Abstract
Electronic databases, from phone to e-mails logs, currently provide detailed records of human communication patterns, offering novel avenues to map and explore the structure of social and communication networks. Here we examine the communication patterns of millions of mobile phone users, allowing us to simultaneously study the local and the global structure of a society-wide communication network. We observe a coupling between interaction strengths and the network's local structure, with the counterintuitive consequence that social networks are robust to the removal of the strong ties but fall apart after a phase transition if the weak ties are removed. We show that this coupling significantly slows the diffusion process, resulting in dynamic trapping of information in communities and find that, when it comes to information diffusion, weak and strong ties are both simultaneously ineffective.
Keywords: complex systems, complex networks, diffusion and spreading, phase transition, social systems
Uncovering the structure and function of communication networks has always been constrained by the practical difficulty of mapping out interactions among a large number of individuals. Indeed, most of our current understanding of communication and social networks is based on questionnaire data, reaching typically a few dozen individuals and relying on the individual's opinion to reveal the nature and the strength of the ties. The fact that currently an increasing fraction of human interactions are recorded, from e-mail (1–3) to phone records (4), offers unprecedented opportunities to uncover and explore the large scale characteristics of communication and social networks (5). Here we take a first step in this direction by exploiting the widespread use of mobile phones to construct a map of a society-wide communication network, capturing the mobile interaction patterns of millions of individuals. The data set allows us to explore the relationship between the topology of the network and the tie strengths between individuals, information that was inaccessible at the societal level before. We demonstrate a local coupling between tie strengths and network topology, and show that this coupling has important consequences for the network's global stability if ties are removed, as well as for the spread of news and ideas within the network.
A significant portion of a country's communication network was reconstructed from 18 weeks of all mobile phone call records among ≈20% of the country's entire population, 90% of whose inhabitants had a mobile phone subscription [see supporting information (SI) Appendix]. Whereas a single call between two individuals during 18 weeks may not carry much information, reciprocal calls of long duration between two users serves as a signature of some work-, family-, leisure-, or service-based relationship. Therefore, to translate the phone log data into a network representation that captures the characteristics of the underlying communication network, we connected two users with an undirected link if there had been at least one reciprocated pair of phone calls between them (i.e., A called B, and B called A) and defined the strength, wAB = wBA, of a tie as the aggregated duration of calls between users A and B. This procedure eliminates a large number of one-way calls, most of which correspond to single events, suggesting that they typically reach individuals that the caller does not know personally. The resulting mobile call graph (MCG) (4) contains N = 4.6 × 106 nodes and L = 7.0 × 106 links, the vast majority (84.1%) of these nodes belonging to a single connected cluster [giant component (GC)]. Given the very large number of users and communication events in the database, we find that the statistical characteristics of the network and the GC are largely saturated, observing little difference between a two- or a three-month-long sample. Note that the MCG captures only a subset of all interactions between individuals, a detailed mapping of which would require face-to-face, e-mail, and land line communications as well. Yet, although mobile phone data capture just a slice of communication among people, research on media multiplexity suggests that the use of one medium for communication between two people implies communication by other means as well (6). Furthermore, in the absence of directory listings, the mobile phone data are skewed toward trusted interactions (that is, people tend to share their mobile numbers only with individuals they trust). Therefore, the MCG can be used as a proxy of the communication network between the users. It is of sufficient detail to allow us to address the large-scale features of the underlying human communication network and the major trends characterizing it.
Results
The MCG has a skewed degree distribution with a fat tail (Fig. 1A), indicating that although most users communicate with only a few individuals, a small minority talks with dozens (4, 7). If the tail is approximated by a power law, which appears to fit the data better than an exponential distribution, the obtained exponent γk = 8.4 is significantly higher than the value observed for landlines (γ = 2.1 for the in-degree distribution; see refs. 8 and 32). For such a rapidly decaying degree distribution, the hubs are few, and therefore many properties of traditional scale-free networks (33), from anomalous diffusion (9) to error tolerance (10), are absent. This decay is probably rooted in the fact that institutional phone numbers, corresponding to the vast majority of large hubs in the case of land lines, are absent, and in contrast with e-mail, in which a single e-mail can be sent to many recipients, resulting in well-connected hubs (1), a mobile phone conversation typically represents a one-to-one communication. The tie strength distribution is broad (Fig. 1B), however, decaying with an exponent γw = 1.9, so that although the majority of ties correspond to a few minutes of airtime, a small fraction of users spend hours chatting with each other. This finding is rather unexpected, given that fat-tailed tie strength distributions have been observed mainly in networks characterized by global transport processes, such as the number of passengers carried by the airline transportation network (11), the reaction fluxes in metabolic networks (12), or packet transfer on the Internet (13), in which case the individual fluxes are determined by the global network topology. An important feature of such global flow processes is local conservation: All passengers arriving to an airport need to be transported away, each molecule created by a reaction needs to be consumed by some other reaction, or each packet arriving to a router needs to be sent to other routers. Although the main purpose of the phone is information transfer between two individuals, such local conservation that constrains or drives the tie strengths are largely absent, making any relationship between the topology of the MCG and local tie strengths less than obvious.
Complex networks often organize themselves according to a global efficiency principle, meaning that the tie strengths are optimized to maximize the overall flow in the network (13, 14). In this case the weight of a link should correlate with its betweenness centrality, which is proportional to the number of shortest paths between all pairs of nodes passing through it (refs. 13, 15, and 16, and S. Valverde and R. V. Sole, unpublished work). Another possibility is that the strength of a particular tie depends only on the nature of the relationship between two individuals and is thus independent of the network surrounding the tie (dyadic hypothesis). Finally, the much studied strength of weak ties hypothesis (17–19) states that the strength of a tie between A and B increases with the overlap of their friendship circles, resulting in the importance of weak ties in connecting communities. The hypothesis leads to high betweenness centrality for weak links, which can be seen as the mirror image of the global efficiency principle.
In Fig. 2A, we show the network in the vicinity of a randomly selected individual, where the link color corresponds to the strength of each tie. It appears from this figure that the network consists of small local clusters, typically grouped around a high-degree individual. Consistent with the strength of weak ties hypothesis, the majority of the strong ties are found within the clusters, indicating that users spend most of their on-air time talking to members of their immediate circle of friends. In contrast, most links connecting different communities are visibly weaker than the links within the communities. As a point of comparison, when we randomly permute the link strengths among the connected user pairs (Fig. 2B), in what would be consistent with the dyadic hypothesis, we observe dramatically more weak ties within the communities and more strong ties connecting distinct communities. Finally, even more divergent with the observed data (Fig. 2A), we illustrate what the world would be like if, as predicted by the global efficiency principle and betweenness centrality, intercommunity ties (“bridges”) were strong and intracommunity ties (“local roads”) weak (Fig. 2C). To quantify the differences observed in Fig. 2, we measured the relative topological overlap of the neighborhood of two users vi and vj, representing the proportion of their common friends (20) Oij = nij/((ki − 1) + (kj − 1) − nij), where nij is the number of common neighbors of vi and vj, and ki (kj) denotes the degree of node vi (vj). If vi and vj have no common acquaintances, then we have Oij = 0, the link between i and j representing potential bridges between two different communities. If i and j are part of the same circle of friends, then Oij = 1 (Fig. 1C). The dyadic hypothesis implies the absence of a relationship between the local network topology and weights, and, indeed, we find that permuting randomly the tie strengths between the links results in Oij that is independent of wij (Fig. 1D). We find, however, that according to the global efficiency principle 〈O〉b decreases with the betweenness centrality bij, indicating that on average the links with the highest betweenness centrality bij have the smallest overlap. In contrast, for the real communication network, 〈O〉w increases as a function of the percentage of links with weights smaller than w, demonstrating that the stronger the tie between two users, the more their friends overlap, a correlation that is valid for ≈95% of the links (Fig. 2D). This result is broadly consistent with the strength of weak ties hypothesis, offering its first societal-level confirmation. It suggests that tie strength is, in part, driven by the network structure in the tie's immediate vicinity. This suggestion is in contrast with a purely dyadic view, according to which the tie strength is determined only by the characteristics of the individuals it connects, or the global view, which asserts that tie strength is driven by the whole network topology.
To understand the systemic or global implications of this local relationship between tie strength and network structure, we explore the network's ability to withstand the removal of either strong or weak ties. To evaluate the impact of removing ties, we measure the relative size of the giant component Rgc(f), providing the fraction of nodes that can all reach each other through connected paths as a function of the fraction of removed links, f. We find that removing in rank order the weakest (or smallest overlap) (Fig. 3 A and B) to strongest (greatest overlap) ties leads to the network's sudden disintegration at fw = 0.8 (fO = 0.6). In contrast, removing first the strongest (or highest overlap) (Fig. 3 A and B) ties will shrink the network but will not precipitously break it apart. The precise point at which the network disintegrates can be determined by monitoring S̃ = Σs<smaxnss2/N, where ns is the number of clusters containing s nodes. According to percolation theory, if the network collapses because of a phase transition at fc, then S̃ diverges as f approaches fc (21, 22). Indeed, we find that S̃ develops a peak if we start with the weakest (or smallest overlap) links (Fig. 3 C and D). Finite size scaling, a well established technique for identifying the phase transition, indicates that the values of the critical points are fcO(∞) = 0.62 ± 0.05 and and fcw(∞) = 0.80 ± 0.04 for the removal of the weak ties, but there is no phase transition when the strong ties are removed first.
Taken together, these results document a fundamental difference between the global role of the strong and weak ties in social networks: The removal of the weak ties leads to a sudden, phase transition-driven collapse of the whole network. In contrast, the removal of the strong ties results only in the network's gradual shrinking but not its collapse. This finding is somewhat unexpected, because in most technological and biological networks the strong ties are believed to play a more important structural role than the weak ties, and in such systems the removal of the strong ties leads to the network's collapse (10, 23–25). This counterintuitive finding underlies the distinct role weak and strong ties play in a social network: Given that the strong ties are predominantly within the communities, their removal will only locally disintegrate a community but not affect the network's overall integrity. In contrast, the removal of the weak links will delete the bridges that connect different communities, leading to a phase transition driven network collapse.
The purpose of the mobile phone is information transfer between two individuals. Yet, given that the individuals are embedded in a social network, mobile phones allow news and rumors to diffuse beyond the dyad, occasionally reaching a large number of individuals, a much studied diffusion problem in both sociology (26) and network science (7). Yet, most of our current knowledge about information diffusion is based on analyses of unweighted networks, in which all tie strengths are considered equal (26). To see whether the observed local relationship between the network topology and tie strength affects global information diffusion, at time 0 we infected a randomly selected individual with some novel information. We assumed that at each time step, each infected individual, vi, can pass the information to his/her contact, vj, with effective probability Pij = xwij, where the parameter x controls the overall spreading rate. (Note that the qualitative nature of results is independent of the choice of x; see SI Appendix for details.) Therefore, the more time two individuals spend on the phone, the higher the chance that they will pass on the monitored information. The spreading mechanism is similar to the susceptible-infected model of epidemiology in which recovery is not possible, i.e., an infected individual will continue transmitting information indefinitely (27). As a control, we considered spreading on the same network, but replaced all tie strengths with their average value, resulting in a constant transmission probability for all links.
As Fig. 4A shows (the real diffusion simulation), we find that information transfer is significantly faster on the network for which all weights are equal, the difference being rooted in a dynamic trapping of information in communities. Such trapping is clearly visible if we monitor the number of infected individuals in the early stages of the diffusion process (Fig. 4B). Indeed, we observe rapid diffusion within a single community, corresponding to fast increases in the number of infected users, followed by plateaus, corresponding to time intervals during which no new nodes are infected before the news escapes the community. When we replace all link weights with an average value w̄ (the control diffusion simulation) the bridges between communities are strengthened, and the spreading becomes a predominantly global process, rapidly reaching all nodes through a hierarchy of hubs (23).
The dramatic difference between the real and the control spreading process raises an important question: Where do individuals get their information? We find that the distribution of the tie strengths through which each individual was first infected (Fig. 4C) has a prominent peak at w ≈ 102 seconds, indicating that, in the vast majority of cases, an individual learns about the news through ties of intermediate strength. The distribution changes dramatically in the control case, however, when all tie strengths are taken to be equal during the spreading process. In this case, the majority of infections take place along the ties that are otherwise weak (Fig. 4D). Therefore, in contrast with the celebrated role of weak ties in information access (17, 19), we find that both weak and strong ties have a relatively insignificant role as conduits for information (“the weakness of weak and strong ties”), the former because the small amount of on-air time offers little chance of information transfer and the latter because they are mostly confined within communities, with little access to new information.
To illustrate the difference between the real and the control simulation, we show the spread of information in a small neighborhood (Fig. 4 E and F). First, the overall direction of information flow is systematically different in the two cases, as indicated by the large shaded arrows. In the control runs, the information mainly follows the shortest paths. When the weights are taken into account, however, information flows along a strong tie backbone, and large regions of the network, connected to the rest of the network by weak ties, are only rarely infected. For example, the lower half of the network is rarely infected in the real simulation but is always infected in the control run. Therefore, the diffusion mechanism in the network is drastically altered when we neglect the tie strengths responsible for the differences between the curves seen in Fig. 4 A and B.
Discussion
Although the study of communication and social networks has a long history, examining the relationship between tie strengths and topology in society-spanning networks has generally been impossible. In this paper, taking advantage of society-wide data collection capabilities offered by mobile phone logs, we show that tie strengths correlate with the local network structure around the tie, and both the dyadic hypothesis and the global efficiency principle are unable to account for the empirical observations.
It has been long known that many networks show resilience to random node removal, but are fragile to the removal of the hubs (10, 28–30). In terms of the links, one would also expect that the strong ties play a more important role in maintaining the network's integrity than the weak ones. Our analyses document the opposite effect in communication networks: The removal of the weak ties results in a phase transition-like network collapse, although the removal of strong ties has little impact on the network's overall integrity. Furthermore, we find that the observed coupling between the network structure and tie strengths significantly slows information flow, trapping it in communities, explaining why successful searches in social networks are conducted primarily through intermediate- to weak-strength ties while avoiding the hubs (3). Therefore, to enhance the spreading of information, one needs to intentionally force it through the weak links or, alternatively, adopt an active information search procedure.
Taken together, weak ties appear to be crucial for maintaining the network's structural integrity, but strong ties play an important role in maintaining local communities. Both weak and strong ties are ineffective, however, when it comes to information transfer, given that most news in the real simulations reaches an individual for the first time through ties of intermediate strength.
The observed coupling between tie strengths and local topology has significant implications for our ability to model processes taking place in social networks. Indeed, many current network models either assign the same strength to all ties or assume that tie strengths are determined by the network's global characteristics, such as betweenness centrality. In addition, some of the most widely used algorithms used to identify communities and groups in complex networks use either betweenness centrality (16) or are based on topological measures (31). Our finding that link weights and betweenness centrality are negatively correlated in mobile communication networks, together with the insights provided by the visually apparent community structure (Fig. 2), offer new opportunities to design clustering algorithms that are tailored to communication networks, and force us to reevaluate many results that were obtained on unweighted graphs. Putting the structural and functional pieces together, we conjecture that communication networks are better suited to local information processing than global information transfer, a result that has the makings of a paradox. Indeed, the underlying reason for characterizing communication networks with global network concepts, such as path length and betweenness centrality, is rooted in the expectation of communication networks to transmit information globally.
Supplementary Material
Acknowledgments
We thank Tamás Vicsek for useful discussions. J.-P.O. thanks the Graduate School in Computational Methods of Information Technology (ComMIT), the Finnish Academy of Science and Letters, and the Väisälä Foundation for a travel grant to visit A.-L.B. at Harvard University. This research was partially supported by the Academy of Finland, Centres of Excellence Programmes, Project nos. 44897 and 213470 and Grant OTKA K60456. G.S. and A.-L.B. were supported by National Science Foundation Grants ITR DMR-0426737, CNS-0540348, and IIS-0513650 and by the James S. McDonald Foundation.
Abbreviation
- MCG
mobile call graph.
Footnotes
Conflict of interest statement: A.L.B. served as a paid consultant for the phone company that provided the phone data.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/cgi/content/full/0610245104/DC1.
References
- 1.Ebel H, Mielsch L-I, Bornholdt S. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Top. 2002;66:35103. doi: 10.1103/PhysRevE.66.035103. [DOI] [PubMed] [Google Scholar]
- 2.Eckmann J-P, Moses E, Sergi D. Proc Natl Acad Sci USA. 2004;101:14333–14337. doi: 10.1073/pnas.0405728101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dodds PS, Muhamad R, Watts DJ. Science. 2003;301:827–829. doi: 10.1126/science.1081058. [DOI] [PubMed] [Google Scholar]
- 4.Aiello W, Chung F, Lu L. Proceedings of the 32nd ACM Symposium on the Theory of Computing; New York: Assoc Comput Machinery; 2000. pp. 171–180. [Google Scholar]
- 5.Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge Univ Press; 1994. [Google Scholar]
- 6.Haythornthwaite C. Inf Commun Soc. 2005;8:125–147. [Google Scholar]
- 7.Newman MEJ, Watts DJ, Barabási A-L. The Structure and Dynamics of Networks. Princeton: Princeton Univ Press; 2006. [Google Scholar]
- 8.Dorogovtsev SN, Mendes JFF. Evolution of Networks. New York: Oxford Univ Press; 2003. [Google Scholar]
- 9.Pastor-Satorras R, Vespignani A. Phys Rev Lett. 2001;86:3200–3203. doi: 10.1103/PhysRevLett.86.3200. [DOI] [PubMed] [Google Scholar]
- 10.Cohen R, Erez K, ben Avraham D, Havlin S. Phys Rev Lett. 2000;85:4626–4628. doi: 10.1103/PhysRevLett.85.4626. [DOI] [PubMed] [Google Scholar]
- 11.Colizza V, Barrat A, Barthélemy M, Vespignani A. Proc Natl Acad Sci USA. 2006;103:2015–2020. doi: 10.1073/pnas.0510525103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Almaas E, Kovács B, Vicsek T, Oltvai ZN, Barabási A-L. Nature. 2004;427:839–843. doi: 10.1038/nature02289. [DOI] [PubMed] [Google Scholar]
- 13.Goh K-I, Kahng B, Kim D. Phys Rev Lett. 2001;87:278701. doi: 10.1103/PhysRevLett.87.278701. [DOI] [PubMed] [Google Scholar]
- 14.Maritan A, Colaiori F, Flammini A, Cieplak M, Banavar JR. Science. 1996;272:984–986. doi: 10.1126/science.272.5264.984. [DOI] [PubMed] [Google Scholar]
- 15.Freeman LC. Sociometry. 1977;40:35–41. [Google Scholar]
- 16.Girvan M, Newman MEJ. Proc Natl Acad Sci USA. 2002;99:7821–7826. doi: 10.1073/pnas.122653799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Granovetter M. Am J Sociol. 1973;78:1360–1380. [Google Scholar]
- 18.Granovetter M. Getting a Job: A Study of Contacts and Careers. 2nd Ed. Chicago: Univ Chicago Press; 1995. [Google Scholar]
- 19.Csermely P. Weak Links: Stabilizers of Complex Systems from Proteins to Social Networks. 1st Ed. Berlin: Springer; 2006. [Google Scholar]
- 20.Jaccard P. Bull Soc Vaudoise Sci Nat. 1901;37:547–579. [Google Scholar]
- 21.Stauffer D, Aharony A. Introduction to Percolation Theory. 2nd Ed. London: CRC; 1994. [Google Scholar]
- 22.Bunde A, Havlin S. Fractals and Disordered Systems. 2nd Ed. New York: Springer; 1996. p. 51. [Google Scholar]
- 23.Barthélemy M, Barrat A, Pastor-Satorras R, Vespignani A. Phys Rev Lett. 2004;92:178701. doi: 10.1103/PhysRevLett.92.178701. [DOI] [PubMed] [Google Scholar]
- 24.Toroczkai Z, Bassler KE. Nature. 2004;428:716. doi: 10.1038/428716a. [DOI] [PubMed] [Google Scholar]
- 25.Gallos LK, Cohen R, Argyrakis P, Bunde A, Havlin S. Phys Rev Lett. 2005;94:188701. doi: 10.1103/PhysRevLett.94.188701. [DOI] [PubMed] [Google Scholar]
- 26.Rogers EM. Diffusion of Innovations. 5th Ed. New York: Free Press; 2003. [Google Scholar]
- 27.See EG, Hethcote HW. SIAM Review. 2000;42:599–693. [Google Scholar]
- 28.Albert R, Jeong H, Barabási A-L. Nature. 2000;406:378–382. doi: 10.1038/35019019. [DOI] [PubMed] [Google Scholar]
- 29.Onnela J-P, Kaski K, Kertész J. Eur Phys J B. 2004;38:353–362. [Google Scholar]
- 30.Callaway DS, Newman MEJ, Strogatz SH, Watts DJ. Phys Rev Lett. 2000;85:5468–5471. doi: 10.1103/PhysRevLett.85.5468. [DOI] [PubMed] [Google Scholar]
- 31.Palla G, Derényi I, Farkas I, Vicsek T. Nature. 2005;435:814–818. doi: 10.1038/nature03607. [DOI] [PubMed] [Google Scholar]
- 32.Barabás A-L, Albert R. Science. 1995;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
- 33.Caldarelli G. Scale-Free Networks. London: Oxford Univ Press; 2007. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.