Significance
Both our mobility and communication patterns obey spatial constraints: Most of the time, our trips or communications occur over a short distance, and occasionally, we take longer trips or call a friend who lives far away. These spatial dependencies, best described as power laws, play a consequential role in broad areas ranging from how an epidemic spreads to diffusion of ideas and information. Here we established the first formal link, to our knowledge, between mobility and communication patterns by deriving a scaling relationship connecting them. The uncovered scaling theory not only allows us to derive human movements from communication volumes, or vice versa, but it also documents a new degree of regularity that helps deepen our quantitative understanding of human behavior.
Keywords: human mobility, social interactions, mobile phone data, social networks, spatial networks
Abstract
Massive datasets that capture human movements and social interactions have catalyzed rapid advances in our quantitative understanding of human behavior during the past years. One important aspect affecting both areas is the critical role space plays. Indeed, growing evidence suggests both our movements and communication patterns are associated with spatial costs that follow reproducible scaling laws, each characterized by its specific critical exponents. Although human mobility and social networks develop concomitantly as two prolific yet largely separated fields, we lack any known relationships between the critical exponents explored by them, despite the fact that they often study the same datasets. Here, by exploiting three different mobile phone datasets that capture simultaneously these two aspects, we discovered a new scaling relationship, mediated by a universal flux distribution, which links the critical exponents characterizing the spatial dependencies in human mobility and social networks. Therefore, the widely studied scaling laws uncovered in these two areas are not independent but connected through a deeper underlying reality.
Over the past few years, we have witnessed tremendous progress in uncovering patterns behind human mobility (1–7) and social networks (8–10), owing partly to the increasing availability of large-scale datasets capturing human behavior in a new level of detail, resolution, and scale (11, 12). Building on rich, fundamental literature from the social sciences (13–19), these data offer a huge opportunity for research, fueling concomitant advances in areas of both human mobility and social networks with profound consequences in broad domains. One important aspect affecting both areas is the critical role space plays. Indeed, growing evidence suggests both our movements and communication patterns are associated with spatial costs that follow reproducible scaling laws. Indeed, previous studies have shown that human travels adhere to spatial constraints (20), characterized by levy flights and continuous time random walk models (1, 2, 4), a scaling law that has proven to be critical in various phenomena driven by human mobility, from spread of viruses (21–23) to migrations (2, 6) and emergency response (24–26). In another related yet distinct area, there has been much empirical evidence about the geographic effect on communication patterns (20), documenting that the probability for two individuals to communicate decays with distance, following power law distributions (20, 27–30). This robust pattern plays an important role in navigating the social network (31), from routing (32, 33) to search of experts (34, 35) to spread of information (27, 36) and innovations (37). Although human movements and social interactions bear high-level similarities in the role spatial distance plays, and are often referred to as two prominent examples of spatial networks (20), they remain as largely separate lines of inquiry, lacking any known connections between their critical exponents. This is particularly perplexing given the fact that they often exploit the same datasets (5, 20, 38–40) and are treated similarly in most modeling frameworks (6, 41).
In this paper, we test the hypothesis that previously observed spatial dependency captures a convolution of geographical propensity and a popularity-based heterogeneity among locations, by exploiting three large-scale mobile phone datasets from different countries across two continents (see Datasets for more details). By separating these two factors, we discovered a scaling relationship linking the critical exponents associated with the spatial effect on movement and communication patterns, effectively reducing the number of independent parameters characterizing human behavior. The uncovered scaling theory not only allows us to derive human movements from communication volumes, or vice versa, it also hints for a deeper connection that may exist among all networked systems where space plays a role, from transportations (2, 6, 42) and communications (27, 29, 30) to the internet (32, 33) and human brains (43).
Results
Mobile communication records, cataloged by mobile phone carriers for billing purposes, provide an extensive proxy of human movements and social interactions at a societal scale. Indeed, by keeping track of each phone call between two users and the spatiotemporal information about the user who initiated the call, mobile phone data offer information on both human mobility and social communication patterns at the same time. In this study, we compiled a uniquely rich database consisting of three different datasets that are of a similar level of detail yet with different demographics, economic status, and scales. The resulting data corpus includes D1, which contains 1.3 million users in Portugal and covers a period of 1 mo; D2, which is a dataset from an unnamed western European country that covers a 1-y period for about 6 million users; and D3, which is collected by the largest mobile phone carrier in Africa, covering a period of 4 y in Rwanda.
To quantify the spatial effect on social communication patterns, we measure the distance distribution of communications using two frequently used distance metrics.
Communication Distance Distribution.
The distance r characterizing social communications is the geodesic distance between two individuals u and v, who communicate via phone calls or short message service (SMS). Previous studies suggested that the probability for two individuals to communicate decreases with distance, following a power law distribution (20, 29, 44). Here we recovered previous results (Fig. 1A), finding that the distance distribution of each studied system, , can be approximated by a power law tail:
[1] |
We find the exponents to be similar for and () and slightly different for () (Fig. 1A and Table 1).
Table 1.
Dataset | ||||||||||
We measured , , , and independently for each dataset by using rank as distance metric. We estimate the errors in our measurements based on 95% confidence level. We then compute using Eq. 8. The error of , , is calculated using error propagations . We find that largely agrees with within uncertainties across all datasets. Similarly, we repeated the same measurements by using geodesic distance, obtaining , , , , and their corresponding errors, allowing us to compute and its error . We find also well approximates . The largest deviations are observed in , which is characterized by much larger uncertainties in estimations of all exponents. This is due to its much smaller data size. Because both our data size and noninteger nature of distance metrics prevent us from using standard fitting algorithms for power laws (57), we computed all our exponents by using the least-square method.
Rank Distribution.
Within a country, the populations are not distributed uniformly in space. To account for such inhomogeneity, previous studies proposed the rank measure as an alternative to quantifying the effective distance between two individuals (27). The rank between two users u and v is the number of people closer to u than v, formally defined as . We measure the rank distributions for our three datasets (Fig. 1B), finding is characterized by a power law tail, consistent with previous studies (20, 27):
[2] |
The exponents for our three datasets are shown in Table 1.
Similarly, for mobility patterns, the jump size distribution is most commonly used to quantify spatial constraints in human movements. Here we measure this quantity in different distance metrics.
Jump Size Distribution.
Jump size measures the displacement in the unit of kilometers between two consecutive sightings of an individual. A fundamental property of human mobility is that the aggregated jump size distribution follows a power law (1, 2, 4),
[3] |
indicating most of the time people travel over short distances, between home and work for example, whereas they occasionally take longer trips. We measured in our data corpus (Fig. 1C), finding few variations in between datasets and ( and 1.8) but slight differences for dataset ().
Rank Jump Size Distribution.
To account for biases from population density we measure the rank of each jump. We find that is also characterized by a power law tail as suggested by previous studies (20, 39),
[4] |
As shown in Fig. 1D, is rather similar for D1 and D2 ( and 1.28) but different from D3: (Table 1).
Taken together, the spatial scaling of social interactions [ and ] for dataset i is characterized by exponents and , respectively, whereas human movements [ and ] are characterized by exponents and . These quantities were reported previously by independent research groups with different measurement details (1, 2, 29, 44). Here we measure these quantities systematically by using a comprehensive database we compiled. We find that within each of the two categories, the critical exponents ( or ) in different countries are rather similar to each other. For example, there is little difference between the three or exponents. For the rank metrics, D1 and D2 are also very similar to each other, whereas D3 is characterized by slightly different exponents. However, most noticeably, we observed substantial and systematic differences between and . Such differences contradict current modeling frameworks from gravity model (45) to radiation model (6) that treat these two classes of problems as similar phenomena given the same population distribution, thus predicting the same scaling exponent within each country. This raises a critical question: What is the origin of the observed differences between exponents and ?
[or ] measures the intensity of social communications as a function of distance, capturing on a population-averaged level the social fluxes between different locations. On the other hand, [or ] measures the aggregated jumps between places, corresponding to the mobility fluxes from one location to another. Denoting with the social fluxes from location i to j and with the mobility fluxes representing the total number of communications () and jumps () between two locations, we measure and between any two locations over a 1-mo period. We find that both social and mobility fluxes follow fat-tailed distributions across our three studied datasets (Fig. 2). This is somewhat expected: Indeed, if we view each location as a node and fluxes as links connecting different locations, the fat-tailed distributions of fluxes are consistent with previous results on link weight distributions (46). Hence, Fig. 2 documents an inherent heterogeneity between locations: There are few fluxes between most locations, yet a nonnegligible fraction of location pairs are characterized by a large number of fluxes. The fat-tailed nature of flux distributions raises an important question: Can distance dependencies (Fig. 1) be accounted for by the observed heterogeneity in fluxes alone (Fig. 2)? To this end, we take D1 as an exemplary case and control for spatial effect by choosing location pairs that are of similar distances () and measuring the distributions for social [ in Fig. 3A] and mobility fluxes [ in Fig. 3B], respectively. We find that the fluxes follow a fat-tailed distribution within each group, indicating there still exists much heterogeneity in fluxes even among locations within similar distances. Moreover, locations that are nearby (small ) tend to have higher fluxes, corresponding to higher intensity in both communications (Fig. 3A) and movements (Fig. 3B). Indeed, the curves in Fig. 3 A and B shift to the right as decreases, indicating the probability for two locations to have large fluxes decays with distance. This is consistent with preceding results (Fig. 1 and Eqs. 2 and 4) because most communications and movements are associated with short distances, accounting for the majority of the fluxes. However, as shown in Fig. 3 A and B, not all pairs of nearby locations have large fluxes. To the contrary, most of them have very small fluxes. Rather, it is a small fraction of location pairs in each distance groups, i.e., the tails of and , that are responsible for generating the majority of fluxes. Most surprisingly, once we rescale the flux distributions with the average fluxes, or , all curves shown in both Fig. 3 A and B (10 curves in total) collapse into one single curve, suggesting that a single universal flux distribution characterizes both social interactions and human movements, independent of distance (Fig. 3C). To be specific, this data collapse indicates that
[5] |
where is a distance-independent function. The data collapse in Fig. 3C is rather remarkable. It indicates that the observed localization in social communications and human movements can be decomposed into two independent factors: one is the universal distribution , which is distance independent, characterizing the inherent popularity-based heterogeneity among different locations. All of the distance dependencies are now encoded in the average fluxes at a given distance, i.e., for social and for mobility fluxes. We repeated our measurements using r as the distance metric, finding again an excellent data collapse (Fig. 3 D–F).
The uncovered universal function in Eq. 5 indicates that the social and mobility fluxes are important factors to characterize communication and mobility patterns, prompting us to measure correlations between the two quantities. We group location pairs (i and j) based on their distance and measure the relationship between and for each group (, , , , and in Fig. 4 A–E). In these scatterplots, each gray dot represents a pair of locations, and its x–y coordinates correspond to the mobility [] and social [] fluxes from i to j. We find strong correlations between these two quantities regardless of the separation between these locations. To quantify this correlation, we measure the average social fluxes given the mobility fluxes at a certain distance, (colored symbols in Fig. 4 A–E), which is formally defined as
[6] |
where is the delta function [ when , and otherwise]. We find that the average social fluxes follow a power law scaling relationship with , i.e.
[7] |
where the scaling exponents for different , , indicating social fluxes scale sublinearly with mobility fluxes, independent of distance. The prefactor in Eq. 7, , corresponds to the shift along the y axis through Fig. 4 A–E. We find that as distance increases, the average social fluxes increase given the same volume of mobility fluxes. Hence, characterizes the cost tradeoff between phone communications and human movement. Rescaling by , we find all curves collapse into a straight line (Fig. 4F), indicating that , where . We repeated the same measurement for D2 and D3, finding that although each dataset is characterized by a different set of and , Eq. 7 holds consistently well across different datasets (Fig. 4 G and H). We also repeated our analysis by replacing with other distance metrics (geodesic distance r), finding again consistent results with Eq. 7. Indeed, each dataset is well described by its characteristic set of and exponents, demonstrating the robustness of our findings (Correlation Between Social and Mobility Fluxes with Geodesic Distance).
Most important, Eq. 7 together with the data collapses in Fig. 3 C and F (Eq. 5) allows us to derive a new scaling relationship,
[8] |
connecting the exponent that characterizes social communications () with the exponent characterizing human movements () (see Derivation of the Scaling Relationship Between Exponents for details). Similarly, for geodesic distance metric r, we obtain
[9] |
We measure each exponent in Eqs. 8 and 9 independently for each dataset, finding excellent agreement between the empirical measurements and our theoretical predictions (Table 1). Hence, Eqs. 8 and 9 offer an explicit link between critical exponents characterizing spatial dependencies in human movements and social interactions, showing that the social exponent (β) can be expressed in terms of the mobility exponents (α), a consistently robust result that is independent of the distance metrics used. The uncovered scaling relationship between these two classes of exponents is mediated by a universal flux distribution [] we uncovered in this study. This scaling relationship bridges two fields that are traditionally disjoint (12, 20), showing that they represent different facets of a deeper underlying reality, effectively reducing the number of independent parameters characterizing human behavior.
The uncovered relationship offers a powerful framework to derive quantities pertaining to one field from those of the other. Next, we show one practical application in public health domain as an exemplary case. Over the past few years, many computational studies highlighted the importance of social data to tackle public health challenges (10, 47). Among them, epidemic spreading is perhaps one of the most prominent (48–51). To this end, we simulate a virus spreading process using D1 to demonstrate how our findings can be used to connect human mobility and social interactions. Of the many ingredients in computational modeling of virus spreading, human mobility is among the most critical (1, 22, 23, 51, 52, 53). To understand how human movements catalyze societal-wide spreading processes, we infect a few randomly selected individuals with some hypothetical germ in a random location at time . Denoting with μ the infection rate of this germ, we assume, at each time step, that an infected individual could spread the disease to others within his/her vicinity, i.e., individuals within the same mobile tower. At the same time, any infected individual can recover from the disease at rate ν. This process is known as the susceptible–infectious–susceptible (SIS) model, commonly used in modeling disease spreading (54, 55).
Choosing any set of μ and ν, we can simulate a spatial SIS model by following the real mobility fluxes between locations () measured from our dataset (see Supporting Information for modeling details). This raises an interesting question: had we not had access to mobility information, how well could we have approximated the observed spreading pattern using social fluxes rescaled by the predicted scaling (Eq. 8)?
Following Eq. 7 and using the exponents from Eq. 8, mobility fluxes between a location i and j can be approximated by rescaled social fluxes, , defined as , where and for D1 (Table 1) and is the distance between the two locations. We simulate a spreading process in Portugal using the real mobility fluxes and the rescaled social fluxes as well as the mobility fluxes , approximated by the widely used gravity model (20, 45) (see Determination of Gravity Law’s Parameters for more details). To compare these results, we started from the same initial conditions (, ) and initial infected users located in a major city (Lisbon) for all simulations in this example.
We measure the density of infected users estimated in each location i for the three cases (Fig. 5 A–C), finding a remarkable agreement between our simulation and the real spreading patterns. Moreover, close up on the city of Porto reveals a superior accuracy of our model comparing with predictions from gravity model. We quantify the differences between the two methods (Fig. 5 D and E). The drastic difference between Fig. 5 D and E highlights the superior predictive power of our model.
To systematically assess and compare the accuracy of our results, we simulated 500 independent spreading processes following the same procedure described above but choosing randomly μ and ν parameters as well as the initial infected location and the number of infected users. We find that errors obtained from the 500 simulations are systematically lower than estimations from gravity model across all stages of the spreading processes (Fig. 5F), demonstrating the practical relevance of our scaling relationship, effectively predicting mobility patterns using social communication records.
The practical applications are most useful when only one of the two facets of information is available. For example, companies that provide social networking functionalities or services have exploded over the past few years. For many of them, mobility information is essential but difficult to collect. Conversely, companies providing location services such as global positioning system (GPS) have many mobility records yet typically lack social information. For both cases, our method may provide a reasonable estimate to fill the void, which is particularly promising given the fact that it outperforms gravity model, the prevailing framework to predict movements. Therefore, in many cases, our method may serve as a viable alternative, working in unison with or in certain cases even replacing model-based approaches, improving the predictive accuracy of most of the phenomena affected by mobility and transport processes. It could be particularly useful for developing countries where many people still live in data-scarce environments.
Taken together, by analyzing three large-scale mobile phone datasets from three different countries, we uncovered a new scaling relationship between the critical exponents that characterize spatial dependencies in human mobility and social interactions. This scaling relationship is mediated by a universal flux distribution for both movement and communication patterns, indicating the previously observed distance dependencies capture a convolution of geographical propensity and a popularity-based heterogeneity among locations. Separating these two factors allows us to establish a formal connection between different critical exponents that were perceived as independent. Together, our results document a new order of regularity that helps deepen our quantitative understanding of human behavior. Last, our results may reach far beyond communications and transportations studied in this paper because many networked systems are also subject to spatial costs in establishing connections in a very similar fashion as our quoted examples, from routers linked by physical cables to form globally connected internet to axons that connect different regions of human brains. Hence, our results may provide relevant insights to a diverse set of networked systems where space plays a role (20), opening up a promising direction for future investigation.
Finally, our study is not without limitations. Indeed, although the critical exponents we studied here capture macroscopic patterns of mobility and social interactions, both processes are affected by various sociodemographic factors both within and across countries, resulting in population variations that may not be captured adequately by power law exponents alone. It would be fruitful to analyze the degree to which such information affects mobility and social interactions, when more sociodemographic information becomes available. Among our three datasets, D3 seems to be an outlier, having different critical exponents than D1 and D2. Such information would also help us uncover deeper reasons behind variations across different countries. Furthermore, although our datasets capture people and their interactions, the focus of our paper is on data rather than people. Indeed, the virtue of our results lies in the uncovered statistical regularities revealed by our datasets. As such, our paper focuses on facts that can be measured from the data rather than deeper sociological reasons behind these observations. Last, to what degree are movements and social interactions estimated from mobile phone datasets representative? Although studies that compare self-report surveys and observational data (56) together with results obtained using higher-resolution traces (12) offer additional, convincing assurance that our results are not affected by the peculiarities of call detail records used in our study (Potential Limitations of Mobile Phone Datasets), we need further studies to test these assumptions in a more systematic manner.
Materials and Methods
Details of studied datasets are described in Datasets. Mathematical derivations of the scaling relationships in Eqs. 8 and 9 are summarized in Derivation of the Scaling Relationship Between Exponents. The same measurements as Figs. 2 and 3 obtained by using D2 and D3 are shown in Distribution of Social and Mobility Fluxes for D2 and D3. Data necessary to replicate results of this study (D1, D2, and D3) are available upon request. The use of mobile phone datasets for research purposes was approved by the Northeastern University Institutional Review Board. Informed consent was not necessary because research was based on previously collected anonymous datasets.
Datasets
Mobile communication records, cataloged by mobile phone carriers for billing purposes, provide an extensive proxy of human movements and social interactions at a societal scale. Indeed, by keeping track of each phone call between two users and the spatiotemporal information about the users who initiated and received the call, mobile phone data offer information on both human mobility and social communication patterns at the same time, as we will detail hereunder.
In this project, we compiled a uniquely rich database consisting of three different datasets that are of a similar level of detail yet with different demographics, economic status, and scales:
Dataset D1 contains mobile phone calls between 1.3 million users over a period of 1 mo in 2006 from a European country (Portugal). For each phone call, the caller and the callee, both anonymized with a key (hash code); the time; the date; and the phone towers routing the communication are recorded. Only phone calls between users that called each other at least 5 times over a period of 18 mo are known. Furthermore, only the coordinates of the mobile phone towers are known; hence, the position of a user within the range of an antenna is unknown.
Dataset D2 covers a 6-mo period of mobile phone calls between 6 million anonymized users from a large European country. For each phone call, the caller, the callee, the time, and the towers routing the communication are recorded. Similarly to , only the coordinates of the mobile towers are known; hence, the position of a user within the range of an antenna is unknown.
Dataset D3 covers a period from 2005 to 2009 and is made of all transaction logs of all mobile phone activity that occurred in an African country (Rwanda) over the 5-y period. The data originate from the largest mobile phone operator in that country and contain about 1.5 million phone calls. The logs include the date, the time, and the mobile phone towers routing the call for each of the phone calls and are again anonymous. Again, only the coordinates of the mobile towers are known; hence, the position of a user within the range of an antenna is unknown.
Inferring Mobility and Social Fluxes
Mobility Fluxes.
For each phone call, the position of the tower routing the call is known for the caller. Because we know the location of each tower, we know the location of the user was within the range of the tower’s service area. By looking at each consecutive phone call made by a user, we can thus reconstruct the user’s jumps between two consecutive locations where his calls were initiated. By aggregating all movements for all users, we can thus obtain the total number of jumps from any tower i to any tower j (). All jumps made outside continental territories (i.e., islands) were not taken into account. The jumps do not exceed km, km, and km for datasets , , and , respectively, due to national frontiers and coverage limitations driven by geographical constraints in the country. We consider the number of jumps between two locations as the mobility fluxes between them.
Social Fluxes.
For each phone call, the position of the tower routing the call is known for both the caller and the callee. By considering all phone calls, we thus know the total number of calls from a tower i to a tower j (). We consider the number of phone calls between two locations as the social fluxes between them.
Jump Size Distribution at Fixed Interevent Time
It is known that the distribution of interevent times between two consecutive calls (locations) from the same user is heterogeneous (2). It is thus important to investigate if the observed displacement statistics (the jumps) are affected by this characteristic.
To simulate location traces left by a phone on a regular basis (instead of those due to calls) using data available to us, we calculate location displacement between a fixed time interval instead of two consecutive phone calls. More specifically, we use our dataset D1 and calculate the jump size distribution for displacements separated by a time . We systematically vary from 1 h to 1 d (Fig. S1). We find the distributions collapse for different choices of , suggesting that the use of consecutive calls serves as a good proxy for movements. Also, the curves can be well approximated by a power law consistent with our previous results. Note our results are bounded by the maximum distance a user can travel during , thus explaining the differences in the tail part of the distribution.
Distribution of Social and Mobility Fluxes for D2 and D3
In Figs. S2 and S3, we present results obtained for the datasets and regarding the distributions of the social fluxes and (Fig. 3 A and D for ) and mobility fluxes and (Fig. 3 B and E for ) for pairs of locations that are of similar distances (r and ). We also show how flux distributions collapse for both datasets when they are rescaled with their average flux, or (Fig. 3 C and F). The same procedure for the dataset is applied to (Fig. S2) and (Fig. S3). We again find that the fluxes for each group still follow a fat-tailed distribution, indicating there also exists much heterogeneity in fluxes among locations within similar distances for both and . Locations that are nearby (small or r) tend to have higher fluxes, corresponding to higher intensity in both communications (Figs. S2 A and D and S3 A and D) and movements (Figs. S2 B and E and S3 B and E), corroborating our results for . Indeed, the curves shift to the right as (or r) decreases, indicating the probability for faraway location pairs to have large fluxes is much lower. Once we rescale the flux distributions with the average flux, or , we find all of the curves collapse into a single curve, demonstrating again a single universal flux distribution characterizes both social communication and human movement fluxes, independent of distance (Figs. S2 C and F and S3 C and F).
Correlation Between Social and Mobility Fluxes with Geodesic Distance
As developed in the manuscript for the rank-based distance, we here analyze the correlations between the social fluxes and mobility fluxes in the case of the geodesic distance. We group location pairs (i and j) based on their distance and measure the relationship between and for each group (, , , , and in Fig. S4 A–E). In these scatterplots, each gray dot represents a pair of locations, and its x–y coordinates correspond to the mobility [] and social [] fluxes from i to j for dataset .
Same as for the rank-based measures, we find again strong correlations between these two quantities regardless of how far away these locations are separated. To quantify this correlation, we measure the average social fluxes given the mobility fluxes at a certain distance, (colored symbols in Fig. S4 A–E), which is formally defined as
[S1] |
where is the delta function [ when , and otherwise]. We find that the average social fluxes have again a power law scaling relationship with , following
[S2] |
where the scaling exponent for different r, indicating social fluxes again scale sublinearly with mobility fluxes. The prefactor in Eq. S2, , corresponds to the shift along the y axis through Fig. S4 A–E. We find, as distance increases, the average social fluxes increase given the same volume of mobility fluxes. Rescaling by , we find all curves collapse into a straight line (Fig. S4F), indicating .We repeated the same measurement for D2 and D3. We found that although each dataset is again characterized by a different set of and , Eq. S2 (same as Eq. 7 in the main manuscript) holds consistently well across different datasets (Fig. S4 F–H), demonstrating the robustness of our findings for both the distance r and .
Derivation of the Scaling Relationship Between Exponents
As stated in the main manuscript, we find that the average social fluxes follow a power law scaling relationship with , i.e.,
[S3] |
where , indicating social fluxes scale sublinearly with mobility fluxes, independent of distance. As described in Correlation Between Social and Mobility Fluxes with Geodesic Distance, a similar result is obtained for geodesic distance metric r (Eq. S2).
The data collapses observed in Fig. 3 C and F, i.e.,
[S4] |
together with Eq. S3 allow us to derive a new scaling relationship between different critical exponents. Indeed, the average social fluxes at distance , , can be obtained by integrating over :
[S5] |
Substituting Eqs. S4 and S3 into Eq. S5, we have
[S6] |
where as a change of variable. As , and similarly , we have
[S7] |
The tail behavior of indicates the integral in Eq. S7 converges. Hence, Eq. S7 leads to a scaling relationship,
[S8] |
connecting the exponent that characterizes social communications () with the exponent characterizing human movements (). Similarly, for geodesic distance metric r, we obtain
[S9] |
The scaling analyses performed here have their roots in the canonical statistical physics literature, namely, the scaling identities in phase transitions and critical phenomenon. The power law scaling behavior in the vicinity of a continuous transition is captured by a set of critical exponents , characterizing various fundamental quantities such as free energy, specific heat, magnetization, susceptibility, etc. In the beginning, these critical exponents were measured independently and found to vary slightly across different materials. Later, we witnessed a burst of results demonstrating that these critical exponents are not independent but are in fact connected through what we now call scaling identities. The famous examples include Rushbrooke’s identity, Widom’s identity, Josephson’s identity, and Fisher’s identity (58).
Determination of Gravity Law’s Parameters
The gravity law assumes that the mobility fluxes between a locations i of origin and a location j of destination can be expressed as a function of the two populations at the two locations ( and ) and the geodesic distance between them () as
[S10] |
where (6, 20, 45, 59). By taking the logarithm on both sides we obtain
[S11] |
Using the observed mobility fluxes (), we can then estimate the parameters through a least square regression, giving us .
Epidemic Spreading Simulations
To compare the accuracy and usefulness of our rescaling formula, we simulated an SIS process commonly used in modeling disease spreading (54, 55) by following the observed mobility fluxes and the rescaled social fluxes but also the mobility fluxes approximated by the well-known gravity model (20, 45).
We consider the process where each location i (mobile tower) is characterized by a constant population size , equal to the number of distinct users present in the vicinity of the mobile tower over the period covered by the dataset . The total population in our system is thus given by , and the system equilibrated as the population is constant. In each location, users are classified according to their infectious state: they can be either infectious (I) or susceptible to be infected (S). The standard generalization of this spatial SIS model is given by
[S12] |
[S13] |
[S14] |
[S15] |
where reaction S5 indicates that susceptible users can become infectious at a rate μ and reaction S6 corresponds to infected users recovering from the disease at a rate ν. In addition to the standard SIS dynamics, susceptible as well as infected users can randomly move between one location i to another location j as described in reactions S7 and S8. The probability rate of these movements from location i to j is governed by the probability rate defined as
[S16] |
Because the system is equilibrated, the flux of users from i to j must balance that of j to i (detailed balance condition):
[S17] |
which is fulfilled by Eq. S18.
In this case, the spatial SIS model can be defined as a set of m coupled ordinary differential equations (ODEs) for the infected people in each location (22, 60):
[S18] |
enabling us to compute the evolution of infected users in each location over time by solving these.
Denoting with the number of users at location i, with the area of location i and , , and the number of infected users at time t in location i when using , , and , respectively, we measure , , and , i.e., the density of infected users estimated in each location i for the three cases (Fig. 5 A–C for ). We find a remarkable agreement between our simulation and the real spreading patterns. Moreover, close up on the city of Porto reveals a superior accuracy of our model comparing with predictions from gravity model. To quantify the differences between the two methods, we measure
[S19] |
and
[S20] |
corresponding to the relative error of infection rate in each location i at time t for both methods (Fig. 5 D and E at ). The drastic difference between Fig. 5 D and E highlights the fact that lower are observed comparing with in this particular example, again documenting the superior predictive power of our model.
To systematically assess and compare the accuracy of our results, we simulated 500 independent spreading processes following the same procedure described above but choosing randomly μ and ν parameters as well as the initial infected location and the number of infected users. For each simulation, we compute the mean values and from Eqs. S19 and S20, respectively, at different stages (time steps). We find that obtained from the 500 simulations are systematically lower than across all stages of the spreading processes (Fig. 5F), demonstrating the practical relevance of our scaling relationship, effectively predicting mobility patterns using social communication records.
Normalizing the Time Steps of the Spreading Processes
In this section, we describe the procedure we use to compare spatial spreading processes whose initial conditions differs.
As formulated in Eq. S18, each spatial process is characterized by a set of m coupled ODEs. Each ODE corresponds to a spreading subprocess within a location, and each one of them reaches the steady-state after a different number of time steps (61, 62). Here we consider the global process to be at equilibrium when no more changes are observed for any of its subprocesses, i.e., .
As described in the main manuscript, we simulated 500 spatial spreading processes, each with parameters μ, ν, initial infected location, and initial number of infected users chosen randomly. Each process i will thus reach the equilibrium at a different time . To compare their accuracy at different stages of the process, we normalize the time steps of each process i by its time before equilibrium, i.e., . As a result, a time step of for any process i would correspond to half the time it takes to reach the equilibrium. This normalization is used in Fig. 5F to compare processes at similar stages.
Potential Limitations of Mobile Phone Datasets
For studies on mobility and social interactions, the mobile phone dataset is the most relevant dataset that is currently in existence. Indeed, at present, the most detailed information on human mobility across a large segment of the population is collected by mobile phone carriers. Mobile carriers record the closest mobile tower each time the user uses his or her phone. Other possible data sources include dollar bills, GPS, or check-in datasets from location-based social networking services, all of which suffer from well-known limitations that are resolved by mobile phone datasets. Indeed, dollar bills are carried by various individuals; hence, mobility inferred from them captures population-level aggregated movements instead of individual mobility. GPS tracks individual positions on a continuous basis with high precision, but it operates on a much smaller scale (typically hundreds of people) in contrast to millions of individuals’ mobile phone data records. Check-in datasets only record mobility information when users report their positions voluntarily on subset of population who use the service, in contrast to mobile phones that objectively collect mobility information across a societal-scale population. For this reason, research on human mobility has literally exploded following the availability of mobile phone datasets, resulting in a number of rather fundamental papers. Furthermore, mobile phone datasets offer comprehensive information on phone calls and text messages, providing social network information in addition to mobility trajectories of each individual. Therefore, mobile phone datasets are excellent data sources to study simultaneously human mobility and social networks.
However, mobile phone datasets have a number of well-known limitations. Most notably, there are three aspects:
First, as mobile phones approximate a user’s location by the tower that routed the call, the spatial resolution of the dataset is limited by the area covered by each tower, which typically ranges between 1 and 3 km. This is a spatial limitation of the data. Luckily, earlier research has extensively focused on this issue and documented that, at least for results we discussed, the results are not affected by this limitation.
Second, a user’s position is only recorded when he or she makes a call or sends a text. However, human communications follow bursty patterns. This is the temporal limitation of the data. However, there are ample reasons to believe that our results are not affected by it. Mobility studies that compare mobility patterns obtained through mobile phone data and other continuous tracing technologies consistently find that the two are largely indistinguishable (2). These include GPS traces (2) as well as high-resolution mobile phone records (4, 5). Although we do not have direct access to these datasets, using our own datasets, we further calculated location displacement between a fixed time interval instead of two consecutive phone calls, in doing so artificially creating mobility traces that occur on a continuous basis. We find that results are remarkably stable as we vary the time interval systematically from 1 h to 1 d (Jump Size Distribution at Fixed Interevent Time). All these results suggest that although mobility information is obtained from calling patterns in mobile phone datasets, call detailed record (CDR) data offer representative patterns of mobility, offering convincing reassurance that our results are not affected by this limitation.
Third, social network information is inferred based on calling patterns, yet calls using mobile phones can be ambiguous and hence may not represent true social relationships. This is the third limitation of the data. However, results by Eagle et al. (56) compared self-report survey data on mobility and social interactions with observational data obtained using mobile phones, demonstrating a high degree of accuracy (95%) in inferring friendship structures based on observational data alone.
Taken together, mobile phone datasets are the best and largest datasets for the type of study we conducted here. Although they have well-known limitations, extensive studies and results have demonstrated that our study is not affected by these limitations.
Acknowledgments
This work was supported by the College of Information, Sciences, and Technology at Pennsylvania State University; the Network Science Collaborative Technology Alliance sponsored by the US Army Research Laboratory under Agreement W911NF-09-2-0053; and the Defense Threat Reduction Agency Awards WMD BRBAA07-J-2-0035 and BRBAA08-Per4-C-2-0033. P.D. is supported by the National Fund for Scientific Research (FNRS) and by the Research Department of the Communauté francaise de Belgique (Large Graph Concerted Research Action).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. A.D. is a guest editor invited by the Editorial Board.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1525443113/-/DCSupplemental.
References
- 1.Brockmann D, Hufnagel L, Geisel T. The scaling laws of human travel. Nature. 2006;439(7075):462–465. doi: 10.1038/nature04292. [DOI] [PubMed] [Google Scholar]
- 2.González MC, Hidalgo CA, Barabási AL. Understanding individual human mobility patterns. Nature. 2008;453(7196):779–782. doi: 10.1038/nature06958. [DOI] [PubMed] [Google Scholar]
- 3.Song C, Qu Z, Blumm N, Barabási AL. Limits of predictability in human mobility. Science. 2010;327(5968):1018–1021. doi: 10.1126/science.1177170. [DOI] [PubMed] [Google Scholar]
- 4.Song C, Koren T, Wang P, Barabási A. Modelling the scaling properties of human mobility. Nat Phys. 2010;6(10):818–823. [Google Scholar]
- 5.Wang D, Pedreschi D, Song C, Giannotti F, Barabasi A-L. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; New York: 2011. Human mobility, social ties, and link prediction; pp. 1100–1108. [Google Scholar]
- 6.Simini F, González MC, Maritan A, Barabási A-L. A universal model for mobility and migration patterns. Nature. 2012;484(7392):96–100. doi: 10.1038/nature10856. [DOI] [PubMed] [Google Scholar]
- 7.de Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD. Unique in the Crowd: The privacy bounds of human mobility. Sci Rep. 2013;3:1376. doi: 10.1038/srep01376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Watts DJ. Six Degrees: The Science of a Connected Age. WW Norton; New York: 2004. [Google Scholar]
- 9.Barabási A-L. Linked: The New Science of Networks. Perseus; Cambridge, MA: 2002. [Google Scholar]
- 10.Lazer D, et al. Life in the network: The coming age of computational social science. Science. 2009;323:721. doi: 10.1126/science.1167742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Blondel VD, et al. 2012. Data for development: The d4d challenge on mobile phone data. arXiv:1210.0137.
- 12.Blondel VD, Decuyper A, Krings G. 2015. A survey of results on mobile phone datasets analysis. arXiv:1502.03406.
- 13.Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Vol 8 Cambridge Univ Press; Cambridge, UK: 1994. [Google Scholar]
- 14.Milgram S. The small world problem. Psychol Today. 1967;1(1):61–67. [Google Scholar]
- 15.Borgatti SP, Mehra A, Brass DJ, Labianca G. Network analysis in the social sciences. Science. 2009;323(5916):892–895. doi: 10.1126/science.1165821. [DOI] [PubMed] [Google Scholar]
- 16.Granovetter MS. The strength of weak ties. Am J Sociol. 1973;78(6):1360–1380. [Google Scholar]
- 17.Coleman JS. Social capital in the creation of human capital. Am J Sociol. 1988;94(Suppl):S95–S120. [Google Scholar]
- 18.Lin N, Cook K, Burt RS. Sociology and Economics: Controversy and Integration Series. Aldine de Gruyter; New York: 2001. Social capital: Theory and research; pp. 31–56. [Google Scholar]
- 19.Fukuyama F. Trust: The Social Virtues and the Creation of Prosperity. Vol 457 Free Press; New York: 1996. [Google Scholar]
- 20.Barthélemy M. Spatial networks. Phys Rep. 2011;499(1):1–101. [Google Scholar]
- 21.Colizza V, Barrat A, Barthélemy M, Vespignani A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc Natl Acad Sci USA. 2006;103(7):2015–2020. doi: 10.1073/pnas.0510525103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Colizza V, Pastor-Satorras R, Vespignani A. Reaction–diffusion processes and metapopulation models in heterogeneous networks. Nat Phys. 2007;3(4):276–282. [Google Scholar]
- 23.Balcan D, et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci USA. 2009;106(51):21484–21489. doi: 10.1073/pnas.0906910106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bagrow JP, Wang D, Barabási A-L. Collective response of human populations to large-scale emergencies. PLoS One. 2011;6(3):e17680. doi: 10.1371/journal.pone.0017680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lu X, Bengtsson L, Holme P. Predictability of population displacement after the 2010 Haiti earthquake. Proc Natl Acad Sci USA. 2012;109(29):11576–11581. doi: 10.1073/pnas.1203882109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gao L, et al. Quantifying information flow during emergencies. Sci Rep. 2014;4:3997. doi: 10.1038/srep03997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liben-Nowell D, Novak J, Kumar R, Raghavan P, Tomkins A. Geographic routing in social networks. Proc Natl Acad Sci USA. 2005;102(33):11623–11628. doi: 10.1073/pnas.0503018102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wong LH, Pattison P, Robins G. A spatial model for social networks. Physica A. 2006;360(1):99–120. [Google Scholar]
- 29.Lambiotte R, et al. Geographical dispersal of mobile communication networks. Physica A. 2008;387(21):5317–5325. [Google Scholar]
- 30.Scellato S, Noulas A, Lambiotte R, Mascolo C. Proceedings of the Fifth International Conference on Weblogs and Social Media. Association for Advancement of Artificial Intelligence; Menlo Park, CA: 2011. Socio-spatial properties of online location-based social networks; pp. 329–336. [Google Scholar]
- 31.Kleinberg JM. Navigation in a small world. Nature. 2000;406(6798):845. doi: 10.1038/35022643. [DOI] [PubMed] [Google Scholar]
- 32.Boguna M, Krioukov D, Claffy KC. Navigability of complex networks. Nat Phys. 2008;5(1):74–80. [Google Scholar]
- 33.Boguñá M, Papadopoulos F, Krioukov D. Sustaining the Internet with hyperbolic mapping. Nat Commun. 2010;1:62. doi: 10.1038/ncomms1063. [DOI] [PubMed] [Google Scholar]
- 34.Adamic LA, Lukose RM, Puniyani AR, Huberman BA. Search in power-law networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2001;64(4 Pt 2):046135. doi: 10.1103/PhysRevE.64.046135. [DOI] [PubMed] [Google Scholar]
- 35.Adamic L, Adar E. How to search a social network. Soc Networks. 2005;27(3):187–203. [Google Scholar]
- 36.Wang D, et al. Proceedings of the 20th International Conference on World Wide Web. Association for Computing Machinery; New York: 2011. Information spreading in context; pp. 735–744. [Google Scholar]
- 37.Rogers EM. Diffusion of Innovations. Simon and Schuster; New York: 2010. [Google Scholar]
- 38.Cho E, Myers SA, Leskovec J. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; New York: 2011. Friendship and mobility: User movement in location-based social networks; pp. 1082–1090. [Google Scholar]
- 39.Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C. A tale of many cities: universal patterns in human urban mobility. PLoS One. 2012;7(5):e37027. doi: 10.1371/journal.pone.0037027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Toole JL, Herrera-Yaqüe C, Schneider CM, González MC. Coupling human mobility and social ties. J R Soc Interface. 2015;12(105):20141128. doi: 10.1098/rsif.2014.1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zipf GK. The P1 P2/D hypothesis: On the intercity movement of persons. Am Sociol Rev. 1946;11(6):677–686. [Google Scholar]
- 42.Rodrigue J-P, Comtois C, Slack B. The Geography of Transport Systems. Routledge; New York: 2013. [Google Scholar]
- 43.Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat Rev Neurosci. 2009;10(3):186–198. doi: 10.1038/nrn2575. [DOI] [PubMed] [Google Scholar]
- 44.Krings G, Calabrese F, Ratti C, Blondel VD. Urban gravity: A model for inter-city telecommunication flows. J Stat Mech. 2009;2009(7):L07003. [Google Scholar]
- 45.Erlander S, Stewart NF. The Gravity Model in Transportation Analysis: Theory and Extensions. Vol 3 VSP, Zeist; The Netherlands: 1990. [Google Scholar]
- 46.Barrat A, Barthélemy M, Pastor-Satorras R, Vespignani A. The architecture of complex weighted networks. Proc Natl Acad Sci USA. 2004;101(11):3747–3752. doi: 10.1073/pnas.0400087101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Giles J. Computational social science: Making the links. Nature. 2012;488(7412):448–450. doi: 10.1038/488448a. [DOI] [PubMed] [Google Scholar]
- 48.Team WER, et al. WHO Ebola Response Team Ebola virus disease in West Africa--the first 9 months of the epidemic and forward projections. N Engl J Med. 2014;371(16):1481–1495. doi: 10.1056/NEJMoa1411100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gomes MFC, et al. 2014. Assessing the international spreading risk associated with the 2014 West African Ebola outbreak. PLoS Currents Outbreaks September 2, 2014, Edition 1.
- 50.Shaman J, Karspeck A, Yang W, Tamerius J, Lipsitch M. Real-time influenza forecasts during the 2012-2013 season. Nat Commun. 2013;4:2837. doi: 10.1038/ncomms3837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A. 2014. Epidemic processes in complex networks. arXiv:1408.2701.
- 52.Wang P, González MC, Hidalgo CA, Barabási A-L. Understanding the spreading patterns of mobile phone viruses. Science. 2009;324(5930):1071–1076. doi: 10.1126/science.1167053. [DOI] [PubMed] [Google Scholar]
- 53.Brockmann D, Helbing D. The hidden geometry of complex, network-driven contagion phenomena. Science. 2013;342(6164):1337–1342. doi: 10.1126/science.1245200. [DOI] [PubMed] [Google Scholar]
- 54.Pastor-Satorras R, Vespignani A. Epidemic spreading in scale-free networks. Phys Rev Lett. 2001;86(14):3200–3203. doi: 10.1103/PhysRevLett.86.3200. [DOI] [PubMed] [Google Scholar]
- 55.Boguñá M, Pastor-Satorras R. Epidemic spreading in correlated complex networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2002;66(4 Pt 2):047104. doi: 10.1103/PhysRevE.66.047104. [DOI] [PubMed] [Google Scholar]
- 56.Eagle N, Pentland AS, Lazer D. Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci USA. 2009;106(36):15274–15278. doi: 10.1073/pnas.0900282106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Clauset A, Shalizi C, Newman M. 2007. Power-law distributions in empirical data. arXiv:0706.1062.
- 58.Kardar M. Statistical Physics of Fields. Cambridge Univ Press; Cambridge, UK: 2007. [Google Scholar]
- 59.Viboud C, et al. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science. 2006;312(5772):447–451. doi: 10.1126/science.1125237. [DOI] [PubMed] [Google Scholar]
- 60.Hufnagel L, Brockmann D, Geisel T. Forecast and control of epidemics in a globalized world. Proc Natl Acad Sci USA. 2004;101(42):15124–15129. doi: 10.1073/pnas.0308344101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Anderson RM, May RM. Infectious Diseases of Humans. Vol 1 Oxford Univ Press; Oxford, UK: 1991. [Google Scholar]
- 62.Heesterbeek J. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation. Vol 5 Wiley; Chichester, UK: 2000. [Google Scholar]