Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2009 Jul 14;106(31):12640–12645. doi: 10.1073/pnas.0902667106

Scaling laws of human interaction activity

Diego Rybski a, Sergey V Buldyrev b, Shlomo Havlin c, Fredrik Liljeros d, Hernán A Makse a,1
PMCID: PMC2722366  PMID: 19617555

Abstract

Even though people in our contemporary technological society are depending on communication, our understanding of the underlying laws of human communicational behavior continues to be poorly understood. Here we investigate the communication patterns in 2 social Internet communities in search of statistical laws in human interaction activity. This research reveals that human communication networks dynamically follow scaling laws that may also explain the observed trends in economic growth. Specifically, we identify a generalized version of Gibrat's law of social activity expressed as a scaling law between the fluctuations in the number of messages sent by members and their level of activity. Gibrat's law has been essential in understanding economic growth patterns, yet without an underlying general principle for its origin. We attribute this scaling law to long-term correlation patterns in human activity, which surprisingly span from days to the entire period of the available data of more than 1 year. Further, we provide a mathematical framework that relates the generalized version of Gibrat's law to the long-term correlated dynamics, which suggests that the same underlying mechanism could be the source of Gibrat's law in economics, ranging from large firms, research and development expenditures, gross domestic product of countries, to city population growth. These findings are also of importance for designing communication networks and for the understanding of the dynamics of social systems in which communication plays a role, such as economic markets and political systems.

Keywords: growth, Gibrat's law, long-term correlations, memory, network growth


The question of whether unforeseen outcomes of social activity follow emergent statistical laws has been an acknowledged problem in the social sciences since at least the last decade of the 19th century (14). Earlier discoveries include Pareto's law for income distributions (5), Zipf's law initially applied to word frequency in texts and later extended to firms, cities and others (6), and Gibrat's law of proportionate growth in economics (79).

Social networks are permanently evolving and Internet communities are growing more each day. Having access to the communication patterns of Internet users opens the possibility to unveil the origins of statistical laws that may lead us to the better understanding of human behavior as a whole. In this paper, we analyze the dynamics of sending messages in 2 Internet communities in search of statistical laws of human communication activity. The first online community (OC1) is mainly used by the group of men who have sex with men (MSM).* The data consists of over 80,000 members and more than 12.5 million messages sent over the course of 63 days. The target group of the second online community (OC2) is teenagers (10). The data covers 492 days of activity with more than 500,000 messages sent among almost 30,000 members. Both web sites are also used for social interaction in general. All data are completely anonymous, lack any message content, and consist only of the times at which the messages are sent and the identification numbers of the senders and receivers.

The act of writing and sending messages is an example of an intentional social action. In contrast to routinized behavior, the actants are aware of the purpose of their actions (2, 3). Nevertheless, the emergent properties of the collective behavior of the actants are unintended. In Fig. 1 A, we show a typical example of the activity of a member of OC1 depicting the times when the member sends messages. Figure 1 B provides the cumulative number of messages sent (green curve) compared with a random surrogate dataset (brown curve) obtained by shuffling the data, as discussed in Materials and Methods. As would be expected, there are large fluctuations in the members' activity when compared with a random signal (1113, 15). The messages sent at random display small temporal fluctuations, whereas the OC1 member, sends many more messages in the beginning and a lot fewer at the end of the period of data acquisition (as also seen in Fig. 1 C, displaying the number of messages sent per day). Although such extreme events or bursts have been documented for many systems, including e-mail and letter post communication, instant messaging, web browsing and movie watching (1115), their origin is still an open question.

Fig. 1.

Fig. 1.

A typical example of an individuals' message activity. (A) Instants at which messages were sent by a member belonging to OC1. (B) Cumulative number of messages m(t) (green) and the same but with the messages placed at random (brown). (C) Sequence of number of messages sent per day, μ(t), for the same individual. (DF) Color-coded sequences μ(t) for members sending M = 100; 1,000; or 10,000 messages overall, respectively. The color is proportional to the logarithm of the number of messages per day (red, 1 message; blue, 400 messages; white, no message).

Results

Growth in the Number of Messages

The cumulative number m j(t) expresses how many messages have been sent by a certain member j up to a given time t [for better readability, we will not write the index j explicitly, m(t); see details on the notation in the SI Appendix, Sec. I]. The dynamics of m(t) between times t 0 and t 1 within the period of data acquisition T (t 0 < t 1T) can be considered as a growth process, where each member exhibits a specific growth rate r j (r for short notation):

graphic file with name zpq99909-8764-m01.jpg

where m 0m(t 0) and m 1m(t 1) are the number of messages sent until t 0 and t 1, respectively, by every member. To characterize the dynamics of the activity, we consider 2 measures. (i) The conditional average growth rate, 〈r(m 0)〉, quantifies the average growth of the number of messages sent by the members between t 0 and t 1 depending on the initial number of messages, m 0. In other words, we consider the average growth rate of only those members who have sent m 0 messages until t 0 (see Materials and Methods for more details). (ii) The conditional standard deviation of the growth rate for those members who have sent m 0 messages until t 0, σ(m0)(r(m0)r(m0))2, expresses the statistical spread or fluctuation of growth among the members depending on m 0. Both quantities are relevant in the context of Gibrat's law in economics (79) which proposes a proportionate growth process, entailing the assumption that the average and the standard deviation of the growth rate of a given economic indicator are constant and independent of the specific indicator value. That is, both 〈r(m 0)〉 and σ(m 0) are independent of m 0 (9).

In Fig. 2 A and B, we show the results of 〈r(m 0)〉 and σ(m 0) versus m 0 for both online communities. We find that the conditional average growth rate is fairly independent of m 0. On the other hand, the standard deviation decreases as a power-law of the form:

graphic file with name zpq99909-8764-m02.jpg

We obtain by least-square fitting the exponents βOC1 = 0.22 ± 0.01 for OC1 and βOC2 = 0.17 ± 0.03 for OC2 (the values deviate slightly for large m 0 due to low statistics). Although the web sites are used by different member populations, the power law and the obtained exponents are quite similar. The exponents are also close to those reported for growth in economic systems such as firms and countries (0.15 − 0.18) (16), research and development expenditures at universities (0.25) (17), scientific output (0.28 − 0.4) (18), and city population growth (0.19 − 0.27) (19). The approximate agreement between the exponents obtained for very different systems (social or of human origin) can be considered as a generalization of Gibrat's law, suggesting that the mechanisms behind the growth properties in different systems may originate in the human activity represented by Eq. 2.

Fig. 2.

Fig. 2.

Average and standard deviation of the growth rate versus number of messages. (A) Results for OC1. The average growth rate of messages conditional to m 0 is almost constant and the standard deviation decays with an exponent βOC1 = 0.22 ± 0.01. (B) Results for OC2. The standard deviation conditional to m 0 decays with an exponent βOC2 = 0.17 ± 0.03. (C) Results for OC1, when the messages are shuffled, displaying βrnd = 1/2. (D) Results for OC2, when the messages are shuffled. In all cases, t 0 corresponds to half of the period of data acquisition and t 1 to the end, which we found to provide optimal statistics (see SI Appendix, Fig. 1).

Figure 2 C and D depicts the results when we randomize the data of OC1 and OC2, respectively (see Materials and Methods for details of the randomization procedure), such that any temporal correlations are removed. The typical dynamics for such surrogate data set are shown in Fig. 1 B (the brown curve), which displays a clear random pattern of small fluctuations in comparison with the original data of larger fluctuations (green curve). We find that the random signal displays a close to constant average growth rate 〈r(m 0)〉 and that the fluctuations behave as in Eq. 2 but with an exponent βrnd = 1/2 (Fig. 2 C and D). The origin of this value has a simple explanation: If an isolated individual randomly flips an ideal coin with no memory of the previous attempt, then the fluctuations from the expected value of the fraction of obtained heads decay as a square-root of the number of throws, implying βrnd = 1/2. In contrast to randomness, here we hypothesize that the origin of the generalized version of Gibrat's law with β < 1/2 in Eq. 2 is a nontrivial long-term correlation in communication activity. These correlations possibly arise from internal and external stimuli from other members transmitted through the highly connected network of individuals, an effect that is absent in the randomized data. The exponent value of β ≈ 0.2 for OC1 and OC2 implies that the fluctuations of very active members are smaller than the ones of less active members, but they are significantly larger compared with the random case (compare Fig. 2 A and B with Fig. 2 C and D).

Long-Term Correlations

The exceptional quality of the data (more than 10 million messages spanning several effective decades of magnitude in terms of both activity and time) allows to test the above hypothesis by investigating the presence of temporal correlations in the individuals' activity. We aggregate the data to records of messages per day (an example is shown in Fig. 1 C) to avoid the daily cycle in the activity and analyze the number of messages sent by individuals per day, μ(t), where t denotes the day [m(t)t'=1tμ(t'), Fig. 1 DF shows the color coded daily activity of 3 members in OC1]. For every member, we obtain a record of a length of 63 days (OC1) or 492 days (OC2). We note that former studies reporting Eq. 2 (such as refs. 1619) typically were not based on data with temporal resolution as we use it here and therefore were not able to investigate its origin in terms of temporal correlations.

We quantify the temporal correlations in the members' activity by mapping the problem to a 1-dimensional random walk. The quantity Y(t)t'=1t(μ(t')μ(t)), where 〈μ(t)〉 is the average of the corresponding record μ(t), represents the position of the random walker that performs an up or down step given by μ(t′) − 〈μ(t)〉 at time step t′. The correlations after Δt steps are reflected in the behavior of the root-mean-square displacement F(Δt)[Y(t+Δt)Y(t)]2 (20), where 〈·〉 is the average over t and members. If the activity μ(t) is uncorrelated or short-term correlated, then one obtains Ft) ∼ (Δt)1/2, Fick's law of diffusion, after some cross-over time. In the case of long-term correlations, the result is a power-law increase

graphic file with name zpq99909-8764-m03.jpg

where H > 1/2 is the fluctuation exponent [also known as Hurst exponent (20)]. In statistical physics, long-term correlation or persistence is also referred to as long-term “memory”. Because, in general, the records might be affected by trends, we use the standard detrended fluctuation analysis (DFA) (21) to calculate H (see SI Appendix, section III for a detailed description).

The results for OC1 are shown in Fig. 3 A and B, where we calculate Eq. 3 by separating the members in groups with different total number of messages sent by the members, M. We find that Ft) asymptotically follows a power law with H ≈ 1/2 for the less active members who sent fewer than 10 messages in the entire period (M < 10). The dynamics of the more active members display clear long-term correlations. We find that the fluctuation exponent increases to H ≈ 0.75 for members with M > 103 (see Fig. 3 B). The smaller value of H for less active members could be due to the small amount of information that these members provide in the available time of data acquisition. When we shuffle the data to remove any temporal correlations, we obtain the random exponent H rnd = 1/2 (as seen in Fig. 3 B), confirming that the correlations in the data are due to temporal structure.

Fig. 3.

Fig. 3.

Long-term correlations in the message activity of OC1 (A and B) and OC2 (C and D). (A) DFA fluctuation functions averaged conditional to M, the total number of messages sent by each member (black, 1-2; red, 3-7; green, 8-20; blue, 21-54; orange, 55-148; brown, 149-403; maroon, 404-1096; violet, 1097-2980; turquoise, 2981-8103). The dotted lines serve as guides, the one in the bottom corresponds to the uncorrelated case, whereas the one in the top corresponds to the exponent 0.75. (B) Fluctuation exponent H measured from A on the scales 10 days ≤Δ t ≤ 63 days as a function of the total number of messages sent, M, for real (blue) and individually shuffled (green) records. (C) DFA fluctuation functions averaged conditional to M (colors as in A). The dotted lines correspond to the uncorrelated case (bottom) and to the exponent 1 (top). (D) Fluctuation exponents obtained from C on the scales 32 days ≤Δ t ≤ 200 days as a function of the total number of messages sent, M. Due to weak statistics causing large error bars, we do not consider the last 2 values for M > 500 as reliable. For clarity, the fluctuation functions in A and C are shifted vertically.

The dynamics of the message activity in OC2 is similar to OC1 (see Fig. 3 C). On large time scales, we measure the fluctuation exponent increasing from H ≈ 1/2 to H ≈ 0.9, with increasing M (the exponents for very active members are based on poor statistics and therefore carry large error bars). Analogous to the results obtained for OC1, there are no correlations in the shuffled records (H rnd = 1/2 in Fig. 3 D). The fact that H > 1/2 means that a sudden burst in activity of a member persists on times scales ranging from days to years. The distribution of activity is self-similar over time. Similar correlation results have been found in traded values of stocks and e-mail data (22).

Relation Between β and H

Next, we elaborate the mathematical framework that relates the growth process Eq. 2 to the long-term correlations, Eq. 3. To relate the exponent from Eq. 2, β, to the temporal correlation exponent γ, from Eq. 4, and therefore to H, one can first rewrite Eq. 1 as

graphic file with name zpq99909-8764-m04.jpg

Next, the total increment of messages Δm is expressed in terms of smaller increments μ(t), such as messages per day:

graphic file with name zpq99909-8764-m05.jpg

which is (assuming stationarity) statistically equivalent to Δm=t=1Δtμ(t), and one can write r1m0t=1Δtμ(t) for the growth rate. The conditional average growth is then

graphic file with name zpq99909-8764-m06.jpg

Then the conditional standard deviation σ(m0)=[r(m0)r(m0)]2 can be written in terms of the autocorrelation function as follows:

graphic file with name zpq99909-8764-m07.jpg

where C(Δt)=1σμ2[μ(t)μ(t)][μ(t+Δt)μ(t)] is the autocorrelation function of μ(t) and σμ is the standard deviation of μ(t). The autocorrelation function Ct) measures the interdependencies between the values of the record μ(t). For uncorrelated values, Ct) is zero for Δt > 0 because, on average, positive and negative products of the record will cancel out each other. In the case of short-term correlations, Ct) has a characteristic decay time, Δt ×. A prominent example is the exponential decay Ct) ∼ exp(−Δtt ×). Long-term correlations are described by a slower decay, namely a power-law

graphic file with name zpq99909-8764-m08.jpg

with the correlation exponent 0 < γ < 1, which is related to the fluctuation exponent H from Eq. 3 by γ = 2 − 2H (20). We note that γ = 1 (or γ > 1) corresponds to an uncorrelated record with H = 1/2. A key property of long-term correlations is a pronounced mountain–valley structure in the records (20). Statistically, large values of μ(t) are likely to be followed by large values and small values by small values. Ideally, this feature holds on all time scales, which means a sequence in daily, weekly, or monthly resolution is correlated in the same way as the original sequence.

Assuming long-term correlations asymptotically decaying as in Eq. 4, we approximate the double sum with integrals and obtain

graphic file with name zpq99909-8764-m09.jpg

In order to relate Δt and m 0, one can use Δt = x t 0 [where x is an arbitrary (small) constant that simply states how large Δt is compared with t 0], and m 0t 0, which states that the number of messages is proportional to time assuming stationary activity. By using these 2 arguments we obtain

graphic file with name zpq99909-8764-m10.jpg

Comparing this equation with Eq. 2, we finally obtain β = γ/2, and, with γ = 2 − 2H,

graphic file with name zpq99909-8764-m11.jpg

Eq. 5 is a scaling law formalizing the relation between growth and long-term correlations in the activity and is confirmed by our data. For OC1, we measured βOC1 ≈ 0.22 yielding H OC1 ≈ 0.78 from Eq. 5, which is in approximate agreement with the (maximum) exponent we obtained by direct measurements for OC1 (H = 0.75 ± 0.05 from Fig. 3 B). For OC2, we obtained βOC2 ≈ 0.17 and therefore H OC2 ≈ 0.83 through Eq. 5, which is not too far from the (maximum) exponent found by direct measurements for OC2 (H = 0.88 ± 0.03). According to Eq. 5, the original Gibrat's law (βG = 0) corresponds to very strong long-term correlations with H G = 1. This is the case when the activity on all time scales exhibits equally strong correlations. In contrast, βrnd = 1/2 represents completely random activity (H rnd = 1/2), as obtained for the randomized data in Fig. 3 B and D.

The mathematical framework relating long-term correlations quantified by H and the growth fluctuations quantified by β could be relevant to other complex systems. While the generalized version of Gibrat's law has been reported for economic indicators displaying β ≈ 0.2 (1618), the origin of this scaling law is not clear and is still being investigated. Our results suggest that the value of β could be explained by the existence of long-term correlations in the activity of the corresponding system ranging from firms and markets to social and population dynamics. In turn, Eq. 5 establishes a missing link between studies of growth processes in economic or social systems (1618) and studies of long-term correlations, such as in finance and the economy (23), Ethernet traffic (24), and human brain (25) or motor activity (26). Our results foreshadow the possibility that systems involving other types of human interaction, such as various Internet activities, communication via cell phones, trading activity, etc., may display similar growth and correlation properties as found here, offering the possibility of explaining their dynamics in terms of the long-term persistence of individuals' behavior.

Growth of the Degree in the Underlying Social Network

Communication among the members of a community represents a type of a social interaction that defines a network, whereas a message is sent either based on an existing relation between 2 members or establishing a new one. There is considerable interest in the origin of broad distributions of activity in social systems. Two paradigms have been invoked for various applications in social systems: the “rich-get-richer” idea used by Simon in 1955 (27) and the models based on optimization strategies as proposed by Mandelbrot (28). Regarding network models, the preferential attachment (PA) model has been introduced (29) to generate a type of stochastic scale-free network with a power-law degree distribution in the network topology. Considering the social network of members linked when they exchange at least 1 message (that has not been sent before), we examine the dynamic of the number of outgoing links of each member [the out-degree k(t)] in analogy to Eq. 2.

We start from the empty set of nodes consisting of all the members in the community and chronologically add a directed link between 2 members when a messages is sent. In analogy to the growth in the number of messages m(t) of each member, we study the growth of the members' out-degree k(t), i.e. the number of links to others. We define the growth rate of every member as

graphic file with name zpq99909-8764-m12.jpg

where k 0k(t 0) is the out-degree of a member at time t 0 and k 1k(t 1) is the out-degree at time t 1. Again, there is a growth rate for each member j, but for better readability we skip the index. In Fig. 4, we study 〈r k(k 0)〉, the average growth rate conditional to the initial out-degree k 0, and σk(k 0), the standard deviation of the growth rate conditional to the initial out-degree k 0 for OC1 and OC2. We obtain almost constant average growth 〈r k(k 0)〉 as a function of k 0 as in the study of messages.

Fig. 4.

Fig. 4.

Mean out-degree growth rate and standard deviation versus initial out-degree. (A) Results for OC1. The average growth of out-degree conditional to the out-degree at t 0 is almost constant. The standard deviation decays with an exponent βk,OC1 = 0.22 ± 0.02. (B) Results for OC2. The standard deviation conditional to the out-degree at t 0 decays with an exponent βk,OC2 = 0.17 ± 0.08. The quantities are analogous to those of Fig. 2 except that here the growth rate of the out-degree r k is considered instead of the number of messages sent.

The conditional standard deviation of the network degree, σk(k 0), is shown in Fig. 4 for both social communities. We obtain a power-law relation analogous to Eq. 2:

graphic file with name zpq99909-8764-m13.jpg

with β-exponents very similar to those found for the number of messages, namely βk,OC1 = 0.22 ± 0.02 for OC1 and βk,OC2 = 0.17 ± 0.08 for OC2. These values are consistent with those we obtained for the activity of sending messages.

Next, we consider the preferential attachment model, which has been introduced to generate scale-free networks (29) with power-law degree distribution P(k) of the type investigated in the present study. Essentially, it consists of subsequently adding nodes to the network by linking them to existing nodes that are chosen randomly with a probability proportional to their degree. We consider the undirected network and study the degree growth properties using Eqs. 6 and 7 and calculate the conditional average growth rate 〈r PA(k 0)〉 and the conditional standard deviation σPA(k 0). The times t 0 and t 1 are defined by the number of nodes attached to the network. Fig. 2 in the SI Appendix, section IV shows the results where an average degree 〈k〉 = 20; 50,000 nodes in t 0 and 100,000 nodes in t 1 were chosen. We find constant average growth rate that does not depend on the initial degree k 0. The conditional standard deviation is a function of k 0 and exhibits a power-law decay characterized by Eq. 7 with βPA = 1/2. The value βPA = 1/2 in Eq. 5 corresponds to H = 1/2, indicating complete randomness. There is no memory in the system. Because each addition of a new node is completely independent from precedent ones, there cannot be temporal correlations in the activity of adding links. Therefore, purely preferential attachment type of growth is not sufficient to describe the social network dynamics found in the present study and further temporal correlations have to be incorporated according to Eq. 3.

For the PA model, it has been shown that the degree of each node grows in time as k(t)˜(tt*)b, where t* is the time when the corresponding node was introduced to the system and b = 1/2 is the dynamics exponent in growing network models (30). Accordingly, the growth rate is given by rPA=blnt1t0, which is a constant independent of k 0, in accordance with our numerical findings. Furthermore, in the SI Appendix, section IV we analytically obtain the exponent βPA = 1/2 and confirm the numerical results as well. Interestingly, an extension of the standard PA model has been proposed (31) that takes into account different fitnesses of the nodes to acquiring links involving a distribution of b-exponents and therefore a distribution of growth rates. This model opens the possibility to relate the distribution of fitness values to the fluctuations in the growth rates, a point that requires further investigation.

Discussion

From a statistical physics point of view, the finding of long-term correlations opens the question of the origin of such a persistence pattern in the communication. At this point, we speculate on 2 possible scenarios that require further studies. The question is whether the finding of an exponent H > 0.5 is due to a power-law (Levy-type) distribution (32, 33) in the time interval between 2 messages of the same person or just from pure correlations or long-term memory in the activity of people. In the first scenario, the intervals between the messages follow a power law (13, 34). Accordingly, the activity pattern comprises many short intervals and few long ones, implying persistent epochs of small and large activity. This fractal-like activity leads to long-term correlations with H > 1/2 (see the analogous problem of the origin of long-term correlations in DNA sequences as discussed in ref. 33). This scenario implies a direct link between the correlations and the distribution of interevent intervals which can be obtained analytically. In the second scenario, the intervals between the messages do not follow a Levy-type distribution, but the value of the time intervals are not independent of each other, again representing long-term persistence. For example, the distribution of interevent times could be stretched exponential [see recent work on the study of extreme events of climatological records exhibiting long-term correlations (35)]. Thus, deciding between these 2 possible scenarios for the origin of correlations in activity requires an extended analysis of interevent intervals as well as correlations to determine whether the behavior is Levy-like or pure memory-like. A careful statistical analysis is needed.

To some extent, the human nature of persistent interactions enables the prediction of the actants' activity. Our finding implies that traditional meanfield approximations based on the assumption that the particular type of human activity under study can be treated as a large number of independent random events (Poisson statistics) may result in faulty predictions. On the contrary, from the growth properties found here, one can estimate the probability for members of a certain activity level to send more than a given number of messages in the future. This result may help to improve the proper allocation of resources in communication-based systems ranging from economic markets to political systems. As a by-product, our finding that the activity of sending messages exhibits long-term persistence suggests the existence of an underlying long-term correlated process. This process can be understood as an unknown individual state driven by various internal and external stimuli (36, 37) providing the probability to send messages. In addition, the memory in activity found here could be the origin of the long-term persistence found in other records representing a superposition of the individuals' behavior, such as the Ethernet traffic (24), highway traffic, stock markets, and so forth.

Materials and Methods

Calculations of 〈r(m 0)〉, σ(m 0) and Optimal Times t 0 and t 1.

The average growth rate, 〈r(m 0)〉, and the standard deviation, σ(m0)=r(m0)2r(m0)2, are defined as follows. Calling P(r|m 0) the conditional probability density of finding a member with growth rate r(m 0) with the condition of initial number of messages m 0, we obtain

graphic file with name zpq99909-8764-m14.jpg

and

graphic file with name zpq99909-8764-m15.jpg

In order to calculate the growth rate in Eq. 1, one has to choose the times t 0 and t 1 in the period of data acquisition T. Naturally, it is best to use all data in order to have optimal statistics. Accordingly, t 1 is chosen best at the end of the available data (t 1 = T). We argue that if the choice of t 0 is too small, then m(t 0) is zero for many members (those that send their messages later), which are then rejected in the calculation because of the division in Eq. 1. Conversely, if the choice of t 0 is too large, then there is not enough time to observe the member's activity and r = 0 will occur frequently, indicating no change (members have sent their messages before). Thus, there must be an optimal time in between. In the SI Appendix, section II, Fig. 1, we plot, as a function of t 0, the number of members with at least 1 message at t 0 [m 0 > 0] and further exhibit at least some activity until t 1 = T [m 1m 0 > 0]. For both online communities, we find an optimal t 0 in the middle of the period of observation t 0 = T/2, a value that is used for the analysis in the main text.

Shuffling of the Message Data.

The raw data comprises 1 entry for each message consisting of the time when the message is sent, the sender identifier, and the receiver identifier. For example:

time sender receiver
1 a b
2 a c
4 b a
6 c d
7 a b

At t = 1 member a sends a message to member b, at t = 2 member a sends a message to member c, and so on.

The randomized surrogate dataset is created by randomly swapping the instants (time) at which the messages are sent between 2 events chosen at random. Thus, each message entry randomly obtains the time of another one, meaning that the total number of messages is preserved and the associations between them get shuffled. Temporal correlations are destroyed, but the set of instants at which the messages are sent remains unchanged. For instance, swapping events at t = 1 and t = 6 results in t = 1, c → d, and t = 6, a → b.

Supplementary Material

Supporting Information

Acknowledgments.

We thank C. Briscoe, L.K. Gallos, and H.D. Rozenfeld for discussions. This work was supported by National Science Foundation Grant SES-0624116. F.L. acknowledges financial support from The Swedish Bank Tercentenary Foundation.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

*

The study of the de-identified MSM dating site network data was approved by the Regional Ethical Review Board in Stockholm, record 2005/5:3.

This article contains supporting information online at www.pnas.org/cgi/content/full/0902667106/DCSupplemental.

References

  • 1.Merton RK. The unanticipated consequences of purposive social action. Am Sociol Rev. 1936;1:894–904. [Google Scholar]
  • 2.Weber M. Economy and Society. vol. 1. Berkeley: University of California Press; 1968. [Google Scholar]
  • 3.Giddens A. Stanford: Stanford Univ Press; 1993. New Rules of Sociological Method. [Google Scholar]
  • 4.Durkheim É. Le suicide. Étude de sociologie. Paris: Les Presses Universitaires de France; 1897. [Google Scholar]
  • 5.Pareto V. Geneva: Droz; 1896. Cours d'Economie Politique. [Google Scholar]
  • 6.Zipf G. Selective Studies and the Principle of Relative Frequency in Language. Cambridge, MA: Harvard Univ Press; 1932. [Google Scholar]
  • 7.Gibrat R. Les Inégalités Économiques. Paris: Libraire du Recueil Sierey; 1931. [Google Scholar]
  • 8.Sutton J. Gibrat's legacy. J Econ Lit. 1997;35:40–59. [Google Scholar]
  • 9.Gabaix X. Zipf's law for cities: An explanation. Q J Econ. 1999;114:739–767. [Google Scholar]
  • 10.Holme P, Edling CR, Liljeros F. Structure and time evolution of an Internet dating community. Soc Networks. 2004;26:155–174. [Google Scholar]
  • 11.Paxson V, Floyd S. Wide area traffic: The failure of Poisson modeling. IEEE/ACM Trans Networking. 1995;3:226–244. [Google Scholar]
  • 12.Dewes C, Wichmann A, Feldman A. Proceedings of the. 2003 ACM SIGCOMM Conference on Internet Measurement (IMC-03) New York: ACM Press; 2003. [Google Scholar]
  • 13.Barabási A-L. The origin of bursts and heavy tails in human dynamics. Nature. 2005;435:207–211. doi: 10.1038/nature03459. [DOI] [PubMed] [Google Scholar]
  • 14.Oliveira JG, Barabási AL. Darwin and Einstein correspondence patterns. Nature. 2005;437:1251. doi: 10.1038/4371251a. [DOI] [PubMed] [Google Scholar]
  • 15.Zhou T, Kiet HAT, Kim BJ, Wang BH, Holme P. Role of activity in human dynamics. Europhys Lett. 2008;82:28002. [Google Scholar]
  • 16.Stanley MHR, et al. Scaling behaviour in the growth of companies. Nature. 1996;379:804–806. [Google Scholar]
  • 17.Plerou V, Amaral LAN, Gopikrishnan P, Meyer M, Stanley HE. Similarities between the growth dynamics of university research and of competitive economic activities. Nature. 1999;400:433–437. [Google Scholar]
  • 18.Matia K, Amaral LAN, Luwel M, Moed HF, Stanley HE. Scaling phenomena in the growth dynamics of scientific output. J Am Soc Inf Sci Tec. 2005;56:893–902. [Google Scholar]
  • 19.Rozenfeld HD, et al. Laws of population growth. Proc Nat Acad Sci USA. 2008;105:18702–18707. doi: 10.1073/pnas.0807435105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Feder J. Fractals, Physics of Solids and Liquids. New York: Plenum Press; 1988. [Google Scholar]
  • 21.Peng CK, et al. Mosaic organization of DNA nucleotides. Phys Rev E. 1994;49:1685–1689. doi: 10.1103/physreve.49.1685. [DOI] [PubMed] [Google Scholar]
  • 22.Eisler Z, Bartos I, Kertész J. Fluctuation scaling in complex systems: Taylor's law and beyond. Adv Phys. 2008;57:89–142. [Google Scholar]
  • 23.Mantegna RN, Stanley HE. An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge, UK: Cambridge Univ Press; 1999. [Google Scholar]
  • 24.Leland WE, Taqqu MS, Willinger W, Wilson DV. On the self-similar nature of ethernet traffic (extended version) IEEE/ACM Trans Networking. 1994;2:1–15. [Google Scholar]
  • 25.Linkenkaer-Hansen K, Nikouline VV, Palva JM, Ilmoniemi RJ. Long-range temporal correlations and scaling behavior in human brain oscillations. J Neurosci. 2001;21:1370–1377. doi: 10.1523/JNEUROSCI.21-04-01370.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ivanov PC, Hu K, Hilton MF, Shea SA, Stanley HE. Endogenous circadian rhythm in human motor activity uncoupled from circadian influences on cardiac dynamics. Proc Nat Acad Sci USA. 2007;104:20702–20707. doi: 10.1073/pnas.0709957104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Simon HA. On a class of skew distribution functions. Biometrika. 1955;42:425–440. [Google Scholar]
  • 28.Mandelbrot B. In: An Informational Theory of the Statistical Structure of Language. Jackson W, editor. London: Butterworth; 1953. pp. 486–504. [Google Scholar]
  • 29.Barabási AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
  • 30.Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74:47–97. [Google Scholar]
  • 31.Bianconi G, Barabási AL. Competition and multiscaling in evolving networks. Europhys Lett. 2001;54:436–442. [Google Scholar]
  • 32.Shlesinger MF, West BJ, Klafter J. Lévy dynamics of enhanced diffusion: Application to turbulence. Phys Rev Lett. 1987;58:1100–1103. doi: 10.1103/PhysRevLett.58.1100. [DOI] [PubMed] [Google Scholar]
  • 33.Buldyrev SV, et al. Generalized Lévy-walk model for DNA nucleotide sequences. Phys Rev E. 1993;47:4514–4523. doi: 10.1103/physreve.47.4514. [DOI] [PubMed] [Google Scholar]
  • 34.Gerstein GL, Mandelbrot B. Random walk models for spike activity of single neuron. Biophys J. 1964;4:41–68. doi: 10.1016/s0006-3495(64)86768-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bunde A, Eichner JF, Kantelhardt JW, Havlin S. Long-term memory: A natural mechanism for the clustering of extreme events and anomalous residual times in climate records. Phys Rev Lett. 2005;94:048701. doi: 10.1103/PhysRevLett.94.048701. [DOI] [PubMed] [Google Scholar]
  • 36.Hedström P. Dissecting the Social: On the Principles of Analytical Sociology. Cambridge, UK: Cambridge Univ Press; 2005. [Google Scholar]
  • 37.Kentsis A. Mechanisms and models of human dynamics. Nature. 2006;441:E5–E6. doi: 10.1038/nature04901. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES