Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2013 Oct 7;110(43):17259–17262. doi: 10.1073/pnas.1304179110

Excitable human dynamics driven by extrinsic events in massive communities

Joachim Mathiesen a,1, Luiza Angheluta b, Peter T H Ahlgren a, Mogens H Jensen a
PMCID: PMC3808623  PMID: 24101482

Significance

Online social networks have over the last decade influenced the way people interact. Data from Twitter allow for a detailed study of the activity in online massive communities. By studying the frequency by which international brands appear on Twitter and the trade of financial securities on financial markets, we find a characteristic bursty behavior of the activity levels of Twitter users and market participants. We explain the bursty behavior by a simple model of the large-scale human behavior and quantify the correlations in the activity levels. The statistical similarity of the two social systems is an indication that the complex process underlying individual decision-making might not be very different for Twitter users and participants in financial markets.

Keywords: social networks, statistical mechanics, extreme events

Abstract

Using empirical data from a social media site (Twitter) and on trading volumes of financial securities, we analyze the correlated human activity in massive social organizations. The activity, typically excited by real-world events and measured by the occurrence rate of international brand names and trading volumes, is characterized by intermittent fluctuations with bursts of high activity separated by quiescent periods. These fluctuations are broadly distributed with an inverse cubic tail and have long-range temporal correlations with a Inline graphic power spectrum. We describe the activity by a stochastic point process and derive the distribution of activity levels from the corresponding stochastic differential equation. The distribution and the corresponding power spectrum are fully consistent with the empirical observations.


Online social networks have emphatically changed the way people interact. The development of network theories and growth in available data on human behavior (14) has prompted an explosive interest in research on the evolution of behaviors (57) and social structures (8, 9). Among the many forms of online social media, microblogging services such as Twitter (1012) are characterized by a real-time dynamics with large numbers of user broadcasts related to real-world events. Twitter is a popular microblogging platform where a registered user can submit small pieces of information, named “tweets,” that are either private or made public to the user’s followers. The length of a tweet is limited to 140 characters and its content ranges widely from personal updates to massively distributed advertisements or political messages. Twitter has a global outreach and, hence, is used by an increasing number of companies and political organizations to disseminate news. To a large extent, Twitter users can be seen as direct social sensors to measure the popularity of various topics. A large part of recent research on Twitter uses user activity as predictor for real-world events including the dynamics of stock market prices (1,3), box office revenues (14), real-time detection of the location of earthquakes hitting populated areas (15), and for opinion mining and political sentiment analysis (16). Large-scale behavioral data from other online media have been shown to have a similar predicting power, e.g., Google query volumes have been used to detect early signs of stock market moves (17) or more general movements in society (18).

Whereas behavioral data from Twitter have been suggested to predict many real-world events or have been used in mapping out social networks, the statistics of the combined user activity are not well understood. Here we suggest a stochastic model for the user activity in massive online communities. Our model sheds light on the statistical properties of the large-scale user activity on Twitter as well as the underlying correlations. Similar to the user activity on Twitter, trading volumes on the stock market reflect the interest that investors have in particular securities or products at given prices. Interestingly, as we will point out below, we find that the trading activity on financial securities is quite similar to the large-scale user activity on Twitter.

We have automatically queried Twitter for tweets containing one or more international brand names (SI Text). These tweets appear at highly irregular time intervals (Fig. 1), reflecting intermittent user activity levels. We consider the broadcasting of tweets to be a random point process with large fluctuations in the time intervals between the online appearances of messages. The number of tweets containing a certain brand “A” as a function of t is therefore given by a time signal Inline graphic composed of isolated events occurring at random times Inline graphic, Inline graphic where the index Inline graphic refers to a specific posting event. For each query, a number of the latest tweets Inline graphic is returned. Thus, we determine an average tweet rate for a given query k as

graphic file with name pnas.1304179110eq1.jpg

where Inline graphic and Inline graphic correspond to the time of the oldest, respectively, latest, tweet returned by the query k. The time interval Inline graphic for a fixed number Inline graphic of tweets is a highly fluctuating variable from one query to another. We notice that the appearance of tweets on Twitter resembles a nonhomogeneous Poisson process with random fluctuations in the average tweet rate Inline graphic.

Fig. 1.

Fig. 1.

Temporal variation of tweet rates of three international brands, IBM, Pepsi, and Toyota. The time signals are for all brands intermittent, i.e., they have longer periods of relatively steady activity levels interrupted by sudden high-activity spikes. The time signals are all modulated by an underlying periodic variation over days and weeks.

We have further collected data for the trading volume of selected equities over a period similar to that covered by the Twitter data. In particular, we consider the trading volume in three shares: Apple Inc. (AAPL), Nokia Corporation (NOK), and Green Mountain Coffee Roasters Inc. (GMCR). The number of shares traded for each security was accumulated and sampled in 1-min intervals. Changes in trading volume and price are known to be highly leptokurtic (19), have long-range correlations (20), and intriguing scaling properties (21, 22). We consider the volume as a simple proxy for the temporal interest that the market has in a given security, disregarding more complex effects that might influence the price formation process.

In the Twitter data, we distinguish two types of user activity, one where users post messages independently of other tweets and one where users interact directly, e.g., by reposting information from other users in so-called retweets or by submitting responses to existing tweets. In general, a retweet contains text from the original tweet together with a reference to the author who posted it. Retweets typically form a smaller subset of all tweets, and the frequency by which individual tweets are retweeted follows a power law with an exponent similar to the out-degree distribution of Twitter users (SI Text). The out-degree is here measured by the number of followers of a user, i.e., the number of people that directly receive tweets posted by that user. This suggests that the rate of information spread by reposting is proportional with the number of followers and is limited by the local network topology. However, because there are only few tweets that generate a large flux of retweets, this may not be the most efficient way to diffuse information between Twitter’s users. There are information pathways that are not only related to the network topology, but, to a larger degree, correspond to many users tweeting at the same time about the same thing triggered by events outside the network.

The intermittent dynamics of individuals has previously been modeled in terms of a timing selection mechanism between different tasks (23). The prioritization of various tasks is suggested to lead to a bursty dynamics with power-law distributed waiting times. This is in contrast with a homogeneous Poisson process, where the waiting time between tasks that are being selected at random follows an exponential distribution. Here, we propose a global measure of collective human behavior and introduce a stochastic model for the global activity rate associated with many interacting individuals in a large social organization. The activity rate is characterized by long-term memory effects as well as nonexponential distributed waiting times.

Results

Interestingly, the intermittent tweet rates of specific brands follow a distribution Inline graphic with a power-law tail with an exponent β close to an inverse cube, Inline graphic (SD), as seen in Fig. 2 and in SI Text. For the trading volumes, we achieve values for the exponent Inline graphic, Inline graphic, and Inline graphic. Moreover, the fluctuations in the flux of tweets are long-range correlated with a power spectrum that decays as Inline graphic, where f is the frequency and Inline graphic (SD), in an intermediate frequency window corresponding to timescales from 20 min to 24 h as shown in Fig. 3. This means that tweets posted at a given time are influenced with an equal strength by tweets on all timescales ranging back as far as Inline graphic h. At the same time, high bursts of new tweets, extreme events, occur in the tail of Inline graphic. The Inline graphic noise is a widespread phenomenon observed in a variety of different systems, including voltage fluctuations (24), heartbeats (25), freeway traffic (26), music (27), and trades in financial securities (20), along with many other examples. Although there is not a unified theory that would apply to all systems exhibiting Inline graphic noise, there are numerous models that reproduce the Inline graphic fluctuations in temporal signals with fluctuations drawn from different distributions. On the other hand, there are also plenty of studies that focus on the non-Gaussian, power-law statistics of time signals, Inline graphic, independent of their power spectrum. More recent studies on stochastic point processes investigated the relationship between Pareto-type distributions of the variables and their Inline graphic power spectrum, e.g., refs. 2830. The idea behind a stochastic point process is to model the average waiting time between random, discrete events by a multiplicative noise process. Essentially, the complexity of scale-free distributed variables with a Inline graphic spectral density emerges from the multiplicative noise. Here we present a stochastic point process that captures both the Inline graphic and Inline graphic -noise features. Other models of correlated human behavior (31) predict similarly a power-law distribution of the activity rates. However, these models possess a weaker memory effect and do not reproduce the scaling exponent for the power spectrum that we observe.

Fig. 2.

Fig. 2.

Probability density functions of (A) brand tweet rates and (B) trading volume rates for stocks. The mean values have been subtracted from the individual rates and the rates have been normalized by their respective SDs. The density functions collapse and reveal a common scaling behavior for relatively large rates. The green line added to both A and B is a guide to the eye and is consistent with a scaling exponent of Inline graphic. In B, we consider trading activities in the companies Apple Inc. (AAPL), Nokia Corporation (NOK), and Green Mountain Coffee Roasters Inc. (GMCR).

Fig. 3.

Fig. 3.

Power spectra of the activity rates Inline graphic for the individual brands in SI Text (A) and stocks (B). For high frequencies 0.1–24 h−1, the spectra have a characteristic Inline graphic behavior with a crossover to white noise at very high frequencies. The green line is a guide to the eye and corresponds to a Inline graphic behavior.

We assume a scenario where the human activity in the case of no external input is determined by a natural drift toward inactivity. That is, the waiting time τ since the last activity increases with time Inline graphic. On the other hand, excitation by external events drives the system toward higher activity levels. Without correlations in the user activity, the rate Inline graphic is determined by a balance between the drift toward inactivity and repeated excitation. We assume that the correlation in the human dynamics is controlled by a current waiting time between events in the shape of a multiplicative noise with an amplitude given by Inline graphic. The stochastic process for the average waiting time is therefore determined by a stochastic differential equation of the form

graphic file with name pnas.1304179110eq2.jpg

where η is Gaussian noise with zero mean and unit variance, and the deterministic part Inline graphic is chosen such that the process attains a nontrivial stationary distribution. Collective interactions between users sending messages on Twitter are effectively modeled by the intrinsic, multiplicative noise. The amplitude of the intrinsic noise term is proportional to Inline graphic, which implies that if the dynamics was solely driven by noise, the waiting time τ would have an absorbing state, i.e., Inline graphic, corresponding to a tweeting activity that is constant in time and never ceases. The drift term equal to 1 is added to mimic that, if nothing happens, the average waiting time would increase linearly with time as mentioned above. Due to this constant drift term, the absorbing state Inline graphic is never attained, although there are sudden excursions in its neighborhood. The stationary probability distribution function of waiting times Inline graphic corresponding to Eq. 2 is obtained as the steady-state solution of the Fokker–Plank equation in the Ito formulation and is given as

graphic file with name pnas.1304179110eq3.jpg

apart from the normalization constant. The divergence at large τ is suppressed by the cutoff function which depends on Inline graphic. However, at small τ, corresponding to large tweet rates γ, the function Inline graphic is irrelevant because Inline graphic. Thus, in the scaling regime, we can safely ignore Inline graphic. Using that Inline graphic, the stochastic dynamics for γ follows from Eq. 2 by Ito’s lemma and is given as

graphic file with name pnas.1304179110eq4.jpg

where we ignore the contributions due to Inline graphic that only determine the range over which Inline graphic is power-law distributed as Inline graphic. The power spectrum of tweet rate fluctuations is determined by the joint distribution Inline graphic associated with the stochastic process in Eq. 4. By the Wiener–Khintchine theorem, the power spectral density of γ is related to the correlation function as

graphic file with name pnas.1304179110eq5.jpg

where the correlation function Inline graphic is defined as

graphic file with name pnas.1304179110eq6.jpg

As with other point processes, one may assume that the transition distribution has an eigenfunction expansion, and therefore

graphic file with name pnas.1304179110eq7.jpg

where the unnormalized probability eigenfunctions Inline graphic satisfy the master equation corresponding to Eq. 1 and are given by

graphic file with name pnas.1304179110eq8.jpg

Combining Eqs. 6 and 7, we have that

graphic file with name pnas.1304179110eq9.jpg

where Inline graphic is the first moment of the probability eigenfunction. By its Fourier transform as in Eq. 5, the power spectrum can be written as a sum of Lorentzian spectra

graphic file with name pnas.1304179110eq10.jpg

The relation between Inline graphic and the eigenvalues Inline graphic follows from the structure of the unnormalized eigenfunction obtained from Eq. 7 and given as Inline graphic, where Inline graphic is the Bessel function of the first kind. Hence, the first moment of it is Inline graphic. The regime Inline graphic is obtained when Inline graphic.

The joint appearance of the Inline graphic fluctuations distributed with an inverse cubic law for the flux of tweets and volume of trades has interesting implications. Although there is no consensus on a unique underlying process generating Inline graphic noise, we associate this kind of fluctuations with complex systems that exhibit an increase in structures and information due to a long-term memory. In general, the collective interactions in relation to trading and tweeting exhibit the characteristics of an emergent phenomenon.

Intermittent fluctuations in the rate of tweets on Twitter can happen by two types of information pathways: (i) the influx of tweets triggered from the outside world onto Twitter (many independent users tweet about the same thing) and (ii) cascades of retweets or replies to existing tweets. Because there is a larger influx of new tweets compared with the rate of retweets, we conclude that most of the information spread across the online network happens in a “one-step” cascade when many unrelated people tweet about the same thing.

The activity on Twitter may not be very different from the dynamics of stock trades on financial markets because both are influenced by the social behavior inside massive communities combined with simple rules on the interaction set by, e.g., the platform through which the individuals interact. Furthermore, the similarity on large scales indicates a common feature in the complex process underlying the decision-making of users on Twitter and participants in financial markets.

Materials and Methods

From the Twitter Application Programming Interface (API), we automatically query for tweets on Twitter containing one or more of 92 preselected international brand names. The full list of brand names is given in SI Text. For each query to Twitter a maximum number of 1,500 tweets is returned by the API. Each returned tweet has a time stamp, which can be used to estimate an average tweet rate. The dataset used in this study was created by monitoring the public timeline for a period of 4 mo, November 2010 to February 2011; 2 mo, January–February 2012; and 2 mo, October–November 2012. During these periods, we computed the tweet rates of selected international brands with a sampling rate down to half a minute.

Supplementary Material

Supporting Information

Acknowledgments

Suggestions and comments by Anders Blok are gratefully acknowledged. This study was supported by the Danish National Research Foundation through the Center for Models of Life.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1304179110/-/DCSupplemental.

References

  • 1.King G. Ensuring the data-rich future of the social sciences. Science. 2011;331(6018):719–721. doi: 10.1126/science.1197872. [DOI] [PubMed] [Google Scholar]
  • 2.Ginsberg J, et al. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012–1014. doi: 10.1038/nature07634. [DOI] [PubMed] [Google Scholar]
  • 3.Borgatti SP, Mehra A, Brass DJ, Labianca G. Network analysis in the social sciences. Science. 2009;323(5916):892–895. doi: 10.1126/science.1165821. [DOI] [PubMed] [Google Scholar]
  • 4.Oliveira JG, Barabási A-L. Human dynamics: Darwin and Einstein correspondence patterns. Nature. 2005;437(7063):1251. doi: 10.1038/4371251a. [DOI] [PubMed] [Google Scholar]
  • 5.Buldyrev SV, Parshani R, Paul G, Stanley HE, Havlin S. Catastrophic cascade of failures in interdependent networks. Nature. 2010;464(7291):1025–1028. doi: 10.1038/nature08932. [DOI] [PubMed] [Google Scholar]
  • 6.Rybski D, Buldyrev SV, Havlin S, Liljeros F, Makse HA. Scaling laws of human interaction activity. Proc Natl Acad Sci USA. 2009;106(31):12640–12645. doi: 10.1073/pnas.0902667106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Centola D. The spread of behavior in an online social network experiment. Science. 2010;329(5996):1194–1197. doi: 10.1126/science.1185231. [DOI] [PubMed] [Google Scholar]
  • 8.Arenas A, Danon L, Díaz-Guilera A, Gleiser PM, Guimera R. Community analysis in social networks. Eur Phys J B. 2004;38(2):373–380. [Google Scholar]
  • 9. Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E. (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008:P10008.
  • 10. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? Proceedings of the 19th International Conference on World Wide Web (Association for Computing Machinery, New York), pp 591–600.
  • 11.Mandavilli A. Peer review: Trial by Twitter. Nature. 2011;469(7330):286–287. doi: 10.1038/469286a. [DOI] [PubMed] [Google Scholar]
  • 12.Huberman BA, Romero DM, Wu F. Crowdsourcing, attention and productivity. J Inf Sci. 2009;35:758–765. [Google Scholar]
  • 13. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: Real-time event detection by social sensors. Proceedings of the 19th International Conference on World Wide Web (Association for Computing Machinery, New York), pp 851–860.
  • 14.Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM. 2010;10:178–185. [Google Scholar]
  • 15.Bollen J, Mao H, Zeng J. Twitter mood predicts the stock market. Comp. Sci. 2011;2:1–8. [Google Scholar]
  • 16. Asur S, Huberman BA (2010), Predicting the future with social media. Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference (IEEE, Los Alamitos, CA), Vol 1, pp 492–499.
  • 17.Preis T, Moat HS, Stanley HE. Quantifying trading behavior in financial markets using Google Trends. Sci Rep. 2013;3:1684. doi: 10.1038/srep01684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Choi H, Varian H. Predicting the present with Google Trends. Econ Rec. 2012;88:2–9. [Google Scholar]
  • 19.Cont R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant Finance. 2001;1:223–236. [Google Scholar]
  • 20. Bonanno, G. and Lillo, F. and Mantegna, R.N. (2000), Dynamics of the number of trades of financial securities. Physica A 280(1):136–141.
  • 21.Mantegna RN, Stanley HE. Scaling behaviour in the dynamics of an economic index. Nature. 1995;376:46–49. [Google Scholar]
  • 22.Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. A theory of power-law distributions in financial market fluctuations. Nature. 2003;423(6937):267–270. doi: 10.1038/nature01624. [DOI] [PubMed] [Google Scholar]
  • 23.Barabási A-L. The origin of bursts and heavy tails in human dynamics. Nature. 2005;435(7039):207–211. doi: 10.1038/nature03459. [DOI] [PubMed] [Google Scholar]
  • 24.Voss RF, Clarke J. Flicker (1/f) noise: Equilibrium temperature and resistance fluctuations. Phys Rev B. 1976;13:556. [Google Scholar]
  • 25. Kobayashi, M and Musha, T (1982), 1/f fluctuation of heartbeat period, IEEE Trans Biomed Eng. 6:456–457. [DOI] [PubMed]
  • 26.Zhang X, Hu G. 1/f noise in a two-lane highway traffic model. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1995;52(5):4664–4668. doi: 10.1103/physreve.52.4664. [DOI] [PubMed] [Google Scholar]
  • 27.Voss RF, Clarke J. 1/f noise in music: Music from 1/f noise. J Acoust Soc Am. 1978;63:258. [Google Scholar]
  • 28.Kaulakys B, Gontis V, Alaburda M. Point process model of 1/f noise vs a sum of Lorentzians. Phys Rev E Stat Nonlin Soft Matter Phys. 2005;71(5 Pt 1):051105. doi: 10.1103/PhysRevE.71.051105. [DOI] [PubMed] [Google Scholar]
  • 29.Ruseckas J, Kaulakys B. 1/f Noise from nonlinear stochastic differential equations. Phys Rev E Stat Nonlin Soft Matter Phys. 2010;81(3 Pt 1):031105. doi: 10.1103/PhysRevE.81.031105. [DOI] [PubMed] [Google Scholar]
  • 30. Erland S, Greenwood PE, Ward LM (2011) 1/fα noise is equivalent to an eigenstructure power relation. Europhys Lett 95:60006.
  • 31.Karsai M, Kaski K, Barabási A-L, Kertész J. Universal features of correlated bursty behaviour. Sci Rep. 2012;2:397. doi: 10.1038/srep00397. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES