Significance
Online social networks have over the last decade influenced the way people interact. Data from Twitter allow for a detailed study of the activity in online massive communities. By studying the frequency by which international brands appear on Twitter and the trade of financial securities on financial markets, we find a characteristic bursty behavior of the activity levels of Twitter users and market participants. We explain the bursty behavior by a simple model of the large-scale human behavior and quantify the correlations in the activity levels. The statistical similarity of the two social systems is an indication that the complex process underlying individual decision-making might not be very different for Twitter users and participants in financial markets.
Keywords: social networks, statistical mechanics, extreme events
Abstract
Using empirical data from a social media site (Twitter) and on trading volumes of financial securities, we analyze the correlated human activity in massive social organizations. The activity, typically excited by real-world events and measured by the occurrence rate of international brand names and trading volumes, is characterized by intermittent fluctuations with bursts of high activity separated by quiescent periods. These fluctuations are broadly distributed with an inverse cubic tail and have long-range temporal correlations with a power spectrum. We describe the activity by a stochastic point process and derive the distribution of activity levels from the corresponding stochastic differential equation. The distribution and the corresponding power spectrum are fully consistent with the empirical observations.
Online social networks have emphatically changed the way people interact. The development of network theories and growth in available data on human behavior (1–4) has prompted an explosive interest in research on the evolution of behaviors (5–7) and social structures (8, 9). Among the many forms of online social media, microblogging services such as Twitter (10–12) are characterized by a real-time dynamics with large numbers of user broadcasts related to real-world events. Twitter is a popular microblogging platform where a registered user can submit small pieces of information, named “tweets,” that are either private or made public to the user’s followers. The length of a tweet is limited to 140 characters and its content ranges widely from personal updates to massively distributed advertisements or political messages. Twitter has a global outreach and, hence, is used by an increasing number of companies and political organizations to disseminate news. To a large extent, Twitter users can be seen as direct social sensors to measure the popularity of various topics. A large part of recent research on Twitter uses user activity as predictor for real-world events including the dynamics of stock market prices (1,3), box office revenues (14), real-time detection of the location of earthquakes hitting populated areas (15), and for opinion mining and political sentiment analysis (16). Large-scale behavioral data from other online media have been shown to have a similar predicting power, e.g., Google query volumes have been used to detect early signs of stock market moves (17) or more general movements in society (18).
Whereas behavioral data from Twitter have been suggested to predict many real-world events or have been used in mapping out social networks, the statistics of the combined user activity are not well understood. Here we suggest a stochastic model for the user activity in massive online communities. Our model sheds light on the statistical properties of the large-scale user activity on Twitter as well as the underlying correlations. Similar to the user activity on Twitter, trading volumes on the stock market reflect the interest that investors have in particular securities or products at given prices. Interestingly, as we will point out below, we find that the trading activity on financial securities is quite similar to the large-scale user activity on Twitter.
We have automatically queried Twitter for tweets containing one or more international brand names (SI Text). These tweets appear at highly irregular time intervals (Fig. 1), reflecting intermittent user activity levels. We consider the broadcasting of tweets to be a random point process with large fluctuations in the time intervals between the online appearances of messages. The number of tweets containing a certain brand “A” as a function of t is therefore given by a time signal composed of isolated events occurring at random times , where the index refers to a specific posting event. For each query, a number of the latest tweets is returned. Thus, we determine an average tweet rate for a given query k as
where and correspond to the time of the oldest, respectively, latest, tweet returned by the query k. The time interval for a fixed number of tweets is a highly fluctuating variable from one query to another. We notice that the appearance of tweets on Twitter resembles a nonhomogeneous Poisson process with random fluctuations in the average tweet rate .
We have further collected data for the trading volume of selected equities over a period similar to that covered by the Twitter data. In particular, we consider the trading volume in three shares: Apple Inc. (AAPL), Nokia Corporation (NOK), and Green Mountain Coffee Roasters Inc. (GMCR). The number of shares traded for each security was accumulated and sampled in 1-min intervals. Changes in trading volume and price are known to be highly leptokurtic (19), have long-range correlations (20), and intriguing scaling properties (21, 22). We consider the volume as a simple proxy for the temporal interest that the market has in a given security, disregarding more complex effects that might influence the price formation process.
In the Twitter data, we distinguish two types of user activity, one where users post messages independently of other tweets and one where users interact directly, e.g., by reposting information from other users in so-called retweets or by submitting responses to existing tweets. In general, a retweet contains text from the original tweet together with a reference to the author who posted it. Retweets typically form a smaller subset of all tweets, and the frequency by which individual tweets are retweeted follows a power law with an exponent similar to the out-degree distribution of Twitter users (SI Text). The out-degree is here measured by the number of followers of a user, i.e., the number of people that directly receive tweets posted by that user. This suggests that the rate of information spread by reposting is proportional with the number of followers and is limited by the local network topology. However, because there are only few tweets that generate a large flux of retweets, this may not be the most efficient way to diffuse information between Twitter’s users. There are information pathways that are not only related to the network topology, but, to a larger degree, correspond to many users tweeting at the same time about the same thing triggered by events outside the network.
The intermittent dynamics of individuals has previously been modeled in terms of a timing selection mechanism between different tasks (23). The prioritization of various tasks is suggested to lead to a bursty dynamics with power-law distributed waiting times. This is in contrast with a homogeneous Poisson process, where the waiting time between tasks that are being selected at random follows an exponential distribution. Here, we propose a global measure of collective human behavior and introduce a stochastic model for the global activity rate associated with many interacting individuals in a large social organization. The activity rate is characterized by long-term memory effects as well as nonexponential distributed waiting times.
Results
Interestingly, the intermittent tweet rates of specific brands follow a distribution with a power-law tail with an exponent β close to an inverse cube, (SD), as seen in Fig. 2 and in SI Text. For the trading volumes, we achieve values for the exponent , , and . Moreover, the fluctuations in the flux of tweets are long-range correlated with a power spectrum that decays as , where f is the frequency and (SD), in an intermediate frequency window corresponding to timescales from 20 min to 24 h as shown in Fig. 3. This means that tweets posted at a given time are influenced with an equal strength by tweets on all timescales ranging back as far as h. At the same time, high bursts of new tweets, extreme events, occur in the tail of . The noise is a widespread phenomenon observed in a variety of different systems, including voltage fluctuations (24), heartbeats (25), freeway traffic (26), music (27), and trades in financial securities (20), along with many other examples. Although there is not a unified theory that would apply to all systems exhibiting noise, there are numerous models that reproduce the fluctuations in temporal signals with fluctuations drawn from different distributions. On the other hand, there are also plenty of studies that focus on the non-Gaussian, power-law statistics of time signals, , independent of their power spectrum. More recent studies on stochastic point processes investigated the relationship between Pareto-type distributions of the variables and their power spectrum, e.g., refs. 28–30. The idea behind a stochastic point process is to model the average waiting time between random, discrete events by a multiplicative noise process. Essentially, the complexity of scale-free distributed variables with a spectral density emerges from the multiplicative noise. Here we present a stochastic point process that captures both the and -noise features. Other models of correlated human behavior (31) predict similarly a power-law distribution of the activity rates. However, these models possess a weaker memory effect and do not reproduce the scaling exponent for the power spectrum that we observe.
We assume a scenario where the human activity in the case of no external input is determined by a natural drift toward inactivity. That is, the waiting time τ since the last activity increases with time . On the other hand, excitation by external events drives the system toward higher activity levels. Without correlations in the user activity, the rate is determined by a balance between the drift toward inactivity and repeated excitation. We assume that the correlation in the human dynamics is controlled by a current waiting time between events in the shape of a multiplicative noise with an amplitude given by . The stochastic process for the average waiting time is therefore determined by a stochastic differential equation of the form
where η is Gaussian noise with zero mean and unit variance, and the deterministic part is chosen such that the process attains a nontrivial stationary distribution. Collective interactions between users sending messages on Twitter are effectively modeled by the intrinsic, multiplicative noise. The amplitude of the intrinsic noise term is proportional to , which implies that if the dynamics was solely driven by noise, the waiting time τ would have an absorbing state, i.e., , corresponding to a tweeting activity that is constant in time and never ceases. The drift term equal to 1 is added to mimic that, if nothing happens, the average waiting time would increase linearly with time as mentioned above. Due to this constant drift term, the absorbing state is never attained, although there are sudden excursions in its neighborhood. The stationary probability distribution function of waiting times corresponding to Eq. 2 is obtained as the steady-state solution of the Fokker–Plank equation in the Ito formulation and is given as
apart from the normalization constant. The divergence at large τ is suppressed by the cutoff function which depends on . However, at small τ, corresponding to large tweet rates γ, the function is irrelevant because . Thus, in the scaling regime, we can safely ignore . Using that , the stochastic dynamics for γ follows from Eq. 2 by Ito’s lemma and is given as
where we ignore the contributions due to that only determine the range over which is power-law distributed as . The power spectrum of tweet rate fluctuations is determined by the joint distribution associated with the stochastic process in Eq. 4. By the Wiener–Khintchine theorem, the power spectral density of γ is related to the correlation function as
where the correlation function is defined as
As with other point processes, one may assume that the transition distribution has an eigenfunction expansion, and therefore
where the unnormalized probability eigenfunctions satisfy the master equation corresponding to Eq. 1 and are given by
Combining Eqs. 6 and 7, we have that
where is the first moment of the probability eigenfunction. By its Fourier transform as in Eq. 5, the power spectrum can be written as a sum of Lorentzian spectra
The relation between and the eigenvalues follows from the structure of the unnormalized eigenfunction obtained from Eq. 7 and given as , where is the Bessel function of the first kind. Hence, the first moment of it is . The regime is obtained when .
The joint appearance of the fluctuations distributed with an inverse cubic law for the flux of tweets and volume of trades has interesting implications. Although there is no consensus on a unique underlying process generating noise, we associate this kind of fluctuations with complex systems that exhibit an increase in structures and information due to a long-term memory. In general, the collective interactions in relation to trading and tweeting exhibit the characteristics of an emergent phenomenon.
Intermittent fluctuations in the rate of tweets on Twitter can happen by two types of information pathways: (i) the influx of tweets triggered from the outside world onto Twitter (many independent users tweet about the same thing) and (ii) cascades of retweets or replies to existing tweets. Because there is a larger influx of new tweets compared with the rate of retweets, we conclude that most of the information spread across the online network happens in a “one-step” cascade when many unrelated people tweet about the same thing.
The activity on Twitter may not be very different from the dynamics of stock trades on financial markets because both are influenced by the social behavior inside massive communities combined with simple rules on the interaction set by, e.g., the platform through which the individuals interact. Furthermore, the similarity on large scales indicates a common feature in the complex process underlying the decision-making of users on Twitter and participants in financial markets.
Materials and Methods
From the Twitter Application Programming Interface (API), we automatically query for tweets on Twitter containing one or more of 92 preselected international brand names. The full list of brand names is given in SI Text. For each query to Twitter a maximum number of 1,500 tweets is returned by the API. Each returned tweet has a time stamp, which can be used to estimate an average tweet rate. The dataset used in this study was created by monitoring the public timeline for a period of 4 mo, November 2010 to February 2011; 2 mo, January–February 2012; and 2 mo, October–November 2012. During these periods, we computed the tweet rates of selected international brands with a sampling rate down to half a minute.
Supplementary Material
Acknowledgments
Suggestions and comments by Anders Blok are gratefully acknowledged. This study was supported by the Danish National Research Foundation through the Center for Models of Life.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1304179110/-/DCSupplemental.
References
- 1.King G. Ensuring the data-rich future of the social sciences. Science. 2011;331(6018):719–721. doi: 10.1126/science.1197872. [DOI] [PubMed] [Google Scholar]
- 2.Ginsberg J, et al. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012–1014. doi: 10.1038/nature07634. [DOI] [PubMed] [Google Scholar]
- 3.Borgatti SP, Mehra A, Brass DJ, Labianca G. Network analysis in the social sciences. Science. 2009;323(5916):892–895. doi: 10.1126/science.1165821. [DOI] [PubMed] [Google Scholar]
- 4.Oliveira JG, Barabási A-L. Human dynamics: Darwin and Einstein correspondence patterns. Nature. 2005;437(7063):1251. doi: 10.1038/4371251a. [DOI] [PubMed] [Google Scholar]
- 5.Buldyrev SV, Parshani R, Paul G, Stanley HE, Havlin S. Catastrophic cascade of failures in interdependent networks. Nature. 2010;464(7291):1025–1028. doi: 10.1038/nature08932. [DOI] [PubMed] [Google Scholar]
- 6.Rybski D, Buldyrev SV, Havlin S, Liljeros F, Makse HA. Scaling laws of human interaction activity. Proc Natl Acad Sci USA. 2009;106(31):12640–12645. doi: 10.1073/pnas.0902667106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Centola D. The spread of behavior in an online social network experiment. Science. 2010;329(5996):1194–1197. doi: 10.1126/science.1185231. [DOI] [PubMed] [Google Scholar]
- 8.Arenas A, Danon L, Díaz-Guilera A, Gleiser PM, Guimera R. Community analysis in social networks. Eur Phys J B. 2004;38(2):373–380. [Google Scholar]
- 9. Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E. (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008:P10008.
- 10. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? Proceedings of the 19th International Conference on World Wide Web (Association for Computing Machinery, New York), pp 591–600.
- 11.Mandavilli A. Peer review: Trial by Twitter. Nature. 2011;469(7330):286–287. doi: 10.1038/469286a. [DOI] [PubMed] [Google Scholar]
- 12.Huberman BA, Romero DM, Wu F. Crowdsourcing, attention and productivity. J Inf Sci. 2009;35:758–765. [Google Scholar]
- 13. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: Real-time event detection by social sensors. Proceedings of the 19th International Conference on World Wide Web (Association for Computing Machinery, New York), pp 851–860.
- 14.Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM. 2010;10:178–185. [Google Scholar]
- 15.Bollen J, Mao H, Zeng J. Twitter mood predicts the stock market. Comp. Sci. 2011;2:1–8. [Google Scholar]
- 16. Asur S, Huberman BA (2010), Predicting the future with social media. Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference (IEEE, Los Alamitos, CA), Vol 1, pp 492–499.
- 17.Preis T, Moat HS, Stanley HE. Quantifying trading behavior in financial markets using Google Trends. Sci Rep. 2013;3:1684. doi: 10.1038/srep01684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Choi H, Varian H. Predicting the present with Google Trends. Econ Rec. 2012;88:2–9. [Google Scholar]
- 19.Cont R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant Finance. 2001;1:223–236. [Google Scholar]
- 20. Bonanno, G. and Lillo, F. and Mantegna, R.N. (2000), Dynamics of the number of trades of financial securities. Physica A 280(1):136–141.
- 21.Mantegna RN, Stanley HE. Scaling behaviour in the dynamics of an economic index. Nature. 1995;376:46–49. [Google Scholar]
- 22.Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. A theory of power-law distributions in financial market fluctuations. Nature. 2003;423(6937):267–270. doi: 10.1038/nature01624. [DOI] [PubMed] [Google Scholar]
- 23.Barabási A-L. The origin of bursts and heavy tails in human dynamics. Nature. 2005;435(7039):207–211. doi: 10.1038/nature03459. [DOI] [PubMed] [Google Scholar]
- 24.Voss RF, Clarke J. Flicker (1/f) noise: Equilibrium temperature and resistance fluctuations. Phys Rev B. 1976;13:556. [Google Scholar]
- 25. Kobayashi, M and Musha, T (1982), 1/f fluctuation of heartbeat period, IEEE Trans Biomed Eng. 6:456–457. [DOI] [PubMed]
- 26.Zhang X, Hu G. 1/f noise in a two-lane highway traffic model. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1995;52(5):4664–4668. doi: 10.1103/physreve.52.4664. [DOI] [PubMed] [Google Scholar]
- 27.Voss RF, Clarke J. 1/f noise in music: Music from 1/f noise. J Acoust Soc Am. 1978;63:258. [Google Scholar]
- 28.Kaulakys B, Gontis V, Alaburda M. Point process model of 1/f noise vs a sum of Lorentzians. Phys Rev E Stat Nonlin Soft Matter Phys. 2005;71(5 Pt 1):051105. doi: 10.1103/PhysRevE.71.051105. [DOI] [PubMed] [Google Scholar]
- 29.Ruseckas J, Kaulakys B. 1/f Noise from nonlinear stochastic differential equations. Phys Rev E Stat Nonlin Soft Matter Phys. 2010;81(3 Pt 1):031105. doi: 10.1103/PhysRevE.81.031105. [DOI] [PubMed] [Google Scholar]
- 30. Erland S, Greenwood PE, Ward LM (2011) 1/fα noise is equivalent to an eigenstructure power relation. Europhys Lett 95:60006.
- 31.Karsai M, Kaski K, Barabási A-L, Kertész J. Universal features of correlated bursty behaviour. Sci Rep. 2012;2:397. doi: 10.1038/srep00397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.