Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2025 Jan 23;122(4):e2410227122. doi: 10.1073/pnas.2410227122

Spreading dynamics of information on online social networks

Fanhui Meng a,1, Jiarong Xie b,c,d,1, Jiachen Sun d,1, Cong Xu e, Yutian Zeng e, Xiangrong Wang f, Tao Jia g,h, Shuhong Huang i, Youjin Deng j,k, Yanqing Hu e,l,2
PMCID: PMC11789049  PMID: 39847317

Significance

We find that in the information-spreading process of online social networks, the social reinforcement effect and the social weakening effect coexist, and this is a universal phenomenon. We propose a concise mathematical model which can well describe the empirical spreading dynamics. Our theory indicates that the highly clustered nature of the social network structure results in high-frequency information bursts with relatively small coverage, enabling social media to have high capacity and diversity for information dissemination.

Keywords: social media, spreading dynamics, information spreading

Abstract

Social media is profoundly changing our society with its unprecedented spreading power. Due to the complexity of human behaviors and the diversity of massive messages, the information-spreading dynamics are complicated, and the reported mechanisms are different and even controversial. Based on data from mainstream social media platforms, including WeChat, Weibo, and Twitter, cumulatively encompassing a total of 7.45 billion users, we uncover a ubiquitous mechanism that the information-spreading dynamics are basically driven by the interplay of social reinforcement and social weakening effects. Accordingly, we propose a concise equation, which, surprisingly, can well describe all the empirical large-scale spreading trajectories. Our theory resolves a number of controversial claims and satisfactorily explains many phenomena previously observed. It also reveals that the highly clustered nature of social networks can lead to rapid and high-frequency information bursts with relatively small coverage per burst. This vital feature enables social media to have a high capacity and diversity for information dissemination, beneficial for its ecological development.


With the advantages of open, easily accessible, and real-time communication, social media has given rise to new social communities involving billions of people across borders, races, and cultures (13). It plays a pivotal role in almost all aspects of human societies (48), including public health (9), political campaigns (10), e-commerce (1114), awareness of societal issues (6, 15), information security (10, 16), and so on. Social media has profoundly changed the way people share information. Hence, understanding the information-spreading mechanisms, and especially how global collective spreading behaviors are triggered by individuals, has attracted extensive attention in recent years (1622).

In earlier studies (2329), the dynamics of information spreading were believed to be analogous to the spread of disease in a population. Later, with evidence from empirical experiments (19, 30), it was realized that this is not the case. Specifically, in 2010, Centola observed in an online propagation experiment that the more times people see a piece of information, the more willing they are to spread it (30). Accordingly, the social reinforcement effect in information spreading on online social networks was proposed. Bakshy et al. repeatedly observed the reinforcement effect through online Facebook experiments (19). Prior to online social media, this interesting phenomenon was observed earlier in offline experiments dating back to Granovetter (31) and has been intriguing over the past decade (6, 7, 21, 24, 32).

However, our empirical observations of data, generated naturally in the real world instead of human-designed experiments, show that the spreading dynamics exhibit complex patterns (Fig. 1A), far beyond the simple scenario of the social reinforcement effect. For example, as the number of exposures increases, the retweeting probability for some messages first increases and then saturates while that for some other messages increases rapidly followed by a dramatic decrease (Fig. 1A). Actually, reliable observations of real-world natural spreading reported by other researchers also demonstrate controversies with the social reinforcement effect (21, 3237). Romero et al. found that the spread of information on Twitter is not entirely the result of social reinforcement as multiple exposures would reduce the spreading probability (35); Ugander et al. observed on Facebook that the higher the proportion of common neighbors between users, the significantly lower the probability of transmission (36); Cheng et al. also found that the likelihood of forwarding a message typically decreases with the number of exposures on Facebook (37).

Fig. 1.

Fig. 1.

The empirical and fitted retweeting probabilities. (A) The empirical spreading trajectories of messages collected from WeChat. Each solid line represents the retweeting probability β(x) versus the number of exposures x for a single message. (B) The distribution of x, i.e., the number of exposures x at which β(x) reaches its maximum, based on the WeChat messages. Typically, β(x) is maximized at x = 2 or 3, in total accounting for 55% of all cases. Inset: The agreement between theoretical and empirical results of x (SI Appendix, section III). (C) The empirical (dots) and fitted (solid lines) retweeting probabilities of 6 representative messages from WeChat by fitting Eq. 1 on the empirical spreading trajectories. The gray shaded area is the 95% CI. Please see SI Appendix, section IV for the rest of fitted Weibo trajectories. (D) The empirical versus fitted retweeting probabilities for the WeChat messages. Dots of the same color represent data of the same message. All dots align close to the diagonal line, suggesting a quite good fit to the empirical data by Eq. 1.

Moreover, an evident paradox arises between the theory of the reinforcement effect and the empirical propagation of information on large-scale social networks. On the one hand, the reinforcement reflects a positive mutual feedback between the number of exposures and the forwarding probability. As information spreads and the number of times users receive it increases (30), the retweet probabilities of potential spreaders are enhanced, and, as feedback, the number of exposures for nonspreading users is further increased. This would commonly lead to a flood of information, with nearly all users retweeting the same information. On the other hand, the total number of messages forwarded daily is limited under the reinforcement effect as it would require enormous retweets per day per user to stimulate numerous messages on a large scale, while the time users spend on social media is physiologically limited. However, the fact is that tons of messages are spreading on social networks every day, but barely any on an overwhelmingly large scale (even outbreak sizes are tiny compared to the size of the entire network, e.g., at most 0.04%; see Table 1).

Table 1.

Description of the major datasets used

Platforms Time period Spreading scale Information outbreak sizes Number of messages Total number of exposed users Total number of retweeting users
WeChat 2019.06–2019.10 Large-scale 10,733–468,970 182 5,412,813,032 23,983,694
Weibo 2017.01–2017.10 Large-scale 3,235–35,045 100 2,033,741,843 1,189,138
Small-scale 1–100 7,946 4,346,107,108 91,627
Twitter 2010.10 Small-scale 1–100 11,072 16,945,294 81,633

“Information outbreak size” is the range of the number of users retweeting each message; “Total number of exposed users” is the sum of the number of users exposed to each message; “Total number of retweeting users” is the sum of the number of users retweeting each message. Please refer to SI Appendix, section I for more detailed data descriptions and SI Appendix, section V for an explanation of how exposures are measured. Here, all the information we study is publicly accessible.

Here, we study the spreading dynamics of online social networks from a unique perspective, recognizing that the propagation process of a piece of information on social media is essentially the spreading process of people’s behavior related to the information. We illustrate this perspective by comparing information spreading with the spread of the virus in a population. In virus transmission, the virus takes people as the host and people play a passive role during virus spreading, with little awareness until symptoms appear. In contrast, people have always been proactive in the process of information propagation. This can be understood as people repeating or mimicking the retweeting behavior of his/her peers. In this sense, the propagation of information can be regarded as the propagation of human behaviors. It is indicated that when studying the mechanisms of information spreading, the natural focus should mainly lie on human behavior, not only the information itself. Therefore, the spreading mechanism, in principle, deeply relies on the gains of mimicking peer behaviors via social interactions among users.

We systematically analyze the spreading trajectories of real information and the corresponding individual retweeting behaviors on three large-scale social media platforms (Table 1), WeChat, Weibo, and Twitter. A brief description of the empirical data used is provided in Materials and Methods and please refer to SI Appendix, section I for more detailed information. We find that users’ forwarding behavior on a piece of information is jointly determined by the following three factors: the intrinsic importance of the information, the average proportion of common neighbors between a user and her friends, and the users’ uncertainty, that is, the users’ sensitivity to the number of exposures and deviation in estimating of the proportion of common neighbors. The first indicates the inherent spreading power of the information, and the latter two represent interactions between users. Based on them, we formulate a concise equation for the propagation dynamics:

βi(x)=αix(1γ)xωi, [1]

where βi(x) is the retweeting probability of message i at the xth exposure to a user, αi, γ, ωi, respectively, describe the three factors mentioned above.

Note that although there are three parameters in Eq. 1, ωi is the only one that has to be fitted while αi and γ can be directly measured from the empirical spreading trajectories and network structure, respectively. We find that this equation well captures most of the spreading trajectories of empirical information (Fig. 1BD). It also resolves a number of controversial claims, including the contradiction of the social reinforcement mechanism, social weakening effects (35), weak ties promoting spreading (19), and more common neighbors reducing the spreading transmissibility (36), among other unexplained phenomena by previous theories. Furthermore, the equation suggests that the highly clustered nature of social media leads to high-frequency information bursts and a small coverage per burst. This endows social media with high throughput and diversity for information dissemination, which is superbeneficial to its ecological sustainability.

Results

Retweeting Probability.

The process of information spreading on social media is driven by the behavior of users: they tend to forward a message to their friends after seeing it multiple times (30). Thus, the interaction associated with retweeting behaviors is the core of the propagation dynamics. Formally, we define βi(x) of message i as the proportion of users who perform the forwarding operation at the xth exposure, which is expressed as

βi(x)=mi(x)ri(x), [2]

where mi(x) is the number of users who forward the information at the xth exposure, and ri(x) is the number of users who have been exposed to the information no less than x times and have not yet forwarded it at the first x1 times (Materials and Methods and SI Appendix, section II). For messages spread on a relatively large scale, we are able to accurately measure the retweeting probability β(x), a generic notation of βi(x). Our empirical data clearly demonstrate diverse and complicated spreading patterns (Fig. 1A).

Unified Equation for Propagation Dynamics.

To quantitatively describe the complicated pattern of β(x), we begin with a more general form of Eq. 1, by defining βi(x)=αix(1γ)fi(x) (Fig. 2). In this equation, αi represents the intrinsic importance of message i; the second multiplier x describes the basic linear reinforcement effect; γ is the average proportion of common neighbors between any two users; fi(x) is a power index for message i used to calibrate the effective exposure rate as the result of users’ retweeting, which can be understood as users’ uncertainty to the gains of retweeting the message. The specific case of Eq. 1 in which fi(x) is determined to be xωi would be discussed later.

Fig. 2.

Fig. 2.

The unified spreading equation. (A) The unified spreading equation and the interpretation of each term for a single message i. Parameters αi, γ and the term fi(x) correspond to the intrinsic spreading power, the average proportion of common friends between a user and its neighbors (SI Appendix, section XIV.B), and the users’ uncertainty to the gains of retweeting message i, respectively. To be more specific, αix describes a linear social reinforcement effect; (1γ)fi(x) formulates the calibration of the effective exposure rate, i.e., the proportion of a user’s friends who would receive a message for the first time if the user retweets the message. (B) The explicit form of fi(x) as xωi obtained from empirical WeChat data. Dots of the same color represent data of the same message. The solid line shows a theoretical fitting of fi(x)=xωi obtained by Eq. 3. Note that while the solid lines seem to display a linear relationship between fi(x) and x, i.e., fi(x)=ωix, this is actually not the case as the boundary condition fi(1)=1 is not satisfied under fi(x)=ωix. (C) The distribution of γ measured directly from the network structure of WeChat, which follows a log-normal distribution with mean 0.24. Inset: The estimated γ by fitting to empirical WeChat trajectories. The calculation of γ is illustrated by an example in (A), the proportion of common friends of node j with its neighbor k is γjk=2/6, averaging over all neighbors of j yields γj, then averaging over all node j finally gives γ.

Eq. 1 encompasses the three core elements involved in information spreading, which form two competing mechanisms. On the one hand, when a user receives the same information multiple times, the perceived importance of the information increases, leading to the social reinforcement mechanism described by αix. On the other hand, when a user receives the same information multiple times (typically from different friends), on average many of her friends should have already been exposed to the information. This would reduce the proportion of potential “fresh audiences” (who receive the information for the first time) if the user forwards the information, as described by (1γ)fi(x). In summary, the information-spreading mechanism represented by β(x) is formulated from the perspective of human spreading behaviors, i.e., β(x) is the gain (which equals the product of information importance and proportion of potential audiences) that motivates users’ forwarding behavior.

For better understanding, we exemplify the above form of βi(x) via three simple and hypothetical cases. First, consider the case when γ=1, i.e., every pair of users is connected in a network. If a user posts a message, then all users would be exposed to the message, making it valueless (zero gain) for another user to retweet the message. This is consistent with the result of Eq. 1, i.e., βi(x)0. Second, for the case when fi(x)1, Eq. 1 reduces to a simple linear reinforcement effect. However, fi(x) should depend on x in general as the more times a message appears in one’s social network, the more of her friends are expected to be aware of the message already. Third, assuming that in a social network, every two users share γ proportion of common neighbors and each user posts messages independently, then only (1γ)x proportion of a user’s friends would remain unaware of a message if it is posted by x of the user’s friends. This corresponds to the case when fi(x)=x in Eq. 1. As will be shown later, the form of fi(x) may not be perfectly linear but can be superlinear or sublinear in x under cases in reality.

Form of the Power Index.

The form of the power index fi(x) can be nicely determined by combining theoretical and empirical analyses. From the theoretical perspective, we use the boundary condition that fi(1)=1. This comes from the fact that when a user is exposed to a message for the first time (x=1), γ proportion of her neighborhood that overlaps with the message poster is also aware of the message, yielding an effective exposure rate of 1γ if the user forwards the message. A direct result of the boundary condition is βi(1)=αi(1γ). Then for x>1, we plug αi=βi(1)/(1γ) into Eq. 1 and obtain

fi(x)=lnβi(x)/βi(1)lnxln(1γ)+1. [3]

Eq. 3 enables us to study the properties of fi(x) via empirical spreading trajectories. Specifically, from the empirical spreading trajectory of message i, we obtain the observed retweeting probability βi(x) via Eq. 2 and plug it into Eq. 3 to get the empirical value of fi(x) (Fig. 2B). Note that the value of γ used in the computation above is γ=0.24±0.001, obtained as the average proportion of common neighbors between any two users based on the network structure of WeChat, as shown by Fig. 2C. It turns out that fi(x) can be well fitted by fi(x)=xωi while other straightforward functions, such as linear functions, cannot adequately capture its variability (see discussions in SI Appendix, section VI). Consequently, the equation for propagation dynamics is eventually established as Eq. 1. Actually, the classical SIR model can be recovered from Eq. 1 under the scenario of random network at small spreading coverage (SI Appendix, section VII.A).

The superiority of our model is further demonstrated by Fig. 3A. Comparing the theoretical retweeting probability curve under different models, our model uniquely achieves a curve that first rises and then falls, which is a prevalent pattern of information spreading on social media platforms. Furthermore, our model supports the weak tie theory, that is, weaker social ties facilitate communication and information exchange. Fig. 3B shows that the retweeting probability of users with lower γ (i.e., those who connect less closely to their friends) is significantly higher than those with higher γ. The weak tie effect cannot be explained by either the SIR model or the social reinforcement model.

Fig. 3.

Fig. 3.

Theoretical retweeting probability curves, weak tie effect and model comparison. (A) The theoretical retweeting probability curve under different models. The SIR model assumes a constant retweeting probability, i.e., β(x)β. “SocRein I model” and “SocRein II model” refer to two social reinforcement models with retweeting probabilities β(x)=β(1)(1+b)x1 (39) and β(x)=1(1β(1))(1b)x1 (40) (b is a parameter for the strength of reinforcement), respectively. Parameter settings: our model, α=0.006, γ=0.24, ω=0.95; SIR model, β=α(1γ)=0.00456; SocRein I model, b=0.05; SocRein II model, b=0.0005; LT (linear threshold) model, activation threshold =0.025. These parameters are set close to the typical values from the WeChat dataset (SI Appendix, section X), and they do not affect the shape of the curves. The figure shows that only our model exhibits nonmonotonicity which best describes the empirical data. (B) The theoretical retweeting probability curve under our model with different values of γ. The value of γ can be considered as a measure of how closely a user is connected to her friends. From this perspective, the figure shows the weak tie effect (smaller γ) under our model. (C) Fitting average simulated retweeting probability with our model and an alternative model as Eq. 5. Average retweeting probability is generated from Eq. 5, where ω~i follows a power law distribution. For other distribution assumptions, such as ω~i follows a normal and exponential distribution, consistent results can be found in SI Appendix, section VI. (D) Performance of fitting average simulated retweeting probability with our model and an alternative model. The RMSE of the alternative model has the same value for all γ indicating a redundancy between γ and ω. The alternative model cannot automatically select the best estimate for γ, in contrast, our model demonstrates a higher level of performance, as the best-estimated value of γ closely matches γ=0.24, which is the ground truth of the parameter γ.

Fitting Empirical Data.

Eq. 1 with γ=0.24 well fits the retweeting probabilities of large-scale spreading WeChat messages (Fig. 1 C and D and SI Appendix, Fig. S4). Fitting results for each large-scale trajectories on WeChat and Weibo are shown in SI Appendix, Figs. S25 and S26. Details of evaluation metrics of model fitting performance can be found in SI Appendix, section VIII). Other than fitting Eq. 1 with a fixed γ obtained from the network structure, we can also fit the equation by treating γ as an unknown parameter (see details in SI Appendix, section IX). Specifically, for any given value of γ, we could find the optimal ωi and αi meanwhile by fitting Eq. 1 to the observed spreading trajectory of message i and compute the corresponding RMSE. Then the value of γ such that the average RMSE overall messages reaches its minimum is the estimated value of γ. As shown in the Inset of Fig. 2C, γ is estimated to be 0.28±0.02 [the SD is obtained by the Bootstrap method (38)].

Note that from a purely model-fitting perspective, γ is just a parameter to be fitted and its physical meaning is obscure, i.e., there is no explicit reason why γ should reflect the structure of the underlying social network for message spreading and more specifically, represent the proportion of common neighbors. Therefore, it is really surprising to find that the value of γ=0.28±0.02, as estimated by fitting Eq. 1 to the empirical trajectories, is consistent with γ=0.24±0.001, which is directly measured from the social network structure without referring to Eq. 1. This striking consistency provides strong support for the form of Eq. 1.

Under Eq. 1, the retweeting probability could incorporate a nonmonotonic shape. Specifically, βi(x) first increases with x because of the term αix, demonstrating a social reinforcement effect. Then, as the term (1γ)xωi starts to dominate, βi(x) turns to decrease with x after reaching some turning point. The turning point can be mathematically determined as (see SI Appendix, section III for the details)

xi=1ωiln(1γ)1ωi. [4]

The theoretical relationship between xi and ωi given by Eq. 4 (with γ=0.24) is displayed in the Inset of Fig. 1B, showing that xi decreases rapidly as ωi increases. The scatter plot of xi versus ωi obtained from the empirical spreading trajectories of all WeChat messages is added to be compared with the theoretical curve. The high degree of agreement between the two again demonstrates that Eq. 1 fits the empirical data very well. We also plot the histogram of the empirical xi (Fig. 1B) and find that xi is mostly 2 or 3 for the WeChat messages, i.e., users typically have the maximum retweeting probability when they are exposed to a message for the second or the third time.

Interpretation and Universality of Eq. 1.

Here, the interpretation and universality of the proposed Eq. 1 for information spreading are addressed. While αi and γ are clearly interpreted as the intrinsic spreading power of message i and the average proportion of common neighbors between two users, the interpretation of ωi is not so straightforward. To explain ωi, consider three alternative forms of the damping term in the equation, namely (1γ)ω~ix, (1ω~iγ)x, and [(1γ)ω~i)]x. These three forms are shown to be mathematically equivalent (SI Appendix, section VI.B) so that only the first one is discussed:

βi(x)=αix(1γ)ω~ix, [5]

where ω~i can be explained as the overall uncertainty of users to the estimation of γ for a given message i. In reality, the uncertainty of different users is generally not the same (18, 41, 42), so let ω~ij denote the uncertainty of user j for a given message i. For a given i, when ω~ij follows a certain distribution, such as a power law distribution, a normal distribution, or an exponential distribution, then the average of (1γ)ω~ijx over all users can be well approximated by a stretched exponential function (4345), which is (1γ)xωi in our Eq. 1 (see Fig. 3C and D and SI Appendix, section VI for details). This serves as an explanation of why Eq. 1 best fits the empirical spreading trajectories. Moreover, it follows that ωi can be interpreted as the overall users’ uncertainty about the average proportion of common neighbors for a given message i.

In addition to large-scale spreading messages, the performance of Eq. 1 on small-scale spreading messages is also investigated. A natural challenge arises for small-scale spreading messages: the scale of β(x) is typically small, e.g., 103 from WeChat, so a large number of retweeters and exposures are required for stable calculation of β(x), especially for larger x; however, the number of retweeters and exposures are generally not enough for these messages, resulting in significant fluctuations in the calculation of β(x). To overcome this challenge, we propose to merge the trajectories of small-scale spreading messages by taking the average of β(x) of each message. The merged trajectories exhibit a consistent pattern with the large-scale spreading trajectories, i.e., first rising and then falling (SI Appendix, sections IV and XI). In this sense, our Eq. 1 also captures the spreading mechanism of small-scale spreading messages.

As a consistent information-spreading pattern has been observed on mainstream social media platforms such as WeChat, Weibo, Twitter, and Facebook (36), we believe that Eq. 1 offers a unified framework for information spreading on online social networks. In this work, we analyze the retweeting behavior of a cumulative of 7.45 billion users. Although it may not be enough to claim that Eq. 1 is universally applicable, it suggests a widespread phenomenon in human behavior, indicating the possibility of its universality, which awaits further confirmation through analysis of more comprehensive data in the future.

Profound Influence on Social Media.

Last, we would like to investigate the pattern of information spreading given by Eq. 1. Specifically, three characteristics of the spreading process of a message are considered: the spreading coverage s—the number of individuals who retweet the message normalized by the network size when the spreading extends globally [also called “outbreak size” (13)], the outbreak probability p(s)—the probability that the spreading coverage is s (13, 41, 46), and the spreading speed T—the average time duration it takes to propagate the information from the source user to each retweeting user (see SI Appendix, sections XII and XIII for the calculations). For illustration purposes, we compare our model with the classical SIR model and the social reinforcement model through simulations. Details on simulations are provided in Materials and Methods.

It turns out that our model yields a small s with sp(s) (Fig. 4A), while s=p(s) under the SIR model and a large s with sp(s) under the social reinforcement model. Fig. 4CE further visualize an information outbreak under our model, the SIR model, and the social reinforcement model on a network with empirical WeChat features (see SI Appendix, section XIV for details), respectively. These results imply that information spreading under our model is relatively small in outbreak size and high in outbreak probability in contrast to a large outbreak size (almost the entire network) predicted by the social reinforcement model (30, 39, 40). This is consistent with empirical observations and reasonably explains the contradiction raised by the social reinforcement mechanism. Furthermore, Fig. 4B demonstrates that the spreading speed is the fastest under our model, followed by the social reinforcement model, and the slowest under the SIR model. Altogether, the small outbreak size, the high outbreak probability, and the rapid information outburst facilitate a large number of information outbreaks per unit of time, which is of great significance to the diversity of information spreading on social media.

Fig. 4.

Fig. 4.

Characteristics of information-spreading process based on simulations. (A) The outbreak probability p(s) versus the spreading coverage s for our model, the classical SIR model, and the social reinforcement model. “Our model” refers to the simulation based on our model with fixed ω (=0.95, i.e., the averaged ωi over 182 WeChat messages) and varying α. The parameters of “SocRein I model” and “SocRein II model” are b=0.01 and b=0.0001, respectively. Under the SIR model (squares), we have s=p(s), which is consistent with the classical propagation theory (13, 46). Comparing with the SIR model, the curve of p(s) versus s under our models (circles) is above the diagonal line and close to the vertical axis, suggesting that sp(s). Under the social reinforcement model (triangles), the curves of p(s) versus s are below the diagonal line and close to the vertical line s=1, indicating sp(s). This means that once a piece of information outbreaks, the spreading coverage is typically very large (close to the entire network) under the social reinforcement model. (B) The time T versus the intrinsic spreading power α. Here, T is the average time duration it takes to propagate the information from the source user to each retweeting user. (CE) Visualization of information-spreading trajectories under our model, the SIR model (not outbreak), and the social reinforcement model II. The parameters used for the three models are α=0.001,γ=0.24,ω=0.95, and b=0.0001. The thickness of a line represents the number of exposures x of the user who retweeted the message along the line (the larger the x, the thicker the line). It is observed that the outbreak size under our model is modest, compared to the smallest scale under the SIR model and the largest scale (almost the entire network) under the social reinforcement model. You can see the spreading process of the three models in Movies S1S3.

Conclusion and Discussion

In this work, we study the information propagation on social media as the propagation of human behaviors. Therein, associating the spreading probability with the gain that motivates the spreading behavior, we propose a concise and universal equation for the information-spreading dynamics on social media. The equation not only fits every empirical spreading trajectory collected on large-scale social media platforms surprisingly well, but also clarifies phenomena that have failed to be explained by previous theories. Particularly, we find that the empirical retweeting probabilities show a ubiquitous pattern of first rising and then falling. With the two competing mechanisms in our model, we can well explain this pattern as well as phenomena like the weak tie effect. Our model demonstrates that the highly clustered nature of real social media facilitates rapid, frequent, and relatively small-scale information bursts, enabling social media to have a high capacity and diversity for information dissemination. Our model links microscopic individual interactions to the macroscopic phenomena of social media. Due to its simplicity and adaptability, we believe that the proposed equation is likely to serve as a fundamental equation for information spreading and inspire a wave of theoretical and applied social media research.

Despite the contributions of this study, several limitations need to be acknowledged. First, during the process of information dissemination on social media, users may add new comments when forwarding a message, e.g., Twitter’s quote tweet feature (37, 47). However, our current model and data are insufficient to clarify how these comments influence the dissemination process. Second, the increasing presence of artificially manipulated bots on social media and their impact on the dissemination process is not explicitly addressed in this paper. Although we theoretically demonstrate that the proposed model can explore these two aspects (SI Appendix, sections XVI and XVII), there is a lack of comprehensive real-world data for an in-depth discussion at present. Finally, the integration of the model presented in this paper with artificial intelligence algorithms to better predict information-spreading size remains an important topic for future research (48).

Materials and Methods

Empirical Data and Preprocessing.

We collect empirical data from three mainstream social media platforms, namely WeChat, Weibo, and Twitter (49). The datasets are strictly deidentified so that no detailed user profiles are available. In total, 294, 8,283, and 66,064 spreading trajectories are obtained from WeChat, Weibo, and Twitter, respectively. Then, a preprocessing step called truncation is applied to the data to pick out trajectories with at least 5 data points on the retweeting probability curve β(x), each involving no less than 100 retweeting users. The reasons to perform such data preprocessing are 1) with too few data points on β(x), the solution of our spreading equation Eq. 1 would not be unique; 2) with too few retweeting users, the computed empirical retweeting probabilities would contain too much noise. For more detailed descriptions of the empirical data, as well as sensitivity analyses on the truncation performed, please refer to SI Appendix, sections I and XV.A. As a result, the trajectories of 182 messages from WeChat and 100 messages from Weibo are used in the analyses of large-scale spreading messages. For Twitter, since the data are incomplete as stated in ref. 49, we only focus on the small-scale spreading messages which are merged and show a consistent first-rises-and-then-falls pattern with the large-scale spreading messages (see SI Appendix, section XV.B for the data preprocessing applied to the Twitter data).

Computation of the Empirical Retweeting Probability.

The probability of users forwarding a message can be calculated based on its spreading trajectory and the associated network structure (35, 36, 50). To compute the retweeting probability defined by Eq. 2 in the main text, how the numerator mi(x) and the denominator ri(x) are determined from empirical data are presented below. From the empirical spreading trajectory of message i in three social media platforms, we obtain the following information for each user j: her total number of exposures to the message, her neighbors who posted the message, the time when her neighbors posted the message, whether she retweets the message, and if yes from which neighbor she retweets. Taking user j who is exposed to message i three times in total as an illustrating example, we describe how she is counted in mi(x) and ri(x). Suppose that three of her neighbors denoted as a, b, c, posted the message, and the time when they posted the message satisfies ta<tb<tc. If user j does not retweet the message after seeing the message three times, then she is counted in ri(1), ri(2), and ri(3). However, if user j retweets from b, for example, then she is considered to be retweeting at the second exposure regardless of whether the retweeting occurs before or after tc, meaning that she is counted in ri(1), ri(2) and mi(2), please see SI Appendix, section II for more details.

Simulation of Information-Spreading Process.

Here, we describe how the simulations of the information-spreading process are performed to generate Fig. 4. The spreading process of a piece of information on a network takes the network size N, the adjacency matrix A, and the retweeting probability β(x) (determined by the model we use) as inputs, and outputs the spreading trajectory encoded as a matrix T. Each row of T records the details of one retweeting, i.e., the ID of the retweeting user, the ID of the retweeting user’s neighbor from whom she retweets, and the time of retweeting. Therefore, the dimension of T is (s1)×3, where s is the spreading coverage (excluding the source user of the information-spreading process). Put in simple terms, the simulation process is to randomly select a user in the network as the source of information spreading, and then sequentially select users that have at least one retweeting neighbor to further propagate the information with probability β(x). However, the concrete simulation process contains many details that need further explanation.

To elaborate on the detailed simulation process for a given piece of information, we introduce some additional notations. The set of neighbors of each user j in the network is denoted as Nj={k:kjandA(j,k)=1}. Each user has two possible states, i.e., the S-state (the susceptible state, i.e., the user has not retweeted the information) and the I-state (the infected state, the user has retweeted the information). The set of S-state users with at least one I-state neighbor at a specific time point t is denoted as Scurr. Assuming that the retweeting activities of these |Scurr| users is a Poisson process, then the expected time duration from the current time point until the first retweeting occurs is Δt1/|Scurr|. Since our focus is on relative time duration, we set Δt=1/|Scurr| for convenience. In addition, the following information is recorded for each user j to keep track of her status at t: sj—the current state of the user (S or I), nIj—the number of the user’s neighbors in the I-state (i.e., the number of infected neighbors), xj—the number of exposures to the information before the user retweets it, tIj—the time at which the user enters I-state (i.e., the time of infection). Note that we always have xjnIj, and xj could be less than nIj as the user may not see every retweeting of her neighbors, which is closer to the situation in practice. The detailed steps of the simulation process are

  1. Initialize the spreading process: set t=0, nIj=xj=0, tIj=0 for j=1,2,,N, and T=[].

  2. Randomly select a user j as the source node of information spreading: update her state (sj=I), her neighbors’ state sk=S, her neighbors’ nI (nIk=1 for kNj), and set Scurr=Nj.

  3. Randomly select a user k from Scurr and try to turn it into the I-state in the following way:

    • (a)
      Update the current time point t=t+1/|Scurr|;
    • (b)
      Sort the nIk infected neighbors of user k by their time of infection in ascending order, select the last m=nIkxk of them and denote the list of IDs as v(k)=(v1,v2,,vm);
    • (c)
      Iterate through each user vl in v(k): update xk=xk+1 and infect user k with probability β(xk); if user k is not infected by user vl, continue to the next user in v(k); otherwise, update user k’s state (sk=I) and time of infection (tIk=t), add a row (k,vl,t) to T, update user k’s neighbors’ nI (nIr=nIr+1 for rNk) and go to step (d).
    • (d)
      Update Scurr={r:sr=Sandxr<nIr}.
  4. Repeat step 3 until Scurr is empty.

Supplementary Material

Appendix 01 (PDF)

Movie S1.

Spreading process of our model. Visualization of information spreading process under our model. The outbreak probability is high, and the outbreak size is modest.

Download video file (1.9MB, mp4)
Movie S2.

Spreading process of SIR model. Visualization of information spreading process under the SIR model. Both the outbreak probability and the outbreak size are small (only spreading a few steps).

Download video file (797.4KB, mp4)
Movie S3.

Spreading process of social reinforcement model II. Visualization of information spreading process under the social reinforcement model II. Both the outbreak probability and the outbreak size are large (almost spreading the entire network).

Download video file (12.5MB, mp4)

Acknowledgments

This work was supported by the Shenzhen-Hong Kong-Macau Science and Technology Project (Category C) (Project No. SGDX20230821091559022), the National Natural Science Foundation of China (Grant Nos. T2350710802, 12275118, and 12275263), the Fundamental Research Funds for the Central Universities (Grant Nos. 124330008 and SWU-XDJH202303), the 13th Five-year plan for Education Science Funding of Guangdong province (Grant No. 2021GXJK349), the Guangdong Major Project of Basic and Applied Basic Research (Grant No. 2023B0303000009), the National Major Scientific Instruments and Equipments Development Project of National Natural Science Foundation of China (Grant No. 62327808), the University Innovation Research Group of Chongqing (Grant No. CXQT21005), the Innovation Program for Quantum Science and Technology (Grant No. 2021ZD0301900), and the Natural Science Foundation of Fujian province of China (Grant No. 2023J02032).

Author contributions

Y.H. designed research; F.M. performed research; F.M., J.X., J.S., T.J., Y.D., and Y.Z. analyzed data; Y.H., J.X., F.M., and S.H. established the equation; and F.M., C.X., X.W., and Y.H. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

Anonymized processed data have been deposited in Figshare (51). Some study data are available. Constrained by the platform that provides data to us, we can only share some of the original data and all of the processed data. Specifically, as follows: WeChat: According to the cooperation regulations with Tencent, we cannot make WeChat’s data (spreading trajectories and the underlying network) publicly available. If researchers would like to access those data, they could apply for the research projects offered by Tencent. These projects aim to establish a platform for industry-university research cooperation and academic exchange for scholars worldwide. Details can be found at refs. 52 and 53. Weibo: We can disclose all of the processed data. Twitter: ref. 54. Previously published data were used for this work (49, 54, 55).

Supporting Information

References

  • 1.Wang P., González M. C., Hidalgo C. A., Barabási A. L., Understanding the spreading patterns of mobile phone viruses. Science 324, 1071–1076 (2009). [DOI] [PubMed] [Google Scholar]
  • 2.Lorenz-Spreen P., Mønsted B. M., Hövel P., Lehmann S., Accelerating dynamics of collective attention. Nat. Commun. 10, 1–9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lazer D., et al. , Computational social science. Science 323, 721–723 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rajkumar K., Saint-Jacques G., Bojinov I., Brynjolfsson E., Aral S., A causal test of the strength of weak ties. Science 377, 1304–1310 (2022). [DOI] [PubMed] [Google Scholar]
  • 5.Wang D., Uzzi B., Weak ties, failed tries, and success. Science 377, 1256–1258 (2022). [DOI] [PubMed] [Google Scholar]
  • 6.Chetty R., et al. , Social capital I: Measurement and associations with economic mobility. Nature 608, 108–121 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lehmann S., Ahn Y. Y., Complex Spreading Phenomena in Social Systems (Springer, 2018), vol. 10, pp. 978–983. [Google Scholar]
  • 8.Deville P., et al. , Scaling identity connects human mobility and social interactions. Proc. Natl. Acad. Sci. U.S.A. 113, 7047–7052 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gallotti R., Valle F., Castaldo N., Sacco P., De Domenico M., Assessing the risks of ‘infodemics’ in response to COVID-19 epidemics. Nat. Hum. Behav. 4, 1285–1293 (2020). [DOI] [PubMed] [Google Scholar]
  • 10.Bovet A., Makse H. A., Influence of fake news in Twitter during the 2016 US presidential election. Nat. Commun. 10, 1–14 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kempe D., Kleinberg J., Tardos É., “Maximizing the spread of influence through a social network” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2003), pp. 137–146. [Google Scholar]
  • 12.Kitsak M., et al. , Identification of influential spreaders in complex networks. Nat. Phys. 6, 888–893 (2010). [Google Scholar]
  • 13.Hu Y., et al. , Local structure can identify and quantify influential global spreaders in large scale social networks. Proc. Natl. Acad. Sci. U.S.A. 115, 7468–7472 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Morone F., Makse H. A., Influence maximization in complex networks through optimal percolation. Nature 524, 65–68 (2015). [DOI] [PubMed] [Google Scholar]
  • 15.Bakshy E., Messing S., Adamic L. A., Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015). [DOI] [PubMed] [Google Scholar]
  • 16.Del Vicario M., et al. , The spreading of misinformation online. Proc. Natl. Acad. Sci. U.S.A. 113, 554–559 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Watts D. J., A simple model of global cascades on random networks. Proc. Natl. Acad. Sci. U.S.A. 99, 5766–5771 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kwak H., Lee C., Park H., Moon S., “What is Twitter, a social network or a news media?” in Proceedings of the 19th International Conference on World Wide Web (ACM, 2010), pp. 591–600. [Google Scholar]
  • 19.Bakshy E., Rosenn I., Marlow C., Adamic L. A., “The role of social networks in information diffusion” in Proceedings of the 21st International Conference on World Wide Web (ACM, 2012), pp. 519–528. [Google Scholar]
  • 20.Vosoughi S., Roy D., Aral S., The spread of true and false news online. Science 359, 1146–1151 (2018). [DOI] [PubMed] [Google Scholar]
  • 21.Hébert-Dufresne L., Scarpino S. V., Young J. G., Macroscopic patterns of interacting contagions are indistinguishable from social reinforcement. Nat. Phys. 16, 426–431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gleeson J. P., Ward J. A., O’sullivan K. P., Lee W. T., Competition-induced criticality in a model of meme popularity. Phys. Rev. Lett. 112, 048701 (2014). [DOI] [PubMed] [Google Scholar]
  • 23.Castellano C., Fortunato S., Loreto V., Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591 (2009). [Google Scholar]
  • 24.Vespignani A., Modelling dynamical processes in complex socio-technical systems. Nat. Phys. 8, 32–39 (2012). [Google Scholar]
  • 25.Iribarren J. L., Moro E., Impact of human activity patterns on the dynamics of information diffusion. Phys. Rev. Lett. 103, 038702 (2009). [DOI] [PubMed] [Google Scholar]
  • 26.Goel S., Anderson A., Hofman J., Watts D. J., The structural virality of online diffusion. Manag. Sci. 62, 180–196 (2016). [Google Scholar]
  • 27.Gruhl D., Guha R., Liben-Nowell D., Tomkins A., “Information diffusion through blogspace” in Proceedings of the 13th International Conference on World Wide Web (ACM, 2004), pp. 491–501. [Google Scholar]
  • 28.Liu C., Zhan X. X., Zhang Z. K., Sun G. Q., Hui P. M., How events determine spreading patterns: Information transmission via internal and external influences on social networks. New J. Phys. 17, 113045 (2015). [Google Scholar]
  • 29.Moreno Y., Nekovee M., Pacheco A. F., Dynamics of rumor spreading in complex networks. Phys. Rev. E 69, 066130 (2004). [DOI] [PubMed] [Google Scholar]
  • 30.Centola D., The spread of behavior in an online social network experiment. Science 329, 1194–1197 (2010). [DOI] [PubMed] [Google Scholar]
  • 31.Granovetter M., Threshold models of collective behavior. Am. J. Sociol. 83, 1420–1443 (1978). [Google Scholar]
  • 32.Myers S. A., Leskovec J., “Clash of the Contagions: Cooperation and competition” in Information Diffusion in 2012 IEEE 12th International Conference on Data Mining (IEEE, 2012), pp. 539–548. [Google Scholar]
  • 33.Davis J. T., Perra N., Zhang Q., Moreno Y., Vespignani A., Phase transitions in information spreading on structured populations. Nat. Phys. 16, 590–596 (2020). [Google Scholar]
  • 34.Hébert-Dufresne L., Althouse B. M., Complex dynamics of synergistic coinfections on realistically clustered networks. Proc. Natl. Acad. Sci. U.S.A. 112, 10551–10556 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Romero D. M., Meeder B., Kleinberg J., “Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on Twitter” in Proceedings of the 20th International Conference on World Wide Web (ACM, 2011), pp. 695–704. [Google Scholar]
  • 36.Ugander J., Backstrom L., Marlow C., Kleinberg J., Structural diversity in social contagion. Proc. Natl. Acad. Sci. U.S.A. 109, 5962–5966 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cheng J., et al. , “Do diffusion protocols govern cascade growth?” in Proceedings of the International AAAI Conference on Web and Social Media (2018), vol. 12. [Google Scholar]
  • 38.Efron B., Tibshirani R. J., An Introduction to the Bootstrap (Chapman and Hall/CRC, 1994). [Google Scholar]
  • 39.Huo L., Chen S., Rumor propagation model with consideration of scientific knowledge level and social reinforcement in heterogeneous network. Physica A 559, 125063 (2020). [Google Scholar]
  • 40.Zheng M., Lü L., Zhao M., Spreading in online social networks: The role of social reinforcement. Phys. Rev. E 88, 012818 (2013). [DOI] [PubMed] [Google Scholar]
  • 41.Xie J., et al. , Detecting and modelling real percolation and phase transitions of information on social media. Nat. Hum. Behav. 5, 1161–1168 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Myers S. A., Leskovec J., “The bursty dynamics of the twitter information network” in Proceedings of the 23rd International Conference on World Wide Web (ACM, 2014), pp. 913–924. [Google Scholar]
  • 43.Wikipedia, Stretched exponential function. https://en.wikipedia.org/wiki/Stretched_exponential_function. Accessed 20 May 2024.
  • 44.Lindsey C. P., Patterson G. D., Detailed comparison of the Williams–Watts and Cole–Davidson functions. J. Chem. Phys. 73, 3348–3357 (1980). [Google Scholar]
  • 45.Berberan-Santos M. N., Bodunov E. N., Valeur B., Mathematical functions for the analysis of luminescence decays with underlying distributions 1. Kohlrausch decay function (stretched exponential). Chem. Phys. 315, 171–182 (2005). [Google Scholar]
  • 46.Newman M. E. J., Networks: An Introduction (Oxford University Press, 2010). [Google Scholar]
  • 47.Cheng J., Adamic L. A., Kleinberg J. M., Leskovec J., “Do cascades recur?” in Proceedings of the 25th International Conference on World Wide Web (International World Wide Web Conferences Steering Committee, 2016), pp. 671–681. [Google Scholar]
  • 48.Cheng J., Adamic L. A., Dow P. A., Kleinberg J. M., Leskovec J., “Can cascades be predicted?” in Proceedings of the 23rd International Conference on World Wide Web (ACM, 2014), pp. 925–936. [Google Scholar]
  • 49.Hodas N. O., Lerman K., “How visibility and divided attention constrain social contagion” in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing (IEEE, 2012), pp. 249–257. [Google Scholar]
  • 50.Hodas N. O., Lerman K., The simple rules of social contagion. Sci. Rep. 4, 1–7 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Meng F., Preliminary disclosure of code and data. Figshare. 10.6084/m9.figshare.27265428.v1. Deposited 21 October 2024. [DOI]
  • 52.Tencent, 2023 CCF-Tencent Rhino-Bird Young Faculty Open Research Fund. https://www.wizsci.com/project/detail/1450?lang=en. Accessed 8 October 2024.
  • 53.Tencent, Tencent University Relations. https://ur.tencent.com/. Accessed 8 October 2024.
  • 54.Lerman K., Twitter 2010 dataset. https://web.archive.org/web/20230316132044 and https://www.isi.edu/~lerman/downloads/twitter/twitter2010.html. Accessed 20 May 2024.
  • 55.Lerman K., Ghosh R., Surachawala T., Social contagion: An empirical study of information spread on Digg and Twitter follower graphs. arXiv [Preprint] (2012). 10.48550/arXiv.1202.3162. Accessed 20 May 2024. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Movie S1.

Spreading process of our model. Visualization of information spreading process under our model. The outbreak probability is high, and the outbreak size is modest.

Download video file (1.9MB, mp4)
Movie S2.

Spreading process of SIR model. Visualization of information spreading process under the SIR model. Both the outbreak probability and the outbreak size are small (only spreading a few steps).

Download video file (797.4KB, mp4)
Movie S3.

Spreading process of social reinforcement model II. Visualization of information spreading process under the social reinforcement model II. Both the outbreak probability and the outbreak size are large (almost spreading the entire network).

Download video file (12.5MB, mp4)

Data Availability Statement

Anonymized processed data have been deposited in Figshare (51). Some study data are available. Constrained by the platform that provides data to us, we can only share some of the original data and all of the processed data. Specifically, as follows: WeChat: According to the cooperation regulations with Tencent, we cannot make WeChat’s data (spreading trajectories and the underlying network) publicly available. If researchers would like to access those data, they could apply for the research projects offered by Tencent. These projects aim to establish a platform for industry-university research cooperation and academic exchange for scholars worldwide. Details can be found at refs. 52 and 53. Weibo: We can disclose all of the processed data. Twitter: ref. 54. Previously published data were used for this work (49, 54, 55).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES