Abstract
Due to the existence of information overload in social networks, it becomes increasingly difficult for users to find useful information according to their interests. This paper takes Twitter-like social networks into account and proposes models to characterize the process of information diffusion under information overload. Users are classified into different types according to their in-degrees and out-degrees, and user behaviors are generalized into two categories: generating and forwarding. View scope is introduced to model the user information-processing capability under information overload, and the average number of times a message appears in view scopes after it is generated by a given type user is adopted to characterize the information diffusion efficiency, which is calculated theoretically. To verify the accuracy of theoretical analysis results, we conduct simulations and provide the simulation results, which are consistent with the theoretical analysis results perfectly. These results are of importance to understand the diffusion dynamics in social networks, and this analysis framework can be extended to consider more realistic situations.
1. Introduction
Research on social networks has received remarkable attention in the past decade, since social networks provide numerous features to encourage information sharing among users. Among the existing social networks, microblogging services (e.g., Twitter and Facebook) have impressively become more and more popular, which provide new communication methods for people to stay connected with their friends. The use of microblogging for lightweight communication makes it important candidate media for informal communication.
Twitter is arguably one of the most well-known microblogging platforms currently available, which is used by hundreds of millions of people all over the world. Twitter users update their daily life activities by computers or mobile phones, so as to broadcast things that happen in their daily lives, such as what they are reading, thinking, and experiencing. Users declare the persons they are interested in by the action following. For the case when user A follows user B, we say user A is one of user B's followers, and user B is one of user A's followees. Twitter users are allowed to post short messages (up to 140 characters) which are so-called tweets and also forward messages which are so-called retweets. Each user has a web form, where all his/her followees' new messages (both tweets and retweets) are arranged in a reverse-chronological order. So after logging in, a user will get noticed if his/her followees have posted new messages.
Essentially, relationships in Twitter are asymmetric, since a user who is followed by another user does not necessarily have to reciprocate by following him/her back. Some social networks adopt symmetric relationships. For example, in Facebook, a relationship is established when a request for friendship is accepted by a user, which adds both on each other's contact lists. If one user removes the other, the relationship is broken. Therefore, an important difference between these two social networks is that the network of Twitter is directed, while that of Facebook is undirected. Having noticed the increasing popularity of Twitter, we take Twitter-like social networks into account in this paper.
Compared with traditional media such as newspapers and television, social networks allow creation and exchange of user-generated contents, while every user can produce and distribute messages. This results in an explosively growing amount of information and makes many social networks become increasingly information saturated. Besides, due to the potential for marketing and advertising, Twitter and some other social networks are considered to be efficient approaches to stimulate the awareness and adoption of products or services. One important benefit of these social networks is that the costs of generating and transmitting information are almost negligible, so advertising messages can reach wide audiences within a short period of time [1]. This also leads to a large volume of advertising information. However, due to the limitation of information-processing capability, if the messages arrive in numbers larger than what users can process, some messages will be lost without catching users' attention, where information overload occurs [2]. Under information overload, users will find it difficult to find useful messages according to their personal interests, which actually has a serious negative impact on the user experience. Therefore, to understand and then address the information overload issue arising in social networks, it is of importance to model and analyze the process of information diffusion under information overload, which is the focus of this paper.
Most research on diffusion dynamics in social networks has focused on the spread of one phenomenon at a time, for example, diffusion models for disease [3], influence [4], knowledge [5], and cooperation [6]. Recently, some researchers have begun to study competitive diffusion, which models the process that multiple competitive epidemics [7], influences [8], or phenomena [9] diffuse through a complex or social network. These problems are somewhat similar to the one considered in this paper, but they fail to characterize the information overload phenomenon in social networks, where every user can generate new messages. In our previous work [10, 11], we study the process of information diffusion under information overload in Facebook-like social networks. We know that the network structure of Twitter is very different from that of Facebook. Besides, considerable effort has been devoted to alleviate the information overload syndrome, where filter-based or cost-based approaches are usually adopted [12–14]. However, to the best of our knowledge, there is no prior work which seeks to model and analyze the process of information diffusion under information overload for Twitter-like social networks.
The remainder of this paper is organized as follows. We describe the models in Section 2 and analyze the process of information diffusion under information overload in Section 3. To verify the accuracy of theoretical analysis results, we conduct simulations and provide the simulation results in Section 4. Finally, we conclude this paper in Section 5.
2. Model Descriptions
In this section, we propose models to capture the characters of Twitter-like social networks, such as network, user behaviors, and information diffusion under information overload. Based on these models, we can analyze the process of information diffusion under information overload theoretically.
2.1. Network
We consider Twitter-like social network as a directed network, where nodes represent typical users and links represent the relationships between pairs of users. Note that a user who is followed by another user does not necessarily have to reciprocate by following him/her back. We let the direction of a link be the same as the direction of information diffusion. For example, in Figure 1, user A is followed by users B, C, D, and E, where the update messages of user A can be received by users B, C, D, and E, and user A can only receive the update messages of user C.
Figure 1.

Network for Twitter-like social networks.
Since isolated users never get involved in the process of information diffusion, we neglect all the isolated users and classify the rest of users into different types according to their in-degrees and out-degrees; that is, a user with in-degree i and out-degree j is of type (i, j), where i + j ≥ 1. For type (i, j) users, we define e k,l i,j to be the probability that a randomly chosen follower is of type (k, l). Then, we have j ≥ 1, k ≥ 1, and
| (1) |
We further define q i,j to be the fraction of type (i, j) users in the network, and we get
| (2) |
Consider the ensemble of networks in which the distributions {e k,l i,j} and {q i,j} take specified values. This defines a random graph model similar to the random graphs defined in [15, 16]. That is to say, the network is drawn uniformly at random from the ensemble of all possible networks with the distributions {e k,l i,j} and {q i,j}. For users, we denote by M the maximum number of in-degrees and by N the maximum number of out-degrees. Then, this network can be characterized by the (M + 1) × N × M × (N + 1) tensor {e k,l i,j} and the (M + 1)×(N + 1) matrix {q i,j}. Note that in a Twitter-like social network, users usually have moderate numbers of followees due to attention limitation. So, we usually have M ≪ N.
2.2. User Behaviors
In Twitter-like social networks, different functions are adopted to diffuse information. After logging in, users can post tweets to broadcast things which happen in their daily lives. There are also other functions such as reply and retweet which allow users to interact with their friends. In this paper, we generalize these behaviors into two categories: generating and forwarding; that is, users can generate new messages or forward messages generated by other users. Note that forwarded messages can still be forwarded.
To model the user ability of message processing under information overload, we introduce the term view scope, which indicates the messages a user can process at a time. Note that for users in Twitter-like social networks, messages are listed in a reverse-chronological order. So for a user with view scope number S, if information overload occurs, he/she can process (i.e., browse) the latest S messages after logging in, while the former ones are lost. In this paper, we assume homogeneous view scope number, which is S, for all users.
To model user behaviors, we make the same assumptions as [10].
The process of user login follows a Poisson process with rate λ.
After logging in and browsing the messages, a user may choose to log off or react to these messages (i.e., generate or forward a message), while the reacting probability is p 1.
Among the reacting actions, users may choose to forward a randomly chosen browsed message with probability p 2 or generate a new message with probability 1 − p 2.
Actually, user online activities may be bursty, and users may generate or forward multiple messages at a time. However, we make these assumptions to simplify the analysis here and plan to extend this analysis framework to more realistic situations in our future work.
2.3. Information Diffusion under Information Overload
Under information overload, messages are arriving in numbers larger than what users can process, and some messages are lost without catching users' attention. We use Figure 2 to illustrate the evolvement of view scopes under information overload. Suppose user A is followed by other users, such as users B, C, and D. The view scopes of these users are depicted in Figure 2(a). After user A processes the messages in his/her view scope (i.e., M A,1, M A,2,…, and M A,S), he/she may generate a new message or just forward a message in his/her view scope. No matter which action is chosen, this message (say M A,0) will be placed at the top of all his/her followers' view scopes, and the messages at the bottom of his/her followers' view scopes (i.e., M B,S, M C,S, and M D,S) will be discarded due to the information overload effect, which are depicted in Figure 2(b).
Figure 2.

Evolvement of view scopes under information overload.
One may argue that the view scope of user A should be cleared after he/she has processed all the messages. However, for simplicity we assume memoryless users here. That is to say, processed messages can still be processed as long as they are in the view scope. We will model the behaviors of users with memories in our future work.
3. Performance Analysis
In this section, we analyze the process of information diffusion under information overload based on the proposed models. Specifically, we are interested in the information diffusion efficiency, which is characterized by the average number of times a message appears in view scopes after it is generated by a type (i, j) user (say u i,j). To achieve this goal, we first calculate the average number of times a message is forwarded by a type (k, l) user after it arrives in this user's view scope (say v k,l).
3.1. Calculation of v k,l
Since users log in following a Poisson process with rate λ, we know that the probability that a user logs in and then generates or forwards a message within a time slot, which is of length Δ, is λΔp 1. Consider a type (k, l) user (say user B). Note that he/she is memoryless and he/she may choose to forward a randomly chosen message in his/her view scope with probability p 2 after he/she decides to react to the browsed messages. So if a message (say M 0) is in his/her view scope, the average number of times that he/she will forward this message in t time slots is
| (3) |
The followees of user B will generate or forward messages, which will be placed at the top of his/her view scope. Let Δ → 0, and then, for user B, the probability that multiple followees generate or forward messages in the same time slot can be neglected. So the probability that a new message arrives in user B's view scope in a time slot is kλΔp 1. Note that message M 0 will be discarded after S new messages arrive. Then the probability that message M 0 will stay in user B's view scope for t time slots is
| (4) |
Therefore, the average number of times that user B will forward message M 0 is
| (5) |
From (5.56) at [17, page 199], we get
| (6) |
So, we have
| (7) |
Remark 1 —
Intuitively, the larger the view scope number S is, the longer a message stays in the view scope and the more this message is forwarded. However, from (7) we find that v k,l is unrelated to S. This is because larger S will lead to more messages stored in the view scope, which reduces the probability that a given message is chosen to be forwarded in a time slot.
3.2. Calculation of u i,j
Consider a type (i, j) user (say user A) and suppose his/her followers are divided into some partition {r 1,0, r 1,1,…, r M,N}, where r k,l is the number of type (k, l) followers and
| (8) |
The probability that the partition takes a particular value {r k,l} is given by the multinomial distribution [16]
| (9) |
We define the generating function G i,j(z) as the distribution of the number of times a message appears in view scopes after it is generated by a type (i, j) user and the generating function H k,l(z) as the distribution of the number of times a message appears in view scopes after it arrives in the view scope of a type (k, l) user. Then
| (10) |
| (11) |
where δ is the Kronecker delta function, and
| (12) |
By submitting (11) into (10), we get
| (13) |
Then, by submitting (9) into (13) and performing the sum over {r k,l}, we have
| (14) |
By solving this equation, we can derive the distribution of the number of times a message appears in view scopes after it is generated by a type (i, j) user. However, here we just calculate the average number of times, which is
| (15) |
We know that users with out-degree 0 can generate or forward messages, but no one can receive them. So, G i,0(z) = 1 and G i,0′(1) = 0. We further know that users with in-degree 0 never forward messages. That is to say, G 0,j′(1) never contributes to the right of (15). So we can first calculate G i,j′(1) where i, j ≥ 1 and then get G 0,j′(1) from (15).
We know that {e k,l i,j} becomes an M × N × M × N tensor for i, j, k, l ≥ 1, which is still hard to handle. We rearrange the elements of this tensor so that they form a matrix, which is called matricizing [18]. Specifically, we let
| (16) |
Then 1 ≤ x, y ≤ MN and
| (17) |
We can write (17) in matrix form and get
| (18) |
where
| (19) |
So we get
| (20) |
Remark 2 —
From (20), we observe that u i,j is determined by p 2 and {e k,l i,j} and is unrelated to other factors such as S, p 1, and {q i,j}.
4. Simulations
To verify the accuracy of theoretical analysis results, we conduct simulations and provide the simulation results in this section. We first take into account a directed ER network and then a growing network model, which generates directed and degree-correlated networks.
The simulations are conducted in a discrete fashion. Specifically, time is slotted, and in each time slot a random user is selected to generate or forward a message. Denoting by K the user number, each simulation is run KT time slots, where T = 10000. That is to say, each user will be selected T times on average to generate or forward messages. We further set S = 10 and p 2 = 0.5.
4.1. Directed ER Network
In the directed ER network, we let the user number K = 2001 and the average user in-degree (or out-degree) α = 10. That is to say, each link is included in the network with probability p = α/(K − 1) = 0.005.
The results for u i,j from simulations and theoretical analysis (i.e., (20)) are depicted in Figure 3. To quantify the gap between these results, we plot the differences in Figure 4, from which we know that the theoretical analysis results coincide very well with the simulation results.
Figure 3.

Results for u i,j from (a) simulations and (b) theoretical analysis.
Figure 4.

Differences between the results for u i,j from simulations and theoretical analysis.
4.2. Growing Network Model
Degree correlations among nodes in a network essentially characterize the network structure, while many real-world networks show degree correlations [19–21]. In particular, social networks show assortative mixing, that is, a preference of high-degree nodes to be connected to other high-degree nodes [15, 16].
To generate directed and degree-correlated networks, we adopt the growing network model proposed in [22], where in each step the probability of adding a new node and creating a link from one of the earlier nodes (say A) is
| (21) |
and the probability of adding a new link and connecting two old nonlinked nodes (say from A to B) is
| (22) |
where V is the node set, d ∗ out (d ∗ in) is the out-degree (in-degree) of node ∗, and parameters β, γ must obey the constraints β > 0 and γ > −1 to ensure that each node will be chosen with positive probability.
Here, we set K = 2000, q = 0.6, and β = γ = 5 to generate a degree-correlated network. The distributions of in-degrees and out-degrees are depicted in Figure 5, from which we know that in-degrees and out-degrees follow power law distributions. The degree correlations are depicted in Figure 6, from which we know that degree correlations at users are evident, while high in-degree users usually have high out-degrees. However, the degree correlations at links are not so obvious.
Figure 5.

Distributions of (a) in-degrees and (b) out-degrees.
Figure 6.

Degree correlations. (a) Degree correlations at users. Columns indicate the distributions of out-degrees for users with given in-degrees. (b) Degree correlations at links. Columns indicate the distributions of followers' in-degrees for users with given out-degrees.
Simulation results for u i,j are depicted in Figure 7(a), while the theoretical analysis results are depicted in Figure 7(b). We also plot the differences in Figure 8, from which we know that the theoretical analysis results are quite consistent with the simulation results, especially for users with low in-degrees and out-degrees. However, even for the only user who is of type (5,39), the value of difference is about 6, which is very small compared to the value of u 5,39.
Figure 7.

Results for u i,j from (a) simulations and (b) theoretical analysis.
Figure 8.

Differences between the results for u i,j from simulations and theoretical analysis.
5. Conclusion
Having noticed the increasing popularity of Twitter and negative influence of information overload, we take Twitter-like social networks into account and propose models to capture the characters such as network, user behaviors, and information diffusion under information overload. Based on these models, we analyze the process of information diffusion under information overload theoretically, and the accuracy of theoretical analysis results is verified by simulations. These results are of importance to understand the diffusion dynamics in social networks and of use for advertisers in viral marketing. However, to simplify the analysis, we make some assumptions such as Poisson arrival and memoryless users, which seem to be unrealistic. We seek to extend these models to characterize more realistic situations and validate the theoretical analysis results by empirical evidence in our future work. Besides, the impact of degree correlations on spreading dynamics appears to be nontrivial [23], and it is demonstrated that degree correlations strongly influence information diffusion [24]. Another future work of this paper is to analyze the impact of degree correlations on the information diffusion under information overload.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants nos. 61105124 and 71331008, the Research Fund for the Doctoral Program of Higher Education of China under Grant no. 20114307120023, and the China Scholarship Council under Grant no. 2011611534.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
- 1.Leskovec J, Adamic LA, Huberman BA. The dynamics of viral marketing. ACM Transactions on the Web. 2007;1(1) [Google Scholar]
- 2.Koroleva K, Krasnova H, Gunther O. ‘stop spamming me!’-exploring information overload on facebook. Proceedings of the 16th Americas Conference on Information Systems (AMCIS '10); August 2010; Lima, Peru. [Google Scholar]
- 3.Anderson RM, May RM. Population biology of infectious diseases: part I. Nature. 1979;280(5721):361–367. doi: 10.1038/280361a0. [DOI] [PubMed] [Google Scholar]
- 4.Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03); August 2003; New York, NY, USA. pp. 137–146. [Google Scholar]
- 5.Cowan R, Jonard N. Network structure and the diffusion of knowledge. Journal of Economic Dynamics and Control. 2004;28(8):1557–1575. [Google Scholar]
- 6.Santos FC, Pacheco JM, Lenaerts T. Evolutionary dynamics of social dilemmas in structured heterogeneous populations. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(9):3490–3494. doi: 10.1073/pnas.0508201103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Karrer B, Newman MEJ. Competing epidemics on complex networks. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics. 2011;84(3) doi: 10.1103/PhysRevE.84.036106.036106 [DOI] [PubMed] [Google Scholar]
- 8.Borodin A, Filmus Y, Oren J. Threshold models for competitive influence in social networks. Proceedings of the 6th International Conference on Internet and Network Economics (WINE '10); 2010; Stanford, Calif, USA. pp. 539–550. [Google Scholar]
- 9.Broecheler M, Shakarian P, Subrahmanian VS. A scalable framework for modeling competitive diffusion in social networks. Proceedings of the 2nd IEEE International Conference on Social Computing (SocialCom '10), and the 2nd IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT '10); August 2010; pp. 295–302. [Google Scholar]
- 10.Li P, Xing K, Wang D, Zhang X, Wang H. Information diffusion in facebook- like social networks under information overload. International Journal of Modern Physics C. 2013;24(7)1350047 [Google Scholar]
- 11.Li P, Sun Y, Chen Y, Tian Z. Estimating user influence in online social networks subject to information overload. International Journal of Modern Physics B. 2014;28(3)1450004 [Google Scholar]
- 12.Solan E, Reshef E. Discussion Papers. 1402. Evanston, Ill, USA: Northwestern University, Center for Mathematical Studies in Economics and Management Science; 2005. The effect of filters on spam mail. [Google Scholar]
- 13.Kraut RE, Sunder S, Morris J, Telang R, Filer D, Cronin M. Markets for attention: will postage for email help?. Proceedings of the 8th 2002 ACM Conference on Computer Supported Cooperative Work (CSCW '02); November 2002; New Orleans, La, USA. pp. 206–215. [Google Scholar]
- 14.Cheng J, Sun A, Zeng D. Information overload and viral marketing: countermeasures and strategies. Lecture Notes in Computer Science. 2010;6007:108–117. [Google Scholar]
- 15.Newman MEJ. Assortative mixing in networks. Physical Review Letters. 2002;89(20) doi: 10.1103/PhysRevLett.89.208701.208701 [DOI] [PubMed] [Google Scholar]
- 16.Newman MEJ. Mixing patterns in networks. Physical Review E. 2003;67 doi: 10.1103/PhysRevE.67.026126.026126 [DOI] [PubMed] [Google Scholar]
- 17.Graham RL, Knuth DE, Patashnik O. Concrete Mathematics. Reading, Mass, USA: Addison Wesley; 1994. [Google Scholar]
- 18.Bader BW, Kolda TG. Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software. 2006;32(4):635–653. [Google Scholar]
- 19.Pastor-Satorras R, Vázquez A, Vespignani A. Dynamical and correlation properties of the internet. Physical Review Letters. 2001;87(25):1–4. doi: 10.1103/PhysRevLett.87.258701.258701 [DOI] [PubMed] [Google Scholar]
- 20.Vázquez A, Pastor-Satorras R, Vespignani A. Large-scale topological and dynamical properties of the Internet. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics. 2002;65(6):1–12. doi: 10.1103/PhysRevE.65.066130.066130 [DOI] [PubMed] [Google Scholar]
- 21.Newman MEJ, Park J. Why social networks are different from other types of networks. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics. 2003;68(3):1–18. doi: 10.1103/PhysRevE.68.036122.036122 [DOI] [PubMed] [Google Scholar]
- 22.Krapivsky PL, Rodgers GJ, Redner S. Degree distributions of growing networks. Physical Review Letters. 2001;86(23):5401–5404. doi: 10.1103/PhysRevLett.86.5401. [DOI] [PubMed] [Google Scholar]
- 23.Payne JL, Dodds PS, Eppstein MJ. Information cascades on degree-correlated random networks. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics. 2009;80(2) doi: 10.1103/PhysRevE.80.026125.026125 [DOI] [PubMed] [Google Scholar]
- 24.Karsai M, Kivelä M, Pan RK, et al. Small but slow world: how network topology and burstiness slow down spreading. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics. 2011;83(2) doi: 10.1103/PhysRevE.83.025102.025102 [DOI] [PubMed] [Google Scholar]
