A Model-Free Scheme for Meme Ranking in Social Media

Saike He; Xiaolong Zheng; Daniel Zeng

doi:10.1016/j.dss.2015.10.002

. Author manuscript; available in PMC: 2017 Jan 1.

Published in final edited form as: Decis Support Syst. 2015 Oct 13;81:1–11. doi: 10.1016/j.dss.2015.10.002

A Model-Free Scheme for Meme Ranking in Social Media

Saike He ¹, Xiaolong Zheng ^1,^*, Daniel Zeng ^1,²

PMCID: PMC4725602 NIHMSID: NIHMS729681 PMID: 26823638

Abstract

The prevalence of social media has greatly catalyzed the dissemination and proliferation of online memes (e.g., ideas, topics, melodies, tags, etc.). However, this information abundance is exceeding the capability of online users to consume it. Ranking memes based on their popularities could promote online advertisement and content distribution. Despite such importance, few existing work can solve this problem well. They are either daunted by unpractical assumptions or incapability of characterizing dynamic information. As such, in this paper, we elaborate a model-free scheme to rank online memes in the context of social media. This scheme is capable to characterize the nonlinear interactions of online users, which mark the process of meme diffusion. Empirical studies on two large-scale, real-world datasets (one in English and one in Chinese) demonstrate the effectiveness and robustness of the proposed scheme. In addition, due to its fine-grained modeling of user dynamics, this ranking scheme can also be utilized to explain meme popularity through the lens of social influence.

Index Terms: Meme ranking, model-free scheme, transfer entropy

1. Introduction

Meme (pronounced “meem”) was first coined by Richard Dawkin in analogy with gene in genetics four decades ago [1]. It is defined as "unit of conceptual replication" that identifies idea, topic or style that spreads from person to person within a culture. Like the natural selection of genes that confer 'differential reproductivity', memes also compete for our scarce individual and collective attention [2]. During this process, some of them quickly die out of popularity while others persist for a long periods of time. In recent years, the advent of various social media platforms has lowered the cost of information generation, boosting the potential reach of each meme among online users. This information abundance is exceeding human capacity to consume it [3–5]. Therefore, an effective ranking scheme is imperative to focus human limited attention on the most important memes. Appropriate solutions for this issue would provide direct implications in refining online advertisement and content distribution. In online advertising, new revenue models could be developed to charge advertisers for the amount of attention that a meme will receive. In media outlets, ranking information can be used to highlight the most popular memes in realtime. These condensed results are especially beneficial in emergent situations where information fragments emerge at random moments, such as social events [6, 7], public health [8, 9], and political campaigns [10–12]. In light of such importance, meme ranking has attracted considerable research interests in various disciplines [2, 13, 14].

However, to the best of our knowledge, few existing studies provide an adequate solution for meme ranking task. Traditional bottom-up approaches attempt to construct various diffusion models in analogy to behavior replication [15–18], epidemic contagion [14, 19–24], or competitive gaming [25, 26]. Although these models can help to track the diffusion process and measure its fact on online users, their computational complexities are often comparatively high, sometimes even NP-hard [27]. Furthermore, oversimplifications made in these models, such as user homogeneity [2, 16], static network structure [19], and finite interaction patterns [28, 29], can lead to unrealistic or even misleading conclusions. Recently, several top-down approaches have been developed to characterize meme dynamics, which is critical in accessing the evolution and mutation of online memes [4]. These studies mainly focus on quantifying the topological centrality [30, 31], content similarity [32], or user behaviors [13] based on large-scale datasets. This line of research has provided significant insights in understanding the trend of the web. Yet, lacking fine-grained modeling of user interactions in meme diffusion, they still cannot characterize meme dynamics well.

To solve the above challenges, in this paper, we elaborate a model-free ranking scheme that characterizes meme dynamics with few assumptions. Different from previous work, our ranking scheme is designed based on information theory and could capture complex meme dynamics without modeling its exact diffusion process. In addition, while most existing studies are concerned with aggregate measures for meme ranking, the scheme presented here allows more fine-grained characterization on information diffusion among online users. This key property enables us not only to rank meme at the macro level, but also to inquiry key factors determining meme popularity at the micro level. For evaluation, we have used two different genres of datasets: one from a Chinese microblogging system and the other one from an American political blog forum. Experimental results on these two datasets validate the efficiency and robustness of the scheme compared with several benchmark approaches. By examining two key factors pertaining to meme spreaders, we also uncover several principles governing meme popularity. These findings may provide both academic and industrial implications in understanding other new types of memes such as innovation [15], rumor [19], and viral marketing [14, 28].

The remainder parts of the paper are structured as follows. Section 2 reviews existing studies most relevant to our task. In Section 3, the technical details for the proposed meme ranking scheme are represented. Section 4 gives the empirical results of our proposed scheme in comparison with several existing approaches. Finally, Section 5 concludes this paper with a summary and a discussion about future research directions.

2. Literature review

The original work regarding meme traces back to a theory proposed by Dawkin [1], who first coined the concept of meme. This concept is utilized to describe the potential process of information diffusion among online users, in analogy with gene in genetics. In the following part of this section, we will present the existing studies relevant to our work from two perspectives, including meme diffusion and meme ranking.

2.1 Meme diffusion

Existing studies concerning meme diffusion mainly focus on constructing various theoretical models from different views. These models can help us to uncover the potential evolutionary patterns of meme diffusion to a certain extent. Generally, these meme diffusion models can be roughly categorized into three groups, i.e. cascade models, epidemic models, and competitive models respectively.

2.1.1 Cascade models

One of the famous cascade models is proposed by Bikhchandani et al. [16], who explore social changes by assuming all users hold the same belief in behavior making. This assumption clearly does not hold in real-world situations. Kempe et al. [15] then study online innovation diffusion and try to maximize its influence among users by selecting a subset of key nodes. In their cascade model, dynamics of neighbor pairs are considered independently. In fact, user dynamics is highly interwoven. Models for multiple cascades have been studied by extending the existing independent cascade model. These models generally assume that the status of each node keeps intact once influenced by other nodes [17]. Myers and Leskovec [18] further infer social relations based on information propagation in latent social networks. Both the cascades and infections are postulated to be conditionally independent in their propagation model. One common drawback of all these work is that assumptions made in modeling clearly do not hold in real-world practice. In contrast, our model makes no explicit assumptions about information dynamics.

2.1.2 Epidemic models

The epidemical analogy of information to virus has opened a new perspective for investigating meme diffusion and evolution. This, in turn, leads to pervasive applications of compartmental models such as SI, SIR, and SIS [20, 21, 33]. The spread of rumors and the detection for its source are studied with classical susceptible-infected (SI) model [19]. This model heavily depends on the network structure, which keeps developing and evolving. Some researchers study meme dynamics in the context of personal publishing. Gruhl et al. [23] employ snapshot models to depict topic propagation in blogspace. Their models are designed to characterize dynamics for both the communities and users. Article memes are studied by expressing complex human dynamics in analogy with infection by a virus [22]. These studies often assume the background environment as constant, which is not very practical in real world situations. In another strand of research, Richardson and Domingos [14] seek to optimize viral marketing plans by mining knowledge-sharing websites. In their probabilistic models, only one type of marketing action is considered. This simplicity may run counter to actual marketing scenarios.

2.1.3 Competitive models

To study meme competition among public attention, Weng et. al. [2] employ a parsimonious agent-based model. However, their model highly relies on the underlying network structure and does not account for the discrepancy in user interest. Wei et al. [34] try to predict meme prevalence by considering network structural and information propagation at the same time. They assume that all nodes are passive and can be characterized with the same propagation model. Further, mixture of meme effect on individual is forbidden. Such postulation may not reflect the real situation in many circumstances. Goldenberg et al. [28] try to understand personal communications in word-of-mouth marketing. However, their complex system modeling technique could only cope with two types of predefined social interactions.

2.2 Meme ranking

Though there is comprehensive work investigating meme diffusion, to the best of our knowledge, the existing studies concerning meme ranking are comparatively limited. In what follows, we present a brief survey for this line of research.

Ienco et al. [35] and Bonchi et al. [27] initially attempt to construct propagation models to rank memes, but find that these models are pragmatically unfeasible since their computational complexities are NP-hard. Consequently, they turn to employ several heuristic methods. However, these methods cannot distinguish the direction of information flow, which is crucial in determining user importance in meme diffusion. Different from their work, in this paper, we adopt an asymmetric measure that is capable of capturing the direction of information flow among users. In another line of research, Bauckhage [13] ranks memes according to their average daily activity. Since activity level is measured via relative value, the ranking result may be confounded by other memes beyond consideration. Thus, it is highly possible that meme activity increases while its portion drops due to the proliferation of unknown memes.

There also exist other studies trying to rank meme based on topological centrality measures, such as in/out-degree, and number of followers. Gloor [30] measures trends on the web based on betweenness centrality. This measure requires a complete collection of underlying network structure, which is impossible in most scenarios. PageRank is a centrality algorithm that has been used widely in network analysis and ranking related tasks. Rather than prioritizing authoritative blogs, Adar et al. [31] try to rank blogs from the perspective of information diffusion. To this end, they propose an iRank algorithm to rank blogs based on implicit link structure. Their approach requires additional resource to train a link predictor, whose performance highly relies on the quality of this resource. However, such resource is not always available in real world practice, thus limiting its applications on a wide scale. Gordevicius et al. [32] focus on ranking news stories. Instead of using hypertext-links, they construct implicit links based on content similarity. Their algorithm is computational expensive, since it is equivalent to obtaining the stationary distribution of a random walk over a whole graph. Besides, the ranking result varies based on the similarity measure used. In contrast, except for user behaviors, our approach does not need any extra information. In addition, it computational complexity is also acceptable.

Recently, studies on meme ranking turn to explore dynamic information. One of the significant studies is presented by Kwak et al.[36], who attempt to rank trending topics based on singleton, reply, mention, and retweet information. Since such information is highly topic-dependent, a reliable scheme is thus imperative. This constitutes the main motivation for our work in this paper.

3. The meme ranking scheme

This section describes our proposed scheme for meme ranking. We first elaborate the rational for the designed scheme, and then introduce its formulation and computation in turn.

3.1 Scheme construction

Though meme ranking has attracted considerable attention and has been explored in different frameworks, existing ranking criteria are rather task dependent, and there still lack a theoretical guideline to measure meme popularity. Empirical analysis suggests that highly popular memes often associate with influential users spreading them [37]. We thus hope to design a scheme to rank meme based on the influence of users engaged in spreading it. Considering the variation in user volume of different memes, we propose to measure meme popularity based on the average influence of all users for each meme. This averaging manipulation is assumed to partially mitigate some confounding factors caused by external inference [38], unobserved heterogeneity [39], or some other contextual effects [40]. Then, the proposed ranking scheme can be formulated as:

{Pop}_{m} = \frac{1}{# U_{m}} \sum_{u \in U_{m}} Influence (u)

(1)

where, Pop_m quantifies the popularity of meme m; U_m represents the collection of users participating in spreading it, and the operator ‘#’ measures the volume size of the set next to it; Influence(u) corresponds to the influence of user u.

Given the ranking scheme defined by (1), we proceed to identify influence for each online user, as discussed next.

3.2 Influence identification

Though influence identification has been explored relatively thoroughly in social dynamics, formulating it in a model-free manner is not done before. To deal with the problems with the previous Mutual Information-based approaches, we adopt the recently developed transfer entropy [41] concept to guide the design of our scheme since this approach is asymmetric and can capture arbitrary nonlinear interactions well. While various kinds of information can be utilized to measure user influence, in this ranking scheme, we choose to use user behaviors during meme diffusion process. This guarantees that the identified user influence is most relevant to meme dynamics. In what follows, we present our influence identification approach, which is a model-free strategy.

3.2.1 Problem formulation

Suppose a pair of users x and y in online social media, whose behavior history can be approximated by two Markov processes X = x_t and Y = y_t (Fig. 1), we define an entropy rate h₁ [42] to measure the amount of additional information required to represent the next behavior x_t+1 of user x given the historical information of the two users:

h_{1} = - \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log p (x_{t + 1} | x_{t}^{m}, y_{t}^{n})

(2)

where, $x_{t}^{m} = (x_{t}, \dots, x_{t - m + 1}), y_{t}^{n} = (y_{t}, \dots, y_{t - n + 1})$ ; m and n are the orders (memory) of the Markov process X and Y respectively.

Fig. 1 — Illustration of transfer entropy. Solid vertical line corresponds to each user behavior launched at timestamp t (green for user x and blue for user y). $H (x_{t + 1} | x_{t}^{m})$ amounts to the uncertainty about user x, $H (x_{t + 1} | x_{t}^{m}, y_{t}^{n})$ amounts to the uncertainty about user x, if we know the behaviors of user y.

Suppose the status observation x_t+1 is not dependent on the historical observations $y_{t}^{n}$ :

h_{2} = - \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log p (x_{t + 1} | x_{t}^{m})

(3)

Then, the departure of entropy rate defined by h₁ and h₂ is given by:

h_{2} - h_{1} = - \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log p (x_{t + 1} | x_{t}^{m}) + \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log p (x_{t + 1} | x_{t}^{m}, y_{t}^{n}) = \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log (\frac{p (x_{t + 1} | x_{t}^{m}, y_{t}^{n})}{p (x_{t + 1} | x_{t}^{m})})

(4)

With substitutions

p (x_{t + 1} | x_{t}^{m}, y_{t}^{n}) = p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) / p (x_{t}^{m}, y_{t}^{n})

(5)

p (x_{t + 1} | x_{t}^{m}) = p (x_{t + 1}, x_{t}^{m}) / p (x_{t}^{m})

(6)

We obtain:

h_{2} - h_{1} = \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log (\frac{p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) \cdot p (x_{t}^{m})}{p (x_{t}^{m}, y_{t}^{n}) \cdot p (x_{t + 1}, x_{t}^{m})})

(7)

Equation (7) captures the transfer entropy from user y to user x, which can be further rewritten into a conditional mutual information:

T E (Y \to X) = h_{2} - h_{1} = - \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log (\frac{p (x_{t + 1}, x_{t}^{m})}{p (x_{t}^{m})}) + \sum_{x_{t + 1}, x_{t}^{m}, y_{t}^{n}} p (x_{t + 1}, x_{t}^{m}, y_{t}^{n}) log (\frac{p (x_{t + 1}, x_{t}^{m}, y_{t}^{n})}{p (x_{t}^{m}, y_{t}^{n})}) = H (x_{t + 1} | x_{t}^{m}) - H (x_{t + 1} | x_{t}^{m}, y_{t}^{n})

(8)

where, H(*) calculates entropy over a given distribution. For the sake of simplicity, we take m=n from this point on.

Equation (8) quantifies the amount of information that can be used to predict the behaviors of user x. This can be considered as a reflection of influence wielded by user y (Fig. 1). As (8) is defined in an asymmetric manner, thus it can be used to analysis user heterogeneity with regard to personal influence and investigate how it is related to meme popularity. This will be shown in the experiment section.

3.2.2 Influence estimation

We now turn to estimate transfer entropy defined in (8). Considering finite date samples available, we adopt a bin-based approach used by Kaiser and Schreiber [43]. In our formulation, the behavior history of user x can be recorded in a time series:

S x = {t_{j} : 0 < t_{1} < t_{2}, \dots}

(9)

To indicate whether user launches a behavior within a time span, a binned indicator function is introduced:

B x (a, b) = {\begin{matrix} 1 & if \exists t_{j} \in S x \cap (b, a], \\ 0 & otherwise . \end{matrix}

(10)

Over a long period of observation time interval [δ, T], we can define the probability function of user behavior as:

P (B x (t, t - δ) = x_{t}) \equiv \frac{1}{T - δ} \int_{δ}^{T} d t [B x (t, t - δ) = x_{t}]

(11)

where ‘[]’ equals to 1 when logical repression enclosed is true and 0 otherwise.

Similarly, a joint probability distribution could be defined over a sequence of adjacent bins:

P (B x (t, t - δ_{0}) = x_{t}, B x (t - δ_{0}, t - δ_{0} - δ_{1}) = x_{t - 1}, \dots)

(12)

where, δ₀,δ₁,…,δ_k are bin widths. Without loss of generality, we omit the binned indicator function, then (12) changes to P(x_t,x_t−1,…,x_t−k), with its most compact formation of $X_{t}^{(t - k)} \equiv {x_{t}, x_{t - 1}, \dots, x_{t - k}}$ .

Then, for two users x and y with respective recorded of time series of Sx and Sy, the joint probability distribution over a common set of bins δ₀,δ₁,…,δ_k can be denoted as $P (x_{t}^{m}, y_{t}^{n})$ .

Given these notations, TE(Y→X) defined in (8) can be readily calculated. Because TE(Y→X) only depicts peer influence, the total influence Influence(Y)of user y is obtained by summarizing the total influence he wields on all others in the community.

3.2.3 Bias remediation

Estimation based on limited data will lead to biases and statistical errors [44, 45]. These errors mainly come from two sources: systematic deviation and statistical deviation. Systematic deviation is tackled with randomized experiments, as it will be discussed in Section 4. Here, statistical deviation can be eliminated mainly through two methods: ex-ante limitation and ex-post elimination. In the former, statistical deviation can be restrained to an arbitrary level with respect to the given dataset. Ex-post elimination, on the other hand, works by first estimating the bias itself, and then adjusting final result accordingly. Further, this method requires some general a priori knowledge (e.g., Panzeri-Treves bias estimate [46]), and works like a post hoc remedy. Thus, we consider ex-ante limitation more appropriate for our scenario of influence estimation.

In proposed scheme, we use Simpson’s rule [47] to estimate the influence defined in (8),which is formulated as:

\int_{a}^{b} T E^{'} (t) d t = \int_{a}^{b} [\frac{(t - c) (t - b)}{(a - c) (a - b)} T E^{'} (a) + \frac{(t - a) (t - b)}{(c - a) (c - b)} T E^{'} (c) + \frac{(t - a) (t - c)}{(b - a) (b - c)} T E^{'} (b)] d t = \dots = \frac{b - a}{6} [T E^{'} (a) + 4 T E^{'} (\frac{a + b}{2}) + T E^{'} (b)]

(13)

where, TE'(t) is the derivative of TE(t).

Simpson’s method approximates the target function via a “piecewise” quadratic. This means if a function is already quadratic, then the approximation will be exact. This property guarantees an unbiased estimation for user influence. In addition, Simpson’s rule is computationally efficient since the computational complexity of estimating (8) is O(Nlog (N)). This cost is acceptable for the meme ranking task.

4. Experimental results

In this section, we first introduce the datasets used for evaluation, and then describe the randomized trials used to alleviate systematic deviation. Finally, we present our experimental design and corresponding results.

4.1 Datasets

We evaluate the proposed ranking scheme on two different genres of dataset: Sina Weibo¹ and Daily Kos².

4.1.1 Sina Weibo

Sina Weibo is a Twitter-like microblogging system in China. With more than 40 million active users spreading approximately 100 million messages each day³, this system is generally considered an ideal laboratory for investigating information contagion, especially for Chinese content. Of particular interest is the section of Sina Weibo named ‘Hot Topics’. In this section, trending topics are ranked according to their popularity among the public in China. Within each topic, a vast number of messages keep evolving and mutating as the topic flows through the network. In this scenario, a topic is an incarnation of meme, while the messages spreading along different threads are operational proxies to track its dynamics. In what follows, without ambiguity, we will use the term meme to refer to each topic in Sina Weibo.

To evaluate the meme ranking task, we crawled down all 10 memes in the ‘Hot Topics’ section. For each meme, we only collected the top 10 threads. As a huge number of messages are generated in each thread, there already manifests sufficient information about user behavior and its corresponding timestamp. Thus, this dataset (Weibo hereafter) is ideal for evaluating the meme ranking task. Statistical information for each meme in Weibo dataset is shown in Table 1. Table 1 suggests that each meme comprises at least 19,000 messages and users. This is a big enough dataset for our evaluation.

Table 1.

Statistical information for Weibo dataset.

Meme ID	Meme Title	#Messages	#Users
1		70424	69397
2		19901	19477
3		88179	86200
4		74196	72686
5		220564	216234
6		33092	29808
7		125200	122331
8		74453	58128
9		36033	33723
10		53938	50425

Total	--	795980	758409

Open in a new tab

Note: Data collected on April 25^th, 2013. ‘#’denotes ‘the number of’.

Meme ID corresponds to its ranking position, and ‘--’ means unapplicable.

4.1.2 Daily Kos

Daily Kos is an American political blog that enable users to publish news and share opinions liberally. This site was founded in 2002, and soon became the premier online political discussion community with 2.5 million visitors per month⁴. On this blog platform, professional political writers post directly to the front page, while other regular users can post “diaries”. In responses to these front page entries and diaries, users write comments and make recommendations, thus driving topics spreading in the community. Ultimately, fiercely discussed diaries will be assigned a specific tag, and popular tags will be ranked in a ‘HOT TAGS’ section in the front page. In this blog community, we consider the hot tags as popular memes. Hereafter, meme will be used interchangeably with tag in Daily Kos.

To construct the evaluation dataset, we crawled down the top 10 hot tags with the 200 most recent diaries. For each diary, we also collected all the related comments and timestamps. Table 2 summarizes the statistics for Daily Kos dataset (Kos hereafter). Compared with Table 1, we find that users in Kos are more contributive by generating more messages (about 20 times higher). This trait may lead to different ranking results from that in Weibo, as will be investigated in following experiments.

Table 2.

Statistical information for Kos dataset.

Meme ID	Meme Title	#Messages	#Users
1	Recommended	372225	9454
2	Affordable Care Act	180798	10855
3	Community	260810	5506
4	HealthCare	147686	10557
5	Elections	148456	10851
6	Republicans	157238	10049
7	2014	64318	7043
8	Environment	151736	7519
9	Economy	148888	8830
10	BarackObama	228088	10039

Total	--	1860243	90703

Open in a new tab

Note: Data collected on April 2^nd, 2014. ‘#’denotes ‘the number of’.

Meme ID corresponds to its ranking position, and ‘--’ means unapplicable.

4.2 Randomized trials

To tackle the systematic deviation caused by multiple sources of bias, we design a randomized trial to minimize the potential negative effects. Our strategy is similar to He et al. [48], but comparatively practical and effective: we randomly sample user behaviors with corresponding timestamps for Weibo and Kos respectively⁵. To avoid data sparsity, users who have less than 2 behavior records have been be pruned out.

This procedure brings four main benefits. First, it alleviates the effects of selection bias, such as crawling strategy, and time point for data collection. Second, it guarantees that the sampled data are representative enough for the whole volume. Third, it statistically mitigates the effect of data incompleteness. Finally, it controls the inferences brought about by information leakage [49]. As users may be exposed to multiple memes at one time, randomized manipulation can offset such inferences with systematic expectation.

In the following experiments, we execute 10 independent randomized samplings for each approach, and the experimental results are averaged across all the runs. If not explicitly stated otherwise, we adopt a sampling rate of 5%.

4.3 Experimental design

In the following sections, we investigate meme ranking task by studying three major issues:

Issue 1: ranking performance. We explore how the proposed ranking scheme performs on two large-scale, real-world datasets.
Issue 2: ranking robustness. We examine the robustness of the ranking scheme by testing whether its performance is sensitive to different sampling rates.
Issue 3: popularity factors. We quantify two factors related to user heterogeneous, and dissect how they influence meme popularity.

4.3.1 Parameter setup

To quantify user influence in the proposed ranking scheme, we employ repost behavior and comment behavior respectively in Weibo and Kos. In parameter settings, for all following experiments, we take m=n=3 in (8) as the Markovian order for user behaviors. According to our empirical analysis, higher values do not give better performance, yet only improves computational cost. Considering the long tail phenomenon in online social behaviors [50–52], as well as the varying observation periods in different datasets, we divide time bins of user behavior elastically by selecting δ₀=0.01T, δ₁ = 0.1T, and δ₂ = 0.2T respectively⁶, where T is the total observation period in a dataset.

4.3.2 Evaluation method

For an objective evaluation on meme ranking results, we conduct experiments under two criteria: Edit Distance [53] based criterion and Kendall-tau Distance [54] based criterion. Gold standard is chosen as the ranking result of memes from the original website. In the following experiments, we clarify whether we can predict this result merely based on partial dataset available from the website.

Kendall tau Distance Based Criterion

Kendall tau Distance is a ranking metric defined as the number of pairwise disagreements between two rankings [55–57]. Due to its advantages in computability and interpretability, Kendall tau Distance has been used widely in information retrieval to evaluate ranking quality [55, 58–60]. Given two lists L₁ and L₂, the Kendall tau Distance between them is:

K (τ_{1}, τ_{2}) = | {(i, j) : i < j, (τ_{1} (i) < τ_{1} (j) \land τ_{2} (i) > τ 2 (j)) \lor (τ_{1} (i) > τ_{1} (j) \land τ_{2} (i) < τ 2 (j))} |

(14)

where, τ₁(i) and τ₂(i) are the ranking position of element i in L₁ and L₂. Originally, K(τ₁,τ₂) equals to 0 if the two lists are identical and n(n−1)/2 (where n is the length of list) if one list is the reverse of the other. For the convenience of comparison, we normalize K(τ₁,τ₂) and convert it to a similarity value:

Kendall - tau_Sim (L_{1}, L_{2}) = 1 - \frac{2 K (τ_{1}, τ_{2})}{n (n - 1)}

(15)

Kendall −tau_Sim(src, tar) has an interval [0, 1], where 1 indicates perfect matching of the two lists.

Edit Distance based Criterion

Edit Distance is an alternative criterion used in evaluating ranking result. It is defined based on the Levenshtein distance, thus sensitive to item positioning in ranking. This trait enables close dissection on the difference between the gold standard and the ranking for evaluations.

Also, we convert the original value given by the Edit Distance to a normalized similarity score:

Edit_Sim (L_{1}, L_{2}) = 1 - \frac{edit_dist (L_{1}, L_{2})}{max_length (L_{1}, L_{2})}

(16)

where, ‘L₁’ and ‘L₂’ represent two lists to be measured, edit_dist() is a function calculating the Edit Distance, and max_length() denotes the maximum length of the two objects.

The similarity score Edit_Sim(L₁,L₂) has a unit value interval of [0,1], and higher value indicate better ranking result.

4.3.3 Benchmark approaches

To evaluate the comparative performance of the proposed ranking scheme, we also introduce four representative benchmark approaches in the literature, as discussed below.

Benchmark 1: followers-centered approach (Follower_Num)

Previous studies have employed different topological characteristics of social networks to measure social influence, such as the in/out-degree [61] and PageRank [62]. Here, we use follower number to quantify user influence. Despite its simplicity, follower number is considered as a rational indicator of user influence since following links determine the flow of information [63] and more following links means more opportunities to influence others. More sophisticated algorithms might yield better results, and we will examine one of them later. Given these analytics, for each meme m, its overall popularity score Pop_FollowNr_m can be defined as:

Pop_Follow N r_{m} = \frac{1}{# U_{m}} \sum_{u \in U_{m}} # Follower (u)

(17)

where, U_m represents the collection of users engaged in spreading meme m, Follower(u) is the collection of all followers of user u, and the operator ‘#’ measures the volume size of the set next to it.

Benchmark 2: PageRank-based approach (PageRank)

PageRank is an alternative metric for quantifying user influence. Apart from the number of links, PageRank also accounts for their qualities. Thus, PageRank is assumed to outperform follower number based approaches. However, PageRank requires explicit knowledge of the underlying network structure as a priori. Actually, an accurate characterization of the network structure is almost impossible as it changes and evolves continuously. In addition, network structure is usually meme independent. Thus, it seems inappropriate to rank online memes by utilizing PageRank directly.

To tackle this issue, we construct two social networks based on user behavior information, which is readily accessible and meme dependent. Taking each user as a node in the network, we consider there is an edge starting from user u to user v if u reposts (in Weibo) or comments (in Kos) a message of v. Statistics of the two constructed networks is shown in Table 3 and Table 4 respectively. We notice that network density [64] in Kos dataset is higher (approximately 2 times as that in Weibo). This may be attributed to high level user contribution in meme diffusion.

Table 3.

Network properties for memes in Weibo.

Meme ID	# Nodes	#Edges	Density (E-05)
1	32725	32514	6.072
2	4958	6776	55.141
3	46217	108254	10.236
4	34750	94713	15.687
5	117393	148111	2.149
6	11076	14351	23.398
7	55996	80235	5.117
8	29388	41001	9.495
9	16239	29975	22.735
10	24159	41161	14.105

Average	37290.100	59709.100	16.403

Open in a new tab

Table 4.

Network properties for memes in Kos.

Meme ID	# Nodes	#Edges	Density (E-05)
1	6819	6681	28.740
2	7713	7480	25.150
3	4003	3901	48.701
4	7462	7222	25.943
5	7887	7645	24.583
6	6945	6741	27.955
7	4977	4780	38.602
8	5395	5256	36.122
9	6406	6228	30.358
10	7291	7117	26.780

Average	6489.800	6305.100	31.293

Open in a new tab

Based on the constructed social networks, we then execute the PageRank algorithm. The popularity score of meme m is calculated as the average PageRank value of each user:

Pop_{PageRank}_{m} = \frac{1}{# U_{m}} \sum_{u \in U_{m}} PageRank (u)

(18)

where, U_m represents the collection of users engaged in the diffusion process of meme m, PageRank(u) is the corresponding PageRank value of user u, and the operator ‘#’ measures the volume size of the set next to it. Again, a meme’s popularity score is moderated by an averaging procedure.

Benchmark 3: dynamic information-based approach (Dynamic)

Recent studies suggest that static structural measures alone reveal very little about social influence [65, 66], while more accurate quantification requires characterizing on dynamic processes [67]. Thus, we design to measure user influence by utilizing the dynamic user behavior information. According to Romero et al. [66], the action rates vary widely across users, and a relatively small portion of them play a key role in meme diffusion. This finding prioritizes the necessity of quantifying user activity level for more accurate influence identification. As such, we use the number of repost (in Weibo) and comment (in Kos) behaviors to measure user activity level⁷. The popularity score of meme m is then formulated as:

Pop_Behavior N r_{m} = \frac{1}{# U_{m}} \sum_{u \in U_{m}} # Behavior (u)

(19)

where, U_m represents the collection of users engaged in the diffusion of meme m, Behavior(u) is the set of a given type of behavior of user u, and the operator ‘#’ measures the volume size of the set next to it.

Apart from user activity, some researchers also employ user passivity [66] (or susceptibility [68] from the opposite point of view) to measure the resistance in influencing others. Aral and Walker [68] suggest that influential users, usually active in information diffusion, are less susceptible to influence. However, user passivity should not be considered as a directly opposite perspective to user activity, as they are totally different metrics for depicting the same user. Further research is needed to determine whether influence based on user passivity will rank memes differently from that based on user activity. This issue is beyond the current research scope and will be considered in our future work.

Benchmark 4: diffusion-based approach (Diffusion)

Following Bonchi et al. [27] and Goyal et al. [69], we also quantify influence /nf / u(u, v) exerted by user u on user v based on the probability that each post generated by u will be further consumed by v.

Influ (u, v) = \frac{| {p \in m | \exists t \in T : c (v, u, p, t)} |}{| {p \in m | \exists t \in T : g (u, p, t)} |}

(20)

where, g(u, p, t) represents that user u generates a message p at timestamp t, while c(v, u, p, t) indicates that user v further consumes (repost in Weibo and comment in Kos) message p generated by user u at timestamp t.

Then, the popularity score of meme m is given by:

Pop_{Influ}_{m} = \frac{1}{# U_{m}} \sum_{u \in U_{m}} Influ (u)

(21)

where, U_m represents the collection of users engaged in the diffusion of meme m, Influ(u) is the total influence of user u wielded on others, and the operator ‘#’ measures the volume size of the set next to it.

Traits of the above benchmark approaches are summarized in Table 5.

Table 5.

Traits of the benchmark approaches.

Benchmark Approach	Information Type	Advantages	Disadvantage
Follower_Num	static information (follower number)	computational simplicity	not very accurate for dynamic data
PageRank	static information (network structure)	accounting for the number of links and their qualities	requiring explicit knowledge of the underlying network
Dynamic	dynamic information (user behaviors)	revealing social dynamics	incapable of characterizing social interactions among users
Diffusion	dynamic information (diffusion potential)	requiring no explicit causal knowledge about user interactions	heuristic and cannot capture nonlinear relationships

Open in a new tab

4.4 Results

4.4.1 Issue 1: ranking performance

In designing the meme ranking task, we attempt to evaluate whether the proposed ranking scheme can predict meme popularity (as indicated by the original website) merely based on partial data available. This task is meaningful as it informs the possibility of either ranking memes with limited data samples at a specific timestamp, or predicting future meme popularity based on historical data samples. In this paper, we mainly focus on the former situation and experimental results are shown in Table 6 and Table 7.

Table 6.

Performance of different ranking approaches on Weibo.

Meme ID	Follower_Num	PageRank (E-6)	Dynamic	Diffusion	MF
1	100.325	30.492	2.128	0.393	3.701
2	154.308	201.803	2.814	0.256	3.002
3	102.148	216.381	2.140	0.136	2.736
4	122.958	28.868	2.146	0.129	2.500
5	110.127	8.433	2.210	0.263	1.956
6	139.055	90.148	2.119	0.244	1.851
7	120.585	17.688	2.263	0.168	1.116
8	77.262	33.883	4.242	0.193	3.395
9	122.821	61.771	2.352	0.099	2.813
10	85.099	41.573	2.638	0.169	2.473

Kendall-tau_Sim	0.578	0.533	0.289	0.689	0.689

Edit_Sim	0.200	0.100	0.100	0.200	0.400

Open in a new tab

Note: Meme ID is consistent with meme position in gold standard; ‘MF’ indicates the experiment group of our model free ranking scheme. ‘Kendall-tau_Sim’ and ‘Edit_Sim’ correspond to evaluate criteria based on Kendall tau Distance and Edit Distance respectively.

Table 7.

Performance of different ranking approaches on Kos.

Meme ID	Follower_Num	PageRank (E-6)	Dynamic	Diffusion	MF
1	31.543	146.904	39.372	0.080	148.582
2	30.378	129.571	16.655	0.067	12.622
3	48.671	249.691	47.368	0.068	23.089
4	34.453	133.842	13.989	0.073	20.162
5	29.653	126.569	13.681	0.079	5.558
6	32.749	144.133	15.647	0.068	10.449
7	42.962	200.624	9.132	0.125	12.495
8	41.082	185.536	20.18	0.045	6.324
9	37.226	156.337	16.861	0.071	8.129
10	33.785	137.092	22.72	0.052	2.186

Kendall-tau_Sim	0.578	0.467	0.533	0.578	0.822

Edit_Sim	0.1	0.1	0.1	0.2	0.5

Open in a new tab

Primarily, we notice two contradictions to our empirical expectation. First, PageRank group fails to outperform Follower_Num group. Since it is structure dependent, we contemplate this result may be attributed to the reconstructed social network, which is highly biased due to limited data samples. The superiority of Follower_Num over PageRank implies that structural information is more reliable in meme ranking with limited data samples.

Second, Dynamic group fails to perform better than the Follower_Num group in both datasets. This result is inconsistent with previous work that assumes dynamic information is more reliable than structural information in depicting information dynamics [63]. In turn, it indicates that meme popularity, at least in the current situation, is reflected more by the long-term status of a user’s social network geometry [70], while less by their short-term communication relationship. This superficially uncanny fact can be understood by delving into the underlying interaction network. An individual's social network mainly constitutes of two parts, i.e. explicit following relationships and implicit communication avenues [71]. Explicit social relation reflects one's long-term status in a community, and implicit communication avenues reflects his short-term prestige, which is highly event dependent (e.g. spreading a specific meme). In the situation where data samples are incomplete, dynamic information turns to be unreliable while explicit social networks become more predictable for social influence, and ultimately for meme popularity.

Apart from the above contradictions, we also notice that the proposed scheme outperforms the four benchmark approaches under both evaluation criteria. Specially, we notice that most of the benchmark approaches do not consider the meme "Yong lovers" (ID=1) as the most popular meme in Weibo dataset (Table 1). This meme corresponds to an untimely death of a lovelorn teenager. Though the absolute number of messages generated in diffusing this meme is not the highest, it does exert intense influence among online users, where fierce discussions develop. While the benchmark approaches merely rely on network structure (constructed either statically or dynamically) for ranking, our proposed scheme quantifies influence among users in a model-free manner with high accuracy. This capability enables our scheme to captures complex nonlinear social interactions through both explicit social networks and implicit communication avenues. Similar situation occurs in the meme "Margaret Thatcher Dies" (ID=8) in Weibo dataset (Table 1). Though this meme receives a low ranking position from both the official website and the four benchmark approaches, it is found to has been gaining momentum in the following several days when other memes already lose popularity. Our ranking scheme 'predicts' this meme's future popularity, while other approaches only rank meme temporarily without any forecast. Similar findings are also found in Kos dataset, thus further validating our conclusions.

Finally, there is one point worth mentioning for the evaluation criteria. Though Kendall-tau_Sim generally gives higher scores (p<0.01 according to a two-tailed paired t-test), Edit_Sim is more sensitive to meme positioning in ranking. This fact is indicated by the higher relative error between the maximal and the minimal ranking scores given by Edit_Sim (about 2 times higher than that given by Kendall-tau_Sim). This disparity enables us to investigate meme ranking task from different perspectives. The consistency in ranking results under these two criteria strengthens the soundness of previous conclusions derived.

4.4.2 Issue 2: ranking robustness

As sampling manipulation has been adopted in randomized trials, we next explore whether our scheme is robust with respect to the sampling rate. By comparing ranking performance at different sampling rates, we can validate the possibility of predicting meme popularity with only partial data available. In this subsection, we run several simulations with distinct sampling rates, and record the corresponding performance (Table 8 and Table 9). The maximum sample rate in this experiment is set to 20%. We believe this is the upper allowable bound for sampling as larger values may exacerbate the representativeness of the sampled dataset.

Table 8.

Meme ranking performance with different sampling rates for Weibo.

Meme ID	SR=1%	SR=5%	SR=10%	SR=15%	SR=20%
1	3.613	3.701	2.943	3.786	2.654
2	3.310	3.002	2.796	3.38	2.466
3	2.517	2.736	2.813	3.224	2.509
4	2.466	2.500	2.515	2.961	2.218
5	1.751	1.956	2.511	2.623	2.267
6	1.316	1.851	2.192	2.630	2.192
7	1.483	1.116	2.105	3.537	2.186
8	4.191	3.395	2.733	3.504	2.609
9	3.180	2.813	2.671	2.297	2.011
10	2.397	2.473	2.578	2.239	2.384

Kendall-tau_Sim	0.644	0.689	0.711	0.756	0.711

Edit_Sim	0.3	0.4	0.3	0.5	0.4

Open in a new tab

Note: Meme ID is consistent with meme position in gold standard; The popularity score of each meme is calculated according to (1); ‘SR’ represents ‘sampling rate’; ‘Kendall-tau_Sim’ and ‘Edit_Sim’ correspond to evaluate criteria based on Kendall tau Distance and Edit Distance respectively.

Table 9.

Meme ranking performance with different sampling rates for Kos.

Meme ID	SR=1%	SR=5%	SR=10%	SR=15%	SR=20%
1	117.813	148.582	91.534	106.756	79.638
2	14.754	12.622	24.231	18.521	21.879
3	26.230	23.089	17.264	9.524	34.477
4	5.682	20.162	15.876	13.207	18.934
5	14.948	5.558	19.906	6.733	21.171
6	5.796	10.449	13.147	12.738	9.222
7	3.156	12.495	7.850	15.616	13.180
8	12.131	6.324	9.387	5.621	16.815
9	6.742	8.129	4.295	7.529	3.631
10	3.667	2.186	8.432	5.440	3.130

Kendall-tau_Sim	0.755	0.822	0.888	0.8	0.888

Edit_Sim	0.3	0.5	0.4	0.4	0.5

Open in a new tab

Results from Table 8 and Table 9 suggest that the ranking results are insensitive to different sampling rates. In Weibo dataset, the average scores of Kendall-tau_Sim and Edit_Sim are 0.702 (± 0.041) and 0.380 (± 0.084) respectively. In Kos dataset, the standard deviation is only slightly larger, with average Edit_Sim 0.420 (± 0.083) and Kendall-tau_Sim 0.830 (± 0.057). These results prove the robustness of our ranking scheme by insensitive to changes in the underlying network structure. This trait distinguishes our scheme from benchmark approaches that are structure-dependent. Also, the results validate the possibility of ranking memes reliably with only finite data samples.

4.4.3 Issue 3: popularity factors

Popular memes are assumed to possess certain competitive advantages. In this subsection, we attempt to explore two factors contributing to meme popularity.

Bakshyet et al. [37] allege that information diffusion is driven by influential users. Thus, we first study how such users contribute to a meme’s popularity. Specifically, we are interested in the portion of users (α_m) whose influence as a whole exceeds a given threshold θ of the total influence, which is defined as:

α_{m} = \frac{N | min_{N} ((\sum_{i = 1, Influ (u_{i}) \geq Influ (u_{i + 1})}^{N} Influ (u_{i})) \geq θ \sum_{u \in U_{m}} Influ (u))}{# U_{m}}

(22)

where,U_m represents the collection of users engaged in the diffusion of meme m, Influ(u) is the total influence of user u wielded on others, and the operator ‘#’ measures the volume size of the set next to it.

In the results presented here, we use a value of θ= 70%. We have also experimented with other values of θ and observed similar qualitative results.

Besides, we notice that there are users whose total influence is zero. Then, we intend to examine how the portion of such users (β_m) affects meme popularity:

β_{m} = \frac{N | max_{N} ((\sum_{i = 1, Influ (u_{i}) \leq Influ (u_{i + 1})}^{N} Influ (u_{i})) \leq 0)}{# U_{m}}

(23)

where, U_m represents the collection of users engaged in diffusing of meme m, Influ(u) is the total influence of user u wielded on others, and the operator ‘#’ measures the volume size of the set next to it.

In following experiments, we analyze the respective correlation of α_m and β_m between meme popularity. Experimental results are shown in Table 10 and Table 11.

Table 10.

Statistical results for Weibo.

Meme ID	MF	α_m (%)	β_m (%)
1	3.701	27.184	1.942
2	3.002	27.692	6.154
3	2.736	4.108	2.528
4	2.500	44.204	2.180
5	1.956	9.614	6.801
6	1.851	2.666	8.984
7	1.116	17.038	7.268
8	3.395	45.127	0.479
9	2.813	54.946	0.860
10	2.473	42.804	0.747

Open in a new tab

Note: the second column represents results given by our scheme (sampling rate=5%).

Table 11.

Statistical results for Kos.

Meme ID	MF	α_m (%)	β_m (%)
1	148.582	47.761	0.021
2	12.622	40.247	0.087
3	23.089	40.702	0.409
4	20.162	41.953	0.373
5	5.558	32.919	0.149
6	10.449	42.699	0.262
7	12.495	42.447	0.848
8	6.324	40.368	0.549
9	8.129	44.726	6.04
10	2.186	40.491	2.578

Open in a new tab

Note: the second column represents results given by our scheme (sampling rate=5%).

It turns out that memes with higher popularity scores tend to associate with larger α_m. The Pearson’s Correlation Coefficient between them is 0.453 in Weibo and 0.609 in Kos⁸. This means the popularity of a meme is achieved by involving a wide scale of users with moderate influence. Put another way, just a small amount of high influential users is insufficient to guarantee a meme’s prevalence. Taking meme 6 and meme 8 in Weibo dataset as an example⁹, meme 6 (Fig. 2-a) comprises a portion of high influential users, with the highest influence exceeding 20. However, this portion is relatively small (α_m =2.666%), and there is a sharp drop in user influence for the remainder of the distribution. On the other hand, the influence distribution of meme 8 (Fig. 2-b) is much smoother (α_m =45.127), with few users possessing prohibitively high influence. This difference makes meme 8 more popular among the public. Thus, users are more likely to be exposed to and influenced by it, and are more willing to spread it. This finding (finding 1) coincides with previous postulation that information diffusion can also be realized by moderate or less influential users; indeed, sometimes they even do a better job [37].

Fig. 2 — Influence distribution of users over different memes in Weibo.

Note: User influence is listed in descending order, and the left side of blue line corresponds to 70% of the total influence.

By comparing columns 2 and 4 in Table 10, we observe that popular memes are likely to contain fewer zero-influence users (finding 2). The Pearson’s Correlation Coefficient between them is −0.687, which suggests a relatively strong negative relationship. This phenomenon can be explained partly by the concept of re-diffusion intention [72] in marketing. Under its theoretical framework, if a meme is really popular, then the re-diffusion intention of users is high, and other users are more likely to be exposed to this meme. Hence, the probability that users release no influence by spreading the meme is low.

Likewise, we find a similar phenomenon in Table 11, but the Pearson’s Correlation Coefficient is comparatively low, only −0.254. As memes in Kos generally have much higher influence level than that in Weibo, the existence of relative small portion of zero-influence users might not affect meme popularity much. Thus, the correlation between meme popularity and ratio of zero-influence users is low.

The above findings are obtained because of our scheme’s fine-grained modeling of user dynamics. These results may not be found if we only utilize the existing approaches. As the scheme makes no domain-specific assumptions, our work can be readily generalized to analyze other new types of memes, such as innovation [15], rumor [19], and viral marketing [14, 28].

5. Conclusions and future work

In this paper, we proposed a novel model-free scheme for meme ranking. Empirical studies on two large-scale real world datasets validate its efficiency and robustness. This scheme can provide us significant insights into understanding the meme popularity due to its fine-grained modeling of user dynamics. By analyzing two key factors regarding the user influence, we uncover two significant findings: (1) the meme popularity is achieved by a wide scale of users on its diffusion trace, while just a small amount of high influential users is insufficient; (2) more popular memes are prone to contain less zero-influence users.

In our future work, we intend to investigate whether other types of dynamic information also contribute to meme ranking task, such as user passivity and adoption rate. As only user behaviors are considered in quantifying user dynamics, we wonder whether the patterns of user behaviors bear some relationship with meme ranking results. Further, we will investigate whether involving contextual information and the behavioral and cognitive factors of online users would predict the meme popularity more accurately.

Highlights.

We develop a model-free scheme for meme ranking task.
We have evaluated the proposed scheme on two large-scale real-world datasets.
Our work presents fine-grained modeling of dynamic information and can be readily generalized to analyze other new types of memes.

Acknowledgement

We would thank for each member of SMILES group in Institute of Automation, Chinese Academy of Sciences. Especially, we would thank for Kainan Cui, Zhu Zhang, and Chuan Luo for useful discussions. We also appreciate the kind help from Yufang Wu and Kainan Cui for data collection.

Biographies

graphic file with name nihms729681b1.gif

Saike He received the B.S. degree in automation and M.S. degree in computer science in 2007 and 2010, respectively, from Beijing University of Posts and Telecommunications, Beijing, China. He is currently pursuing the Ph.D. degree in computer science at Institute of Automation, Chinese Academy of Sciences.

His research interest includes sentiment analysis, behavior modeling, information diffusion, and synchronization in complex networks.

graphic file with name nihms729681b2.gif

Xiaolong Zheng is currently an Associate Professor of Institute of Automation, Chinese Academy of Sciences. He got Ph.D. from Institute of Automation, Chinese Academy of Sciences in 2009, M.S. from Beijing Jiaotong University in 2006, and B.S. from China Jiliang University in 2003. Xiaolong Zheng has published over 80 peer-to-peer academic papers in journals/magazines and conferences, including Journal of Medical Internet Research, Scientific Reports, Plos One, IEEE Intelligent Systems and Physica A, and 6 academic books. Xiaolong Zheng has served as the Program Co-chair of International conference of Smart Health (ICSH2014), Pacific Asia Workshop on Intelligence and Security Informatics 2013(PAISI2013), and Pacific Asia Workshop on Intelligence and Security Informatics 2011(PAISI2011). He is also the Academic Secretary of ACM Social and Economic Computing Chapter from 2012–2013, the Reviewer of IEEE Transactions on Systems, Man, and Cybernetics: Systems, Information & Management, Knowledge-based Systems, IEEE Intelligent Systems, Journal of Combinatorial Optimization and ACM Transactions on Management Information Systems, and the Technical Program Committee Member of over 10 international conferences.

graphic file with name nihms729681b3.gif

Daniel Zeng (M’04–SM’07) received the Ph.D. degree in industrial administration from Carnegie Mellon University in 1998.

He is a Research Professor at the Institute of Automation, Chinese Academy of Sciences, Beijing, China, and is also affiliated with the University of Arizona, Tucson, AZ, USA. His current research interests include software agents and multi-agent systems, intelligence and security informatics, social computing, and recommendation systems.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://www.weibo.com

http://www.dailykos.com/

http://en.wikipedia.org/wiki/Sina_Weibo. Accessed on Apr. 1st, 2014.

⁴

In 2009, Time magazine readers nominated this site as the second best blog platform.

⁵

In randomized trials, we did not conduct randomized selection on the threads for each meme, for threads ranked higher tend to have more reposts and engaged users. Randomized selection on meme threads may undermine information abundance for tracking memes and measuring their popularity.

⁶

In our datasets, this setting for bin widths fixes the finest temporal resolution for recent records of user behaviors and coarser for historical ones.

⁷

Within each meme, it is possible for a user to repost or comment multiple times as there are many different variations of one original message.

⁸

Considering the limited data samples we used, these correlation values are relatively high.

⁹

We also experimented with Kos data, and observed similar results.

Contributor Information

Saike He, Email: saike.he@ia.ac.cn.

Xiaolong Zheng, Email: xiaolong.zheng@ia.ac.cn.

Daniel Zeng, Email: dajun.zeng@ia.ac.cn.

References

1.Dawkin R. The selfish gene. New York City: Oxford University Press; 1976. [Google Scholar]
2.Weng L, Flammini A, Vespignani A, Menczer F. Competition among memes in a world with limited attention. Scientific Reports. 2012;2 doi: 10.1038/srep00335. 03/29/online. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Crane R, Sornette D. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences. 2008;105(41):15649–15653. doi: 10.1073/pnas.0803685105. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Leskovec J, Backstrom L, Kleinberg J. Meme-tracking and the dynamics of the news cycle. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining; Paris, France. 2009. pp. 497–506. [Google Scholar]
5.Lerman K, Ghosh R. Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks. ICWSM. 2010;10:90–97. [Google Scholar]
6.De Choudhury M, Monroy-Hernandez A, Mark G. Narco emotions: affect and desensitization in social media during the mexican drug war. :3563–3572. [Google Scholar]
7.Qu Y, Huang C, Zhang P, Zhang J. Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake. :25–34. [Google Scholar]
8.Salathé M, Vu DQ, Khandelwal S, Hunter DR. The dynamics of health behavior sentiments on a large online social network. EPJ Data Science. 2013;2(1):1–12. [Google Scholar]
9.Liang Y, Zhou X, Zeng DD, Guo B, Zheng X, Yu Z. An integrated approach of sensing tobacco-oriented activities in online participatory media. 2014 [Google Scholar]
10.Choy M, Cheong M, Laik MN, Shung KP. Us presidential election 2012 prediction using census corrected twitter model. arXiv preprint arXiv:1211.0938. 2012 [Google Scholar]
11.Pasek J, Kenski K, Romer D, Jamieson KH. America's Youth and Community Engagement How Use of Mass Media Is Related to Civic Activity and Political Awareness in 14-to 22-Year-Olds. Communication Research. 2006;33(3):115–135. [Google Scholar]
12.Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM. 2010;10:178–185. [Google Scholar]
13.Bauckhage C. Insights into Internet Memes. ICWSM. 2011 [Google Scholar]
14.Richardson M, Domingos P. Mining knowledge-sharing sites for viral marketing; Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining; 2002. pp. 61–70. [Google Scholar]
15.Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network; Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining; 2003. pp. 137–146. [Google Scholar]
16.Bikhchandani S, Hirshleifer D, Welch I. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political Economy. 1992:992–1026. [Google Scholar]
17.Budak C, Agrawal D, Abbadi AE. Limiting the spread of misinformation in social networks. Proceedings of the 20th international conference on World wide web; Hyderabad, India. 2011. pp. 665–674. [Google Scholar]
18.Myers SA, Leskovec J. On the convexity of latent social network inference. arXiv preprint arXiv:1010.5504. 2010 [Google Scholar]
19.Shah D, Zaman T. Rumors in a Network: Who's the Culprit? IEEE Transactions on Information Theory. 2011;57(8):5163–5181. [Google Scholar]
20.Bailey NT. The mathematical theory of infectious diseases and its applications; Charles Griffin & Company Ltd; 5a Crendon Street, High Wycombe, Bucks HP13 6LE. 1975. [Google Scholar]
21.Anderson RM, May RM, Anderson B. Infectious diseases of humans: dynamics and control. Wiley Online Library; 1992. [Google Scholar]
22.Kubo M, Naruse K, Sato H, Matubara T. Intelligent Data Engineering and Automated Learning-IDEAL 2007. Springer; 2007. The possibility of an epidemic meme analogy for web community population analysis; pp. 1073–1080. [Google Scholar]
23.Gruhl D, Guha R, Liben-Nowell D, Tomkins A. Information diffusion through blogspace; Proceedings of the 13th international conference on World Wide Web; 2004. pp. 491–501. [Google Scholar]
24.Li Y-M, Shiu Y-L. A diffusion mechanism for social advertising over microblogs. Decision Support Systems. 2012;54(1):9–22. [Google Scholar]
25.Shifman L, Thelwall M. Assessing global diffusion with Web memetics: The spread and evolution of a popular joke. Journal of the American society for information science and technology. 2009;60(12):2567–2576. [Google Scholar]
26.Nguyen QH, Ong YS, Lim MH. Non-genetic transmission of memes by diffusion; Proceedings of the 10th annual conference on Genetic and evolutionary computation; 2008. pp. 1017–1024. [Google Scholar]
27.Bonchi F, Castillo C, Ienco D. Meme ranking to maximize posts virality in microblogging platforms. [2013/04/01];Journal of Intelligent Information Systems. 2013 40(2):211–239. [Google Scholar]
28.Goldenberg J, Libai B, Muller E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters. 2001;12(3):211–223. [Google Scholar]
29.Cheung CM, Lee MK. What drives consumers to spread electronic word of mouth in online consumer-opinion platforms. Decision Support Systems. 2012;53(1):218–225. [Google Scholar]
30.Gloor PA. Coolhunting for trends on the web; Collaborative Technologies and Systems, 2007. CTS 2007. International Symposium on; 2007. pp. 1–8. [Google Scholar]
31.Adar E, Zhang L, Adamic LA, Lukose RM. Implicit structure and the dynamics of blogspace. Workshop on the weblogging ecosystem. 2004 [Google Scholar]
32.Gordevicius J, Estrada FJ, Lee HC, Andritsos P, Gamper J. Ranking of evolving stories through meta-aggregation; Proceedings of the 19th ACM international conference on Information and knowledge management; 2010. pp. 1909–1912. [Google Scholar]
33.Allen LJ. Some discrete-time SI, SIR, and SIS epidemic models. Mathematical biosciences. 1994;124(1):83–105. doi: 10.1016/0025-5564(94)90025-6. [DOI] [PubMed] [Google Scholar]
34.Xuetao W, Valler NC, Prakash BA, Neamtiu I, Faloutsos M, Faloutsos C. Competing Memes Propagation on Networks: A Network Science Perspective. IEEE Journal on Selected Areas in Communications. 2013;31(6):1049–1060. [Google Scholar]
35.Ienco D, Bonchi F, Castillo C. The Meme Ranking Problem: Maximizing Microblogging Virality; 2010 IEEE International Conference on Data Mining Workshops; 2010. pp. 328–335. [Google Scholar]
36.Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media?; Proceedings of the 19th international conference on World wide web; 2010. pp. 591–600. [Google Scholar]
37.Bakshy E, Hofman JM, Mason WA, Watts DJ. Everyone's an influencer: quantifying influence on twitter; Proceedings of the fourth ACM international conference on Web search and data mining; 2011. pp. 65–74. [Google Scholar]
38.Myers SA, Zhu C, Leskovec J. Information diffusion and external influence in networks. :33–41. [Google Scholar]
39.Van den Bulte C, Lilien GL. Medical Innovation Revisited: Social Contagion versus Marketing Effort1. American Journal of Sociology. 2001;106(5):1409–1435. [Google Scholar]
40.Manski CF. Identification of endogenous social effects: The reflection problem. The review of economic studies. 1993;60(3):531–542. [Google Scholar]
41.Saito Y, Harashima H. Tracking of Information within Multichannel {EEG} record Causal analysis in {EEG} In: Yamaguchi N, Fujisawa K, editors. Recent advances in {EEG} and {EMG} data processing. Elsevier; 1981. pp. 133–146. [Google Scholar]
42.Latora V, Baranger M. Kolmogorov-Sinai entropy rate versus physical entropy. Physical Review Letters. 1999;82(3):520. [Google Scholar]
43.Kaiser A, Schreiber T. Information transfer in continuous processes. Physica D: Nonlinear Phenomena. 2002;166(1):43–62. [Google Scholar]
44.Hlaváčková-Schindler K, Paluš M, Vejmelka M, Bhattacharya J. Causality detection based on information-theoretic approaches in time series analysis. Physics Reports. 2007;441(1):1–46. [Google Scholar]
45.Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Physical review E. 2004;69(6):066138. doi: 10.1103/PhysRevE.69.066138. [DOI] [PubMed] [Google Scholar]
46.Victor JD. Approaches to information-theoretic analysis of neural activity. Biological theory. 2006;1(3):302. doi: 10.1162/biot.2006.1.3.302. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Hildebrand FB. Introduction to numerical analysis. Courier Corporation; 1987. [Google Scholar]
48.He S, Zheng X, Zeng D, Cui K, Zhang Z, Luo C. Intelligence and Security Informatics. Springer; 2013. Identifying Peer Influence in Online Social Networks Using Transfer Entropy; pp. 47–61. [Google Scholar]
49.Aral S, Walker D. Creating social contagion through viral product design: A randomized trial of peer influence in networks. Management Science. 2011;57(9):1623–1639. [Google Scholar]
50.Xiang Z, Gretzel U. Role of social media in online travel information search. Tourism management. 2010;31(2):179–188. [Google Scholar]
51.Cheng X, Dale C, Liu J. Statistics and social network of youtube videos; Quality of Service, 2008. IWQoS 2008. 16th International Workshop on; 2008. pp. 229–238. [Google Scholar]
52.Hogg T, Lerman K, Smith LM. Using Stochastic Models to Predict User Response in Social Media; Social Computing (SocialCom), 2013 International Conference on; 2013. pp. 63–68. [Google Scholar]
53.Atallah MJ. Algorithms and theory of computation handbook. CRC press; 2002. [Google Scholar]
54.Hubert LJ, Golledge RG. Measuring association between spatially defined variables: Tjøstheim's index and some extensions. Geographical Analysis. 1982;14(3):273–278. [Google Scholar]
55.Teevan J, Dumais ST, Horvitz E. Personalizing search via automated analysis of interests and activities; Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval; 2005. pp. 449–456. [Google Scholar]
56.Teevan J, Dumais ST, Horvitz E. Characterizing the value of personalizing search; Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval; 2007. pp. 757–758. [Google Scholar]
57.Fagin R, Kumar R, Sivakumar D. Efficient similarity search and classification via rank aggregation; Proceedings of the 2003 ACM SIGMOD international conference on Management of data; 2003. pp. 301–312. [Google Scholar]
58.Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E. Comparing and aggregating rankings with ties. :47–58. [Google Scholar]
59.Godfrey-Smith P, Martínez M. Communication and common interest. PLoS computational biology. 2013;9(11):e1003282. doi: 10.1371/journal.pcbi.1003282. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Barg A, Mazumdar A. Codes in permutations and error correction for rank modulation. Information Theory, IEEE Transactions on. 2010;56(7):3158–3165. [Google Scholar]
61.Goldenberg J, Han S, Lehmann DR, Hong JW. The role of hubs in the adoption process. Journal of Marketing. 2009;73(2):1–13. [Google Scholar]
62.Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. 1999 [Google Scholar]
63.Cha M, Haddadi H, Benevenuto F, Gummadi KP. Measuring user influence in twitter: The million follower fallacy; 4th international aaai conference on weblogs and social media (icwsm); 2010. p. 8. [Google Scholar]
64.Newman M, Barabasi A-L, Watts DJ. The structure and dynamics of networks. Princeton University Press; 2006. [Google Scholar]
65.Cha M, Haddadi H, Benevenuto F, Gummadi PK. Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM. 2010;10(10–17):30. [Google Scholar]
66.Romero D, Galuba W, Asur S, Huberman B. Influence and passivity in social media. Machine Learning and Knowledge Discovery in Databases. 2011:18–33. [Google Scholar]
67.Ghosh R, Lerman K. Predicting influential users in online social networks. arXiv preprint arXiv:1005.4882. 2010 [Google Scholar]
68.Aral S, Walker D. Identifying influential and susceptible members of social networks. Science. 2012;337(6092):337–341. doi: 10.1126/science.1215842. [DOI] [PubMed] [Google Scholar]
69.Goyal A, Bonchi F, Lakshmanan LV. Learning influence probabilities in social networks; Proceedings of the third ACM international conference on Web search and data mining; 2010. pp. 241–250. [Google Scholar]
70.Young HP. The diffusion of innovations in social networks. Economy as an Evolving Complex System. Proceedings volume in the Santa Fe Institute studies in the sciences of complexity. 2002;3:267–282. [Google Scholar]
71.Roth M, Ben-David A, Deutscher D, Flysher G, Horn I, Leichtberg A, Leiser N, Matias Y, Merom R. Suggesting friends using the implicit social graph. :233–242. [Google Scholar]
72.Min T. Influence of eWOM Message on Receivers' Re-diffusion Intention [J] Journal of Intelligence. 2012;4:028. [Google Scholar]

[R1] 1.Dawkin R. The selfish gene. New York City: Oxford University Press; 1976. [Google Scholar]

[R2] 2.Weng L, Flammini A, Vespignani A, Menczer F. Competition among memes in a world with limited attention. Scientific Reports. 2012;2 doi: 10.1038/srep00335. 03/29/online. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Crane R, Sornette D. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences. 2008;105(41):15649–15653. doi: 10.1073/pnas.0803685105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Leskovec J, Backstrom L, Kleinberg J. Meme-tracking and the dynamics of the news cycle. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining; Paris, France. 2009. pp. 497–506. [Google Scholar]

[R5] 5.Lerman K, Ghosh R. Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks. ICWSM. 2010;10:90–97. [Google Scholar]

[R6] 6.De Choudhury M, Monroy-Hernandez A, Mark G. Narco emotions: affect and desensitization in social media during the mexican drug war. :3563–3572. [Google Scholar]

[R7] 7.Qu Y, Huang C, Zhang P, Zhang J. Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake. :25–34. [Google Scholar]

[R8] 8.Salathé M, Vu DQ, Khandelwal S, Hunter DR. The dynamics of health behavior sentiments on a large online social network. EPJ Data Science. 2013;2(1):1–12. [Google Scholar]

[R9] 9.Liang Y, Zhou X, Zeng DD, Guo B, Zheng X, Yu Z. An integrated approach of sensing tobacco-oriented activities in online participatory media. 2014 [Google Scholar]

[R10] 10.Choy M, Cheong M, Laik MN, Shung KP. Us presidential election 2012 prediction using census corrected twitter model. arXiv preprint arXiv:1211.0938. 2012 [Google Scholar]

[R11] 11.Pasek J, Kenski K, Romer D, Jamieson KH. America's Youth and Community Engagement How Use of Mass Media Is Related to Civic Activity and Political Awareness in 14-to 22-Year-Olds. Communication Research. 2006;33(3):115–135. [Google Scholar]

[R12] 12.Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM. 2010;10:178–185. [Google Scholar]

[R13] 13.Bauckhage C. Insights into Internet Memes. ICWSM. 2011 [Google Scholar]

[R14] 14.Richardson M, Domingos P. Mining knowledge-sharing sites for viral marketing; Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining; 2002. pp. 61–70. [Google Scholar]

[R15] 15.Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network; Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining; 2003. pp. 137–146. [Google Scholar]

[R16] 16.Bikhchandani S, Hirshleifer D, Welch I. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political Economy. 1992:992–1026. [Google Scholar]

[R17] 17.Budak C, Agrawal D, Abbadi AE. Limiting the spread of misinformation in social networks. Proceedings of the 20th international conference on World wide web; Hyderabad, India. 2011. pp. 665–674. [Google Scholar]

[R18] 18.Myers SA, Leskovec J. On the convexity of latent social network inference. arXiv preprint arXiv:1010.5504. 2010 [Google Scholar]

[R19] 19.Shah D, Zaman T. Rumors in a Network: Who's the Culprit? IEEE Transactions on Information Theory. 2011;57(8):5163–5181. [Google Scholar]

[R20] 20.Bailey NT. The mathematical theory of infectious diseases and its applications; Charles Griffin & Company Ltd; 5a Crendon Street, High Wycombe, Bucks HP13 6LE. 1975. [Google Scholar]

[R21] 21.Anderson RM, May RM, Anderson B. Infectious diseases of humans: dynamics and control. Wiley Online Library; 1992. [Google Scholar]

[R22] 22.Kubo M, Naruse K, Sato H, Matubara T. Intelligent Data Engineering and Automated Learning-IDEAL 2007. Springer; 2007. The possibility of an epidemic meme analogy for web community population analysis; pp. 1073–1080. [Google Scholar]

[R23] 23.Gruhl D, Guha R, Liben-Nowell D, Tomkins A. Information diffusion through blogspace; Proceedings of the 13th international conference on World Wide Web; 2004. pp. 491–501. [Google Scholar]

[R24] 24.Li Y-M, Shiu Y-L. A diffusion mechanism for social advertising over microblogs. Decision Support Systems. 2012;54(1):9–22. [Google Scholar]

[R25] 25.Shifman L, Thelwall M. Assessing global diffusion with Web memetics: The spread and evolution of a popular joke. Journal of the American society for information science and technology. 2009;60(12):2567–2576. [Google Scholar]

[R26] 26.Nguyen QH, Ong YS, Lim MH. Non-genetic transmission of memes by diffusion; Proceedings of the 10th annual conference on Genetic and evolutionary computation; 2008. pp. 1017–1024. [Google Scholar]

[R27] 27.Bonchi F, Castillo C, Ienco D. Meme ranking to maximize posts virality in microblogging platforms. [2013/04/01];Journal of Intelligent Information Systems. 2013 40(2):211–239. [Google Scholar]

[R28] 28.Goldenberg J, Libai B, Muller E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters. 2001;12(3):211–223. [Google Scholar]

[R29] 29.Cheung CM, Lee MK. What drives consumers to spread electronic word of mouth in online consumer-opinion platforms. Decision Support Systems. 2012;53(1):218–225. [Google Scholar]

[R30] 30.Gloor PA. Coolhunting for trends on the web; Collaborative Technologies and Systems, 2007. CTS 2007. International Symposium on; 2007. pp. 1–8. [Google Scholar]

[R31] 31.Adar E, Zhang L, Adamic LA, Lukose RM. Implicit structure and the dynamics of blogspace. Workshop on the weblogging ecosystem. 2004 [Google Scholar]

[R32] 32.Gordevicius J, Estrada FJ, Lee HC, Andritsos P, Gamper J. Ranking of evolving stories through meta-aggregation; Proceedings of the 19th ACM international conference on Information and knowledge management; 2010. pp. 1909–1912. [Google Scholar]

[R33] 33.Allen LJ. Some discrete-time SI, SIR, and SIS epidemic models. Mathematical biosciences. 1994;124(1):83–105. doi: 10.1016/0025-5564(94)90025-6. [DOI] [PubMed] [Google Scholar]

[R34] 34.Xuetao W, Valler NC, Prakash BA, Neamtiu I, Faloutsos M, Faloutsos C. Competing Memes Propagation on Networks: A Network Science Perspective. IEEE Journal on Selected Areas in Communications. 2013;31(6):1049–1060. [Google Scholar]

[R35] 35.Ienco D, Bonchi F, Castillo C. The Meme Ranking Problem: Maximizing Microblogging Virality; 2010 IEEE International Conference on Data Mining Workshops; 2010. pp. 328–335. [Google Scholar]

[R36] 36.Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media?; Proceedings of the 19th international conference on World wide web; 2010. pp. 591–600. [Google Scholar]

[R37] 37.Bakshy E, Hofman JM, Mason WA, Watts DJ. Everyone's an influencer: quantifying influence on twitter; Proceedings of the fourth ACM international conference on Web search and data mining; 2011. pp. 65–74. [Google Scholar]

[R38] 38.Myers SA, Zhu C, Leskovec J. Information diffusion and external influence in networks. :33–41. [Google Scholar]

[R39] 39.Van den Bulte C, Lilien GL. Medical Innovation Revisited: Social Contagion versus Marketing Effort1. American Journal of Sociology. 2001;106(5):1409–1435. [Google Scholar]

[R40] 40.Manski CF. Identification of endogenous social effects: The reflection problem. The review of economic studies. 1993;60(3):531–542. [Google Scholar]

[R41] 41.Saito Y, Harashima H. Tracking of Information within Multichannel {EEG} record Causal analysis in {EEG} In: Yamaguchi N, Fujisawa K, editors. Recent advances in {EEG} and {EMG} data processing. Elsevier; 1981. pp. 133–146. [Google Scholar]

[R42] 42.Latora V, Baranger M. Kolmogorov-Sinai entropy rate versus physical entropy. Physical Review Letters. 1999;82(3):520. [Google Scholar]

[R43] 43.Kaiser A, Schreiber T. Information transfer in continuous processes. Physica D: Nonlinear Phenomena. 2002;166(1):43–62. [Google Scholar]

[R44] 44.Hlaváčková-Schindler K, Paluš M, Vejmelka M, Bhattacharya J. Causality detection based on information-theoretic approaches in time series analysis. Physics Reports. 2007;441(1):1–46. [Google Scholar]

[R45] 45.Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Physical review E. 2004;69(6):066138. doi: 10.1103/PhysRevE.69.066138. [DOI] [PubMed] [Google Scholar]

[R46] 46.Victor JD. Approaches to information-theoretic analysis of neural activity. Biological theory. 2006;1(3):302. doi: 10.1162/biot.2006.1.3.302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Hildebrand FB. Introduction to numerical analysis. Courier Corporation; 1987. [Google Scholar]

[R48] 48.He S, Zheng X, Zeng D, Cui K, Zhang Z, Luo C. Intelligence and Security Informatics. Springer; 2013. Identifying Peer Influence in Online Social Networks Using Transfer Entropy; pp. 47–61. [Google Scholar]

[R49] 49.Aral S, Walker D. Creating social contagion through viral product design: A randomized trial of peer influence in networks. Management Science. 2011;57(9):1623–1639. [Google Scholar]

[R50] 50.Xiang Z, Gretzel U. Role of social media in online travel information search. Tourism management. 2010;31(2):179–188. [Google Scholar]

[R51] 51.Cheng X, Dale C, Liu J. Statistics and social network of youtube videos; Quality of Service, 2008. IWQoS 2008. 16th International Workshop on; 2008. pp. 229–238. [Google Scholar]

[R52] 52.Hogg T, Lerman K, Smith LM. Using Stochastic Models to Predict User Response in Social Media; Social Computing (SocialCom), 2013 International Conference on; 2013. pp. 63–68. [Google Scholar]

[R53] 53.Atallah MJ. Algorithms and theory of computation handbook. CRC press; 2002. [Google Scholar]

[R54] 54.Hubert LJ, Golledge RG. Measuring association between spatially defined variables: Tjøstheim's index and some extensions. Geographical Analysis. 1982;14(3):273–278. [Google Scholar]

[R55] 55.Teevan J, Dumais ST, Horvitz E. Personalizing search via automated analysis of interests and activities; Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval; 2005. pp. 449–456. [Google Scholar]

[R56] 56.Teevan J, Dumais ST, Horvitz E. Characterizing the value of personalizing search; Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval; 2007. pp. 757–758. [Google Scholar]

[R57] 57.Fagin R, Kumar R, Sivakumar D. Efficient similarity search and classification via rank aggregation; Proceedings of the 2003 ACM SIGMOD international conference on Management of data; 2003. pp. 301–312. [Google Scholar]

[R58] 58.Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E. Comparing and aggregating rankings with ties. :47–58. [Google Scholar]

[R59] 59.Godfrey-Smith P, Martínez M. Communication and common interest. PLoS computational biology. 2013;9(11):e1003282. doi: 10.1371/journal.pcbi.1003282. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] 60.Barg A, Mazumdar A. Codes in permutations and error correction for rank modulation. Information Theory, IEEE Transactions on. 2010;56(7):3158–3165. [Google Scholar]

[R61] 61.Goldenberg J, Han S, Lehmann DR, Hong JW. The role of hubs in the adoption process. Journal of Marketing. 2009;73(2):1–13. [Google Scholar]

[R62] 62.Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. 1999 [Google Scholar]

[R63] 63.Cha M, Haddadi H, Benevenuto F, Gummadi KP. Measuring user influence in twitter: The million follower fallacy; 4th international aaai conference on weblogs and social media (icwsm); 2010. p. 8. [Google Scholar]

[R64] 64.Newman M, Barabasi A-L, Watts DJ. The structure and dynamics of networks. Princeton University Press; 2006. [Google Scholar]

[R65] 65.Cha M, Haddadi H, Benevenuto F, Gummadi PK. Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM. 2010;10(10–17):30. [Google Scholar]

[R66] 66.Romero D, Galuba W, Asur S, Huberman B. Influence and passivity in social media. Machine Learning and Knowledge Discovery in Databases. 2011:18–33. [Google Scholar]

[R67] 67.Ghosh R, Lerman K. Predicting influential users in online social networks. arXiv preprint arXiv:1005.4882. 2010 [Google Scholar]

[R68] 68.Aral S, Walker D. Identifying influential and susceptible members of social networks. Science. 2012;337(6092):337–341. doi: 10.1126/science.1215842. [DOI] [PubMed] [Google Scholar]

[R69] 69.Goyal A, Bonchi F, Lakshmanan LV. Learning influence probabilities in social networks; Proceedings of the third ACM international conference on Web search and data mining; 2010. pp. 241–250. [Google Scholar]

[R70] 70.Young HP. The diffusion of innovations in social networks. Economy as an Evolving Complex System. Proceedings volume in the Santa Fe Institute studies in the sciences of complexity. 2002;3:267–282. [Google Scholar]

[R71] 71.Roth M, Ben-David A, Deutscher D, Flysher G, Horn I, Leichtberg A, Leiser N, Matias Y, Merom R. Suggesting friends using the implicit social graph. :233–242. [Google Scholar]

[R72] 72.Min T. Influence of eWOM Message on Receivers' Re-diffusion Intention [J] Journal of Intelligence. 2012;4:028. [Google Scholar]

PERMALINK

A Model-Free Scheme for Meme Ranking in Social Media

Saike He

Xiaolong Zheng

Daniel Zeng

Abstract

1. Introduction

2. Literature review

2.1 Meme diffusion

2.1.1 Cascade models

2.1.2 Epidemic models

2.1.3 Competitive models

2.2 Meme ranking

3. The meme ranking scheme

3.1 Scheme construction

3.2 Influence identification

3.2.1 Problem formulation

Fig. 1.

3.2.2 Influence estimation

3.2.3 Bias remediation

4. Experimental results

4.1 Datasets

4.1.1 Sina Weibo

Table 1.

4.1.2 Daily Kos

Table 2.

4.2 Randomized trials

4.3 Experimental design

4.3.1 Parameter setup

4.3.2 Evaluation method

Kendall tau Distance Based Criterion

Edit Distance based Criterion

4.3.3 Benchmark approaches

Benchmark 1: followers-centered approach (Follower_Num)

Benchmark 2: PageRank-based approach (PageRank)

Table 3.

Table 4.

Benchmark 3: dynamic information-based approach (Dynamic)

Benchmark 4: diffusion-based approach (Diffusion)

Table 5.

4.4 Results

4.4.1 Issue 1: ranking performance

Table 6.

Table 7.

4.4.2 Issue 2: ranking robustness

Table 8.

Table 9.

4.4.3 Issue 3: popularity factors

Table 10.

Table 11.

Fig. 2.

5. Conclusions and future work

Highlights.

Acknowledgement

Biographies

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases