Abstract
Thanks to the mass adoption of internet and mobile devices, users of the social media can seamlessly and spontaneously connect with their friends, followers and followees. Consequently, social media networks have gradually become the major venue for broadcasting and relaying information, and is casting great influences on the people in many aspects of their daily lives. Thus locating those influential users in social media has become crucially important for the successes of many viral marketing, cyber security, politics, and safety-related applications. In this study, we address the problem through solving the tiered influence and activation thresholds target set selection problem, which is to find the seed nodes that can influence the most users within a limited time frame. Both the minimum influential seeds and maximum influence within budget problems are considered in this study. Besides, this study proposes several models exploiting different requirements on seed nodes selection, such as maximum activation, early activation and dynamic threshold. These time-indexed integer program models suffer from the computational difficulties due to the large numbers of binary variables to model influence actions at each time epoch. To address this challenge, this paper designs and leverages several efficient algorithms, i.e., Graph Partition, Nodes Selection, Greedy algorithm, recursive threshold back algorithm and two-stage approach in time, especially for large-scale networks. Computational results show that it is beneficial to apply either the breadth first search or depth first search greedy algorithms for the large instances. In addition, algorithms based on node selection methods perform better in the long-tailed networks.
Keywords: Networks, Integer programming, Target set selection, Greedy algorithm, Influence maximization, Social media, Linear threshold model
Disclaimer: comparisons and improvements to the previous conference article
This paper is an extension of an article in the proceedings of the International Conference on Computational Data and Social Networks, CSoNet in November 2021, titled “Target Set Selection in Social Networks with Influence and Activation thresholds (2021).” The previous paper introduces the minimum influence and activation thresholds target set selection model and corresponding computational methods. This paper adds the maximum influence and activation thresholds target set selection with budget model. In addition, we also propose models exploring different requirements of seed nodes selection, which are maximum activation, early activation and dynamic threshold. The addition of the different models extends the scope and offers insights to the application of dual thresholds target set selection in social media. Besides, the addition of DFS search greedy algorithm, recursive threshold back algorithm and two-stage approach in time algorithm allows for better solutions for the proposed target set selection models. The addition of the tired influence and activation thresholds models and corresponding computational algorithms provide a comprehensive discussion for the target set selection in social networks with tiered influence and activation thresholds problem.
Introduction
Nowadays, the use of social media networks has become a necessary daily activity for people to interact with family and friends, access news, information and make decisions. Besides being a handy means for keeping in touch with friends and family, social media is more of a platform spreading the tremendous influence. Users of the social media tend to follow and adopt their friends or followers’ thoughts and behaviors. Thus businesses pay incentives to social influencers using the social media platforms such as Youtube and Instagram to introduce their products and stimulate demands. Politicians begin to communicate their policy views and humanize themselves through their social media accounts. Politician campaigns boost their investment in social media ads to get votes as well. During the time of social isolation due to the COVID-19, people rely on social media for health and safety updates, entertainment and virtually interaction with family and friends. Social media has great impact on businesses, politics, disease control and others. To this end, researchers have studied various practical problems in social media to better understand how the social media behaves and propagates the information. The problems include buzz prediction (Chen et al. 2013), volume prediction (Tsur and Rappoport 2012), infection prediction (Bourigault et al. 2016; Qiang et al. 2019), source prediction (Shah and Zaman 2010), link detection (Rodriguez et al. 2011), target set selection (Kempe et al. 2003; Chen 2009; Chen et al. 2019; Yun et al. 2019; Chen et al. 2020) and firefighter problem (Anshelevich et al. 2009).
Target set selection problem
Here we focus on investigating the problem of target set selection. Formally, we define a social media network as a directed graph , where users are defined as all nodes V and their friendships are defined as all edges E. Users are active when they repost the messages.
Target set selection problem refers to find a subset , , then the nodes activated by S will be at least l. The target set selection problem has two optimization variants: minimum target set and maximum target set. The minimum target set refers to find the smallest set , so that the nodes activated by S will be at least l. The maximum target set refers to find a set of size k, which has the largest activated nodes compared to any other subset of size k. The maximum target set problem is also called social influence maximization problem in social media (Kempe et al. 2003). The target set selection problem can be applied in the areas of viral marketing (Domingos 2005) and cyber security (Budak et al. 2011).
Motivation and problem description
In the target set selection model, the seed nodes spread the influence until the diffusion process stops. The goal is to influence all the nodes or as many nodes as possible. In the current target set selection models in social media, influenced users normally refer to the users who repost the message. Thus the current target set selection models in social media focus on maximizing the repost of messages. In reality, influenced users sometimes will repost the messages. But in most cases, even they are convinced or influenced by the message, users will not repost the message for certain reasons. In this case, influence should not only refers to activation (repost) but also refers to the belief or like in the messages. Thus we build the tiered influence and activation thresholds target set selection models to describe the situation. Here we introduce two thresholds, one is activation threshold and another one is influence threshold . Users will be influenced first before be activated, thus we define . The goal of the model is to influence all the nodes or as many nodes as possible.
Our models are time-indexed integer program models, which can be divided into two parts, the first part is the information propagation. There are two widely used propagation models, namely Independent Cascade Model(IC) (Kempe et al. 2003) and Linear Threshold Model(LT) (Granovetter 1978). Independent Cascade Model assumes every node has a single chance to activate its neighbors. In Linear Threshold Model, each node will be influenced by each neighbor according to a weight. When the total weights from its neighbors is larger than a threshold , then the node will be activated. In this paper, we propose all the mathematical models based on the Linear Threshold Propagation Model. Here we set the weights as 1 for all the nodes. Thus an inactive node will become active if at least of its neighbors are active in the previous step. The second part is the influence dominating part, which means the users should be either active or influenced through having at least of activated neighbors at time T.
Similar to the variants of target set selection problem, we define the Dual Threshold Minimum Influential Seeds problem and its variants, Dual Threshold Maximum Influence with Budget and its variants. Minimum Influential Seeds problem refers to select the least nodes to influence all the nodes at time T. Maximum Influence with Budget refers to select the seed nodes within the budget to influence as many nodes as possible at time T.
Literature review
To investigate the tiered influence and activation thresholds target set selection problem, the research of target set selection problem offers us some good insights. Kempe et al. (2003) study the maximum active set (under the name target set selection) and show that it is NP-hard. They also provide a greedy algorithm within provable approximation guarantees based on the submodularity property of the objective function. Chen (2009) study the minimum target set selection problem and show the problem is hard to approximate within a polylogarithmic factor. Besides, he comes up a polynomial-time algorithm to find an optimal solution when the underlying graph is a tree. Ackerman et al. (2010) propose a combinatorial model for the minimum target set selection and prove the combinatorial bounds for the perfect target set selection problem. Shakarian et al. (2013) present a time-indexed formulation to find the minimum seeds for the target set selection and come up with a scalable heuristic based on the idea of shell decomposition. Spencer and Howarth (2013) consider the problem of how to target individuals with subsidy in the network in order to promote pro-environmental behavior. It is also a target set selection problem and they use a time-indexed integer program formulation with as many time periods as the number of nodes in the network to tackle the problem. Günneç et al. (2019) study the variation of the target set selection problem called least cost target set selection on social networks, and they propose greedy algorithm and dynamic programming algorithm to solve the problem for the tree structure network. Raghavan and Zhang (2019) develop and implement a branch-and-cut approach to solve the weighted target set selection problem on arbitrary graphs.
Contribution and organization
To our knowledge, the previous research involving target set selection focuses on the single threshold (activation threshold) target set selection. In this paper, we propose several practical tiered influence and activation thresholds target set selection mathematical models. The detailed contributions of the paper is summarized below:
We propose tiered influence and activation thresholds time-indexed integer program models to select the seed nodes which can be easily applied in practice. Both minimum and maximum target set selection are explored here. In addition, we propose models exploiting different requirements of seed nodes selection including the maximum activation, early activation and dynamic threshold.
We conduct the sensitivity analysis to determine how the objective function will change if specified parameters, i.e, influence and activation thresholds deviate from their anticipated values.
We compare between different mathematical models Minimum Influential Seeds model and its variants, Maximum Influence with Budget Model and its variants to draw conclusions regarding their differences and connections.
We solve these novel models exactly by Gurobi for small datasets. Besides, we compare between some efficient computational algorithms, i.e., Graph Partition, Nodes Selection, Greedy Algorithm, Recursive Threshold Back Algorithm and Two-stage Approach in Time.
The rest of the paper is organized as follows. Section 2 presents the novel tiered influence and activation thresholds target set selection models. In Sect. 3, we propose several computational algorithms for solving large scale networks. Section 4 shows the experimental results of the proposed models and their corresponding computational algorithms for both synthetic dataset and real dataset. Section 5 concludes the article.
Tiered influence and activation thresholds target set selection models
In this section, we propose eight time-indexed integer program models for identifying the seed nodes for the influence and activation thresholds target set selection problem. Both minimum dual thresholds and maximum dual thresholds target set selection with budget are considered here. In addition, the proposed models explore different requirements of seed nodes selection, which are maximum activation, early activation and dynamic threshold.
Minimum influential seeds
Firstly, we introduce a time-indexed integer program to find the minimum influential seeds for the influence and activation thresholds target set selection problem. An artificial time index t taking values from 0 to T is introduced to model the order in which nodes become active. The messages could propagate at varying distances through different forms of social media. Cha et al. (2009) observe that even for popular photos, only 19 percent of fans are more than 2 hops away from uploaders on Flickr.com. Ye and Wu (2010) find that, on Twitter, 37.1 percent message flows spread more than 3 hops away from the originators. Thus here we set T as 0,1,2,3, which means we only consider the cascades less than or equal to three time steps. The formulation uses a binary variable to represent the status of node i at time t, which is 1 if node i reposts the message at time t and 0 otherwise. Here represents the influence threshold and represents the activation threshold. Nodes should always be influenced(like the message) first before be activated(repost the message), so we set . N(i) represents the neighborhood of node i. The Minimum Influential Seeds Model is as follows:
| 1 |
| 2 |
| 3 |
| 4 |
where, the following parameters are assumed to be given,
The objective function (1) aims to minimize the seed nodes activated at time 0. Constraints (2) are influential constraints, making sure that all the nodes should be either active or be influenced by at least of active neighbors at time T. Constraints (3) refer that a node i will stay inactive at time when it is not activated at time t. Constraints (4) restrict that a node will stay active if it is originally active, which means when , should be 1 as well. In addition, the constraints make sure that a node will become active at time when it is activated at time t, which means when , the influence from its neighbors is larger or equal than , then . Here we introduce two , the first restricts that node i should be active at time even if the influence from its’ neighbors is . The second confirms that when and all the neighbors of i are active, the node i being active at time still holds.
When we set different values for different parameters and , the influential constraints (2) have different insights. When , influential constraints (2) are identical to find the dominating set at time T, which means each node is either active or has at least one active neighbor. When , influential constraints (2) are identical to find the vertex cover set at time T, which means each edge has at least one active node.
Minimum influential seeds with maximum activation
Sometimes we not only want everyone to be convinced by the message but also want the message to be reposted by as many users. Then we propose a time-indexed integer program to find the minimum influential seeds that simultaneously maximize activation at time T. The objective function is different from 2.1, instead of only minimizing the number of seed nodes, we try to maximize the nodes reposting the message at time T as well. In objective function (5), we put more weights on minimum influential seeds and less weights on maximum activation at time T. Weight of is assigned to maximum activation, where represents the total number of nodes. The Minimum Influential Seeds with Maximum Activation model is as follows:
| 5 |
| 6 |
Minimum influential seeds with early activation
Instead of activating maximum nodes at time T, activating nodes as early as possible is necessary as well for some cases. Specifically for the health and disaster applications, users should be informed about the information and repost the message as early as possible to avoid the risks. Thus we come up with an integer program here to find the minimum influential seeds with early activation. The objective function (7) aims to minimize the seed nodes activated at time 0 and simultaneously maximize the number of activated nodes at each time step t. Here we could get rid of the constraints (4), because our objective function confirms that the node will prefer to stay or become active at each time step. Constraints (3) restrict that a node i will stay inactive at time when it is not activated at time t. The Minimum Influential Seeds with Early Activation model is as follows:
| 7 |
| 8 |
Minimum influential seeds with dynamic activation threshold
In the above models, we assume the activation threshold stays constant over the time. However, as time passes by, it will become more difficult to convince someone of another belief, i.e., the longer someone has a negative opinion, the more difficult it is to change it to positive. Thus we come up with the model of Minimum Influential Seeds with Dynamic Activation Threshold, where the activation threshold increases linearly over time. In the model, the activation threshold is at time 0 and the activation threshold is at time . The Minimum Influential Seeds with Dynamic Activation Threshold Model is as follows:
| 9 |
| 10 |
| 11 |
| 12 |
Maximum influence with budget
The problems discussed above are finding the minimum influential seeds under different circumstances, making sure all the nodes are influenced at the end. However, in real cases, when the social network is large scale, it is not practical to select the seed nodes to influence all the nodes. It is common that you have a budget for selecting the seed nodes. Here we come up with a model of Maximum Influence with Budget, which aims to find the seed nodes under budget that maximize the influenced nodes at the end. The formulation uses a binary variable to represent the status of node i at time t, which is 1 if node i is active at time t and 0 otherwise as well. In addition, it uses a binary variable to represent whether the node i is influenced at time T. The Maximum Influence with Budget model is as follows:
| 13 |
| 14 |
| 15 |
| 16 |
| 17 |
The objective function (13) aims to maximize the nodes influenced at time T. Constraints (14) set the budget of the nodes activated initially. Constraints (15) make sure that the node will be influenced when it is active already or have at least of active neighbors at time period T. Constraints (16) and Constraints (17) are cascade constraints. Constraints (16) refer that a node i will stay inactive at time when it is not activated at time t. Constraints (17) restrict that a node will stay active if it is originally active and a node will become active at time when it is activated at time t.
Maximum influence and activation with budget
Then we propose a time-indexed integer program to find the limited influential seeds with maximum influence and activation at time T. The model differs from the Maximum Influence with Budget Model in the objective function, which not only maximizes the influenced nodes but also maximizes the nodes activated at time T. Here we put more weights on influenced nodes with the weight of 1 than activated nodes with the weight of . The Maximum Influence and Activation with Budget model is as follows:
| 18 |
| 19 |
Maximum influence and early activation with budget
Here we propose a time-indexed integer program to find the seed nodes with budget to have maximum influence and early activation as well. The objective function aims to choose the seed nodes with budget to maximize the nodes influenced at time T and maximize the activated nodes at each time step t which will force the nodes to be activated as early as possible. The Maximum Influence and Early Activation with Budget model is as follows:
| 20 |
| 21 |
Maximum influence with dynamic activation threshold
Same as in the Subsect. 2.4, we also propose a maximum influence model with dynamic activation threshold. The dynamic activation threshold increases linearly over time. In the model, the activation threshold is at time 0 and at time . The Maximum Influence with Dynamic Activation Threshold model is as follows:
| 22 |
| 23 |
| 24 |
| 25 |
| 26 |
Computational algorithms
The time-indexed integer program models proposed in Sect. 2 are computationally intractable unless in very small instances because of the large number of binary variables. However, social media networks are usually in an extremely large scale. Thus we apply multiple computational algorithms to tackle the influence and activation thresholds target set selection models for larger scale networks in this manuscript. More details will be discussed in the rest of this section.
Graph partition
When the social media network is large-scale, solving the models exactly through Gurobi is very difficult. The most intuitive way is to solve multiple smaller subgraphs instead of one large graph. Here we use techniques from Modularity and Community Structure (Clauset et al. 2004) in networks to divide the large graph into several smaller subgraphs. Then we solve the models exactly separately for each subgraph.
Nodes selection
When we’re dealing with influence and activation thresholds target set selection problem for large-scale network, the large number of binary decision variables, which are in total, makes the problem difficult to solve. In order to accelerate the computational speed, we could reduce the decision variables through adding some constraints to restrict that some of the nodes are not selected or some of the nodes should be selected. Here we come up with two methods, one is to delete the leaf nodes, and another is to choose the nodes with high degree.
Leaf nodes deletion
Leaf nodes in a connected graph may not be seeded because they’ll influence or activate at most one neighbor directly. Thus we add the constraints (27) to remove the option of activating leaf nodes. In other words, all the leaf nodes will not be seeded using the method.
| 27 |
Degree centrality selection
Nodes with high degree have more potential to influence and active other nodes. Therefore, we assume the high degree nodes must be seeded. Here is defined as the criteria for choosing the seed nodes. When the total neighbors of node i is larger than , the node i will be seeded. Thus we add the constraints (28) to the original models in order to choose the nodes with more than neighbors as seed nodes.
| 28 |
The larger the , the nodes with higher degree will be selected as seed nodes. When is , the nodes having neighbors will all be selected. When is , only the node connecting to all the other nodes will be selected.
Greedy algorithm
We propose the greedy algorithm for the Minimum Influential Seeds problem and Maximum Influence with Budget problem. The greedy algorithm selects the seed node with the largest number of inactive neighbors in each iteration and adds it to the seed node set S until the stop conditions have been met. Then we update the nodes threshold and active set in each iteration considering the propagation process. Here we come up with two different ways to update the threshold and active set, one is based on the DFS search and another is based on the BFS search.
For the DFS Search Greedy algorithm, the algorithm starts with empty seed set (S). Then at each iteration we select the seed node with the largest number of inactive neighbors and add it to the seed node set S. Next, we carry out the propagation process from this newly activated seed node and update the threshold and activated time step (T) of an inactive neighbor node by adding the influence from the activated seed node. When the threshold of the node is larger or equal than , the node is activated. Then we add this node to the active set A. We continue to carry out the same propagation process when the node is activated until certain time steps. DFS search starts at the root node and explores as far as possible along each branch before backtracking. The iteration terminates when all the inactive nodes have at least of active neighbors for the Minimum Influential Seeds problem and when the budget is used up or all the inactive nodes have at least of active neighbors for the Maximum Influence with Budget problem. Algorithms 1, 2 show the DFS Search Greedy Algorithm for the Minimum Influential Seeds Problem. 

For the BFS Search Greedy Algorithm for the Minimum Influential Seeds Problem shown in Algorithm 3, firstly we choose the seed node with the largest number of inactive neighbors and add it to the seed set S. Then we update the threshold and activation time step (T) of an inactive neighbor node by adding the influence sent from the activated seed node. Different from DFS Search, BFS search starts at the tree root and explores all of the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level. Here we use a queue Q to store the parent nodes which will spread the influence within propagation step P. 
Recursive threshold back algorithm
We also come up with the recursive threshold back algorithm especially for the minimum influential seeds problem. We decompose the minimum influential seeds problem and tackle the problem backwards with the tool of integer program. It is required that at time T, all the nodes would be either active or have of active neighbors. In other words, at time T the active nodes should dominate all the nodes. Here we assume the active nodes at time T as minimum positive dominating set. The minimum positive dominating set is the minimum set of nodes that all the nodes are either in the active set or have of neighbors in the active set. Then in order to minimize the seed nodes selected to activate at the beginning, we assume the number of nodes activated should be minimized at each time step. Thus we get the nodes activated at each time step backwards recursively. The pseudo code of the recursive threshold back algorithm is shown in Algorithm 4. 
We use integer program model to solve both the minimum positive dominating set (MPDS) problem and threshold back (TB) problem. The mathematical programming model for finding minimum positive dominating set (MPDS) is shown below:
| 29 |
| 30 |
Objective function (29) is to minimize the nodes activated at time T. Constraints (30) make sure that all nodes should be either active or have at least of active neighbors at time T. Then we get the status of users at different time t recursively by solving the following subproblem Threshold Back (TB).
| 31 |
| 32 |
| 33 |
| 34 |
Here . are solutions getting from the last Threshold Back iteration. The objective function (31) is to minimize the nodes activated at each time period, so that the initial seed set will be minimized as well. Constraints (32), (33) refer that a node j will be activated in time period only if it is active in time period t, or if it has at least of active neighbors. Otherwise, the node j will remain inactive. Constraints (34) show that there could be more nodes active than the solutions getting from the last ThresholdBack iteration, which confirms the feasibility of the problem.
Two-stage approach in time
A two-stage approach in time algorithm is proposed for the minimum influential seeds problem. For the algorithm, we decompose the original minimum influential seeds model into two stages. The first stage is to find the minimum positive dominating set by solving the minimum positive dominating set (MPDS) problem. The mathematical programming model for finding Minimum Positive Dominating Set is the same as the MPDS model presented in Subsect. 3.4. The second stage is to obtain the minimum influential seeds through solving the following minimum target set selection problem (TS).
| 35 |
| 36 |
| 37 |
| 38 |
The objective function (35) is to minimize the selected seed set. Constraints (36), (37) refer that a node j will be activated in time period only if it is active in time period t, or if it has at least of active neighbors. Otherwise, the node j will remain inactive. Constraints (38) show that the node will be active if the node is a dominant node in the first stage.
Computational experiments
We present computational results of the proposed influence and activation thresholds target set selection models and compare the performance of different computational algorithms in various sizes of datasets. The goals of the experiments are listed below:
Conduct the sensitive analysis of different threshold parameters, i.e., influence threshold and activation threshold on synthetic datasets.
Compare between different models to see their differences and connections.
Test the performance and computational time of exact method and various computational algorithms.
Experimental setting and dataset
Our experiments are conducted on a Mac OS Catalina machine equipped with an Intel Core i7 2.6GHz processor, 16GB of RAM and 9.0.0 Gurobi Optimizer.
We consider two classes of datasets in our experiments. (1) Synthetic network of 50 nodes and 80 nodes using the Barabasi-Albert(BA) model (Barabási and Albert 1999), which is an algorithm for generating scale-free network with heavy-tailed degree distribution. (2) A subset of real-life social networks: Karate Club (Zachary 1977), Hamster Friendships Network, Facebook Network Dataset (Leskovec and Mcauley 2012) and LastFM Social Network Dataset (Rozemberczki and Sarkar 2020). The detailed information of the networks is shown in Table 1.
Table 1.
Real datasets
| Network | Nodes | Edges | Description |
|---|---|---|---|
| Karate | 34 | 78 | Contain social ties among members of a university karate club |
| Hamster | 1858 | 12534 | Contain friendships between users of hamsterster.com |
| Facebook1 | 2888 | 2981 | Contain Facebook user-user friendships |
| LastFM | 7624 | 27806 | Contain mutual follower relationships of LastFM asian users |
Sensitivity analysis
The modification of influence and activation thresholds will change the seeds required for minimum dual threshold target selection models and the nodes influenced for maximum dual threshold target selection models. Here we conduct numerous experiments using BA-50 and BA-80 networks to do the sensitive analysis to determine how the objective function will change if specified parameters, i.e, influence and activation thresholds, are permutated.
Sensitivity analysis of minimum influential seeds model
Firstly, we conduct the sensitivity analysis of different parameters and for both BA-50 and BA-80 datasets for the minimum influential seeds model.
Changes of activation threshold:
Figure 1 shows the solutions of different activation thresholds with the same influence threshold of 0.2 for Minimum Influential Seeds Model. We could see that the seed nodes selected would be fewer over time as the activation threshold decreases. In other words, time contributes more in influence propagation when is small. It can be explained by that when the activation becomes easier, fewer seed nodes will be required to influence all the nodes over time. Another interesting finding is that when the activation threshold is large enough, i.e., for BA-50 network, the number of seed nodes selected will be constant over time, which can be explained by the fact that when the activation is quite hard, the seed nodes required will remain constant over time.
Fig. 1.
=0.2
Changes of influence threshold:
Figure 2 shows the solutions of different influence thresholds with the same activation threshold of 0.6 for Minimum Influential Seeds model. We could conclude that when the influence threshold increases, the more seed nodes will be selected, which can be explained by that when nodes are more difficult to be influenced, then more seed nodes will be selected at the beginning.
Fig. 2.
=0.6
Sensitivity analysis of maximum influence with budget model
Then we conduct the sensitivity analysis of different threshold parameters and for both BA-50 and BA-80 datasets for the Maximum Influence with Budget Model.
Changes of activation threshold():
Figure 3 shows the maximum influenced nodes with different activation thresholds and the same influence threshold of 0.2 for Maximum Influence with Budget Model. When is smaller, more nodes will be influenced over time. It can be explained by that when the activation is easy, more nodes will be activated and influenced. In addition, when is too large which means that the activation is too hard, then the influenced nodes will stay constant.
Fig. 3.
=0.2
Changes of influence threshold:
Figure 4 shows the maximum influenced nodes with different influence thresholds and the same activation threshold of 0.6. We could see that when is larger which means that it is more difficult to be influenced, then the fewer nodes will be influenced.
Fig. 4.
=0.6
Model comparison
We propose minimum influential seeds model and its variants, i.e., minimum influential seeds with maximum activation, minimum influential seeds with early activation and minimum influential seeds with dynamic activation threshold models for the minimum dual threshold target selection problem. For the maximum dual threshold target selection problem, maximum influence with budget model and its variants, i.e., maximum influence and activation with budget, maximum influence and early activation with budget and maximum influence with dynamic threshold models are proposed. The original model and its variants of maximum activation, early activation and dynamic activation threshold models are different but there also exist some connections between the models. In this section, we test our proposed models and explore the connections through conducting experiments in BA-50 dataset.
Minimum target set selection models comparison
Here we compare the seed nodes required and activated nodes for different variations of dual threshold minimum target set selection models. Figure 5 shows the performance of models using different influence and activation thresholds in BA-50 dataset. Here model 1 refers to Minimum Influential Seeds model, model 2 refers to Minimum Influential Seeds with Maximum Activation model, model 3 refers to Minimum Influential Seeds with Early Activation model and model 4 refers to Minimum Influential Seeds with Dynamic Threshold model. We could verify that when the number of seed nodes is the same, Minimum Influential Seeds with Maximum Activation model will have more or at least the same number of activated nodes than Minimum Influential Seeds model and Minimum Influential Seeds with Early Activation model. Besides, Minimum Influential Seeds with Dynamic Threshold Model will require fewer or at least the equal number of seed nodes to influence all the nodes and normally it will activate more nodes compared with the other models.
Fig. 5.
Minimum target selection models comparison of BA-50
Maximum target set selection models comparison
We also compare the activated nodes and influenced nodes for different variations of maximum target set selection models with budget. Figure 6 shows the performance of models using different influence and activation thresholds in BA-50 dataset with the budget of 5 seed nodes. Here model 1 represents Maximum Influence with Budget model, model 2 represents Maximum Influence and Activation with Budget model, model 3 represents maximum influence and early activation with budget model and model 4 represents maximum influence with dynamic activation threshold model. From the figure, we could conclude that Maximum Influence and Activation with Budget model will activate more or at least the same number of nodes in comparison with Maximum Influence with Budget model, maximum influence and early activation with budget model. Furthermore, maximum influence with dynamic threshold model will influence more or at least the same number of nodes as other maximum target selection models. However, it may not always activate more nodes than the other models in all cases, such as the cases shown in Fig. 6a and b.
Fig. 6.
Maximum target selection models comparison of BA-50
Computational algorithms comparison
In this subsection, we assess and draw comparisons between different computational algorithms introduced in the Section 3. We implement these computational algorithms for both Minimum Influential Seeds Model and Maximum Influence with Budget Model.
Computational algorithms results comparison for minimum influential seeds model
Firstly, we conduct numerous experiments in different datasets to investigate the performance of different computational algorithms. For Minimum Influential Seeds Model, we implement the original mathematical model using solver Gurobi and various computational algorithms, i.e., Graph Partition, Leafnode, Degree Centrality, DFSGreedy, BFSGreedy, Recursive Threshold and Two Stage in Time. We set the time limit of 3600 s for bold methods in the following experiments. For the method of degree centrality, we set the as 0.2.
For the BA-50 network shown in Table 2, we could solve the model directly using Gurobi. But the degree centrality method is better which speeds up the computation without sacrificing the performance. For the BA-80 network shown in Table 3, we could see it is hard for Gurobi to solve the model directly for the network of 80 nodes. However, the Degree Centrality method could provide a good solution within much shorter time. The Karate network is a network of very small size and the result is shown in Table 4. For the Karate Club network, we could solve the model directly. However, the Leaf Node method could accelerate the computation slightly without sacrificing the performance. For larger size network of Hamster Dataset, the Graph Partition, Recursive Threshold, Two Stage in Time methods couldn’t generate a solution in one hour, so we don’t include here in Table 5. For Leaf Node method, the model is not feasible which means we couldn’t exclude all the leaf nodes as seed nodes for Minimum Influential Seeds model. We could see from the table the Original Model even performs better than the Degree Centrality method within the time limit of 3600s. In addition, both the DFS Greedy method and BFS Greedy method offer good solutions within much less time compared to the Original Model. Facebook 1 is a dataset of low density. The Facebook 1 network has a large number of nodes with few friends. Thus it is easy for Gurobi to solve it directly. However, the Leaf node method has the shortest computation time for this dataset. The results of LastFM Asia Dataset are shown in Table 7, here Leaf Node performs the best compared with Original Model and Degree Centrality methods. For this dataset, BFS Greedy method has better performance than DFS Greedy method.
Table 6.
Facebook 1 dataset (minimum influential seeds model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 16.66 |
| Graph partition | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 20.17 |
| Leaf node | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 0.55 |
| Degree centrality | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 4.40 |
| DFS greedy | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 0.75 |
| BFS greedy | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 1.11 |
| Recursive threshold | 0.4 | 0.6 | 3 | 10 | NA | 2888 | 10 | 0.76 |
| Two stage in time | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 10 | 7.25 |
Table 2.
BA-50 network (minimum influential seeds model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 8 | 37 | 50 | 8 | 2.63 |
| Graph partition | 0.4 | 0.6 | 3 | 16 | 37 | 50 | 16 | 0.12 |
| Leaf node | 0.4 | 0.6 | 3 | 8 | 37 | 50 | 8 | 1.62 |
| Degree centrality | 0.4 | 0.6 | 3 | 8 | 37 | 50 | 8 | 0.22 |
| DFS greedy | 0.4 | 0.6 | 3 | 10 | 43 | 50 | 10 | 0.001 |
| BFS greedy | 0.4 | 0.6 | 3 | 10 | 43 | 50 | 10 | 0.002 |
| Recursive threshold | 0.4 | 0.6 | 3 | 13 | NA | 50 | 13 | 0.05 |
| Two stage in time | 0.4 | 0.6 | 3 | 12 | 50 | 50 | 12 | 0.59 |
Table 3.
BA-80 network (minimum influential seeds model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 12 | 63 | 80 | 12 | 3600.07 |
| Graph partition | 0.4 | 0.6 | 3 | 19 | 63 | 80 | 19 | 0.37 |
| Leaf node | 0.4 | 0.6 | 3 | 12 | 63 | 80 | 12 | 3600.11 |
| Degree centrality | 0.4 | 0.6 | 3 | 12 | 54 | 80 | 12 | 39.13 |
| DFS greedy | 0.4 | 0.6 | 3 | 14 | 71 | 80 | 14 | 0.004 |
| BFS greedy | 0.4 | 0.6 | 3 | 14 | 71 | 80 | 14 | 0.004 |
| Recursive threshold | 0.4 | 0.6 | 3 | 20 | NA | 80 | 20 | 0.15 |
| Two stage in time | 0.4 | 0.6 | 3 | 16 | 80 | 80 | 16 | 1585.26 |
Table 4.
Karate network (minimum influential seeds model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 6 | 18 | 34 | 6 | 0.34 |
| Graph partition | 0.4 | 0.6 | 3 | 7 | 32 | 34 | 7 | 0.09 |
| Leaf node | 0.4 | 0.6 | 3 | 6 | 18 | 34 | 6 | 0.32 |
| Degree centrality | 0.4 | 0.6 | 3 | 7 | 32 | 34 | 7 | 0.02 |
| DFS greedy | 0.4 | 0.6 | 3 | 7 | 33 | 34 | 7 | 0.001 |
| BFS greedy | 0.4 | 0.6 | 3 | 7 | 33 | 34 | 7 | 0.001 |
| Recursive threshold | 0.4 | 0.6 | 3 | 9 | NA | 34 | 9 | 0.05 |
| Two stage in time | 0.4 | 0.6 | 3 | 8 | 34 | 34 | 8 | 0.12 |
Table 5.
Hamster dataset (minimum influential seeds model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 282 | 1543 | 1858 | 282 | 3600.47 |
| Degree centrality | 0.4 | 0.6 | 3 | 291 | 1529 | 1858 | 291 | 3600.56 |
| DFS greedy | 0.4 | 0.6 | 3 | 330 | 1753 | 1858 | 330 | 16.22 |
| BFS greedy | 0.4 | 0.6 | 3 | 327 | 1766 | 1858 | 327 | 15.05 |
Table 7.
LastFM Asia dataset (minimum influential seeds model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 2850 | 6682 | 7624 | 2850 | 3601.25 |
| Leaf node | 0.4 | 0.6 | 3 | 1498 | 6453 | 7624 | 1498 | 3601.52 |
| Degree centrality | 0.4 | 0.6 | 3 | 3427 | 7624 | 7624 | 3427 | 3601.42 |
| DFS greedy | 0.4 | 0.6 | 3 | 1697 | 7276 | 7624 | 1697 | 879.06 |
| BFS greedy | 0.4 | 0.6 | 3 | 1675 | 7255 | 7624 | 1675 | 868.62 |
In summary, for the small size datasets, we could solve the problem directly using Gurobi. For the network of low density, especially when large portion of the nodes have few neighbors(long-tailed network), we could consider the Leaf Node method and Degree Centrality method. For the larger size datasets, normally the DFS Greedy and BFS Greedy will have better performance. The Graph Partition, Recursive Threshold and Two Stage in Time algorithms have poor performance and long computational time for the selected social media networks. For the Graph Partition method, it could result from the structure of network which is hard to divide into subgraphs. Furthermore, even it is divided properly, sometimes the size of the subgraph is still hard to solve directly. For the Recursive Threshold and Two Stage in Time methods, the problem lies in the first step of solving the Minimum Positive Dominating Set problem, which is also a complicated NP complete problem and very difficult to be solved by Gurobi directly for large size data sets.
Computational algorithms results comparison for maximum influence with budget model
We conduct numerous experiments in different datasets to investigate the performance of different computational methods for Maximum Influence with Budget Model as well. For Maximum Influence with Budget Model, we implement the Original Model using solver Gurobi, Leaf Node, Degree Centrality, DFS Greedy and BFS Greedy computational methods correspondingly. We set the time limit of 3600s for bold methods in the following experiments. The budget of seed nodes is of the total nodes. For the method of degree centrality, we set the as 0.2 for BA-50, BA-80 and Facebook1 datasets. For Karate, Hamster and LastFM Asia datasets, is set as 0.3.
For the BA-50 network, the results are shown in Table 8. The model could be solved by Gurobi directly. However, Degree Centrality method works faster with the same performance. From the results of BA-80 network shown in Table 9, we could see it is already very hard for the original model to solve the network of 80 nodes. Here Degree Centrality Method could have the same performance with much less time. The result of Karate Network is shown in Table 10, solving the original model directly works well for Karate Network dataset. From the results of Hamster Dataset shown in Table 11, we could see DFS Greedy and BFS Greedy work much better than the other methods. Facebook 1 is a dataset with several users of high degree. In this case, when these leader users are activated, then all the users will be activated and influenced. From the result shown in Table 12, DFS Greedy and BFS Greedy methods are much better because they will select as few nodes as possible to activate and influence all the nodes without using all the budget. From the results of LastFM Asia Dataset shown in Table 13, BFS Greedy and DFS Greedy outperform the other methods in both computational time and performance.
Table 8.
BA-50 network (maximum influence with budget model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 5 | 18 | 38 | 38 | 9.27 |
| Leaf node | 0.4 | 0.6 | 3 | 5 | 18 | 38 | 38 | 9.70 |
| Degree centrality | 0.4 | 0.6 | 3 | 5 | 18 | 38 | 38 | 0.18 |
| DFS greedy | 0.4 | 0.6 | 3 | 5 | 21 | 36 | 36 | 0.001 |
| BFS greedy | 0.4 | 0.6 | 3 | 5 | 21 | 36 | 36 | 0.001 |
Table 9.
BA-80 network (maximum influence with budget model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 8 | 21 | 56 | 56 | 3600.01 |
| Leaf node | 0.4 | 0.6 | 3 | 8 | 21 | 56 | 56 | 3600.02 |
| Degree centrality | 0.4 | 0.6 | 3 | 8 | 27 | 56 | 56 | 4.05 |
| DFS greedy | 0.4 | 0.6 | 3 | 8 | 24 | 53 | 53 | 0.002 |
| BFS greedy | 0.4 | 0.6 | 3 | 8 | 24 | 53 | 53 | 0.002 |
Table 10.
Karate network (maximum influence with budget model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 3 | 9 | 24 | 24 | 1.15 |
| Leaf node | 0.4 | 0.6 | 3 | 3 | 9 | 24 | 24 | 2.04 |
| Degree centrality | 0.4 | 0.6 | 3 | 3 | 12 | 21 | 21 | 0.01 |
| DFS greedy | 0.4 | 0.6 | 3 | 3 | 12 | 21 | 21 | 0.001 |
| BFS greedy | 0.4 | 0.6 | 3 | 3 | 12 | 21 | 21 | 0.001 |
Table 11.
Hamster dataset (maximum influence with budget model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 185 | 999 | 1509 | 1509 | 3600.11 |
| Leaf node | 0.4 | 0.6 | 3 | 185 | 959 | 1476 | 1476 | 3600.08 |
| Degree centrality | 0.4 | 0.6 | 3 | 185 | 893 | 1419 | 1419 | 3600.08 |
| DFS greedy | 0.4 | 0.6 | 3 | 185 | 1147 | 1552 | 1552 | 11.72 |
| BFS greedy | 0.4 | 0.6 | 3 | 185 | 1178 | 1546 | 1546 | 10.20 |
Table 12.
Facebook1 dataset (maximum influence with budget model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 288 | 2888 | 2888 | 2888 | 2.84 |
| Leaf node | 0.4 | 0.6 | 3 | 12 | 2888 | 2888 | 2888 | 0.12 |
| Degree centrality | 0.4 | 0.6 | 3 | 60 | 2888 | 2888 | 2888 | 0.92 |
| DFS greedy | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 2888 | 0.78 |
| BFS greedy | 0.4 | 0.6 | 3 | 10 | 2888 | 2888 | 2888 | 1.13 |
Table 13.
LastFM Asia dataset (maximum influence with budget model)
| Method | T | Seeded | Activated | Influenced | Obj | Time | ||
|---|---|---|---|---|---|---|---|---|
| Original model | 0.4 | 0.6 | 3 | 762 | 1907 | 3220 | 3220 | 3600.21 |
| Leaf node | 0.4 | 0.6 | 3 | 762 | 2023 | 3542 | 3542 | 3600.16 |
| Degree centrality | 0.4 | 0.6 | 3 | 762 | 1907 | 3220 | 3220 | 3600.15 |
| DFS greedy | 0.4 | 0.6 | 3 | 762 | 3923 | 5673 | 5673 | 382.64 |
| BFS greedy | 0.4 | 0.6 | 3 | 762 | 3951 | 5697 | 5697 | 383.08 |
In summary, the advantages of DFS Greedy and BFS Greedy are more obvious compared to other computational algorithms for the Maximum Influence with Budget problem. On one hand, DFS Greedy and BFS Greedy could save the budget when the budget can not be fully used. On the other hand, when the budget could be completely used, the DFS Greedy and BFS Greedy will work much better in both efficiency and computing time in comparison with the other methods for larger size networks. For small size networks, we could solve it directly. Or we could also consider the Leaf Node method and Degree Centrality method to speed up the computation. In this subsection, we assess and draw comparisons between different computational algorithms introduced in the Sect. 3. We implement these computational algorithms for both Minimum Influential Seeds Model and Maximum Influence with Budget Model.
Conclusion
The increasing popularity of social media networks has created the need for businesses, politicians and organizations to find influential users in social media to spread the influence. In this work, we have addressed the problem through developing the influence and activation thresholds target set selection models including both the minimum and maximum influence and activation thresholds target set selection models. Our models allow us to find the minimum seed nodes that influence all the nodes at time T, and determine the seed nodes under budget that maximize the influence. In addition, to appeal to various applications, different forms of seed nodes selection models are proposed, which are maximum activation, early activation and dynamic activation threshold. We provide different computational algorithms to tackle the various datasets as well. They are Graph Partition, Leaf Node, Degree Centrality, DFS Greedy, BFS Greedy, Recursive Threshold and Two Stage in Time computational algorithms. Experiements in various datasets show that DFS Greedy and BFS Greedy are much more efficient than the other methods for large size datasets. Besides, leaf node deletion and degree centrality selection perform better in terms of long-tailed network.
While we already consider the maximum activation, early activation and dynamic threshold models in the manuscript, we could still customize the dual threshold target set selection models for different applications for future study. Furthermore, we could investigate comprehensively various computational algorithms with regard to different network topologies.
Funding
This article is based on basic research works supported by AFRL Mathematical Modeling and Optimization Institute. The work was supported in part by the U.S. Air Force Research Laboratory (AFRL) award FA8651-16-2-0009.
Data Availability
Enquiries about data availability should be directed to the authors.
Declarations
Conflict of interest
All authors have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Ackerman E, Ben-Zwi O, Wolfovitz G. Combinatorial model and bounds for target set selection. Theoret Comput Sci. 2010;411(44–46):4017–4022. doi: 10.1016/j.tcs.2010.08.021. [DOI] [Google Scholar]
- Anshelevich E, Chakrabarty D, Hate A, Swamy C (2009) Approximation algorithms for the firefighter problem: cuts over time and submodularity. In: International symposium on algorithms and computation, pp. 974–983. Springer
- Barabási A-L, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
- Bourigault S, Lamprier S, Gallinari P (2016) Representation learning for information diffusion through social networks: an embedded cascade model. In: Proceedings of the ninth ACM international conference on web search and data mining. WSDM ’16, pp. 573–582. ACM, New York, NY, USA
- Budak C, Agrawal D, El Abbadi A (2011) Limiting the spread of misinformation in social networks. In: Proceedings of the 20th international conference on world wide web, pp. 665–674. ACM
- Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the Flickr social network. In: Proceedings of the 18th international conference on world wide web, pp. 721–730
- Chen C-L, Pasiliao EL, Boginski V (2020) A cutting plane method for least cost influence maximization. In: International conference on computational data and social networks, pp. 499–511. Springer
- Chen N. On the approximability of influence in social networks. SIAM J Discrete Math. 2009;23(3):1400–1415. doi: 10.1137/08073617X. [DOI] [Google Scholar]
- Chen GH, Nikolov S, Shah D. A latent source model for nonparametric time series classification. Adv Neural Inf Process Syst. 2013;26:1088–1096. [Google Scholar]
- Chen M, Zheng QP, Boginski V, Pasiliao EL (2019) Reinforcement learning in information cascades based on dynamic user behavior. In: International conference on computational data and social networks, pp. 148–154 Springer
- Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E. 2004;70(6):066111. doi: 10.1103/PhysRevE.70.066111. [DOI] [PubMed] [Google Scholar]
- Domingos P. Mining social networks for viral marketing. IEEE Intell Syst. 2005;20(1):80–82. [Google Scholar]
- Granovetter M. Threshold models of collective behavior. Am J Sociol. 1978;83(6):1420–1443. doi: 10.1086/226707. [DOI] [Google Scholar]
- Günneç D, Raghavan S, Zhang R (2019) Least-cost influence maximization on social networks. INFORMS J Comput
- Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’03, pp. 137–146. ACM, New York, NY, USA
- Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. Adv Neural Inf Process Syst pp. 539–547
- Qiang Z, Pasiliao EL, Zheng QP. Model-based learning of information diffusion in social media networks. Appl Netw Sci. 2019;4(1):111. doi: 10.1007/s41109-019-0215-3. [DOI] [Google Scholar]
- Qiang Z, Pasiliao EL, Zheng QP (2021) Target set selection in social networks with influence and activation thresholds. In: International conference on computational data and social networks, pp. 371–380. Springer
- Raghavan S, Zhang R. A branch-and-cut approach for the weighted target set selection problem on social networks. INFORMS J Optim. 2019;1(4):304–322. doi: 10.1287/ijoo.2019.0012. [DOI] [Google Scholar]
- Rodriguez MG, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697
- Rozemberczki B, Sarkar R (2020) Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models
- Shah D, Zaman T. Detecting sources of computer viruses in networks: theory and experiment. SIGMETRICS Perform Eval Rev. 2010;38(1):203–214. doi: 10.1145/1811099.1811063. [DOI] [Google Scholar]
- Shakarian P, Eyre S, Paulo D. A scalable heuristic for viral marketing under the tipping model. Soc Netw Anal Min. 2013;3(4):1225–1248. doi: 10.1007/s13278-013-0135-7. [DOI] [Google Scholar]
- Spencer G, Howarth R (2013) Maximizing the spread of stable influence: Leveraging norm-driven moral-motivation for green behavior change in networks. arXiv preprint arXiv:1309.6455
- Tsur O, Rappoport A (2012) What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM international conference on web search and data mining, pp. 643–652. ACM
- Ye S, Wu SF (2010) Measuring message propagation and social influence on twitter. com. In: International conference on social informatics, pp. 216–231. Springer
- Yun G, Zheng QP, Boginski V, Pasiliao EL (2019) Information network cascading and network re-construction with bounded rational user behaviors. In: International conference on computational data and social networks, pp. 351–362. Springer
- Zachary WW. An information flow model for conflict and fission in small groups. J Anthropol Res. 1977;33(4):452–473. doi: 10.1086/jar.33.4.3629752. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Enquiries about data availability should be directed to the authors.






