Searching for associations between social media trending topics and organizations

João Henriques; João Ferreira

doi:10.1007/s11042-022-13438-2

. 2022 Aug 19;82(6):9277–9302. doi: 10.1007/s11042-022-13438-2

Searching for associations between social media trending topics and organizations

João Henriques ^1,^✉, João Ferreira ¹

PMCID: PMC9388974 PMID: 35999845

Abstract

Trending topics are the most discussed topics at the moment on social media platforms, particularly on Twitter and Facebook. While the access to trending topics are free and available to everyone, marketing specialists and specific software are more expensive, therefore there are companies that do not have the budget to support those costs. The main goal of this work is to search for associations between trending topics and companies on social media platforms and HotRivers prototype was developed to fill this gap. This approach was applied to Twitter and used text mining techniques to process tweets, train personalized models of companies and deliver a list of the matched trending topics of the target company. So, in this work were tested different pre-processing text techniques and a method to select tweets called Centroid Strategy used on trending topics to avoid unwanted tweets. Also, were tested three models, an embedding vectors approach with Doc2Vec model, a probabilistic model with Latent Dirichlet Allocation, and a classification task approach with a Convolutional Neural Network used on the final architecture. The approach was validated with real cases like Adidas, Nike and Portsmouth Hospitals University. In the results stand out that trending topic Nike has an association with the company Nike and #WorldPatientSafetyDay has an association with Portsmouth Hospitals University. This prototype, HotRivers, can be a new marketing tool that points the direction to the next campaign.

Keywords: Text mining, Text similarity, Text classification, Convolutional neural network, Doc2Vec, Latent Dirichlet allocation

Introduction

The number of Internet users is growing exponentially, as well as the offer of new social media platforms over the course of the years, and the amount of users has also increased substantially [26, 32]. Social media has the incredible power of making information, opinions and complains accessible to everyone [10]. An example that illustrates this fact is the video that captured the death of George Floyd, which gained international attention after hitting an astonishing number of comments, likes and shares across multiple social media platforms. As a result of that, protests appeared across the world [36].

The following examples given by the authors [13], in 2010, illustrate the power that social media has to destroy a marketing campaign and products reputation. The case study about Motrin, a medicine that was so criticized that climbed to the top of trending topics on Twitter and gained such visibility that reached mainstream media. One impressive fact is that all this happened throughout a period of 24 hours during a weekend. The other case study is related to a milk-based product, Raging Cow and the failed attempt to create a good reputation around it. The campaign was not well-received by the blogosphere community and bloggers attacked and boycotted the marketing campaign, making the product disappear from the market.

On the other hand, the next example is a mistake that could have damaged the image of Red Cross, but turned out to be a successful blood-donation campaign. In 2011, an employee from Red Cross made a tweet about drinking beer with an uncommon hashtag (#gettingslizzerd) from the company’s account in the middle of the night. This event was noticed on Twitter getting attention from the users. In order to reverse this situation, Red Cross acknowledged the mistake and took action with humor [10]. Both companies Red Cross and Dogfish Head Brewery took advantage of the trending hashtag to their own benefit.

The given examples were triggered without intention and became a discussion topic across multiple platforms. Therefore, knowing the social media environment can be a very powerful tool to avoid harm or to improve social media metrics [9, 10].

A trending topic is defined by Twitter as an emerging discussion topic that is popular in the present moment. To be considered trending, a topic, needs to be discussed more then what it usually is. The authors [8] defined trending topics as the official Twitter description of 2010 “the hottest emerging topics (or the “most breaking” breaking news), rather than the most popular ones”.

Additionally, as the authors [43] refereed in their work, in the year of 2011, trending topics became interesting to users, journalists, applications developers and social media researchers. Besides being new and relevant to people at that moment, the active time of a trending topic is limited [8]. Therefore, if it takes too long to evaluate trending topics, companies may not have time to do something effective [25].

Trending topics are relevant to companies’ marketing as the authors [8] said “(...)Trending Topics present a comparable visibility to other traditional advertisement channels and thus they can be considered a useful tool in marketing and advertisement contexts.”. Instead of companies spending more time and money to increase the visibility of their products or brand, they might take advantage of the already spoken topics on social media to reach their marketing goal.

There are companies already taking advantage of current events to communicate. On the 10 of June, Control a Portuguese brand, made a post on Instagram with the quote “it is day to raise the flag” (Fig. 1(a)) to take advantage of the national holiday of Portugal.

Fig. 1 — Control Portugal post on Instagram and Super Bock and Sagres post on Facebook

On February 8, 2019, in the 21st round of the Portuguese first league, a match between FC Porto and Vitória de Guimarães, Marega, FC Porto player, was the target of racist chants and shouts by supporters of the Vitória de Guimarães team. This topic was very discussed in social and traditional media. Nine days after that event, Super Bock and Sagres, two Portuguese competitor beer brands, made a post together on Facebook with the quote “Against racism, there are no rivals”(Fig. 1(b)).

Finally, having an account on a social media platform is free and trending topics are easily available to everyone. Therefore, even organizations that may not have the budget to invest on specialized human resources and software to make social media marketing can take advantage of trending topics. The key is to analyse which trending topics are relevant to each company in a timely manner.

The goal of this work is to design and implement a prototype cable of use text mining techniques and train personalized models to find associations between trending topics and a social media account of an organization. Three different approaches are used a text similarity, a probabilistic and a classification task approach.

The prototype is called HotRivers and needs to be capable of:

Collecting data from a social media platform: By using the name of the company social media account (e.g. Adidas, Nike, Pull & Bear, and others) and desired location of the trending topic (e.g. United Kingdom, Lisbon, New York, and more);
Preparing data: Apply pre-processing techniques to clean and transform the data;
Modeling: Use different approaches to measure the similarity between data from trending topics and companies social media accounts.

It is important to refer that it is not the scope of this work to study the relationship between trending topics in multiple social media platforms, because each social media platform has different features, has distinct communities and can be classified into many categories [13, 15]. In Section 2.2 is explain which and why social media platform was picked.

Finally, the structure of this work is the following, in Section 1 were presented the problem, the motivations and the goals of this work. In Section 2 is defined what is social media, the impacts of marketing and the presence of companies in social media platforms. Additionally, which social media should be used for this work and why. Furthermore, a research was conducted on works done with trending topics and similar prototypes and systems. In Section 3 is introduced the architecture of HotRivers and user requirements. In Section 4, three experiments are conducted. The first intends to demonstrate the result of all phases and to discuss what went wrong and which aspects need more testing. The second experiment continues testing the techniques and models that were not abandoned in the first experiment. The third experiments confirm that the chosen model is suitable for HotRivers and the objectives are reachable. Finally, in Section 6 is presented the conclusions, future work and limitations of the current work.

Literature review

In this section is discussed what is a social media, the impacts of marketing on social media and why it is beneficial to companies. Also, a systematic review was conducted on similar works to HotRivers and on related works on trending topics.

Social media and marketing

In 2010, [15] defined social media as “a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of User Generated Content.”. According to [7], in the year of 2015, “Social media are Internet-based channels that allow users to opportunistically interact and selectively self-present, either in real-time or asynchronously, with both broad and narrow audiences who derive value from user-generated content and the perception of interaction with others.”.

According to Tiago and Veríssimo [38], in 2014, marketers disseminated information related to the company or it’s products through e-mail blasts, direct marketing, telemarketing, informational websites, television, radio, and others. Hence, if costumers are on social media then firms should be as well. Also, with social media evolution and growth, new challenges have appeared to improve social media services and user experience [16].

In the year of 2014, an online survey was conducted on the managers of the largest companies in Portugal. The authors concluded that 87% of the managers agrees that digital presence improves information gathering and feedback. Also, 85% of them acknowledged that digital presence increases knowledge and 82% admitted that digital presence promotes internal and external relationships [38].

In other study made in 2013, the authors [1], made a case study where they conducted interviews with the staff of running events. They concluded that the gain of using social media in relationship marketing in sport was getting higher acknowledgement from consumers, improved communication client-organization, better customer engagement and more efficient use of resources.

Why Twitter?

A set of conditions were selected to choose which social media should be used in this work. One condition was that the core of publications on that social media platform was text, for example, messages, posts, micro-blogging publications. Instead of video, photography or image, which are not text-based. Even though chat messages or messaging services are text-based, they are not suitable for this work.

For this work, another condition is that legal entities have visibility in that social media application. Visibility, in this context, is not paid publicity, sponsorship or partnership, but an user account that represents a legal entity.

The language used must be English and it was required to be a worldwide application and not specific to some part of the globe. English was the language chosen to be used in this work, because many sophisticated linguistic models were developed for English.

Additionally, this work uses the topics of the day, so it is important to choose a social media platform where the hottest topics are already filtered, because that is the focus of this work. Last but not least, the appliance to access the data must be easy and fast.

Table 1 was constructed in order to compare the top fifteen most used social media platforms and to see which conditions each one check [14]. As said before, messaging service providers such as Whatsapp, Facebook Messenger, WeChat and QQ were disregarded. Then, there are social networks based on videos or images like Youtube, Instagram, TIKTOK, Kuaishou, Snapchat and Pinterest which were also disregarded. While Sina Weibo had a few companies represented there, in QZone was not clear if legal entities played a relevant role. However, both platforms were made for Mandarin speakers and were therefore disregarded. Reddit was excluded, because it is a social news media aggregation, and no legal entity has presence on it [19]. The most proper candidates were Facebook and Twitter. Facebook was excluded too, due to the difficulty to access data and the hottest topic aggregator, because it is not clear how it works. Twitter seemed to be the most accessible of all and fulfilled all requirements.

Table 1.

Table with which characteristic each social media has

Top 15 most used social media platforms
Social media platforms	Text based	Legal entities visibility	Available in english	World wide platform	Appliance conditions difficulty³	Hottest topics filter
Facebook	Yes	Yes	Yes	Yes	Hard	Yes
Youtube	No	−	−	−	−	−
Whatsapp	Yes	N.A.²	−	−	−	−
Facebook Messenger	Yes	N.A.²	−	−	−	−
WeChat	Yes	N.A.²	−	−	−	−
Instagram	No	−	−	−	−	−
TIKTOK	No	−	−	−	−	−
QQ	Yes	N.A.²	−	−	−	−
QZone	Yes	N.K.¹	−	−	−	−
Sina Weibo	Yes	Yes	No	−	−	−
Reddit	Yes	No	−	−	−	−
Kuaishou	No	−	−	−	−	−
Snapchat	No	−	−	−	−	−
Twiiter	Yes	Yes	Yes	Yes	Easy	Yes
Pinterest	No	−	−	−	−	−

Filtering Criteria
Inclusion criteria	Exclusion criteria
Written exclusively in English	Not written exclusively in English
Work developed to English language	Work developed to other languages
Publication after 2010	Publication before 2010
Free or inside ISCTE’s scientific license	Paid documents
Papers in conferences or journals	Paper or journals published in non-trust sources
Title, abstract or keywords related to association between trending topics and companies	Non-applicability to association between trending topics and companies

Filtering steps	Number of works
Search for query	7,729
Title, abstract or keywords related to association between trending topics to companies	330
Applied inclusion and exclusion criteria	5
Full-work analyse	0

Authors	Year	Approach used	Findings
Zubiaga et al. [43]	2011	SVM with 15 different features	Classified current events with an accuracy of 82.9% memes with 73.1%
Lee et al. [20]	2011	- MNB with bag-of-words TF-IDF - C5.0 decision tree learner	MNB accomplished 70% and C5.0 decision tree 65% accuracy
Zhu [42]	2018	MNB with short text aggregation	Model achieved 73.33% of accuracy and build and classifies in 1.5 seconds
Shalini et al. [33]	2019	- Bag of Tricks classifier - CNN - Bi-LSTM	The best were Bag of Tricks, then slightly worse CNN and last Bi-LSTM
Liu et al. [21]	2019	CNN-LSTM (A mix of a CNN and a LSTM)	Classification binary of offensive tweets attained 98% of F1-score and 67.9% of F1-score on classified sentiments

Authors	Year	Approach used	Findings
Bian et al. [6]	2013	Multimodal latent Dirichlet allocation	The framework output is textual and visual summaries of the trending topics
Aiello et al. [2]	2013	- LDA - Doc-p - GFeat-p - FPM - SFPM - BNgram	- The best topic recall: BNgram - The most complete topic description: SFTM and LDA - The most precise topic description: FPM - Steaming worsen the performance and tweets aggregation seems to improve topic recall - LDA is affected by noisy events
Peng et al. [27]	2015	SVM with unigram features	- Use of sentimental features - The model achieved the highest response time and accomplished 73.3% of F1-score
Sharma et al. [34]	2015	Proposed the algorithm TopicDetect	Approach effective and extensive to cover important topics
Melvin et al. [22]	2017	Phrase network model	The model accomplishes an F1-score of 54%
Singh and Shashi [35]	2019	- Bag-of-words with TF-IDF - Word2Vec - Doc2Vec - k-means	- Bag-of-words with TF-IDF had a purity score of 0.98, Doc2Vec of 0.95 and Word2Vec of 0.89 - Bag-of-words with TF-IDF had slighly better performance, but offered less options than Word2Vec and Doc2Vec - Doc2Vec delivered the highest-quality results

Authors	Year	Findings
Asur et al. [5]	2011	- Trending topics are driven by a log-normal distribution - Trending topics have a decay of a geometric distribution - The most important attribute is the retweet by other users - The number of followers and tweet-rate of users does not provoke trends - The most content shared is news from traditional media
Wilkinson and Thelwall [40]	2012	Twitter follows identical pattern to media news
Annamoradnejad and Habibi [4]	2019	- Half of the trending topics were a single word - On average the trending topics had 30 characters and 2 words - Approximately a trending topic needed 36.2 minutes to get to top 10 and 91.5 minutes to be at top 1 - 977 trending topics in 1 year got to rank 1 in less than 10 minutes - A trending topic emerge 1.5 times - The longer duration of a trending topic was 30 hours

Authors	Year	Approach used	Findings
Althoff et al. [3]	2013	- Nearest Neighnor - Forecaste approach	The duration of the lifetime of a trending topic: - on Twitter was 1 day in 44% and 2 days in 24% of the trending topics - on Wikipedia, 50% of trend stayed 1 day and 16% 2 days - on Google, 17% lived seven days, 14% four days and 13% six days - The model forecast 9,000-48,000 views up front to 14 days with an error of 19-45%
Giummolè et al. [12]	2013	- TBG - Autoregressive model - DL model - Autoregressive - Distributed Lag model	- The DL model explained approximately 75% of the variance - When Google was the dependent variable and Twitter the explanatory variable the DL model was significant 60% the time - Twitter trending topics caused an identical Google Trend 43% of the times

Affected phase	HotRivers requirements
Collecting phase	Own a Twitter developer account
	Minimum of 1,500 posted tweets in an account¹
	Target account cannot be private
	Minimum of 100 tweets per topic
	Only hot topics’ tweet from official English speakers countries
	Maximum extraction time 4 hours
	Only words, abbreviations and different word spelling
Modeling Phase	Number of tweets repeated between 2 to 5²
	Minimal of 1,000 tweets after cleaning and transformation

Name of the company	Number of tweets	Number of tweets	Number of followers
	collected	published
Adidas	3,104	13,800	3,800,000
Nike	2,889	36,800	8,200,000
Portsmouth Hospitals University	2,261	17,000	8,790

Techniques		Average	Median	Rank	Number of Tweets
No STP	No treatment	0.357	0.360	[0: 1875, 1: 348, 2: 181, 3: 108, 4: 75, ...]	3,081
	Lemmatization	0.358	0.359	[0: 1928, 1: 355, 2: 168, 3: 104, 4: 77, ...]
	Steamming	0.359	0.361	[0: 1923, 1: 367, 2: 180, 3: 105, 4: 64, ...]
With STP	No treatment	0.356	0.354	[0: 1082, 1: 69, 2: 14, 3: 2, 831: 1, ...]	1,175
	Lemmatization	0.358	0.349	[0: 1078, 1: 59, 2: 6, 3: 4, 4: 1, ...]	1,151
	Steamming	0.357	0.349	[0: 1082, 1: 61, 2: 13, 3: 1, 4: 2, ...]	1,161

Technique	Number of unique	Number of	Average	Standard deviation	Median similarity
	tokens	documents		similarity
No treatment	513	3,082	0.800	0.092	0.816
Lemmatization	502		0.814	0.089	0.831
Steaming	507		0.811	0.087	0.820

Techniques		Loss	Accuracy	Recall	Precision	F-1 score	Quantity of Adidas tweets
No STP	No treatment	0.421	0.918	0.920	0.920	0.920	3,077
	Lemmatization	0.466	0.920	0.922	0.922	0.922
	Steaming	0.543	0.910	0.912	0.912	0.912
With STP	No treatment	0.479	0.901	0.903	0.903	0.903	2,886
	Lemmatization	0.447	0.903	0.903	0.903	0.903	2,874
	Steaming	0.543	0.900	0.901	0.901	0.901	2,886

Trending Topic Name	Average
	Adidas	Nike	PHU
#PepsiMaxTasteOneStop	0.263	0.323	0.234
Nike	0.560	0.986	0.382
#WorldPatientSafetyDay	0.013	0.036	0.816
#ps5preoder	0.222	0.566	0.373
#northeastlockdown	0.202	0.351	0.722
Amazon Uk	0.326	0.400	0.369
argos	0.382	0.514	0.297
#XboxSeriesX	0.251	0.603	0.392
#Covid_19	0.121	0.237	0.689
#NHSCOVID19app	0.054	0.200	0.683
iOS 13	0.130	0.284	0.526

PERMALINK

Searching for associations between social media trending topics and organizations

João Henriques

João Ferreira

Abstract

Introduction

Fig. 1.

Literature review

Social media and marketing

Why Twitter?

Table 1.

Systematic review of related works

Table 2.

Table 3.

Related works to trending topics

Table 4.

Table 5.

Table 6.

Table 7.

HotRivers prototype overview

Fig. 3.

Fig. 2.

Data collection phase

Data preparation phase

Modeling phase

Experiments and results

HotRivers minimum operating requirements to work

Table 8.

Data

Table 9.

Experiment 1: Adidas

Table 10.

Table 11.

Table 12.

Adidas results with trending topics and analyse

Table 17.

Experiment 2: Nike

Table 13.

Nike results with trending topics and analyse

Table 18.

Experiment 3: Portsmouth Hospitals University

Table 14.

Portsmouth Hospitals University Results with trending topics and analyse

Table 19.

Results discussion

Fig. 4.

Table 15.

Conclusion

Table 16.

Future work and limitations

Appendix A: HotRivers experiments results

Declarations

Conflict of Interests

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases