Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2023 Jan 31;18(1):e0270542. doi: 10.1371/journal.pone.0270542

What Tweets and YouTube comments have in common? Sentiment and graph analysis on data related to US elections 2020

Alexander Shevtsov 1,2, Maria Oikonomidou 1,2, Despoina Antonakaki 1,*, Polyvios Pratikakis 1,2, Sotiris Ioannidis 1,3
Editor: Daswin De Silva4
PMCID: PMC9888715  PMID: 36719868

Abstract

Most studies analyzing political traffic on Social Networks focus on a single platform, while campaigns and reactions to political events produce interactions across different social media. Ignoring such cross-platform traffic may lead to analytical errors, missing important interactions across social media that e.g. explain the cause of trending or viral discussions. This work links Twitter and YouTube social networks using cross-postings of video URLs on Twitter to discover the main tendencies and preferences of the electorate, distinguish users and communities’ favouritism towards an ideology or candidate, study the sentiment towards candidates and political events, and measure political homophily. This study shows that Twitter communities correlate with YouTube comment communities: that is, Twitter users belonging to the same community in the Retweet graph tend to post YouTube video links with comments from YouTube users belonging to the same community in the YouTube Comment graph. Specifically, we identify Twitter and YouTube communities, we measure their similarity and differences and show the interactions and the correlation between the largest communities on YouTube and Twitter. To achieve that, we have gather a dataset of approximately 20M tweets and the comments of 29K YouTube videos; we present the volume, the sentiment, and the communities formed in YouTube and Twitter graphs, and publish a representative sample of the dataset, as allowed by the corresponding Twitter policy restrictions.

1 Introduction

Previous studies analyzing political content on Social Networks, mainly focus on a single platform, however, online political campaigns and interactions between users seem to take place across diverse online social networks. Online discourse is scattered across several social networks, sometimes among the same users and communities. The parallel study of the content of these social networks can reveal the political campaigns [1], comments, opinions, tendencies, and beliefs of the electorate [2, 3]. For instance, social media played a very important role during the 2016 US elections [1, 46]. However, only a few studies focus on more than one social networks at a time [7, 8], thus possibly missing important interactions.

Users on Twitter and YouTube form tight communities. Twitter homophily has been noticed in communities [9], in hashtags [10], on Twitter lists [11], and modelled in the follow and mention graphs [12, 13]. The analysis of this phenomenon has been studied under the prism of political context as well, like in [1419], or support homophily between users and social media [20]. Homophily and forming of common communities on social media have been noticed on YouTube as well, as seen in several background studies [2123].

Social media users with political interests share and seek information about politics on Twitter and there is a high probability these particular users will also search for additional information on other social networks with similar topics. This could indicate potential connections across different social platforms, especially in the case of political discourse. Consequently, the analysis of a single social network cannot reveal the entire picture of the connections across several social networks.

As supported above, online social networks allow users to form groups and communities, where the members are related to the same topic or interest. Based on that, we assume that communities are generated based on the users’ preferences. Each social network has its own communities, where the users are already connected to each other, by some hidden layers of each social network. In our study, we provide an analysis of Twitter and YouTube where these layers are consisted of the features like retweets, replies, mentions - in the case of Twitter- and comments and likes -in the case of YouTube-. We already have information about the community structures of each social network, but the correlation of those communities across both of them is still not covered. From our perspective, it is very important to identify if indeed communities with similar interests are connected across different social networks.

In this study, we manage to reveal the connections between the communities across YouTube and Twitter. Through community detection in the two social networks, we reveal the interactions between the comment graph and retweet graph, respectively. To achieve this, initially, we obtain the most popular hashtags related to the US elections and gather a dataset of 19.8M tweets, for a period of four months starting from July 19 of 2020. We extract 28.343 unique YouTube video links contained in the Twitter dataset, as well as their metadata (comments, authors, etc.). We perform volume analysis, identify the main entities in the corpus and perform per-entity sentiment analysis. We study the evolving retweet graph in six sequential time periods from July to November 2020 and show the diffusion of content regarding the two candidates. The results of sentiment analysis give a higher positive sentiment towards Donald Trump on Twitter (35.2%) and YouTube (18%), in comparison to the positive sentiment expressed towards Joe Biden on Twitter (28%) and YouTube (12%). We perform community detection in the retweet graph and YouTube comment graph with the use of Louvain method [24]. With PageRank, we identify the most engaging nodes in both graphs and label the communities that contain these nodes after their name. Finally, we show the interactions between the largest, in size, communities on both social media.

This study makes the following contributions:

  • Identifies the connections of the communities affiliated with D.Trump and J.Biden campaign between Twitter and YouTube.

  • Reveals the interactions between the YouTube communities in the YouTube-comment graph.

  • Presents the Twitter retweet communities generated based on the user retweets.

  • Highlights the patterns in the comment graph that supports the linking between the Twitter Retweet graph to the YouTube comment graph.

Analyzing the sentiment is an additional step towards the processing of political content [2529]. Sentiment analysis has been extensively applied to Twitter traffic, usually regarding a specific event. Through sentiment analysis, we can visualize the variation of the sentiment of the electorate, during a political event [30, 31], a company event [32], a product’s reviews [33], or model the public mood and emotion and connect tweets’ sentiment features with fluctuations with real events [34]. The main task of these works is to predict elections, classify the electorate and distinguish posts towards one political party or ideology, although it has been addressed in many works that Twitter is not suitable for election prediction [35, 36].

1.1 Background

1.1.1 Homophily, communities on Twitter, YouTube

Existing studies towards the analysis of the interactions across different social networks during the elections period are used for diverse goals. They mainly focus on the observation and investigation of the online discourse, highlighting the main activity of the candidates and the evolution of their online presence [37], evaluate how the candidates are influenced by SNs [38], or investigate whether social networks contribute in the democratization of our political systems, studying online propaganda, or the topics of the content in YouTube videos related to campaigns [39].

There is a plethora of studies on homophily on Twitter and YouTube on political content, analyzing the communities that are formed on social media and content generated by these platforms. For example in [40] authors manage to analyse 8.9 M tweets during the 2015 Spain Elections. This study identifies a specific category of users, the so-called “party evangelists” and explores the activity effect on them regarding the general political conversation. In [41] they analyze the communications of alt-right supporters during the 2018 US-midterm elections. After they obtain a dataset of 52903 tweets posted by 123 alt-right Twitter accounts (from 30/10 until 13/11 2018), they apply community detection (Clauset-Newman-Moore greedy modularity maximization method) in the retweet graph, topic extraction of the communities and demonstrate the communities categorized by the topic discussed among them. In [42] they study homophily in journalists’ interactions by analyzing a dataset of 600,000 tweets sent by 2908 Australian journalists in a period of one year. They conclude that journalists mainly interact with other journalists of the same gender, working environment, and location forming a bubble.

In [43] they investigate the structure of the “Occupy Wall Street” movement on Twitter and YouTube through a dataset of 328 identified users including their posts, the number of followers, followees tweets, and favorites of users, as well as the corresponding YouTube videos with the keyword “Occupy Wall Street”. The results show that both social networks were coordinated by US users and specifically, a loosely connected network coordinated around central hub users on Twitter while on YouTube the network was organized by anonymous users. In [44] they analyse a dataset of 238,967 tweets around the #gamergate, associated with an episode “Law & Order: Special Victims Unit” that initiated an online activity and controversy and explore the challenges of accounting on the cultural dynamics of Twitter and YouTube, focusing on the key media objects like images videos and tags.

Community detection has been used towards the discovery of relations between different and heterogeneous social networks like in [7], where they build a heterogeneous network with undirected and directed edges, to simulate a social network and study overlapping community detection. Also, in [45] they apply a method of converting multi-relational networks to single-relational networks on synthetic and real-world data (DBLP). Background work in community detection across Twitter with other real social network is limited.

1.1.2 Political content analysis on Twitter

The background work specifically during election periods on Twitter includes volume analysis of the number of tweets and comparison with the results of the elections, while they apply topic analysis and show the co-occurrence between terms or candidates within the tweets [3, 4648]. Other works focus on the classification of social media users to separate Democrats and Republicans by using machine learning techniques on features derived from user information [49], or by automatically constructing user profiles in a dataset of scraped lists of users in the Twitter directories WeFollow and Twellow [50]. For example in [51] in a categorized dataset of four samples covering the period of January 2013 until December 2014, they apply unsupervised learning to classify the tweets topics posted by the legislators and citizens and reveal that the former follows a discussion of public issues and are potentially more interacting with their supporters that to the general public. In [52] on a dataset of 13 million followers of @realDonaldTrump on 8 November 2016, they apply a clustering analysis and show the following patterns of the electorate.

Studies that focus on network analysis on Twitter, during periods of election study also homophily [13, 14, 53, 54], apply stochastic link structure analysis and show the party interactions [55], apply community detection [56, 57], analyze the networks of mentions and retweets [5760] and estimate users’ partisan preference, while in [60] they study the activity, emotional content, and interactions of political parties and politicians. Finally, in [61] they study the structure of the conversations on Twitter between political, media, and citizen agents, while the analysis shows a decentralized and loosely knit network, during elections in Belgium.

Studies are focusing on the prediction of the election outcome([62, 63]), like in [48] where there are also studying whether Twitter can be used for political deliberation and mirrors offline political sentiment, during the German federal election.

There is a plethora of studies that have used sentiment analysis in the political domain [64, 65], either for group polarization [13, 66], study specific events or personalities like the Arab spring [67] or Hugo Chavez [68]. For example, in [30] they study a dataset of 36 million tweets on the 2012 U.S. presidential candidates and apply a real-time analysis of public sentiment. Using the Amazon Mechanical Turk, they label the dataset with the tweets’ sentiment (positive, negative, neutral, or unsure), in order to apply statistical classification (Naive Bayes model on uni-gram features) on a training set consisting of nearly 17.000 tweets (16% positive, 56% negative, 18% neutral, 10% unsure). Also, in [69] they attempt to understand the broader picture of how Twitter is used by party candidates, understand the content and the level of interaction by followers. More detailed background on Twitter Sentiment Analysis methods can be found on surveys like [7073].

Sentiment Analysis on Twitter is not limited to one language; there are works studying multiple language datasets [7480]. Recent studies in sentiment analysis using Deep Convolutional Neural Networks [74, 8186]. Our work focuses on a single language (English) which is the dataset based on. Finally, some studies incorporate the use of Twitter features, like emoticons [8790].

Sentiment in these analyses is represented by a variable with values like ‘Positive’, ‘Negative’ and ‘Neutral’, or even more specific (’Happy’,’Angry’, etc.). Each word in the corpus can be assigned with more than one sentiment (’Positive’ and ‘Negative’). Other metrics that can be measured in this analysis, are ‘subjectivity’ and ‘polarity’, where the first one is defined as the ratio of ‘positive’ and ‘negative’ tweets to ‘neutral’ tweets, while the second is defined as the ratio of ‘Positive’ to ‘Negative’ tweets.

A necessary step of sentiment analysis is ‘text normalization’, an initial preprocessing of the corpus to extract the lexical features that can significantly affect the performance [9193]. The steps of the preprocessing include tokenization, expansion of abbreviations, and removal of stop words (URLs, mentions, etc.).

A very common step that is also used is to incorporate a lexicon, specially made for the domain of the dataset [3, 94]. In [95] they obtain three different corpora of tweets and explore the usage of linguistic features towards sentiment analysis. In [96] they are adopting a lexicon-based method on diverse Twitter datasets. In [97] they conduct a study of sentiment analysis on a dataset of 26,175 general Bulgarian tweets. Through feature selection and classification (binary SVM), they show that the negative sentiment predominated before and after the election period. In the current study, we are not compiling a specially made lexicon, because we do not have a language barrier or analyzing a plethora of entities; we are just focusing on the two main candidates.

1.1.3 Content analysis on YouTube

YouTube analysis has been used to apply sentiment analysis in the recent US Elections 2020, on limited dataset of approximately 200 comments from YouTube [98], to discover irrelevant and misleading metadata [99], to identify spam campaigns [100], to discover extremists videos and hidden communities [101], to propagate preference information of personalised video [102], to estimate causality between user profiles [103], to spread political advertisement [39, 104, 105], and to apply opinion mining [106]. For example in [104], they explore the Senate Campaign 2008. Among other, they claim that not many people are exposed to television ads, therefore YouTube represents an increase in accessibility of the campaign messages, which favored a group to control the messages on the network. Background work on sentiment analysis on YouTube [107, 108], studies the sentiment on user comments [109, 110], identifies the trends and demonstrates the influence of real events of user sentiments [111], implements model utilizing audio, visual and textual modalities as sources of information [112] and studies the popularity indicators and audience sentiments of videos [113].

Previous studies on both Twitter and YouTube, during the elections period, investigate online propaganda strategies like in [8], or evaluate the influence of candidates via social media [38].

In our study, we focus on the discovery of the relation between YouTube and Twitter via community detection and on the link of Twitter Retweet graph with YouTube comment graph, by using tweets of posted videos, the measurement of their similarity and differences, and the interactions and the correlations between the largest communities on YouTube and Twitter. As seen above, previous work on the analysis of social media corpus during election periods like [98], does not provide this large dataset covering almost three months before the elections of political conversation (in two social media), nor does this extended analysis covering sentiment, volume and graph analysis, as well as the comparison and correlation between two social networks.

2 Dataset

We acknowledge the fact that Twitter and YouTube users do not fully represent the electorate during an election period, which consequently introduces a bias in the electorate dataset. Previous studies have shown that Twitter users belong to a specific age [36], social and ideology demographic group [114]. This means that public opinion is not fully expressed through social media. Regarding YouTube, the usage penetration in the United States 2020, by age group is in better shape than Twitter, since the popularity difference between the age groups differs slightly, although it attracts a younger audience, regardless of the minimum age for using the service is 13 years, in most countries [115], not necessarily part of the audience of our dataset.

Taking into consideration the bias of the dataset, the task of election prediction is challenging. However, the analysis of the specific corpus can shed light on the relationship between Twitter and YouTube. As shown in section 4.4, the task of community detection on both social graphs can contribute to the association between the communities on the YouTube comment graph with the communities on the Twitter retweet graph and the measurement of their similarities and differences.

2.1 Twitter

In this study, we search for all the prevalent hashtags regarding the US elections on the 3rd of November 2020 and obtain the Twitter corpus through streaming Twitter API. The acquisition of the dataset started on 19 July 2020 and finished on 3 November 2020.

This resulted in a dataset of approximately 19.8M tweets, containing 4.5M users, with the most prevalent hashtags being #Trump2020, #vote and #Election2020. In Table 1 we can see the most popular hashtags, sorted by the number of tweets within which they are contained and in S1 Appendix, on Table 5 in S1 Appendix the whole list of hashtags used in our analysis. In Fig 1 we show the number of daily tweets and comments on YouTube, where we notice a peak on September 29, potentially explained by the first debate [116] and of course the second higher peak for Twitter on the day of the elections.

Table 1. The 10 most popular hashtags in our dataset.

Hashtag Tweets count
#vote 7.196.981
#trump2020 3.913.969
#election2020 3.535.323
#biden 1.569.981
#trump 959.048
#bidenharris2020 866.120
#debate2020 855.543
#votebluetosaveamerica 837.089
#maga 697.381
#trumphascovid 581.456

Fig 1. Number of daily tweets and comments on YouTube videos.

Fig 1

We considered that 19 July 2020 date was a reasonable starting point for collecting our dataset since the number of tweets in the corresponding hashtags we collect begin to accumulate a significant number, as well as the semantics of the content, started to be more relevant to the conversation related to the elections. Regarding the completeness of the content covered by our dataset, we acknowledge the fact that additional minor hashtags may exist during that period, that were not crawled and included in our corpus. We consider our dataset the complete online discourse, since we got the majority of the hashtags available in that period before the elections. Additionally, there was an overlap between the hashtags in the election discourse and these hashtags were cross-referenced in the tweets. For example, some tweets included the popular hashtags (#Vote) and the minor hashtags were also mentioned in the same tweet. This tweet is also included in our dataset because of #Vote. Additionally, we included only the general hashtags and not the ones that were in favor of a particular candidate. We try to include only two hashtags that are in favor of each candidate, in order to keep a balance between them and not introduce a bias towards one of them. Finally, the main amount of the political conversation was gathered in the popular hashtags and the rest of them do not contain a significant amount of tweets. In the S1 Appendix, in Table 5 of S1 Appendix we show the list of 20 most popular hashtags in our dataset. The entire list contains 585.486 unique entries of hashtags, which is too long to be added to the manuscript.

Fig 1 shows the total number of tweets per day, for every hashtag contained in our dataset.

2.2 YouTube

From the election tweets, we extracted all the contained YouTube video links. We followed these links and ended up with 39.203 videos. Through the YouTube Data API, we obtained all the publicly available data regarding these videos. The accessible data used in this study for each video are:

  1. Category where it belongs (e.g. News & Politics, Entertainment, Music, etc.),

  2. Text and author of each selected comment and,

  3. YouTube channel that posted the video.

In Fig 2 we see how many videos belong to each category. In this study, we focus on the election’s topic, so we filtered out the videos that do not belong to the following categories: News & Politics, People & Blogs, and Entertainment. The filtering led to a dataset of 28.343 videos.

Fig 2. Number of videos per YouTube category.

Fig 2

From the 28.343 videos, we gathered all the comments and their replies generated between 19/7/20 and 3/11/20. This resulted in a dataset of 8.476.193 unique commenters and 39.266.355 comments and replies. Fig 1 shows the total number of comments and replies per day, related to the elections. We notice an increase in the number of comments from July to November, with diurnal patterns, a peak on September 29 on the first debate [116] and of course the second peak for Twitter on the day of the elections.

2.3 Data availability

To obtain the dataset used for the analysis described in this study, we follow the Twitter and YouTube API restrictions and do not violate any terms from the Developer Agreement and Policies. According to Twitter Policy, we are not allowed to share the entire dataset, but only 100K user IDs. This dataset is available here: zenodo link. We provide the directed retweet graph from the Twitter network, all user IDs from the provided retweet graph (89.479 users), all video IDs (vid) extracted from the election-related tweets (39.203 video ids), and the directed comment graph.

The public dataset is composed of the collected data, split into two separate parts, the Twitter and YouTube parts. Each part contains limited information, in order to protect the privacy of users and respect the limitations of the social platform policies.

2.3.1 Twitter

The Twitter part, contains the user_IDs from the retweet graph of the users contained in our dataset, with respect to Twitter policy and limits. This retweet graph can be utilized, in order to identify the retweet relationship between the users during the US Election 2020. In this graph, nodes represent the users in the graph, while edges represent the retweet relationship between two users, accompanied by an edge weight representing the number of retweets between the particular pair of users. Also, the described nodes in the retweet graph are used, in order to collect user information, such as the user posts and user profiles. For these purposes, we used a crawler that respects the Twitter API policy, which is available here: tWawler. This crawler allows the retrieval of the user posts from the user timeline and the profile information based on the provided User_IDs, from Twitter API. In order to keep only the posts that are correlated with the US Election topic, the collection should be filtered based on the hashtags listed in Table 5 in S1 Appendix.

According to Twitter policy, we are not permitted to store and distribute user objects and information that have been removed or suspended from Twitter. Consequently, the shared dataset contains only the tweets that contain the specified hashtags. We do not provide the user information (user objects) and the timeline posts for the accounts, since this information is removed or can be removed in the future by Twitter. Thus, the shared dataset includes only the User_IDs which allows the collection of the publicly available information through the Twitter API, without violating the Twitter policy. For each User_ID we also provide a YouTube video ID that was posted by the user during the elections period. Based on those video IDs, it is possible to collect the users who post the comments and replies to each video with the usage of YouTube API.

2.3.2 YouTube

The YouTube dataset contains a list of video IDs, extracted from the related tweets, described above. For this part, we provide the graph that represents the relationship between two YouTube users commenting on a specific channel. To gather the data needed for this collection part and follow YouTube API’s policy restrictions, we used the scripts available at YouTube Data API. Each video on YouTube is categorized based on its content. From the videos obtained from the tweets, we exclude the ones that did not belong to the categories News & Politics, People & Blogs, and Entertainment. For the resulting filtered videos, we gathered all their comments and authors from 19 July 2020 to 3 November 2020, along with the channel that owns them. Initially, to construct the comment graph for each channel, we gather for each filtered video_ID all the user User_IDs from YouTube that have commented on it. Next, we convert the video_IDs to channel_IDs pointing to the channel that owns the video. The result is the graph containing the users and the comments on each channel.

3 Methodology

3.1 Text preprocessing

In order to apply sentiment analysis, we follow a prerequisite set of steps for preprocessing the corpus (tweets - YouTube comments); by removing punctuation symbols, text emojis and URLs; and by modifying mentions and hashtags (remove starting characters of ‘@’ and ‘#’). This procedure removes the text noise and finally allows the identification of the entity that was discussed by the users. The next step includes the transformation of the text to lower case and tokenization of the collected tweets. Then, we perform the lemmatization of each token. This technique normalizes the inflected word forms. As a result of the previous steps, this sentiment analysis will be performed on lower-cased and normalized sentences.

3.2 Sentiment analysis

There is a plethora of previous studies as mentioned in section 1.1 that analyses the sentiment on Twitter. Vader is broadly used in this domain, as shown in [117121].

In our implementation of sentiment analysis, we utilize Vader [122] sentiment analysis model from python NLTK library [123]. We choose this particular implementation because it is especially attuned to the sentiment expressed on social media. Vader uses a list of lexical features that are labeled according to their semantic orientation. Thanks to the implementation simplicity, sentiment analysis execution time remains low. Utilizing the already developed sentiment solution, we need to perform text filtering and parse our data to the SentimentIntensityAnalyzer function that returns the scores of positive, negative, and neutral sentiment types. In our analysis, we compute such sentiment for each collected tweet and comment in our database and summarize those sentiment scores daily for each particular user. We compute two values of daily sentiment scores per user: summary, where the daily user sentiment is a sum product of scores of tweets that was created by the user each day; and the average score where the summary score is divided by the number of tweets where a particular entity was mentioned by the user.

The sentiment analysis of the corpus develops only the emotion vector of a particular sentence and not the emotion of a particular entity. In order to continue our pipeline, we will need the sentiment of each entity. In the next subsection, we explore how we identify the entities.

3.3 The entities and their sentiment

In order to identify the entity that was contained in each tweet, we generate the set of keywords, as shown in Table 2, for the particular dataset entities (‘Trump’ and ‘Biden’). The keywords are being searched in lower case in order to avoid any misspellings and use upper-lower case writing style. We are matching those keywords in each tweet from the Twitter dataset and in each comment in the YouTube dataset, to identify whether the entity was used on the specific tweet/comment text. The entities of J. Biden and D. Trump were identified with a computed set of keywords (including the hashtags enriched with the candidates’ names and the parties they are representing). For example hashtag #VoteBlue, as a text does not contain the word Biden but is associated with J. Biden campaign. We manage to recognize it because of our keywords shown in the Table 2. For this reason, we use as keywords the hashtags correlated with each candidate. This approach increases the accuracy of the association of a particular tweet/user with the described entities.

Table 2. The complete list of all entities with the corresponding keywords that were used for each one.

Entity Key words
Trump trump, donald, donaldtrump, trump2020, votetrumpout, trumpislosing, dumptrump, nevertrump, republican
Biden biden, joseph, bluewave2020, votebluetosaveamerica, voteblue, ridinwithbiden, neverbiden, democrat

Each time the keyword is found in a particular tweet or comment, we assign the sentiment values on those entities to the user who posted this particular tweet. These entity sentiments are assigned to the users daily, since we are interested in the identification of the user/community dynamically and present how the entity sentiment evolves day by day.

4 Results

4.1 Volume analysis

In this Section, we include several volume measures derived from our dataset. Initially, we perform a volume analysis on the whole corpus of the tweets. In Fig 3 we plot the cumulative distribution of the daily tweets.

Fig 3. CDF distribution of tweets per day.

Fig 3

In Fig 1 we plot the number of daily tweets and video comments, where we notice the diurnal increases of posts and the massive increase of the number of tweets, on the day of the elections. In Fig 4 we can see the cumulative distribution of the daily tweets per hashtag, where we notice the prevailing hashtags of #Debate2020, #Trump, #BidenHarris2020, #VoteBluetoSaveAmerica and #Biden.

Fig 4. CDF distribution of the daily tweets per hashtag.

Fig 4

We plot the daily active users per entity, for both Twitter and YouTube, in Fig 5 for each entity of ‘Trump’ and ‘Biden’ and we notice that the total number of users is increasing from July to November, the top hashtag is the #Debate 2020 and that the users posting on Twitter and commenting on YouTube for Trump, exceed the users for Biden. The apogee of user activities is presented on the dates of the 29th of September and on the 2nd of October. These user activity anomalies are explained in more detail in Section: 4.4, where the increased interest of users is correlated with election events. We also notice that the total Twitter traffic overcomes the total YouTube traffic.

Fig 5. In this figure, we plot the number of daily users that post YouTube comments and tweets where particular entities are discussed.

Fig 5

4.2 The retweet graph

In Figs 611 we plot the retweet graph. This graph is developed based on the retweet relationships between all users in the collected dataset. We represent the retweet relation as a directed edge between two users (nodes). In those directed relations the destination node is the original user who posted the tweet and the source node is the user who actually retweeted that particular tweet. Such representation allows the identification of the user communities that share the same information source. To show how tight the relations are, we assign the number of seen retweets for a particular destination node, as an edge weight. We also perform filtering, by removing the non-significant edges, with weights less than 8. The filtering procedure reduces the noise volume of the non-significant relations and also reduces the number of edges that are not manageable for graph visualization. Before filtering, our retweet graph consisted of 2.874.090 nodes and 9.993.122 edges and after the filtering of the non-significant edges, we reduced the number of nodes to 56.853 and the number of edges to 86.668. For the entity visualization, we use 2 colors for our entities; in red we represent the entity of Trump, and in blue the entity of Biden.

Fig 6. Retweet graph relation between users in our dataset.

Fig 6

Colours represent entity activity by users. Red colour entity Trump and blue color entity Biden.

Fig 11. Retweet graph relation between users in our dataset.

Fig 11

Colours represent entity activity by users. Red colour entity Trump and blue color entity Biden.

Fig 7. Retweet graph relation between users in our dataset.

Fig 7

Colours represent entity activity by users. Red colour entity Trump and blue color entity Biden.

Fig 8. Retweet graph relation between users in our dataset.

Fig 8

Colours represent entity activity by users. Red colour entity Trump and blue color entity Biden.

Fig 9. Retweet graph relation between users in our dataset.

Fig 9

Colours represent entity activity by users. Red colour entity Trump and blue color entity Biden.

We apply sentiment analysis for each day and we measure the number of tweets a user posts containing a particular entity. We use this counter to provide coloring of the user node by selecting the most popular entity of the user at each particular date. We also develop a plot (Fig 5) with the volume of users that allows us to compare the daily volume per entity. Users that don’t use any entities in their tweets remain without a particular coloring.

The graph plots in Figs 611 were generated with Gephi Furchterman Reingold layout [124], while we export Gephi generated positions and we use them to generate a daily graph with networkX python library [125]. In Figs 611 we notice the largest connected component acquires color in the period closer to the elections, while other cores are formed and the mentions of the two main candidates increases or in other periods decrease potentially influenced by real events like the debate (see Section 4.4). We also show how the discussion around those entities obtains more resonance in the period closer to the elections. Also, in Tables 3 and 4 we show the highest retweeted users, with the highest numbers of in-degree and out-degree respectively, with anonymized usernames.

Table 3. Top retweeted users (highest indegree).

User Number of Followings Number of Followers
User 1 1.03K 826.4K
User 2 5.1K 2.7M
User 3 97 248.3K

Table 4. Top retweeted users (highest out degree).

User Number of Followings Number of Followers
User 1 4.7K 5K
User 2 3.6K 3.6K
User 3 5K 8.1K

The retweet graphs show the volume of the tweets and the relation between the nodes which represent the users. This apposition of the graphs through the whole pre-election period, demonstrates how the volume of the communication and interaction between the involved users evolves. As we expected, the density of the graph is increasing, which means the volume of the tweets increases, and the discourse is getting more intense, as we approach the period close to the elections.

The discussion around each candidate evolves through time as well. The corresponding hashtag for each candidate is mentioned as long as the electorate references them in the discourse. Of course, this does not necessarily mean a preference towards Trump or Biden. As mentioned in section 2, this analysis does not focus on the prediction of the preference of the electorate or the outcome of the elections; we expect to see how users engage in the online conversation. As we get closer to the election date, we notice some components being formed in the graph that show the conversations about each candidate and the conversation about both of them in Fig 11 in the last bottom figure (Fig 11).

Additionally, we see two main graph components throughout the whole period that change the colour according to the daily events. For example, in the button left subgraph (Fig 10) we notice that the colour is red in both components, while the button right Fig 11 one is painted blue (from J. Biden). This happens because the left graph highlights the fact that Trump was infected with COVID-19 on 2/10, and the conversation about Trump was more intense and increased in terms of tweets. Also in the middle right plot we show the day after the debate, whose mixed red-blue colour makes sense if we consider the conversations this particular debate initiated. The colours as expected seem to resume back to normal on the last bottom right (3/11, which corresponds to the main two components blue and red, one for each candidate, since on this date none of the candidates seem to attract any peculiar attention.

Fig 10. Retweet graph relation between users in our dataset.

Fig 10

Colours represent entity activity by users. Red colour entity Trump and blue color entity Biden.

4.3 Sentiment analysis

In this Section, we present the results from the sentiment analysis in our corpus, as described in 3.2. In Fig 12 we present the daily average sentiment for the entity ‘Biden’ and ‘Trump’. The solid line is the average sentiment on YouTube comments and the dotted line is the average sentiment in the corpus of the tweets. Below zero we have the negative sentiment and above zero the positive sentiment for each social media. In Fig 13, we plot the daily overall sentiment for entity Biden and ‘Trump’.

Fig 12. Result of daily average sentiment per entity for Twitter and YouTube collected dataset.

Fig 12

Fig 13. Result of overall sentiment analysis for Twitter and YouTube.

Fig 13

Particular sentiment values present the overall sentiment that was posted by users at given dates.

Additionally, in Fig 14 we plot the positive sentiment time series for the two sets of hashtags for each state. In order to identify the location of the user accounts, we extract this information from the corresponding field named ‘location’ in the Twitter user object. Additionally, we append this information with the location field in the tweet object of the Twitter API. We acknowledge the fact that the first field may be not updated by the user or the second field could be missing because the user does not share the location. This means that the location for many users could be missing. Nevertheless, this is the closest information we have from Twitter about the user location. We notice the daily fluctuations for every state per entity (blue is the entity ‘Biden’ and red is for ‘Trump’). The juxtaposition of the time series in the form resembling an EEG makes it easier to discern localized events from nationwide Twitter traffic. The list of state abbreviations can be found here: [126].

Fig 14. Positive sentiment time series for the two sets of hashtags for each state.

Fig 14

4.4 Event consequences

Since our work is based on the analysis of the 2020 US Presidential elections we monitor real-world events that may trigger significant user interest on social media. Interesting examples of such events are the candidates’ debate on TV (September 29 and October 22) and the date when President Donald Trump was diagnosed positively for COVID-19 (October 2). The depiction of the online conversations regarding these events is visible within our analysis. Analyzing the user engagement of these specific periods, one day before and one day after, shows whether such real-world events are connected in the virtual world of social networks.

We use sentiment analysis on these specific time points of our dataset timeline to allow us to identify how the social media users react to those occasions, identify the fluctuations of the sentiment and measure the volume around each entity topic.

Our results are presented in Figs 15 and 16, where it is noticeable that the first debate and Donald Trump COVID-19 announcement events generated a high volume of user interest on social media. In the first case (debates) both entities are soared, in comparison with previous dates. At the second event, the volume of the entity ‘Trump’ is taking first place during the user discussions on social media by increasing the volume of tweets for the entity ‘Trump’ and by presenting high dissonance on sentiment values.

Fig 15. Summary of user tweets sentiment values per each entity followed by the number of tweets.

Fig 15

In this plot, we mention 3 important dates in the dataset, where 29/09 is the date of the first debate, 2/10 the date when Donald Trump was tested positive for COVID-19, and 22/10 the date of the last election debate.

Fig 16. Normalized sentiment values of user tweet per each entity followed by the number of tweets.

Fig 16

In this plot, we mention 3 important dates in the dataset, where 29/09 is the date of the first debate, 2/10 the date when Donald Trump was tested positive to COVID-19, and 22/10 the date of the last election debate.

5 Linking Twitter and YouTube data

In this section, we explore the differences in the discussion and the community between the two social networks. We perform Louvain community detection on both social graphs, we associate the communities in the YouTube comment graph with the communities in the Twitter retweet graph and measure their similarity and differences. Fig 17 shows the 3-core of the YouTube comment graph, Fig 18 shows the 3-core of the Retweet graph. In both graphs, we have color-coded communities and labeled the top-Pagerank nodes of each community.

Fig 17. The 3-core of the YouTube comment graph, color-coded by the community.

Fig 17

Fig 18. The 3-core of the retweet graph, color-coded by community.

Fig 18

We used Gephi [124] for the analysis and visualization of Figs 17 and 18.

Fig 17 illustrates the 3-core YouTube comment graph that represents the relation of which channel has commented on which channel. Different colored areas (purple, light grey, green blue) in the graph represent different communities that are formed between the YouTube channels, produced by the Louvain algorithm [24]. For example purple indicates the community between the “Blaze TV”, “The Officer Tatum” and the “Fox News”. However “Fox News” is also between the purple and the grey, which shows the next community (formed by “Fox News” and “Donald J Trump”). These communities do not necessarily correspond to a real life community; they are a result of the Louvain algorithm of Gephi. Regarding the labels, in this figure, we also show the 13 most popular channels after running PageRank.

Similarly, in plot 18 we show the 3-core Retweet graph that represents the relation between users that have retweeted. In the graph we notice seven colored areas of different sizes, indicating different communities, that as in plot 17, do not necessarily correspond to a real life community. Also, the text labels shown, for example, “MaryLTrump” and “DrDenaGrayson” seem to be close, which again cannot be translated to semantic information. It is considered coincidental to be close but not coincidental to be in the same color-area (community).

In both plots (node diagrams Figs 17 and 18) it is not possible to keep readable names of each particular node, through the high density of nodes. The nodes in Fig 18 are the users on Twitter who have posted the tweets included in this plot and similarly, the nodes in Fig 17 are the users of YouTube which are channels. In order to highlight only the important nodes, we manage to provide the names for the most significant nodes based on the PageRank score.

For illustration purposes, nodes in both plots (17,18) with high degrees are shaped like circles (lightly highlighted circle-like scheme) in this plot, but are variables in Gephi that unfortunately cannot be explained semantically in this context and should be ignored.

Fig 19 shows the interactions between the 3 largest YouTube communities (top half) in the YouTube-comment graph (YT) and the 6 largest Twitter communities in the Retweet graph (RT). Each community is named after its highest PageRank member (or the second highest, when more clear) in the corresponding graph. The size of each relation depicts the number of users in the RT community, that tweeted video URLs from any channel in the YT community. As expected, communities affiliated with the Trump campaign on Twitter, mostly post video links from channels in the Fox News YouTube community, whereas members of the Twitter community mostly affiliated with the Biden campaign post mostly video links from YouTube channels in the CNN YouTube community.

Fig 19. Interactions between largest communities on YouTube and Twitter.

Fig 19

One Twitter community that seems to follow an unexpected pattern is mostly formed by Arabic speakers who, in this setting, tend to use Twitter more as a news media and less as a social network for disseminating political opinions. This observation supports linking the Twitter Retweet graph with the YouTube comment graph using tweets of posted videos.

6 Conclusions

The purpose of this study is to shed light on the online discourse on social media, happening during the US presidential elections of November 2020. We focus on two main social media Twitter and YouTube. This work having obtained the tweets for the most popular hashtags regarding the US elections 2020, as well as the extracted unique YouTube videos, performs an analysis to study the connection between these two social networks, by following a series of steps including: volume analysis of tweets and users, identification of entities, and correlation between the features of the YouTube videos. Next, we apply sentiment analysis on the Twitter corpus and the YouTube metadata and show that the positive sentiment is higher for Donald Trump in comparison with Joe Biden. We identify how real-world events trigger user discussions on social media around the elections.

The next step is to study the Retweet graph across six different time points on the dataset, from July to September 2020, highlight the two main entities (‘Biden’ and ‘Trump’) and show how the main connected component becomes denser through time. Finally, the results from the previous steps of the analysis allow us to link the Twitter Retweet graph with the YouTube comment graph and show the interactions between the 3 largest YouTube communities, in the YouTube-comment graph and the 6 largest Twitter communities in the Retweet graph (RT). We include sarcasm detection in our future plans, since it will require a crowd-sourcing technique after the election period, in order to form an adequate ground truth dataset for the training process.

Supporting information

S1 Appendix. List of all hashtags.

(PDF)

S1 Text

(TXT)

S1 Data

(XLSX)

Data Availability

In order to obtain the dataset used for the analysis described in this study, we follow the Twitter API restrictions and do not violate any terms from Twitter Developer Agreement and Policy. According to Twitter Policy, we are not allowed to share the entire dataset, but only 100K user IDs. This dataset is available here: https://zenodo.org/record/4618233#.YGGJU2Qzada. The access is open and no approval is required. We provide the directed retweet graph from the Twitter network, all user IDs from the provided retweet graph (89.479 users), all video IDs (vid) extracted from the election related tweets (39.203 video ids) and the directed comment graph.

Funding Statement

This document is the results of the research project co-funded by the European Commission, project CONCORDIA, with grant number 830927 (EUROPEAN COMMISSION Directorate-General Communications Networks, Content and Technology) and by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH - CREATE - INNOVATE (project ode:T1EDK-02857 and T1EDK-01800). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Howard PN, Woolley S, Calo R. Algorithms, bots, and political communication in the US 2016 election: The challenge of automated political communication for election law and administration. Journal of information technology & politics. 2018;15(2):81–93. doi: 10.1080/19331681.2018.1448735 [DOI] [Google Scholar]
  • 2.Baumgartner JC, Mackay JB, Morris JS, Otenyo EE, Powell L, Smith MM, et al. Communicator-in-chief: How Barack Obama used new media technology to win the White House. Lexington Books; 2010.
  • 3. Antonakaki D, Spiliotopoulos D, Samaras V C, Pratikakis P, Ioannidis S, Fragopoulou P. Social media analysis during political turbulence. PloS one. 2017;12(10):e0186836. doi: 10.1371/journal.pone.0186836 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Yaqub U, Chun SA, Atluri V, Vaidya J. Analysis of political discourse on twitter in the context of the 2016 US presidential elections. Government Information Quarterly. 2017;34(4):613–626. doi: 10.1016/j.giq.2017.11.001 [DOI] [Google Scholar]
  • 5. Enli G. Twitter as arena for the authentic outsider: exploring the social media campaigns of Trump and Clinton in the 2016 US presidential election. European Journal of Communication. 2017;32(1):50–61. doi: 10.1177/0267323116682802 [DOI] [Google Scholar]
  • 6. Gong Z, Cai T, Thill JC, Hale S, Graham M. Measuring relative opinion from location-based social media: A case study of the 2016 US presidential election. Plos one. 2020;15(5):e0233660. doi: 10.1371/journal.pone.0233660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Huang M, Zou G, Zhang B, Liu Y, Gu Y, Jiang K. Overlapping community detection in heterogeneous social networks via the user model. Information Sciences. 2018;432:164–184. doi: 10.1016/j.ins.2017.11.001 [DOI] [Google Scholar]
  • 8. Golovchenko Y, Buntain C, Eady G, Brown MA, Tucker JA. Cross-platform state propaganda: Russian trolls on twitter and youtube during the 2016 US presidential election. The International Journal of Press/Politics. 2020;25(3):357–389. doi: 10.1177/1940161220912682 [DOI] [Google Scholar]
  • 9.Faralli S, Stilo G, Velardi P. Large scale homophily analysis in twitter using a twixonomy. In: Twenty-Fourth International Joint Conference on Artificial Intelligence; 2015.
  • 10. Hashtag homophily in twitter network: Examining a controversial cause-related marketing campaign. Computers in Human Behavior. 2020;102:87–96. doi: 10.1016/j.chb.2019.08.006 [DOI] [Google Scholar]
  • 11.Kang JH, Lerman K. Using lists to measure homophily on twitter. In: Workshops at the twenty-sixth AAAI conference on artificial intelligence; 2012.
  • 12.Barbieri N, Bonchi F, Manco G. Who to follow and why: link prediction with explanations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014. p. 1266–1275.
  • 13. Colleoni E, Rozza A, Arvidsson A. Echo chamber or public sphere? Predicting political orientation and measuring political homophily in Twitter using big data. Journal of communication. 2014;64(2):317–332. doi: 10.1111/jcom.12084 [DOI] [Google Scholar]
  • 14. Himelboim I, Sweetser KD, Tinkham SF, Cameron K, Danelo M, West K. Valence-based homophily on Twitter: Network analysis of emotions and political talk in the 2012 presidential election. New media & society. 2016;18(7):1382–1400. doi: 10.1177/1461444814555096 [DOI] [Google Scholar]
  • 15.Just MR, Crigler AN, Metaxas P, Mustafaraj E.’It’s Trending on Twitter’-An Analysis of the Twitter Manipulations in the Massachusetts 2010 Special Senate Election. In: APSA 2012 Annual Meeting Paper; 2012.
  • 16. Guo L, Rohde A J, Wu HD. Who is responsible for Twitter’s echo chamber problem? Evidence from 2016 US election networks. Information, Communication & Society. 2020;23(2):234–251. doi: 10.1080/1369118X.2018.1499793 [DOI] [Google Scholar]
  • 17. Vergeer M. Twitter and political campaigning. Sociology compass. 2015;9(9):745–760. doi: 10.1111/soc4.12294 [DOI] [Google Scholar]
  • 18. Plotkowiak T, Stanoevska-Slabeva K. German politicians and their Twitter networks in the Bundestag election 2009. First Monday. 2013;. doi: 10.5210/fm.v18i5.3816 [DOI] [Google Scholar]
  • 19. Nahon K. Where there is social media there is politics. In: The Routledge companion to social media and politics. Routledge; 2015. p. 39–55. [Google Scholar]
  • 20. Vaccari C, Valeriani A, Barberá P, Jost JT, Nagler J, Tucker JA. Of echo chambers and contrarian clubs: Exposure to political disagreement among German and Italian users of Twitter. Social media+ society. 2016;2(3):2056305116664221. [Google Scholar]
  • 21. Rosenbusch H, Evans AM, Zeelenberg M. Multilevel emotion transfer on YouTube: Disentangling the effects of emotional contagion and homophily on video audiences. Social Psychological and Personality Science. 2019;10(8):1028–1035. doi: 10.1177/1948550618820309 [DOI] [Google Scholar]
  • 22. Ladhari R, Massa E, Skandrani H. YouTube vloggers’ popularity and influence: The roles of homophily, emotional attachment, and expertise. Journal of Retailing and Consumer Services. 2020;54:102027. doi: 10.1016/j.jretconser.2019.102027 [DOI] [Google Scholar]
  • 23.Wattenhofer M, Wattenhofer R, Zhu Z. The YouTube social network. In: Sixth international AAAI conference on weblogs and social media; 2012.
  • 24. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008. doi: 10.1088/1742-5468/2008/10/P10008 [DOI] [Google Scholar]
  • 25. Martínez-Cámara E, Martín-Valdivia MT, Urena-López LA, Montejo-Ráez AR. Sentiment analysis in Twitter. Natural Language Engineering. 2014;20(1):1–28. doi: 10.1017/S1351324912000332 [DOI] [Google Scholar]
  • 26. Giachanou A, Crestani F. Like it or not: A survey of twitter sentiment analysis methods. ACM Computing Surveys (CSUR). 2016;49(2):1–41. doi: 10.1145/2938640 [DOI] [Google Scholar]
  • 27. Go A, Huang L, Bhayani R. Twitter sentiment analysis. Entropy. 2009;17:252. [Google Scholar]
  • 28.Mittal A, Goel A. Stock prediction using twitter sentiment analysis. Standford University, CS229 (2011 http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf). 2012;15:2352.
  • 29.Saif H, He Y, Alani H. Alleviating data sparsity for twitter sentiment analysis. In: CEUR Workshop proceedings. CEUR Workshop Proceedings (CEUR-WS. org). Lyon, France.: CEUR; 2012. p. 297–312.
  • 30.Wang H, Can D, Kazemzadeh A, Bar F, Narayanan S. A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In: Proceedings of the ACL 2012 system demonstrations. Jeju Island, Korea: ACL; 2012. p. 115–120.
  • 31.Diakopoulos NA, Shamma DA. Characterizing debate performance via aggregated twitter sentiment. In: Proceedings of the SIGCHI conference on human factors in computing systems; 2010. p. 1195–1198.
  • 32. Daniel M, Neves RF, Horta N. Company event popularity for financial markets using Twitter and sentiment analysis. Expert Systems with Applications. 2017;71:111–124. doi: 10.1016/j.eswa.2016.11.022 [DOI] [Google Scholar]
  • 33.Mukherjee S, Bhattacharyya P. Feature specific sentiment analysis for product reviews. In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer; 2012. p. 475–487.
  • 34.Bollen J, Pepe A, Mao H. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. arXiv preprint arXiv:09111583. 2009;.
  • 35. Gayo-Avello D. A meta-analysis of state-of-the-art electoral prediction from Twitter data. Social Science Computer Review. 2013;31(6):649–679. doi: 10.1177/0894439313493979 [DOI] [Google Scholar]
  • 36.Gayo-Avello D, Metaxas P, Mustafaraj E. Limits of Electoral Predictions Using Twitter. In: -; 2011. p. 00.
  • 37. Aparaschivei PA, et al. The use of new media in electoral campaigns: Analysis on the use of blogs, Facebook, Twitter and YouTube in the 2009 Romanian presidential campaign. Journal of Media Research-Revista de Studii Media. 2011;4(10):39–60. [Google Scholar]
  • 38. Effing R, van Hillegersberg J, Huibers T. Social media indicator and local elections in The Netherlands: Towards a framework for evaluating the influence of Twitter, YouTube, and Facebook. In: Social media and local governments. Springer; 2016. p. 281–298. [Google Scholar]
  • 39. Vesnic-Alujevic L, Van Bauwel S. YouTube: A political advertising tool? A case study of the use of YouTube in the campaign for the European Parliament elections. Journal of Political Marketing. 2014;13(3):195–212. doi: 10.1080/15377857.2014.929886 [DOI] [Google Scholar]
  • 40. Baviera T, Sampietro A, García-Ull FJ. Political conversations on Twitter in a disruptive scenario: The role of “party evangelists” during the 2015 Spanish general elections. The Communication Review. 2019;22(2):117–138. doi: 10.1080/10714421.2019.1599642 [DOI] [Google Scholar]
  • 41.Panizo-LLedot A, Torregrosa J, Bello-Orgaz G, Thorburn J, Camacho D. Describing alt-right communities and their discourse on twitter during the 2018 us mid-term elections. In: International conference on complex networks and their applications. Springer; 2019. p. 427–439.
  • 42. Hanusch F, Nölleke D. Journalistic Homophily on Social Media: Exploring journalists’ interactions with each other on Twitter. Digital Journalism. 2018. 02;7:1–23. [Google Scholar]
  • 43. Park SJ, Lim YS, Park HW. Comparing Twitter and YouTube networks in information diffusion: The case of the “Occupy Wall Street” movement. Technological forecasting and social change. 2015;95:208–217. doi: 10.1016/j.techfore.2015.02.003 [DOI] [Google Scholar]
  • 44. Burgess J, Matamoros-Fernández A. Mapping sociocultural controversies across digital media platforms: One week of# gamergate on Twitter, YouTube, and Tumblr. Communication Research and Practice. 2016;2(1):79–96. doi: 10.1080/22041451.2016.1155338 [DOI] [Google Scholar]
  • 45.Wu Z, Yin W, Cao J, Xu G, Cuzzocrea A. Community detection in multi-relational social networks. In: International Conference on Web Information Systems Engineering. Springer; 2013. p. 43–56.
  • 46. Caldarelli G, Chessa A, Pammolli F, Pompa G, Puliga M, Riccaboni M, et al. A multi-level geographical study of Italian political elections from Twitter data. PloS one. 2014;9(5):e95809. doi: 10.1371/journal.pone.0095809 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Eom YH, Puliga M, Smailović J, Mozetič I, Caldarelli G. Twitter-based analysis of the dynamics of collective attention to political parties. PloS one. 2015;10(7):e0131184. doi: 10.1371/journal.pone.0131184 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Tumasjan A, Sprenger T, Sandner P, Welpe I. Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 4; 2010. p. 22.
  • 49.Pennacchiotti M, Popescu AM. A machine learning approach to twitter user classification. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 5; 2011. p. 560–734.
  • 50.Pennacchiotti M, Popescu AM. Democrats, republicans and starbucks afficionados: user classification in twitter. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining; 2011. p. 430–438.
  • 51. Barbera P, Casas A, Nagler J, Egan P, Bonneau R, Jost J, et al. Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data. American Political Science Review. 2019. 07;113. doi: 10.1017/S0003055419000352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Zhang Y, Wells C, Wang S, Rohe K. Attention and amplification in the hybrid media system: The composition and activity of Donald Trump’s Twitter following during the 2016 presidential election. New Media & Society. 2018;20(9):3161–3182. doi: 10.1177/1461444817744390 [DOI] [Google Scholar]
  • 53. Caetano JA, Lima HS, Santos MF, Marques-Neto HT. Using sentiment analysis to define twitter political users’ classes and their homophily during the 2016 American presidential election. Journal of internet services and applications. 2018;9(1):1–15. doi: 10.1186/s13174-018-0089-0 [DOI] [Google Scholar]
  • 54. Halberstam Y, Knight B. Homophily, group size, and the diffusion of political information in social networks: Evidence from Twitter. Journal of public economics. 2016;143:73–88. doi: 10.1016/j.jpubeco.2016.08.011 [DOI] [Google Scholar]
  • 55.Dokoohaki N, Zikou F, Gillblad D, Matskin M. Predicting swedish elections with twitter: A case for stochastic link structure analysis. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015; 2015. p. 1269–1276.
  • 56. Chamberlain JM, Spezzano F, Kettler JJ, Dit B. A Network Analysis of Twitter Interactions by Members of the US Congress. ACM Transactions on Social Computing. 2021;4(1):1–22. doi: 10.1145/3439827 [DOI] [Google Scholar]
  • 57. Guerrero-Solé F. Community detection in political discussions on Twitter: An application of the retweet overlap network method to the Catalan process toward independence. Social science computer review. 2017;35(2):244–261. doi: 10.1177/0894439315617254 [DOI] [Google Scholar]
  • 58. Baviera T. Influence in the political Twitter sphere: Authority and retransmission in the 2015 and 2016 Spanish General Elections. European journal of communication. 2018;33(3):321–337. doi: 10.1177/0267323118763910 [DOI] [Google Scholar]
  • 59. Davidson I, Gourru A, Velcin J, Wu Y. Behavioral differences: insights, explanations and comparisons of French and US Twitter usage during elections. Social Network Analysis and Mining. 2020;10(1):1–27. doi: 10.1007/s13278-019-0611-9 [DOI] [Google Scholar]
  • 60. Aragon P, Kappler KE, Kaltenbrunner A, Laniado D, Volkovich Y. Communication dynamics in twitter during political campaigns: The case of the 2011 Spanish national election. Policy & internet. 2013;5(2):183–206. doi: 10.1002/1944-2866.POI327 [DOI] [Google Scholar]
  • 61. D’heer E, Verdegem P. Conversations about the elections on Twitter: Towards a structural understanding of Twitter’s relation with the political and the media field. European journal of communication. 2014;29(6):720–734. doi: 10.1177/0267323114544866 [DOI] [Google Scholar]
  • 62. Bekafigo MA, McBride A. Who tweets about politics? Political participation of Twitter users during the 2011gubernatorial elections. Social Science Computer Review. 2013;31(5):625–643. doi: 10.1177/0894439313490405 [DOI] [Google Scholar]
  • 63. Jungherr A, Jürgens P, Schoen H. Why the pirate party won the german election of 2009 or the trouble with predictions: A response to tumasjan, a., sprenger, to, sander, pg, & welpe, im “predicting elections with twitter: What 140 characters reveal about political sentiment”. Social science computer review. 2012;30(2):229–234. [Google Scholar]
  • 64. O’Connor B, Balasubramanyan R, Routledge BR, Smith NA. From tweets to polls: Linking text sentiment to public opinion time series. Tepper School of Business. 2010;344:559. [Google Scholar]
  • 65.Antonakaki D, Spiliotopoulos D, Samaras CV, Ioannidis S, Fragopoulou P. Investigating the complete corpus of referendum and elections tweets. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE. San Francisco, CA, USA: IEEE; 2016. p. 100–105.
  • 66. Conover MD, Ratkiewicz J, Francisco MR, Gonçalves B, Menczer F, Flammini A. Political polarization on twitter. Icwsm. 2011;133(26):89–96. [Google Scholar]
  • 67.Weber I, Garimella VRK, Batayneh A. Secular vs. islamist polarization in egypt on twitter. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. Niagara Ontario Canada: IEEE; 2013. p. 290–297.
  • 68. Morales AJ, Borondo J, Losada JC, Benito RM. Measuring political polarization: Twitter shows the two sides of Venezuela. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2015;25(3):033114. doi: 10.1063/1.4913758 [DOI] [PubMed] [Google Scholar]
  • 69. Christensen C. WAVE-RIDING AND HASHTAG-JUMPING. Information, Communication & Society. 2013;16(5):646–666. doi: 10.1080/1369118X.2013.783609 [DOI] [Google Scholar]
  • 70.Bakshi RK, Kaur N, Kaur R, Kaur G. Opinion mining and sentiment analysis. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE. New Delhi, Andaman and Nicobar Islands, India: IEEE; 2016. p. 452–455.
  • 71. Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems. 2015;89:14–46. doi: 10.1016/j.knosys.2015.06.015 [DOI] [Google Scholar]
  • 72. Serrano-Guerrero J, Olivas JA, Romero FP, Herrera-Viedma E. Sentiment analysis: A review and comparative analysis of web services. Information Sciences. 2015;311:18–38. doi: 10.1016/j.ins.2015.03.040 [DOI] [Google Scholar]
  • 73. Antonakaki D, Fragopoulou P, Ioannidis S. A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications. 2020;164:114006. doi: 10.1016/j.eswa.2020.114006 [DOI] [Google Scholar]
  • 74.Wehrmann J, Becker W, Cagnini HE, Barros RC. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE. Anchorage, Alaska: IEEE; 2017. p. 2384–2391.
  • 75. Narr S, Hulfenhaus M, Albayrak S. Language-independent twitter sentiment analysis. Knowledge discovery and machine learning (KDML), LWA. 2012;89898:12–14. [Google Scholar]
  • 76.Davies A, Ghahramani Z. Language-independent Bayesian sentiment mining of Twitter. In: The 5th SNA-KDD Workshop’11 (SNA-KDD’11). University of California: SNA-KDD; 2011. p. 56–58.
  • 77.Guthier B, Ho K, Saddik AE. Language-independent data set annotation for machine learning-based sentiment analysis. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Banff, AB, Canada: SMC; 2017. p. 2105–2110.
  • 78.Saroufim C, Almatarky A, Hady MA. Language independent sentiment analysis with sentiment-specific word embeddings. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Brussels, Belgium: ACL; 2018. p. 14–23.
  • 79.Ptáček T, Habernal I, Hong J. Sarcasm detection on czech and english twitter. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland: ACL; 2014. p. 213–223.
  • 80.Zhang S, Zhang X, Chan J. A Word-Character Convolutional Neural Network for Language-Agnostic Twitter Sentiment Analysis. In: Proceedings of the 22nd Australasian Document Computing Symposium. ADCS 2017. New York, NY, USA: Association for Computing Machinery; 2017. p. 00. Available from: 10.1145/3166072.3166082. [DOI]
  • 81.Severyn A, Moschitti A. Twitter Sentiment Analysis with Deep Convolutional Neural Networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’15. New York, NY, USA: Association for Computing Machinery; 2015. p. 959–962. Available from: 10.1145/2766462.2767830. [DOI]
  • 82. Jianqiang Z, Xiaolin G, Xuejun Z. Deep convolution neural networks for twitter sentiment analysis. IEEE Access. 2018;6:23253–23260. doi: 10.1109/ACCESS.2017.2776930 [DOI] [Google Scholar]
  • 83.You Q, Luo J, Jin H, Yang J. Robust image sentiment analysis using progressively trained and domain transferred deep networks. arXiv preprint arXiv:150906041. 2015;3:270–279.
  • 84.Dos Santos C, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland: COLING; 2014. p. 69–78.
  • 85. Alharbi ASM, de Doncker E. Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information. Cognitive Systems Research. 2019;54:50–61. doi: 10.1016/j.cogsys.2018.10.001 [DOI] [Google Scholar]
  • 86.Severyn A, Moschitti A. Unitn: Training deep convolutional neural network for twitter sentiment classification. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Denver, Colorado: SIGLEX — SIGSEM; 2015. p. 464–469.
  • 87.Liu KL, Li WJ, Guo M. Emoticon smoothed language models for twitter sentiment analysis. In: Aaai. vol. 12. Citeseer. Paris, France: Citeseer; 2012. p. 22–26.
  • 88.Wang H, Castanon JA. Sentiment expression via emoticons on social media. In: 2015 ieee international conference on big data (big data). IEEE. Santa Clara, CA, USA: IEEE; 2015. p. 2404–2408.
  • 89.Zhao J, Dong L, Wu J, Xu K. MoodLens: An Emoticon-Based Sentiment Analysis System for Chinese Tweets. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’12. New York, NY, USA: Association for Computing Machinery; 2012. p. 1528–1531.
  • 90.Yamamoto Y, Kumamoto T, Nadamoto A. Role of Emoticons for Multidimensional Sentiment Analysis of Twitter. In: Proceedings of the 16th International Conference on Information Integration and Web-Based Applications and Services. iiWAS’14. New York, NY, USA: Association for Computing Machinery; 2014. p. 107–115.
  • 91.Kolchyna O, Souza TT, Treleaven P, Aste T. Twitter sentiment analysis: Lexicon method, machine learning method and their combination. arXiv preprint arXiv:150700955. 2015;5656:33–38.
  • 92. Pak A, Paroubek P. Twitter as a corpus for sentiment analysis and opinion mining. In: LREc. vol. 10. Valletta, Malta: LREC; 2010. p. 1320–1326. [Google Scholar]
  • 93. Jianqiang Z, Xiaolin G. Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access. 2017;5:2870–2879. doi: 10.1109/ACCESS.2017.2672677 [DOI] [Google Scholar]
  • 94. Ghiassi M, Lee S. A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach. Expert Systems with Applications. 2018;106:197–216. doi: 10.1016/j.eswa.2018.04.006 [DOI] [Google Scholar]
  • 95. Kouloumpis E, Wilson T, Moore JD. Twitter sentiment analysis: The good the bad and the omg! Icwsm. 2011;11(538-541):164. [Google Scholar]
  • 96. Zhang L, Ghosh R, Dekhil M, Hsu M, Liu B. Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011. 2011;89. [Google Scholar]
  • 97.Smailović J, Kranjc J, Grčar M, Z̎nidaršič M, Mozetič I. Monitoring the Twitter sentiment during the Bulgarian elections. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). Paris: IEEE; 2015. p. 1–10.
  • 98.Singh S, Sikka G. YouTube Sentiment Analysis on US Elections 2020. In: 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC). IEEE; 2021. p. 250–254.
  • 99.Bajaj P, Kavidayal M, Srivastava P, Akhtar MN, Kumaraguru P. Disinformation in multimedia annotation: Misleading metadata detection on YouTube. In: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion; 2016. p. 53–61.
  • 100.O’Callaghan D, Harrigan M, Carthy J, Cunningham P. Network analysis of recurring youtube spam campaigns. arXiv preprint arXiv:12013783. 2012;.
  • 101. Sureka A, Kumaraguru P, Goyal A, Chhabra S. Mining youtube to discover extremist videos, users and hidden communities. In: Asia Information Retrieval Symposium. Springer; 2010. p. 13–24. [Google Scholar]
  • 102.Baluja S, Seth R, Sivakumar D, Jing Y, Yagnik J, Kumar S, et al. Video suggestion and discovery for youtube: taking random walks through the view graph. In: Proceedings of the 17th international conference on World Wide Web; 2008. p. 895–904.
  • 103. Jang HJ, Sim J, Lee Y, Kwon O. Deep sentiment analysis: Mining the causality between personality-value-attitude for analyzing business ads in social media. Expert Systems with applications. 2013;40(18):7492–7503. doi: 10.1016/j.eswa.2013.06.069 [DOI] [Google Scholar]
  • 104. Klotz RJ. The sidetracked 2008 YouTube senate campaign. Journal of Information Technology & Politics. 2010;7(2-3):110–123. doi: 10.1080/19331681003748917 [DOI] [Google Scholar]
  • 105.Ridout TN, Franklin Fowler E, Branstetter J. Political advertising in the 21st century: The rise of the YouTube ad. In: APSA 2010 Annual Meeting Paper; 2010..
  • 106. Severyn A, Moschitti A, Uryupina O, Plank B, Filippova K. Multi-lingual opinion mining on YouTube. Information Processing & Management. 2016;52(1):46–60. doi: 10.1016/j.ipm.2015.03.002 [DOI] [Google Scholar]
  • 107. Susarla A, Oh JH, Tan Y. Social networks and the diffusion of user-generated content: Evidence from YouTube. Information Systems Research. 2012;23(1):23–41. doi: 10.1287/isre.1100.0339 [DOI] [Google Scholar]
  • 108.Wang X, Wei F, Liu X, Zhou M, Zhang M. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In: Proceedings of the 20th ACM international conference on Information and knowledge management; 2011. p. 1031–1040.
  • 109. Thelwall M, Sud P, Vis F. Commenting on YouTube videos: From Guatemalan rock to el big bang. Journal of the American Society for Information Science and Technology. 2012;63(3):616–629. doi: 10.1002/asi.21679 [DOI] [Google Scholar]
  • 110. Lindgren S. ‘It took me about half an hour, but I did it!’ Media circuits and affinity spaces around how-to videos on YouTube. European Journal of Communication. 2012;27(2):152–170. doi: 10.1177/0267323112443461 [DOI] [Google Scholar]
  • 111.Krishna A, Zambreno J, Krishnan S. Polarity Trend Analysis of Public Sentiment on YouTube. In: Proceedings of the 19th International Conference on Management of Data. COMAD’13. Mumbai, Maharashtra, IND: Computer Society of India; 2013. p. 125–128.
  • 112. Poria S, Cambria E, Howard N, Huang GB, Hussain A. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. 2016;174:50–59. doi: 10.1016/j.neucom.2015.01.095 [DOI] [Google Scholar]
  • 113. Amarasekara I, Grant WJ. Exploring the YouTube science communication gender gap: A sentiment analysis. Public Understanding of Science. 2019;28(1):68–84. doi: 10.1177/0963662518786654 [DOI] [PubMed] [Google Scholar]
  • 114. Preoţiuc-Pietro D, Volkova S, Lampos V, Bachrach Y, Aletras N. Studying user income through language, behaviour and affect in social media. PloS one. 2015;10(9):e0138717. doi: 10.1371/journal.pone.0138717 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Araújo CS, Magno G, Meira W, Almeida V, Hartung P, Doneda D. Characterizing videos, audience and advertising in Youtube channels for kids. In: International Conference on Social Informatics. Springer; 2017. p. 341–359.
  • 116.Wikipedia. 2020 United States presidential debates; 2020 (accessed September 30, 2020). Available from: https://en.wikipedia.org/wiki/2020_United_States_presidential_debates.
  • 117. Pano T, Kashef R. A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19. Big Data and Cognitive Computing. 2020;4(4). Available from: https://www.mdpi.com/2504-2289/4/4/33. doi: 10.3390/bdcc4040033 [DOI] [Google Scholar]
  • 118.Elbagir S, Yang J. Twitter sentiment analysis using natural language toolkit and VADER sentiment. In: Proceedings of the International MultiConference of Engineers and Computer Scientists. vol. 122; 2019. p. 16.
  • 119.Zahoor S, Rohilla R. Twitter Sentiment Analysis Using Lexical or Rule Based Approach: A Case Study. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO); 2020. p. 537–542.
  • 120.Ramteke J, Shah S, Godhia D, Shaikh A. Election result prediction using Twitter sentiment analysis. In: 2016 international conference on inventive computation technologies (ICICT). vol. 1. IEEE; 2016. p. 1–5.
  • 121.Shelar A, Huang CY. Sentiment analysis of twitter data. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE; 2018. p. 1301–1302.
  • 122.Gilbert C, Hutto E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International Conference on Weblogs and Social Media (ICWSM-14). Available at (20/04/16) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf. vol. 81; 2014. p. 82.
  • 123. Bird S, Klein E, Loper E. Natural Language Processing with Python. O’Reilly Media; 2009. [Google Scholar]
  • 124.Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks. In: Third international AAAI conference on weblogs and social media; 2009. p. 00. Available from: http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154.
  • 125.Aric Hagberg DS Pieter Swart. Python networkx library for graph creation/visualization; 2005. Available from: https://networkx.github.io/.
  • 126.Wikipedia. List of U.S. state and territory abbreviations; 2020 (accessed September 29, 2020). Available from: https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations.

Decision Letter 0

Davide Bacciu

3 Mar 2021

PONE-D-20-37937

What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020

PLOS ONE

Dear Dr. Antonakaki,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Apr 08 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Davide Bacciu

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please include additional information about your dataset and ensure that you have included a statement specifying whether the collection method complied with the terms and conditions for the website.

3.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

4. Please ensure that you refer to Figure state pred wave.png in your text as, if accepted, production will need this reference to link the reader to the figure.

Additional Editor Comments:

The paper has been well received and the use of multiple social networks has been appreciated. Methodology appears adequate, though not innovative.

The manuscript requires a round of revision to better tune the claims in the paper and better substantiate the expected contributions with what is realistically demonstrated in the empirical part.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: -------

General summary

-------

The authors analyze the online discourse on the 2020 presidential elections in the United States. After collecting Twitter and YouTube data by supervised procedure, the authors present the results of various data-driven analyzes, including daily traffic, trends of hashtags, volume, sentiment and graph analysis. The work also focuses on studying the sentiment towards the two candidates and the effect of offline events on online social networks.

-------

Main successes and main problems

-------

The analyzes represent a twofold snapshot of a brief and well-defined historical moment at a fine-grained resolution. The work follows a logical and clear procedure, perhaps sometimes a little superficial.

The promises have been fulfilled and the results are what could be expected.

I appreciated the joint use of two different social networks, Twitter and YouTube, to observe differences of discussions and communities. Perhaps a brief consideration of the user bias of the two platforms could be added.

The methodological part reserved for sentiment analysis and community detection consists of the application of methods, i.e., Vader and Louvain, already implemented and documented. Consequently, the methodological contribution is limited.

-------

Major changes

-------

I would suggest to the authors to balance the presentation and contextualization of the arguments in favor of a correct valorisation of the work and distribution of expectations.

Section 3.5 appears not to be adequately supplemented, argued and defined. In addition to the Introduction, I would ask authors to contextualize the topic in Section 1.1. Check terminology.

The state of the art (Section 1.1.) mentions numerous recent articles. However, the literature is limited to sentiment analysis with hints of the consequences of real events on Twitter and YouTube data. I think it is necessary to contextualyze network analysis, e.g., theory/use of (characteristics of) networks and communities.

-------

Minor changes

-------

There are numerous food for thought that could enrich Section 3.2. These include emoticons, irony, swear words, bad language, youth language, and acronyms.

Please remove the claim of precedence from the abstract. In this case, the claim is unnecessary and hard to confirm.

-------

Small corrections

-------

- Capitalize “figure”, “table” and “section” when followed by a reference number, e.g., lines 106 and 155.

- Some missing spaces, e.g., line 71, 208, 203, captions of Tables 3 and 4.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jan 31;18(1):e0270542. doi: 10.1371/journal.pone.0270542.r002

Author response to Decision Letter 0


31 May 2021

The response to the review comments has been included in the rebutal letter, which has been uploaded as a separate file.

Attachment

Submitted filename: response_to_reviewers_round2.pdf

Decision Letter 1

Davide Bacciu

30 Jul 2021

PONE-D-20-37937R1

What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020

PLOS ONE

Dear Dr. Antonakaki,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper has been reviewed by an additional expert who is providing a quite detailed assessment of the key strenghts and weaknesses of the work. These should be carefully taken into consideration by the authors should they decide to send a revised manuscript, as the work will be checked against such open point. Namely, the review highlights that the paper tackles an interesting cross-platform approach and a potentially impactful application to political discourse. The main criticisms are instead directed towards an unsatisfactory consideration of the background work and the lack of a clear, homogenous, and strong articulation of the key points of innovation in the analysis proposed by this paper.

Please submit your revised manuscript by Sep 13 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Davide Bacciu

Academic Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments (if provided):

The paper has been reviewed by an additional expert who is providing a quite detailed assessment of the key strenghts and weaknesses of the work. These should be carefully taken into consideration by the authors should they decide to send a revised manuscript, as the work will be checked against such open point. Namely, the review highlights that the paper tackles an interesting cross-platform approach and a potentially impactful application to political discourse. The main criticisms are instead directed towards an unsatisfactory consideration of the background work and the lack of a clear, homogenous, and strong articulation of the key points of innovation in the analysis proposed by this paper.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: # Summary

This paper presents a primarily descriptive analysis on a pair of datasets collected from Twitter and YouTube during the 2020 US presidential election. Analyses include descriptions of the time series data extracted from both platforms, plots of several retweet-based interaction networks over time, sentiment analysis of messages and comments on Twitter and YouTube respectively, event identification, and Twitter- and YouTube-based community analysis. Results show temporal variation in volume and senitment and suggest some consistency across Twitter and YouTube.

# Strengths

This paper's major strength comes from its potential for cross-platform analysis. Extracting YouTube videos from Twitter content and comparing temporal and sentiment patterns across platforms can provide interesting insight about cross-platform and event-based influence.

# Weaknesses

The most significant weaknesses in this paper are its lack of clear contributions to the field and the (relatedly) limited connection it has with prior work studying political discourse in lead-up to elections on social media. For example, there's a huge body of literature on Twitter and politics in general [1], and the 2016 US presidential election specifically [2-3], so how does that work inform what we might expect to find in Twitter and YouTube during the 2020 election? Are any of the results identified in the four months leading up to Election Day in 2020 surprising given what we know about 2016? Even for 2020, other work has been done on YouTube and sentiment (e.g., Singh and Sikka [4]), so how consistent are these results with those from other papers in this area?

- While the paper's related work section does identify a number of related efforts in this area, more needs to be done to explain what *important* questions this paper answers that have not already been answered. That is, we need to know not just how this paper investigates a slightly different context than the other papers, but how those other papers inform this work. Does this work contradict existing results or expectations? Does the existing body of work contradict itself, and this paper resolves this contradiction? These are crucial questions that need to be answered to connect this work to a particular community and demonstrate what the most important result in this paper really is.

- Related to above, the paper lacks a clear set of hypotheses about what we might expect in this context. This work is not exploratory given the volume of work in this area, and providing a set of hypotheses based on the literature would help us understand the important results in this work.

Beyond this contextualization, data collection and validity of the data collected in this effort are unclear. How exactly was this dataset collected from Twitter? I understand the API was used, but was the streaming API used or the retrospective API? If the collection was built around particular keywords, what were those keywords, and how were they selected? The paper mentions a list of hashtags, but it appears this list is extracted from the collected dataset, not a set of tags used to develop the dataset. We need additional detail about how this dataset was collected to assess what is missing. For example, popular hashtags like #StopTheSteal, #HidingBiden, or other partisan hashtags do not appear in this data.

- To address this issue, the paper should clearly explain exactly how the data was collected and why this timeframe is reasonable.

The paper also claims to present an analysis of sentiment per US state, but no information is provided about *how* this analysis maps messages to US states. Geolocation in social media is a known and open problem, and inclusion of this analysis necessitates a description of how this geolocation is performed.

Other comments:

- The analysis of retweet graphs are primarily qualitative and lack a compelling frame. What did we expect to see here? How do we know the differences observed between the two candidates is meaningful?

- Sentiment analysis in Vader is not specifically built for social media; the paper should spend some effort demonstrating its assessment of sentiment towards the presidential candidates is valid.

- More detail is needed for entity extraction. Is the search for trump/biden a direct token match or a substring match? E.g., would it match a tweet with #TeachersForBiden? Also, for tweets with tags like #VoteBlue, they may be specific to Biden but not directly mention him; how does that potentiality impact analyses?

- In general, node-edge diagrams are not particularly useful visualizations. Additional effort is needed to describe what is interesting about the communities in these figures, what size means, why certain nodes are labeled and others note, etc.

- The paper is missing a "Data Availability Statement" in the text.

# Final Comments

I applaud the authors for engaging with an important set of issues (namely political communications in online spaces), and I especially like the potential for cross-platform analyses and connections.

That said, the paper presents an exploratory and descriptive piece about a topic that has a lot of background, and the paper does not sufficiently engage with this background. Given the volume of work in this area, some of which overlaps with this work, the current version of the paper does not have a clear path toward a major contribution. Instead, it feels like the paper is primarily a collection of different, unconnected analyses that lack clear motivation.

I encourage the authors to continue this work and find ways to build on the prior background in this space.

# Related Work

[1] BARBERÁ, P., CASAS, A., NAGLER, J., EGAN, P., BONNEAU, R., JOST, J., & TUCKER, J. (2019). Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data. American Political Science Review, 113(4), 883-901. doi:10.1017/S0003055419000352

[2] Sahly A, Shao C, Kwon KH. Social Media for Political Campaigns: An Examination of Trump’s and Clinton’s Frame Building and Its Effect on Audience Engagement. Social Media + Society. April 2019. doi:10.1177/2056305119855141

[3] Zhang Y, Wells C, Wang S, Rohe K. Attention and amplification in the hybrid media system: The composition and activity of Donald Trump’s Twitter following during the 2016 presidential election. New Media & Society. 2018;20(9):3161-3182. doi:10.1177/1461444817744390

[4] S. Singh and G. Sikka, "YouTube Sentiment Analysis on US Elections 2020," 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC), 2021, pp. 250-254, doi: 10.1109/ICSCCC51823.2021.9478128.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jan 31;18(1):e0270542. doi: 10.1371/journal.pone.0270542.r004

Author response to Decision Letter 1


12 Oct 2021

Response to the reviewers

Review comment 1) The most significant weaknesses in this paper are its lack of clear contributions to the field and the (relatedly) limited connection it has with prior work studying political discourse in lead-up to elections on social media. For example, there's a huge body of literature on Twitter and politics in general [1], and the 2016 US presidential election specifically [2-3], so how does that work inform what we might expect to find in Twitter and YouTube during the 2020 election? Are any of the results identified in the four months leading up to Election Day in 2020 surprising given what we know about 2016? Even for 2020, other work has been done on YouTube and sentiment (e.g., Singh and Sikka [4]), so how consistent are these results with those from other papers in this area?

Abstract, Introduction and Background work have been reformed and as we have included a lot of cross-platform studies.

We have also included the papers mentioned in the review comment, among other new background work that we have added. [1] is referenced in subsection: “Political content analysis on Twitter”, of section “Background”. [2] is referenced in the first paragraph of “background”, [3] is referenced in the first paragraph of subsection “Political content analysis on Twitter” in “Background” and [4] is referenced in the first paragraph of subsection “Content analysis on YouTube” in “Background”.

Regarding the results in the four months leading to election day in 2020 from our analysis, cannot correlate with previous work done for 2016. The nature of the two analyses are different and the dataset in the previous studies (in 2016) seem to be limited , in terms of number of tweets as well as covering multiple social media. In these terms we cannot perform a comparison between the two studies. More specifically in [1 BARBERÁ, P., CASAS et al. ] they use a dataset of tweets sent by the members of the 113th House and Senate of the US Congress (2013–14), while our dataset contains tweets from the popular hashtags available, which means they could be sent by users (including electorate) and potentially members of the House and Senate. Additionally, they do not include data from other social networks (like YouTube in our study) and the pipeline of the analysis is completely different (e.g. they apply probabilistic model LDA, and we are not applying topic analysis at all).

The same applies for the related studies in 2020 (e.g., Singh and Sikka [4]). Specifically, they are applying an analysis on 200 YouTube comments, while our analysis includes 20M tweets and the comments of 29K YouTube videos. The nature of the two analyses is different in many perspectives, since we apply a comparative analysis between the two social networks (Twitter and Youtube) and sentiment analysis is just a step of the pipeline necessary towards this goal, while authors in [4] are solely focusing on the sentiment analysis of the YouTube comments.

Finally, regarding [2] what they study is more close to our analysis, except that they are comparing Twitter and Facebook, while our analysis focuses on Twitter and YouTube. Also, they focus the emotional frames while we demonstrate how the communities correlate between Twitter and YouTube.

[The last two paragraphs are not included in the manuscript, since it is a response to the review and out of the context of the study].

-Review comment 2) While the paper's related work section does identify a number of related efforts in this area, more needs to be done to explain what *important* questions this paper answers that have not already been answered. That is, we need to know not just how this paper investigates a slightly different context than the other papers, but how those other papers inform this work. Does this work contradict existing results or expectations? Does the existing body of work contradict itself, and this paper resolves this contradiction? These are crucial questions that need to be answered to connect this work to a particular community and demonstrate what the most important result in this paper really is.

Related to above, the paper lacks a clear set of hypotheses about what we might expect in this context. This work is not exploratory given the volume of work in this area, and providing a set of hypotheses based on the literature would help us understand the important results in this work.

We have reformed the paper including the abstract, the introduction, and the background work, highlighting our reshaped contributions that focus on the correlation and interactions between Twitter and YouTube communities, which actually is our novelty.

-Review comment 3) Beyond this contextualization, data collection and validity of the data collected in this effort are unclear. How exactly was this dataset collected from Twitter? I understand the API was used, but was the streaming API used or the retrospective API? If the collection was built around particular keywords, what were those keywords, and how were they selected? The paper mentions a list of hashtags, but it appears this list is extracted from the collected dataset, not a set of tags used to develop the dataset. We need additional detail about how this dataset was collected to assess what is missing. For example, popular hashtags like #StopTheSteal, #HidingBiden, or other partisan hashtags do not appear in this data.

To address this issue, the paper should clearly explain exactly how the data was collected and why this timeframe is reasonable.

We address this comment in the “Dataset” section, where Twitter and YouTube dataset collection procedure is described.

Considering the hashtags used for our analysis, we only show in the manuscript the list of 20 most popular hashtags in our dataset. The entire list contains 585.486 unique entries of hashtags, which is too long to be added to the manuscript and is included in the submission as a separate file (all_hashtags.txt) which contains all the HTs used and the number of tweets corresponding in each HT that we have in our dataset.

We added in the manuscript the following paragraph:

We considered that this date was a reasonable starting point for collecting our dataset since the amount of tweets in the corresponding hashtags we collect begin to accumulate a significant number, as well as the semantic of the content, started to be more relevant to the conversation related to the elections.

Regarding the completeness of the content covered by our dataset, we acknowledge the fact that additional minor hashtags may exist during that period that were not crawled and included in our corpus. We consider our dataset a complete online discourse since we got the majority of the hashtags available in that period before the elections. Additionally, there was an overlap between the hashtags on the election discourse and these hashtags were cross referenced in the tweets. For example, some tweets included the popular hashtags (\\#Vote) and the minor hashtags were also mentioned in the same tweet. This tweet is included in our dataset because of \\#Vote. Additionally, we included only the hashtags that we general and not in favor of a particular candidate. We try to include only two hashtags that are in favor of each candidate, keep a balance between them and not introduce a bias towards one of them. Finally, the main amount of the political conversation was gathered in the popular hashtags and the rest of them do not contain a significant amount of tweets. In the appendix, in figure \\ref{table:allhashtags} we show the list of 20 most popular hashtags in our dataset. The entire list contains 585.486 unique entries of hashtags, which is too long to be added to the manuscript.

We also included in our submission the whole list of HTs crawled for this dataset. [filename : all_hashtags.txt]

-Review comment 4) The paper also claims to present an analysis of sentiment per US state, but no information is provided about *how* this analysis maps messages to US states. Geolocation in social media is a known and open problem, and the inclusion of this analysis necessitates a description of how this geolocation is performed.

This comment is addressed in the “Sentiment Analysis” section.

As it appears in the manuscript:

In order to identify the location of the user accounts, we extract this information from the corresponding field named 'location' in the Twitter user object. Additionally, we append this information with the location field in the tweet object of the Twitter API. We acknowledge the fact that the first field may be not updated by the user or the second field could be missing because the user did not approve to share the location. This means that the location for many users could be missing. Nevertheless, this is the closest information we have from Twitter about the user location.} We notice the daily fluctuations for every state per entity (blue is the entity 'Biden' and red is for 'Trump'). The juxtaposition of the time series in the form resembling an EEG makes it easier to discern localized events from nation-wide Twitter traffic.

The list of state abbreviations can be found here: \\cite{states_abbr}.

-Review comment 5) The analysis of retweet graphs are primarily qualitative and lack a compelling frame. What did we expect to see here? How do we know the differences observed between the two candidates is meaningful?

As written also in the manuscript:

The retweet graphs show the volume of the tweets and the relation between the nodes which represent the users. This apposition of the graphs through the whole pre-elections period, demonstrates how the volume of the communication and interaction between the involved users evolves. The results were expected as we see the density of graph increasing, which means the volume of the tweets increases, and the discourse is getting more intense, as we approach the period close to the elections.

The discussion around each candidate evolves through time as well. The corresponding hashtag for each candidate is mentioned as long as the electorate references them in the discourse. Of course, this does not necessarily means a preference towards Trump or Biden. As mentioned, as well in section \\ref{dataset}, this analysis does not focus on the prediction of the preference of the electorate or the outcome of the elections, we expect to see how the users engage in the online conversation. As we get closer to the election date, we notice some components being formed in the graph that show the conversations about each candidate and the conversation about both of them in figure 11 [or in \\ref{retweet_graph}] in the last bottom sub-graph (f).

Additionally, we see two main graph components throughout the whole period that change colour according to the daily events. For example, in the button left subgraph (e) we notice that the colour is red in both components while the button right figure (f) one is painted blue (from J. Biden) . This happens because the left graph highlights the fact that Trump was infected with COVID-19 on 2/10, and the conversation about Trump was more intense and increased in terms of tweets. Also in the middle right plot we show the day after the debate, which mixed red-blue colour makes sense if we consider the conversations this particular debate initiated. The colours as expected seem to resume back to normal on the last bottom right (3/11, which correspond to the main two components blue and red, one for each candidate, since on this date none of the candidates seem to attract any peculiar attention.

-Review comment 6) Sentiment analysis in Vader is not specifically built for social media; the paper should spend some effort demonstrating its assessment of sentiment towards the presidential candidates is valid.

As justified in the text as well :

There is a plethora of previous studies as mentioned in section \\ref{Background} that analyze the sentiment in Twitter. Vader is broadly used in this domain as shown in \\cite{zahoor2020TWSent, elbagir2019twitter, pano2020Complete, ramteke2016election, shelar2018sentiment, park2018sentiment, mustaqim2020twitter, bose2021survey, al2020suspicious, yaqub2020tweeting, alharbi2019twitter}.

-Review comment 7) More detail is needed for entity extraction. Is the search for trump/biden a direct token match or a substring match? E.g., would it match a tweet with #TeachersForBiden? Also, for tweets with tags like #VoteBlue, they may be specific to Biden but not directly mention him; how does that potentiality impact analyses?

As described in section “The Entities and their sentiment” of the manuscript, the entities of J. Biden and D. Trump were identified with a computed set of keywords (including the hashtags and enriched with the candidates’ names and the parties they are representing). The selected keywords are shown in table ref{table:allentities} [Table 2: The complete list of all entities with the corresponding keywords that were used for each one.] . For example hashtag \\#VoteBlue, as a text does not contain the word Biden, but is associated with J. Biden campaign, and we manage to recognize it because of our keywords shown in table 2.

For this reason, we use as keywords the hashtags correlated with each candidate. This approach increases the accuracy of the association of a particular tweet/user with the described entities.

\\The keywords are being searched in lower case in order to avoid any misspelling and user upper-lower case writing style.

-Review comment 8) In general, node-edge diagrams are not particularly useful visualizations. Additional effort is needed to describe what is interesting about the communities in these figures, what size means, why certain nodes are labeled and others note, etc.

This is explained in detail in the manuscript in section 5 Linking Twitter and YouTube data:

Fig17.eps illustrates the 3-core YouTube comment graph that represents the relation of which channel has commented on which channel. Different colored areas (purple, light grey, green blue) in the graph represent different communities that are formed between the YouTube channels produced by the Louvain algorithm. For example purple indicates the community between the “Blaze TV”, “The Officer Tatum” and the “ Fox News”. However “ Fox News” is also between the purple and the grey which shows the next community (formed by “Fox News” and “Donald J Trump”). These communities do not necessarily correspond to a real life community are a result of the Louvain algorithm of Gephi.

Regarding the labels, in this figure we also show the 13 most popular channels after running PageRank.

Similarly, in plot 18 we show the 3-core Retweet graph that represents the relation which users have retweeted. In the graph we notice seven colored areas of different size, indicating different communities, that as in plot 17, do not necessarily correspond to a real life community. Also the text labels shown, for example “MaryLTrump” and “DrDenaGrayson” seem to be close, which again cannot be translated to semantic information. It is considered coincidental to be close but not coincidental to be in the same color-area (community).

In both plots (node diagrams 17.eps-18.eps) it is not possible to keep readable names of each particular node, through the high density of nodes. The nodes in Fig18.eps are the users in Twitter who have posted the tweets included in this plot and similarly the nodes in Fig17.eps are the users of YouTube which are channels. In order to highlight only the important nodes we manage to provide the names for most significant nodes based on the PageRank score.

For illustration purposes nodes in both plots (17,18) with high degree are shaped like in circles (lightly highlighted circle-like scheme) in this plot, but are variables in Gephi that unfortunately, cannot be explained semantically in this context and should be ignored.

-Review comment 9) The paper is missing a "Data Availability Statement" in the text.

The dataset is available through zenodo and this was already described in subsection: “How to obtain the dataset”, which has been moved to subsection: “Data Availability” subsection located in the dataset section.

https://zenodo.org/record/4618233#.YGGJU2Qzada

Attachment

Submitted filename: Response_to_the_reviewers_third_round.pdf

Decision Letter 2

Ayan Seal

7 Mar 2022

PONE-D-20-37937R2

What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020

PLOS ONE

Dear Dr. Antonakaki,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we have decided that your manuscript does not meet our criteria for publication and must therefore be rejected.

Specifically:

I am sorry that we cannot be more positive on this occasion, but hope that you appreciate the reasons for this decision.

Yours sincerely,

Ayan Seal, Ph.D

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: No

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: The author needs to identify the novelty in the paper, as sentiment analysis is a vast field-. A lot of work has already been done in this field. The work is not up-to the mark lagging novelty .

Also their are statistical measures to show that your work is better than other works which is basically lagging in authors work

No comparative study is done to depict that your work ix better than other work.

Above all the paper seems to lag a lot of novelty and hence It is not accepted

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

- - - - -

For journal use only: PONEDEC3

PLoS One. 2023 Jan 31;18(1):e0270542. doi: 10.1371/journal.pone.0270542.r006

Author response to Decision Letter 2


13 May 2022

This submission is a response to the "PLOS ONE: Your Appeal for the manuscript (PONE-D-20-37937R2) - [EMID:87b2b482bf78e2ed]"

After the final round of reviews, the manuscript has been handled to a new Editor, and as address in the response to reviewers: file PLOS Fourth review (rejection).pdf we responded with the request to be revisited.

In this submission we include the cover letter, all the necessary latex files (plos_latex_third_review.tex) and images in order to produce the manuscript, as well as:

The manuscript in pdf with highlighted changes from the first submission (Revised Manuscript with Track Changes_25_august_2021.pdf) the second submission (Revised Manuscript with Track Changes_12_October_.pdf)

Attachment

Submitted filename: Response_to_the_reviewers_third_round.pdf

Decision Letter 3

Daswin De Silva

14 Jun 2022

What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020

PONE-D-20-37937R3

Dear Dr. Antonakaki,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

I acknowledge that the authors face several major challenges in terms of the number of revisions and disparity of review comments across the three versions of this manuscript. I commend the authors for their resilience and diligence in fully addressing all relevant comments with suitable content while also maintaining the main focus of this paper. Specifically, the challenges of homophily, echo chambers are increasingly prominent on social media platforms, and this analysis of such behaviors within and across platforms during a "social media significant"  event such as the US Elections of 2020" is practical and meaningful contribution to research literature. I also acknowledge  this manuscript aligns with the Plos One publication principles in conveying this contribution to the wider community. 

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Daswin De Silva

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments: 

No further reviewer comments. I commend the authors in addressing all reviewer comments effectively, across the past three review cycles. The manuscript has already been significantly revised and now much more relatable to the Plos One readership.  

Acceptance letter

Daswin De Silva

27 Jun 2022

PONE-D-20-37937R3

What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020

Dear Dr. Antonakaki:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Daswin De Silva

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. List of all hashtags.

    (PDF)

    S1 Text

    (TXT)

    S1 Data

    (XLSX)

    Attachment

    Submitted filename: response_to_reviewers_round2.pdf

    Attachment

    Submitted filename: Response_to_the_reviewers_third_round.pdf

    Attachment

    Submitted filename: Response_to_the_reviewers_third_round.pdf

    Data Availability Statement

    In order to obtain the dataset used for the analysis described in this study, we follow the Twitter API restrictions and do not violate any terms from Twitter Developer Agreement and Policy. According to Twitter Policy, we are not allowed to share the entire dataset, but only 100K user IDs. This dataset is available here: https://zenodo.org/record/4618233#.YGGJU2Qzada. The access is open and no approval is required. We provide the directed retweet graph from the Twitter network, all user IDs from the provided retweet graph (89.479 users), all video IDs (vid) extracted from the election related tweets (39.203 video ids) and the directed comment graph.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES