Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Mar 31;5(1):86–99. doi: 10.2478/dim-2020-0023

Exploring Public Response to COVID-19 on Weibo with LDA Topic Modeling and Sentiment Analysis

Runbin Xie a,*, Samuel Kai Wah Chu b, Dickson Kak Wah Chiu b, Yangshu Wang b
PMCID: PMC8975181  PMID: 35402850

Abstract

It is necessary and important to understand public responses to crises, including disease outbreaks. Traditionally, surveys have played an essential role in collecting public opinion, while nowadays, with the increasing popularity of social media, mining social media data serves as another popular tool in opinion mining research. To understand the public response to COVID-19 on Weibo, this research collects 719,570 Weibo posts through a web crawler and analyzes the data with text mining techniques, including Latent Dirichlet Allocation (LDA) topic modeling and sentiment analysis. It is found that, in response to the COVID-19 outbreak, people learn about COVID-19, show their support for frontline warriors, encourage each other spiritually, and, in terms of taking preventive measures, express concerns about economic and life restoration, and so on. Analysis of sentiments and semantic networks further reveals that country media, as well as influential individuals and “self-media,” together contribute to the information spread of positive sentiment.

Keywords: COVID-19, Weibo, web crawling, LDA, sentiment analysis

1. Introduction

In recent decades, we have encountered several disease outbreaks such as SARS in 200 and MERS in 2012. Nowadays, we are facing another enemy: COVID-19. As of July 26, 2020, more than 16 million COVID-19 confirmed cases have been reported around the world (JHU COVID-19 Resource Center, 2020). With the first confirmed case officially reported in Wuhan, China, in late December 2019, the global COVID-19 outbreak has brought great damage to the world's normal operations for over half a year and has been, unfortunately, aggravating, as of July 2020.

During and after the disease outbreak, public opinion is commonly collected as it is useful in many ways such as improving communications in terms of public concerns, crisis management, health knowledge promotion, and so on between governments and the public (Holmes, Henrich, Hancock, & Lestou, 2009; Mollema et al., 2015). Traditionally, surveys have played an important role in gathering public opinion, and undoubtedly, they have also been applied to disease outbreak research about public opinion. Table 1 summarizes some popularly investigated themes in survey-based research of the COVID-19 outbreak.

Table 1.

Popular Themes in Survey-Based Research of COVID-19 Outbreak

Themes Sample research
Knowledge, attitudes & practices (KAP) Wolf et al. (2020); Zhong et al. (2020)
Psychological stress Huang and Zhao (2020); Mazza et al. (2020); Qiu et al. (2020)
Information seeking Ebrahim et al. (2020); Liu (2020)
Misinformation (fake news) Greene and Murphy (2020); Motta, Stecula, and Farhart (2020)
Sensitive individuals, including front-line hospital staffs and recovered patient Bhagavathula, Aldhaleei, Rahmani, Mahabadi, and Bandari (2020); Huang, Han, Luo, Ren, and Zhou (2020)
Attitudes towards government actions on disease control Atchison et al. (2020)

While surveys play an irreplaceable role, there is another way of collecting public opinion about disease outbreaks: the increasing popularity of social media and the development of text analysis techniques have given rise to mining social media data. Unlike surveys that can mainly collect only a limited volume of data and may not be able to reflect the real thoughts of the public because of some influential factors during the survey process, such as the environmental distraction, subject's psychological pressure, and so on (Hridoy, Ekram, Islam, Ahmed, & Rahman, 2015), social media data, which is the collection of short texts posted online to express one's feelings at any time, are generated on a large scale. Thus, the efficient use of social media data can contribute largely to the research about public opinion.

Reported as the first storm center affected by COVID-19, China had undergone an incredibly hard time at the very beginning, encountering difficulties such as city lockdowns, lack of medical supplies, lack of hospital resources, and so on. But luckily, by July 2020, China had also made satisfactory achievement by getting the disease outbreak under control with aggressive approaches (Campbell, 2020; Clinch, 2020). Weibo, the biggest social media platform in mainland China, serves the functions of information sharing and communications in the country. Data collected from Weibo can be analyzed to understand the Chinese people's reaction to the disease outbreak and the characteristics of semantic networks, which lead to the research questions (RQs) of this research.

  • RQ1: What topics can be detected from Weibo posts on COVID-19?

  • RQ2: How do sentiments change over time, and what are the characteristics of different semantic networks?

The rest of this article is organized as follows. The second part reviews some related work in the context of this research, after which methodologies used to conduct this research are introduced, and the results are presented and discussed. Finally, conclusions are made, together with the contribution of this work to the research context and suggestions on further research directions.

2. Related Work

2.1. Disease Outbreaks on Social Media

Table 2 records some exemplary research on the disease outbreak on social media. It can be observed that social media data contribute to several types of research on disease outbreaks, including public opinion mining, sentiment analysis, semantic network analysis, disease outbreak detection, and so on; the methods applied to social media data analysis vary from manual coding to machine learning techniques; Twitter is a worldwide platform for collection and analysis of social media data, while for social media data collection from residents in mainland China, Weibo is a more popular choice.

Table 2.

Research on Disease Outbreaks on Social Media

Author Disease outbreak Social media Methods Findings
Mollema et al. (2015) Measles Twitter and others Thematic analysis People on Twitter cared about disease transmission, preventive actions, and vaccination; governments needed to promote vaccination acceptability.
Fung et al. (2013) MERS-CoV & H7N9 Weibo Statistical analysis on the number of Weibo posts Weibo users reacted to the disease outbreak significantly, and people paid more attention to the H7N9 outbreak.
Ye, Li, Yang, and Qin (2016) Dengue Weibo Analysis on the numbers of posts and spatial information Spatially and temporally, there was a correlation between the number of posts and disease development trends.
Chew and Eysenbach (2010) H1N1 Twitter Manual and automated coding Several sentiments, including confusion, humor, risk, and so on, were discovered, among which humor was the most popular sentiment.
Ye et al. (2016) Influenza Twitter Modeling A prediction model built on Twitter data could be used for influenza outbreak alerts.
Zhang et al. (2015) H7N9 Weibo Analysis on the number of Weibo posts and the number of new confirmed cases There was a positive correlation between discussion and disease outbreak level, and Weibo served as a good medium to promote communications of public health.
Li et al. (2020) COVID-19 Weibo Machine learning algorithms Weibo posts were classified into seven categories of situational information. Useful text features should be helpful in building an emergence response system.

2.2. Topic Modeling Using LDA

Latent Dirichlet Allocation (LDA) assumes that a document is generated based on a certain number of topics, and each word in the document is randomly selected from its corresponding topic vocabulary (Blei, Ng, & Jordan, 2003; Gruber, Weiss, & Rosen-Zvi, 2009). It is an excellent probabilistic model that performs well in topic modeling and has been widely applied in research (Hu et al., 2017). For example, Barua, Thomas, and Hassan (2012) utilized LDA to discover topics and topic trends from a popular Q&A website in the programming field and found that the developer community discussed a wide range of topics and discussions of different topics are interconnected. Similarly, Hu et al. (2017) studied email corpora with the LDA model.

LDA has been applied to extract topics from various kinds of corpora, including but not limited to microblogs. For example, Huang, Yang, Mahmood, and Wang (2012) applied LDA with web usage data; Xu, Zhang, and Yi (2018) and Lim and Buntine (2014) studied tweets with LDA modeling. Therefore, it is logical to follow that exploring topics from Weibo datasets with LDA topic modeling should also harvest satisfactory results.

2.3. Sentiment Analysis

Sentiment of text data mainly refers to the emotions hidden within the text. Sentiment analysis has been widely applied to a large volume of opinion mining research, including product review analysis, public response to the stock market, and so on, as valuable information can be discovered if emotions in texts are well analyzed (Bakshi, Kaur, Kaur, & Kaur, 2016). In general, sentiments are classified into three categories: positive, neutral, and negative, and there are mainly two ways of sentiment tagging (Ray & Chakrabarti, 2017).

Lexicon-based sentiment tagging analyzes the words in a sentence and harvests the overall score by adding up the scores of each word, for which sentiment dictionaries are used (Bhonde, Bhagwat, Ingulkar, & Pande, 2015). This method was widely applied in the sentiment tagging of social media posts. For example, Ray and Chakrabarti (2017) used the R language to tag Twitter posts for product reviews with lexicon vocabularies and completed sentiment analysis at three levels: document, sentence, and aspect; Pérez-Pérez, Pérez-Rodríguez, Fdez-Riverola, and Lourenço (2019) used a lexicon-based tagger to analyze tweets' sentiments under each topic discovered in the Human Bowel Disease community.

Machine learning approaches to sentiment tagging are also gaining popularity. For example, Chen and Sokolova (2018) adopted an unsupervised approach to cluster sentiments of clinical discharge summaries with word embeddings generated from Word2Vec and Doc2Vec models and compared the results. Salathé and Khandelwal (2011) compared three machine learning techniques in terms of sentiment classification performance and adopted both naive Bayes and maximum entropy algorithms for the supervised classification of vaccination sentiments.

3. Methodologies

3.1. Methodological Framework

Figure 1 shows the methodological framework of this work. This research starts with data collection of Weibo posts with a user-simulation-like web crawler. Posts are then collected and processed to remove text noises and stop words, after which text segmentation is also performed. Then topic modeling is conducted using the LDA model with the cleaned dataset. To answer the second research question, lexicon-based sentiment analysis is conducted on data, including the number of posts, network information, and so on, to reveal sentiment trends and discover the characteristics of semantic networks.

Figure 1.

Figure 1

Methodological framework of the proposed research

3.2. Data Collection

This work examines the public opinion related to COVID-19 on Weibo, for which theme-related posts must be collected. Both web crawlers and APIs are nowadays widely used technologies to collect data from social media, including Weibo (Liu & Hu, 2019; Liu, Wu, Wang, & Li, 2014). Although Weibo API, being the official gateway to collect Weibo data, is easy to use, there are many constraints such as only providing a limited number of posts and so on (Zeng, Zheng, Chen, & Yu, 2014). To avoid such inconvenience, this work uses selenium to develop a simulation-based web crawler using Python, so as to satisfy the data collection task. The web crawler simulates human logins and searches Weibo posts based on given keywords. The HTML of webpages is then collected and parsed to get the posted Weibo content and relevant information such as username and so on.

To fulfill the research purposes, this work uses six keywords selected from Hu, Huang, Chen, and Mao (2020), as listed in Table 3 . Hot posts on each day over the period from January 1 to June 30, 2020, are collected.

Table 3.

Selected Keywords for Data Collection

Keyword Translation
?? Virus
?? (Confirmed/suspicious) case
?? Pneumonia
?? COVID
???? Coronavirus
?? Disease outbreak

3.3. Data Preprocessing

3.3.1. Removing Noises

Text noises are generally removed in the text mining research, which benefits the experimental outcomes (Celardo, Iezzi, & Vichi, 2016). In the context of this research, text noises include emoji codes, punctuation marks, symbols, non-Chinese words, and so on. To remove text noises, the “re” regular expression package in Python is adopted to remove all non-Chinese components in the dataset.

3.3.2. Stop Word List

In Chinese text, there are many “meaningless” words like “?” (I/me), “?” (you), “?” (have done something), and so on, which are normally removed in information retrieval and text mining tasks so as to improve the experimental outcomes (Zou, Wang, Deng, & Han, 2006). There are also some Chinese stop word lists built by university NLP labs and companies. To construct a stop word list, this work integrates three public stop word lists (Baidu stop word, SCU stop word, and HIT stop word) that are widely used (Xie et al., 2019), together with some domain stop words such as “????” (repost) that appear frequently in most of the posts collected.

3.3.3. Chinese Text Segmentation

Unlike English text, the Chinese text needs to be segmented for analysis tasks. In terms of Chinese text segmentation tools, the “jieba” package in Python is widely used and has many advantages, such as adding customized words (Day & Lee, 2016; Peng, Liou, Chang, & Lee, 2015). In this research, the “jieba” package is adopted to perform the Chinese text segmentation task.

3.4. Topic Modeling

As mentioned above, LDA is a powerful tool in topic modeling. LDA can extract a given number of topics from a corpus that contains a certain number of documents. This research applies LDA to extract a certain number of topics from the cleaned dataset with the Python package “gensim.” In terms of the determination of the number of topics, both perplexity and coherence scores are taken into consideration. While the former measures how well the model is generated from the corpus (the lower the better), the latter measures the sentence similarity of each topic in the dataset (the higher the better) (Blei et al., 2003; Xie, Qin, & Zhu, 2018). After the optimal model is determined, another Python package, “pyLDAvis,” is adopted to visualize the topic extraction results, with a coordinate graph to show the distribution of topics and lists of the top 30 most salient words in each topic. Topic labeling is completed manually, based on the given salient words.

3.5. Sentiment Analysis

3.5.1. Lexicon-based Sentiment Tagging

The sentiment tagging task in this research adopts a lexicon-based approach recorded in https://www.cnblogs.com/qiaoyanlin/p/6891437.html. In short, the process of calculating the sentiment score of a post mainly contains four steps: (1) sentences in a post are split based on the punctuation marks; (2) each sentence is then segmented and meaningful words remain; (3) each remaining word (including negation words that might reverse the sentiment of a sentence) provides its sentiment and/or weight score based on the lexical dictionaries; and (4) the overall score of the post is calculated based on the scores of each sentence, which are calculated from the scores of each word.

3.5.2. Statistics and Semantic Network Analysis

After sentiment tagging, descriptive statistics of the results are described. Time series analysis is then performed to discover interesting phenomena from the number of positive and negative posts over time. To take a closer look at the sentiments of public opinion, positive and negative semantic networks are constructed to identify important roles in and the characteristics of respective networks, for which network visualization and statistical analysis are performed.

4. Results and Discussion

4.1. Data Collected

Figure 2 is a screenshot of some collected raw data. In total, 719,570 posts are collected over the period from January 1 to June 30, 2020, based on the given keywords. After data cleaning is performed, which includes removing duplicate posts, blanks, and “NA” that come probably from parsing failure, only 374,225 posts remain.

Figure 2.

Figure 2

A screenshot of some collected raw data

The dataset is then further processed with Chinese text segmentation and removing stop words, after which it is ready for text mining analysis. Figure 3 visualizes the top 500 frequent terms with a word cloud, and Table 4 records the top 50 frequent words in the dataset, from which some interesting preliminary findings can be observed: (1) undoubtedly, term frequencies of each keyword selected for data collection rank top in the vocabulary and (2) discussions of COVID-19 on Weibo might cover a wide range of topics, including local disease outbreak, international relations, prevention measures, global pandemic, hospital staff, vaccination, and so on, which are in line with some previous research findings on public opinions about disease outbreak (Jalloh et al., 2017; Nickell et al., 2004; Rubin, Amlôt, Page, & Wessely, 2009). The preliminary findings can be further confirmed with the following analysis.

Figure 3.

Figure 3

Word cloud of top 500 terms in the cleaned dataset

Table 4.

Top 50 Words in the Cleaned Dataset

No. Term Translation Frequency No. Term Translation Frequency No. Term Translation Frequency
1 ?? Disease outbreak 153,855 18 ?? Country 23,856 35 ?? Import 15,642
2 ?? Pneumonia 126,294 19 ?? Quarantine 23,311 36 ?? Vaccine 15,560
3 ?? COVID 98,978 20 ?? Court 23,168 37 ?? Treatment 15,506
4 ?? Case 97,597 21 ?? Work 22,905 38 ?? Genuine 15,478
5 ?? Virus 71,299 22 ?? Hope 22,836 39 ?? Staff 14,722
6 ?? Novel 66, 986 23 ?? Death 21,772 40 ?? Situation 14,243
7 ?? Infected 65,884 24 ?? Hospital 21,209 41 ?? Hubei 13,338
8 ???? Coronavirus 60,636 25 ?? Test 21,195 42 ?? Beijing 13,018
9 ?? America 52,716 26 ?? Add oil 20,350 43 ?? Health 13,005
10 ?? China 50,935 27 ?? Globe 19,263 44 ?? Report 12,769
11 ?? Wuhan 45,959 28 ?? Time 19,247 45 ?? Period 12,223
12 ?? Infected 38,556 29 ?? Accumulate 18,367 46 ?? abroad 12,162
13 ?? Disease control 33,447 30 ?? Fight 17,996 47 ?? Discharged 11,678
14 ?? Video 32,489 31 ?? Folk 17,381 48 ?? Doctor 1,111
15 ?? Patients 31,036 32 ?? News 17,000 49 ?? World 11,558
16 ?? New case 28,674 33 ?? Discover 16,799 50 ??? Yuhua district 11,391
17 ?? Face mask 23,942 34 ?? Nation 15,650

4.2. Topic Modeling

4.2.1. Perplexity and Coherence Scores of LDA Models

Figure 4, Figure 5, respectively, show the perplexity and coherence scores of models trained under different settings, namely, different numbers of topics. Overall, with an increasing number of topics set for model training, a smaller perplexity score is gained, that is, the model performs better in predicting the samples. Notably, the setting of a larger number of topics may lead to model overfitting issues, and thus, it needs to be careful in deciding the number of topics for training the model. Another measurement for deciding the number of topics is to compare the coherence scores of different models. In this case, coherence scores fluctuate with an increasing number of topics, and peaks are observed when numbers of topics for model training are set as 4, 8, 12, 15, respectively. Based on the perplexity and coherence scores, first, it can be concluded that the number of topics set for LDA training should be 12 or 15. By observing the keywords generated for each topic, the number of topics is finally determined as 12, because it is easier to code the topics from the given keywords.

Figure 4.

Figure 4

Perplexity scores of models under different settings of number of topics

Figure 5.

Figure 5

Coherence scores of models under different settings of number of topics

4.2.2. Discovered Topics

Figure 6 is a visualization of the selected LDA model mentioned above. While the left panel shows the distribution of each topic, the word list on the right panel shows the top 30 most salient terms of the selected topic, in which the blue bar shows the overall term frequency in the dataset, and the red bar represents the estimated term frequency in the selected topic. Table 5 records the discovered topics, each with 10 representative words selected from the top 30 most salient terms.

Figure 6.

Figure 6

LDA model visualization

Table 5.

Top 30 Most Salient Terms of Each Topic and Topic Coding Results

Topic ID Topic Label 10 representative words selected from the top 30 most salient words
1 Fight the virus together ?? (power), ?? (add oil), ?? (folk), ?? (hope), ?? (China), ?? (fight), ?? (fight the virus), ?? (front line), ?? (success), ?? (work hard)
2 Knowledge ?? (research), ?? (expert), ?? (reason), ?? (popular science), ?? (transmission), ?? (disease), ?? (science), ?? (prevention), ?? (exam), ?? (discover)
3 Assistance ?? (work), ?? (fight), ?? (come back), ?? (nation), ???? (united), ?? (on site), ?? (pay attention to), ?? (support), ?? (news), ?? (situation)
4 Economics ?? (globe), ?? (economics), ?? (influence), ??? (fund), ?? (world), ?? (internationality), ?? (control), ?? (cooperation), ?? (society), ?? (market)
5 Global pandemic ?? (Iran), ?? (case), ?? (UK), ?? (disease outbreak), ?? (new case), ?? (Japan), ??? (Italy), ?? (accumulate), ?? (urgent), ?? (abroad)
6 Prevention ?? (disease outbreak), ?? (face mask), ?? (promotion), ?? (prevention), ?? (do well in), ?? (prevent), ?? (fight the virus), ?? (risk), ?? (health), ?? (measure)
7 Treatment ?? (quarantine), ?? (discharged), ??? (no symptoms), ???? (medical observation), ???? (close contact), ?? (cure), ?? (fever), ?? (severely ill), ?? (confirmed affection), ?? (treatment)
8 Stay at home ?? (period), ?? (life), ?? (this time), ?? (happy), ?? (go back home), ?? (at home), ?? (like), ?? (thing), ?? (friend), ?? (home)
9 Law ?? (court), ?? (judge), ?? (folk), ?? (breach the law), ?? (case), ?? (defendant), ?? (proof), ?? (audio recording), ?? (according to the law), ?? (truth)
10 Study ?? (school), ?? (children), ?? (start school), ?? (student), ?? (start school), ?? (university), ?? (major), ?? (parents), ?? (homework), ?? (college entrance examination)
11 Celebrity and charity ??? (Xukun Cai), ?? (Zi Yang), ?? (charity), ?? (super topic), ?? (fan), ?? (dawn), ?? (charity), ?? (Zhan Xiao), ?? (protect), ?? (defeat)
12 People ?? (live stream), ?? (son), ?? (younger brother), ?? (grandmother), ?? (vocation), ?? (husband), ?? (idol), ?? (value), ?? (boost popularity), ?? (meet)

It can be seen from Figure 6 that though there are some overlapped areas, in general, topics extracted are evenly distributed. From Table 5, it can be inferred that users on Weibo discuss a wide range of topics, among which 10 topics are related to the COVID-19 outbreak while 2 topics on “Law” and “People,” seem irrelevant to the research context. Seven topics, including “Fight the virus together,” “Knowledge,” “Assistance,” “Prevention,” “Treatment,” “Global pandemic,” and “Stay at home” are directly related to the disease outbreak, while three topics, “Economics,” “Study,” and “Celebrity and charity” could be regarded as topics derived from COVID-19, because they are either fields affected by the disease outbreak or tightly associated with it. From the results, it is interesting to know that people encourage each other in terms of fighting the disease, share disease-related knowledge such as prevention and transmission, pay attention to global disease outbreak development, and also discuss the affected life together, which are consistent with some findings of previous research (Chung, He, & Zeng, 2015; Corley, Cook, Mikler, & Singh, 2010; Lazard, Scheinfeld, Bernhardt, Wilcox, & Suran, 2015; Signorini, 2014).

4.3. Sentiment Tagging Results and Trend Analysis

After data preprocessing and text segmentation, 207,323 posts in total remained for sentiment tagging, with 79,861 positive posts and 33,049 negative posts, while the rest were all tagged as neutral posts (sentiment score is 0). Figure 7 shows the trends of the number of positive and negative posts over the data collection period. It can be observed that the number of positive posts exceed that of the negative ones over the whole period.

Figure 7.

Figure 7

Number of positive and negative posts over time

Specifically, in terms of positive posts, three peaks can be observed, as also marked in Figure 7. The top 10 most frequent terms extracted from posts at each peak are listed in Table 6 . The first peak is recorded on January 24, the day of the Chinese New Year's Eve, and right after the lockdown of Wuhan, which took place on January 23. From the top 10 terms, it can be inferred that the posts are mostly related to wishes for the coming New Year as well as the COVID-19 outbreak in Wuhan. The second peak comes on March 3, when the central government announced the preliminary success in fighting COVID-19, for which relevant words, for example, ?? (protect), can also be observed. The third peak comes on May 23, when the success of the phase 1 vaccine trial was announced.

Table 6.

Top 10 Frequent Terms Extracted from Posts at Each Peak

No. Peak 1
Peak 2
Peak 3
Top term Translation Freq. Top term Translation Freq. Top term Translation Freq.
1 ?? Hope 470 ?? Protect 767 ?? Disease outbreak 533
2 ?? Wuhan 388 ?? Disease outbreak 374 ?? Case 395
3 ?? Disease outbreak 320 ?? Hope 364 ?? COVID-19 287
4 ?? Pneumonia 260 ?? Defeat 351 ?? Coronavirus 263
5 ?? Add oil 235 ?? This 350 ?? Pneumonia 193
6 ?? Novel 204 ?? Fight the virus 344 ?? Confirmed Affection 189
7 ?? Safe and sound 179 ??? Fight the virus 313 ?? China 185
8 ???? Coronavirus 173 ?? Information 306 ?? Vaccine 173
9 ?? Virus 157 ??? Good kids 288 ?? Accumulate 148
10 ?? 1 year 154 ???? Eliminate the false and retain the true 263 ?? Hope 141

4.4. Semantic Network Analysis

Figure 8, Figure 9 record the visualization of semantic networks of the positive and negative sentiments, respectively. In the network visualization, the color of each modularity is different from one another, and the size of a node stands for its interaction frequency: the more interaction it has, the bigger size of node it is. In both semantic networks, it could be easily observed that country media, including ???? (People's Daily), ???? (CCTV News), ???? (Global Times), and so on are leading the discussions, and basically, each of them forms a relatively independent community. As for the differences, it could be seen that there are more medium-sized nodes around each leading actor in the semantic network of positive sentiment, which means that mainstream media and influential KOL (key opinion leader), including entrepreneurs such as ?????????(Jack Ma), ??? (Hu Xijin), self-media such as ????? (Things in the UK) and so on, play an important role in leading the information spread and discussions of positive sentiment, while in the semantic network of negative sentiment, the discussions, also led by country media, are more scattered.

Figure 8.

Figure 8

Visualization of semantic network of positive sentiment

Figure 9.

Figure 9

Visualization of semantic network of negative sentiment

To take a closer look, the reposting frequencies of each node are calculated for each semantic network and then normalized between 0 and 1 for comparison purposes. Quartiles of normalized data are shown in Table 7 , from which it can be seen that Q1, median, and Q3 of the reposting frequencies of the positive semantic network are all greater than those of the negative semantic network, meaning that there are more influential nodes in the semantic network of positive sentiment, which are consistent with the previous findings.

Table 7.

Quartiles of Normalized Reposting Frequencies of Each Semantic Network

Sentiment Min Q1 Median Q3 Max
Positive 0.000 0.002 0.013 0.109 1.000
Negative 0.000 0.000 0.000 0.002 1.000

5. Conclusion and Future Work

Weibo serves as a social media platform for people in mainland China to share information and communicate with each other. With the help of 719,570 collected posts and application of the LDA model, a wide range of topics discussed in relation to COVID-19 on Weibo is discovered. In response to the COVID-19 outbreak, people gain knowledge about COVID-19, show their support for frontline warriors, encourage each other spiritually, and, in terms of taking preventive measures, express concerns about economic and life restoration, and so on. Sentiment analysis further reveals that country media are leading the discussions on Weibo in both semantic networks, while, specifically, country media, as well as influential individuals and “self-media” together contribute to the information spread of positive sentiment, indicating that the government could better fulfill its role as crisis communicator through the utilization of such kind of media network.

Although there have been studies of public opinion on COVID-19 using surveys, scant studies focus on COVID-19 opinion mining based on social media data. With LDA's excellent performance in topics modeling and sentiment analysis to take a closer look at people's feelings, this work contributes to the understanding of peoples' response to COVID-19 on Weibo and may probably serve as an example of preliminary research in the application of LDA and sentiment analysis on the COVID-19 social media dataset. Further investigation of this topic can be done in different ways. One direct way is to extend the research context, such as tracing relevant posts of each topic, analyzing peaks of negative posts, revealing the relationships between positive and negative trends, and so on. Besides, identification of topic trends, correlation analysis between the number of posts and disease development trends, and so on can also lead to meaningful findings, for which a similar approach can be seen in some previous research (Fung et al., 2013; Hu et al., 2017).

References

  1. Atchison C., Bowman L., Eaton J.W., Imai N., Redd R., Pristera P., Vrinten C., Ward H. Public response to UK government recommendations on COVID-19: Population survey. 2020. (Report No. 10). [DOI]
  2. Bakshi R.K., Kaur N., Kaur R., Kaur G. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) Hoda M.N., editor. IEEE; Piscataway, NJ: 2016. Opinion mining and sentiment analysis; pp. 452–455. [Google Scholar]
  3. Barua A., Thomas S.W., Hassan A.E. What are developers talking about? An analysis of topics and trends in Stack Overflow. Empirical Software Engineering. 2014;19(3):619–654. [Google Scholar]
  4. Bhagavathula A.S., Aldhaleei W.A., Rahmani J., Mahabadi M.A., Bandari D.K. MedRxiv. 2020. Novel coronavirus (COVID-19) knowledge and perceptions: A survey of healthcare workers; pp. 1–15. [DOI] [Google Scholar]
  5. Bhonde R., Bhagwat B., Ingulkar S., Pande A. Sentiment analysis based on dictionary approach. International Journal of Emerging Engineering Research and Technology. 2015;3(1):51–55. [Google Scholar]
  6. Blei D.M., Ng A.Y., Jordan M.I. Latent dirichlet allocation. Journal of Machine Learning Research. 2003;3(Jan):993–1022. [Google Scholar]
  7. Campbell C. China appears to have tamed a second wave of coronavirus in just 21 days with no deaths. Time. 2020. Retrieved from https://time.com/5862482/china-beijing-coronavirus-second-wave-covid19-xinfadi/
  8. Celardo L., Iezzi D.F., Vichi M. Proceedings of the 13th International Conference on Statistical Analysis of Textual Data. Les Press de Fac Imprimeur; Nice: 2016. Multi-mode partitioning for text clustering to reduce dimensionality and noises; pp. 181–192. [Google Scholar]
  9. Chen Q., Sokolova M. Word2Vec and Doc2Vec in unsupervised sentiment analysis of clinical discharge summaries. 2018. arXiv:1805.00352 ArXiv Preprint.
  10. Chew C., Eysenbach G. Pandemics in the age of Twitter: Content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One. 2010;5(11):1–13. doi: 10.1371/journal.pone.0014118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chung, W., He, S., & Zeng, D. (2015). eMood: Modeling emotion for social media analytics on Ebola disease outbreak. Paper presented at the ICIS2015, Beijing, China.
  12. Clinch M. CNBC; 2020. Beijing's coronavirus outbreak is under control, Chinese health expert says.Retrieved from https://www.cnbc.com/2020/06/18/beijings-coronavirus-outbreak-under-control-china-health-expert-says.html [Google Scholar]
  13. Corley C.D., Cook D.J., Mikler A.R., Singh K.P. Text and structural data mining of influenza mentions in web and social media. International Journal of Environmental Research and Public Health. 2010;7(2):596–615. doi: 10.3390/ijerph7020596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Day M.Y., Lee C.C. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) IEEE; Piscataway, NJ: 2016. Deep learning for financial sentiment analysis on finance news providers; pp. 1127–1134. [Google Scholar]
  15. Ebrahim A.H., Saif Z.Q., Buheji M., AlBasri N., Al-Husaini F.A., Jahrami H. COVID-19 information-seeking behavior and anxiety symptoms among parents. Journal of Health Care and Medicine. 2020;1(1):1–9. [Google Scholar]
  16. Fung I.C.H., Fu K.W., Ying Y., Schaible B., Hao Y., Chan C.H., Tse Z.T.H. Chinese social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks. Infectious Diseases of Poverty. 2013;2(1):1–12. doi: 10.1186/2049-9957-2-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Greene C.M., Murphy G. PsyArXiv. 2020. Can fake news really change behaviour? Evidence from a study of COVID-19 misinformation; pp. 1–32. [DOI] [Google Scholar]
  18. Gruber A., Weiss Y., Rosen-Zvi M. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics. 2009. Hidden topic Markov models; pp. 163–170. [Google Scholar]
  19. Holmes B.J., Henrich N., Hancock S., Lestou V. Communicating with the public during health crises: Experts’ experiences and opinions. Journal of Risk Research. 2009;12(6):793–807. doi: 10.1080/13669870802648486. [DOI] [Google Scholar]
  20. Hridoy S.A.A., Ekram M.T., Islam M.S., Ahmed F., Rahman R.M. Localized twitter opinion mining using sentiment analysis. Decision Analytics. 2015;2(1):1–19. doi: 10.1186/s40165-015-0016-4. [DOI] [Google Scholar]
  21. Hu X., Choi K., Hao Y., Cunningham S.J., Lee J.H., Laplante A., Downie J.S. In: Proceeding of 18th International Society for Music Information Retrieval Conference. Hu X., Cunningham S.J., Turnbull D., Duan Z., editors. 2017. Exploring the music library association mailing list: A text mining approach; pp. 302–308. [Google Scholar]
  22. Hu, Y., Huang, H., Chen, A., & Mao, X. L. (2020). Weibo-COV: A large-scale COVID-19 social media dataset from Weibo. ArXiv Preprint. arXiv: 2005.09174.
  23. Huang B., Yang Y., Mahmood A., Wang H. Microblog topic detection based on LDA model and single-pass clustering. 2012, August. Paper presented at the International Conference on Rough Sets and Current Trends in Computing, Chengdu, China. [DOI]
  24. Huang J.Z., Han M.F., Luo T.D., Ren A.K., Zhou X.P. Mental health survey of medical staff in a tertiary infectious disease hospital for COVID-19. Chinese Journal of Industrial Hygiene and Occupational Diseases. 2020;38(3):192–195. doi: 10.3760/cma.j.cn121094-20200219-00063. [DOI] [PubMed] [Google Scholar]
  25. Huang Y., Zhao N. Generalized anxiety disorder, depressive symptoms and sleep quality during COVID-19 outbreak in China: A Web-based cross-sectional survey. Psychiatry Research. 2020;288:1–6. doi: 10.1016/j.psychres.2020.112954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jalloh M.F., Sengeh P., Monasch R., Jalloh M.B., DeLuca N., Dyson M., Bunnell R. National survey of Ebola-related knowledge, attitudes and practices before the outbreak peak in Sierra Leone: August 2014. BMJ Global Health. 2017;2(4):1–10. doi: 10.1136/bmjgh-2017-000285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. JHU COVID-19 Resource Center COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) 2020. Retrieved from https://coronavirus.jhu.edu/map.html
  28. Lazard A.J., Scheinfeld E., Bernhardt J.M., Wilcox G.B., Suran M. Detecting themes of public concern: A text mining analysis of the Centers for Disease Control and Prevention's Ebola live Twitter chat. American Journal of Infection Control. 2015;43(10):1109–1111. doi: 10.1016/j.ajic.2015.05.025. [DOI] [PubMed] [Google Scholar]
  29. Li L., Zhang Q., Wang X., Zhang J., Wang T., Gao T.L., Wang F.Y. Characterizing the propagation of situational information in social media during COVID-19 epidemic: A case study on Weibo. IEEE Transactions on Computational Social Systems. 2020;7(2):556–562. [Google Scholar]
  30. Lim K.W., Buntine W. Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 2014. Twitter opinion topic model: Extracting product opinions from tweets by leveraging hashtags and sentiment lexicon; pp. 1319–1328. [DOI] [Google Scholar]
  31. Liu P.L. COVID-19 information seeking on digital media and preventive behaviors: The mediation role of worry. Cyberpsychology, Behavior, and Social Networking. 2020;23(10):677–682. doi: 10.1089/cyber.2020.0250. [DOI] [PubMed] [Google Scholar]
  32. Liu X., Hu W. Attention and sentiment of Chinese public toward green buildings based on Sina Weibo. Sustainable Cities and Society. 2019;44:550–558. doi: 10.1016/j.scs.2018.10.047. [DOI] [Google Scholar]
  33. Liu Y., Wu B., Wang B., Li G. Paper presented at the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) IEEE; Piscataway, NJ: 2014, August. SDHM: A hybrid model for spammer detection in Weibo. [DOI] [Google Scholar]
  34. Mazza C., Ricci E., Biondi S., Colasanti M., Ferracuti S., Napoli C., Roma P. A nationwide survey of psychological distress among Italian people during the COVID-19 pandemic: Immediate psychological responses and associated factors. International Journal of Environmental Research and Public Health. 2020;17(9):1–14. doi: 10.3390/ijerph17093165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Mollema L., Harmsen I.A., Broekhuizen E., Clijnk R., De Melker H., Paulussen T., Das E. Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013. Journal of Medical Internet Research. 2015;17(5):1–12. doi: 10.2196/jmir.3863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Motta M., Stecula D., Farhart C. How right-leaning media coverage of COVID-19 facilitated the spread of misinformation in the early stages of the pandemic in the US. Canadian Journal of Political Science/Revue Canadienne de Science Politique. 2020;53(2):335–342. doi: 10.1017/S0008423920000396. [DOI] [Google Scholar]
  37. Nickell L.A., Crighton E.J., Tracy C.S., Al-Enazy H., Bolaji Y., Hanjrah S., Upshur R.E. Psychosocial effects of SARS on hospital staff: Survey of a large tertiary care institution. Canadian Medical Association Journal. 2004;170(5):793–798. doi: 10.1503/cmaj.1031077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Peng K.H., Liou L.H., Chang C.S., Lee D.S. Paper presented at the 2015 24th Wireless and Optical Communication Conference (WOCC) IEEE; Piscataway, NJ: 2015, October. Predicting personality traits of Chinese users based on Facebook wall posts. [Google Scholar]
  39. Pérez-Pérez M., Pérez-Rodríguez G., Fdez-Riverola F., Lourenço A. Using twitter to understand the human bowel disease community: Exploratory analysis of key topics. Journal of Medical Internet Research. 2019;21(8):1–16. doi: 10.2196/12610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Qiu J., Shen B., Zhao M., Wang Z., Xie B., Xu Y. A nationwide survey of psychological distress among Chinese people in the COVID-19 epidemic: Implications and policy recommendations. General Psychiatry. 2020;33(2):1–3. doi: 10.1136/gpsych-2020-100213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ray P., Chakrabarti A. Paper presented at the 2017 International Conference on Data Management, Analytics and Innovation (ICDMAI) IEEE; Piscataway, NJ: 2017, February. Twitter sentiment analysis for product review using lexicon method. [Google Scholar]
  42. Rubin G.J., Amlôt R., Page L., Wessely S. Public perceptions, anxiety, and behaviour change in relation to the swine flu outbreak: Cross sectional telephone survey. BMJ (Clinical Research Ed.) 2009;339 doi: 10.1136/bmj.b2651. (jul02 3). [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Salathé M., Khandelwal S. Assessing vaccination sentiments with online social media: Implications for infectious disease dynamics and control. PLoS Computational Biology. 2011;7(10):1–7. doi: 10.1371/journal.pcbi.1002199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Signorini A. Use of social media to monitor and predict outbreaks and public opinion on health topics. 2014. (Doctoral dissertation, University of Iowa, Iowa). [DOI]
  45. Wolf M.S., Serper M., Opsasnick L., O’Conor R.M., Curtis L., Benavente J.Y., Bailey S.C. Awareness, attitudes, and actions related to COVID-19 among adults with chronic conditions at the onset of the US outbreak : A cross-sectional survey. Annals of Internal Medicine. 2020;173(2):100–109. doi: 10.7326/M20-1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Xie T., Qin P., Zhu L. Study on the topic mining and dynamic visualization in view of LDA model. Modern Applied Science. 2018;13(1):204–213. [Google Scholar]
  47. Xu G., Zhang Y., Yi X. Paper presented at the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE; Piscataway, NJ: 2008, December. Modelling user behaviour for web recommendation using lda model. [DOI] [Google Scholar]
  48. Ye X., Li S., Yang X., Qin C. Use of social media for the detection and analysis of infectious diseases in China. International Journal of Geo-Information. 2016;5(9):1–17. doi: 10.3390/ijgi5090156. [DOI] [Google Scholar]
  49. Zeng Z., Zheng X., Chen G., Yu Y. Paper presented at the 2014 IEEE 6th International Conference on Cloud Computing Technology and Science. IEEE; Piscataway, NJ: 2014, December. Spammer detection on Weibo social network. [DOI] [Google Scholar]
  50. Zhang E.X., Yang Y., Di Shang R., Simons J.J.P., Quek B.K., Yin X.F., Tey J.S. Leveraging social networking sites for disease surveillance and public sensing: The case of the 2013 avian influenza A (H7N9) outbreak in China. Western Pacific Surveillance and Response Journal: WPSAR. 2015;6(2):66–72. doi: 10.5365/WPSAR.2015.6.1.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Zhong B.L., Luo W., Li H.M., Zhang Q.Q., Liu X.G., Li W.T., Li Y. Knowledge, attitudes, and practices towards COVID-19 among Chinese residents during the rapid rise period of the COVID-19 outbreak: A quick online cross-sectional survey. International Journal of Biological Sciences. 2020;16(10):1745–1752. doi: 10.7150/ijbs.45221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zou F., Wang F.L., Deng X., Han S. Automatic identification of Chinese stop words. Research on Computing Science. 2006;18:151–162. [Google Scholar]

Articles from Data and Information Management are provided here courtesy of Elsevier

RESOURCES