Abstract
Objective
Vaccination is one of the most powerful and effective protective measures against Coronavirus disease 2019 (COVID-19). Currently, several blogs hold content on vaccination attitudes expressed on social media platforms, especially Sina Weibo, which is one of the largest social media platforms in China. Therefore, Weibo is a good data source for investigating public opinions about vaccination attitudes. In this paper, we aimed to effectively mine blogs to quantify the willingness of the public to get the COVID-19 vaccine.
Materials and Methods
First, data including 144,379 Chinese blogs from Weibo, were collected between March 24 and April 28, 2021. The data were cleaned and preprocessed to ensure the quality of the experimental data, thereby reducing it to an experimental dataset of 72,496 blogs. Second, we employed a new fusion sentiment analysis model to analyze the sentiments of each blog. Third, the public’s willingness to get the COVID-19 vaccine was quantified using the organic fusion of sentiment distribution and information dissemination effect.
Results
(1) The intensity of bloggers’ sentiment toward COVID-19 vaccines changed over time. (2) The extremum of positive and negative sentiment intensities occurred when hot topics related to vaccines appeared. (3) The study revealed that the public’s willingness to get the COVID-19 vaccine and the actual vaccination doses shares a linear relationship.
Conclusion
We proposed a method for quantifying the public’s vaccination willingness from social media data. The effectiveness of the method was demonstrated by a significant consistency between the estimates of public vaccination willingness and actual COVID-19 vaccination doses.
Keywords: Vaccination willingness, Social media, COVID-19, Sentiment analysis, Information dissemination effect
1. Introduction
COVID-19 is a highly infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1], [2], [3]. Vaccination is widely regarded by health organizations as one of the most effective ways to protect people from infectious diseases [4], [5]. Presently, the COVID-19 virus is mutating and may persist for a long time [6]. To effectively reduce the prevalence and incidence of vaccine-preventable diseases, it is necessary to achieve acceptable levels of protection and sustained herd immunity through widespread vaccination [7]. While more than ten COVID-19 vaccines are currently available, understanding public vaccination willingness is critical to a deeper understanding of the global state of COVID-19 vaccination.
During the early stages of vaccination, there are compulsory vaccination measures for people working in various public places, but for the public, vaccination must follow the basic principles of ”informed, consent, and voluntary”. The development and marketing of vaccines require several years of clinical experience and strict regulatory mechanisms before they can be produced and used [8]. As COVID-19 spreads globally, targeted vaccines must be developed to prevent viral infection with limited cycles. Due to the small number of clinical trials of COVID-19, its safety and effectiveness is not always widely recognized by the public, and the public can therefore be hesitant to new vaccines [9]. A cross-sectional study in China found that 9% of the participants reportedly refused the COVID-19 vaccination, whereas 35.5% reported vaccine hesitancy [10]. In the United Kingdom, 27% of the participants reported vaccine hesitancy and 9% were resistant to getting the COVID-19 vaccine [11]. Meanwhile, vaccination willingness is also a key factor for vaccination coverage, which is typically collected by surveys [12], [13]. However, it is costly and time consuming. With vaccination willingness changing over different events and times, the survey results may quickly become outdated [14]. Therefore, it is crucial to investigate a novel, timely, and automated method for mining the public’s vaccination willingness.
Nowadays, many methods for sharing information have been incorporated into several social media platforms with high speed and penetration [15]. Sina Weibo is one of the largest social media platforms in China. There has been an increasing number of discussions on COVID-19 vaccines, offering a promising data source to mine the public’s vaccination willingness. Furthermore, accessibility, interactiveness, and spontaneity of such data give this study a great opportunity.
In this paper, we proposed a quantitative vaccination willingness (QVW) model to mine the vaccination willingness of China, which consists of a sentiment analysis model and information dissemination effect. First, we performed a fusion sentiment analysis model based on bidirectional encoder representations from transformers-bidirectional long short-term memory network-convolutional neural network (BERT-Bi-LSTM-CNN) architecture, and obtained the sentiment probability of each blog. Second, we used an information dissemination effect index to measure the dissemination effect of each blog. Third, the combination of positive sentiment probability and corresponding information dissemination effect was used to quantify the public’s vaccination willingness.
To validate the effectiveness of our proposed method, we use it to calculate the public’s vaccination willingness estimates for 36 days from March 24 to April 28, 2021, and directly collect the actual vaccination doses. The results show that the correlation coefficient of the public’s willingness with actual vaccination doses is 0.67, indicating that the proposed model provides a reliable signal of the public’s vaccination willingness.
Our proposed method is motivated by a wealth of research in the social and computational sciences, suggesting that the influence intensity in a social network is reflected through social activities among users. Furthermore, the content posted on social media can influence other people’s decisions. The core contribution of this paper is a new methodological tool for quantifying the vaccination willingness of the public. Our method is a real-time, low-cost alternative to existing methods that researchers can use it for quantifying public willingness for different vaccinations, such as HIV, HPV, and H7N9 vaccines [16], [17], [18]. The results of this study can better help governments, policymakers, and healthcare providers take effective steps to drive a successful COVID-19 campaign.
2. Related works
At present, online social media such as Weibo, Facebook, and Twitter spread a great deal of information about the COVID-19 vaccine, which includes the public’s attitudes and opinions on vaccines as well as vaccine knowledge and popularization [19]. The wide application of social media provides a good data source for scholars to carry out research [20]. Previous studies not only employed sentiment analysis in social media data to analyze public’s attitudes and opinions about COVID-19 vaccines, but also carried out related research on COVID-19 vaccination willingness [21], [22].
Sentiment analysis on COVID-19 vaccines. As the vaccines developed and rolled out, a large and growing body of literature has performed the sentiment analysis of vaccine-related social media data to understand people’s attitude toward vaccination [23]. Several studies have focused on performing sentiment analysis on different vaccines [24], [25], aiming to mine different public’s attitudes and opinions on two or more vaccines. There are also several studies investigating that changes in public attitudes toward vaccines in one country [26] or even multiple countries [27]. Collectively, the existing literature is mainly focused on reflecting the vaccination willingness from public attitudes toward vaccines. Methodologically, scholars have used different sentiment analysis methods, such as dictionary and rule-based tools [25], [27], machine learning -based models [24], [26]. However, these methods often require a great number of complex features and their performance are susceptible to sparse data.
Vaccination willingness for COVID-19. The study of vaccination willingness has recently attracted extensive attention in the fields of medicine, management, psychology and so on. The existing literature has conducted relevant research on vaccination willingness from two aspects. (1) Several studies investigated the vaccination willingness and its potential predictors within small groups through questionnaires [28], [29]. On this basis, several scholars have performed systematic reviews and meta-analyses to estimate the vaccination willingness of the broader population [30], [31]. (2) A few of researchers proposed different social network modeling to study vaccination willingness and its influence factors, such as an epidemiological model of social networks that considered population heterogeneity and different vaccination strategies [32], a model integrating multiple criteria belief modeling with social network analysis [33]. However, from a data perspective, the sample size of questionnaires is small and the collection is time-consuming, while the social network modeling uses simulated data and lacks of real data. Online social media provide a wealth of real-world data and low-cost analytical tools for research. Therefore, the use of social media data to quantify the public vaccination willingness represents a fundamental shift in measurement method.
3. Materials and methods
This section introduces data acquisition and the proposed QVW model in detail.The overall steps of the proposed model is shown in Fig.1 .
Fig. 1.
The overall steps of the QVW model.
3.1. Data acquisition
The experimental data are from the Weibo data source pool. Weibo has more than 500 million registered users. A web crawler using Python was developed to obtain blogs on Weibo. The search keywords for the blogs were
(COVID-19 vaccine). Therefore, all Chinese-language blogs related to the keywords posted during the period from March 24 to April 28, 2021 were obtained, totaling 144,379 blogs. Furthermore, the actual data for vaccination doses were obtained on the website of the national health commission of the people’s republic of China ().
3.1.1. Data preprocessing
The collected blogs are preprocessed by Python 3.7 to ensure the quality of the experimental data. The preprocessing steps mainly include: (1) we removed blogs with duplicate content mainly to eliminate the influence of multiple duplicate blogs generated by fake accounts. (2) used regular expressions re library to delete blogs containing ”HPV + vaccine”, ”rabies + vaccine”, ”H7N9 + vaccine” and ”HIV + vaccine”. (3) deleted reposts and kept original blogs, as the latter can better express the real thoughts. (4) filtered URLs and HTML tags. (5) removed stop words from blogs using the stop word list posted by Harbin Institute of Technology (). (6) converted traditional characters to simplified characters.
3.1.2. Experimental datasets
After data preprocessing, we obtained an experimental dataset containing 72,496 blogs. 12,000 blogs were randomly selected and manually annotated by four doctoral students majoring in information systems and management, each of which annotated 3,000 blogs. Subsequently, the 3,000 blogs annotated by each annotator were split into three parts and assigned them to other annotators for re-annotation. The annotation consistency ratio was 84.2%, and 10,000 consistent blogs were retained as the dataset, containing three sentiment labels: positive (2), neutral (1), and negative (0). The annotated dataset is referred to as the PNN dataset, which was divided into training, test and validation sets in the ratio of 8:1:1. Table 1 lists the detailed statistics of the PNN dataset. Examples of ”positive”, ”neutral”, and ”negative” blogs are listed in Table 2 .
Table 1.
Statistics of the PNN dataset.
| Label | Training | Test | Validation | Total (%) |
|---|---|---|---|---|
| Positive | 3,824 | 483 | 493 | 4,800 (48.00) |
| Neutral | 316 | 38 | 33 | 387 (3.87) |
| Negative | 3,860 | 479 | 474 | 4,813 (48.13) |
| Total | 8,000 | 1,000 | 1,000 | 10,000 (100) |
Table 2.
Examples of ”positive”, ”neutral”, and ”negative” blogs.
3.2. The proposed QVW model
Given blogs with the number of likes, comments, and retweets and the corresponding blogger with the number of followers, the number of posting blogs, and the registration time. Our goal is to develop an automated method to quantify a score of the public vaccination willingness. A sentiment analysis model was first used to get the sentiment probability of each blog and the corresponding sentiment classification. Subsequently, we calculated the information dissemination effect of each blog. Finally, a score to the public vaccination willingness is quantified by combining the sentiment probability of blogs and corresponding information dissemination effect. In this subsection, the specific steps are introduced in detail.
3.2.1. Sentiment analysis model
Sentiment analysis refers to the process of determining sentiment and classifying the polarity of text content [34]. We classified blogs into three categories, namely positive, negative, and neutral sentiment. The positive blogs indicate that bloggers are optimisim about vaccination, the neutral blogs show basic information or knowledge on vaccination, and the negative blogs reflect the adverse opinions of bloggers about vaccination. The proposed sentiment analysis model consists of three parts: (1) Bidirectional encoder representations from transformers (BERT). (2) Bidirectional long short-term memory network (Bi-LSTM). (3) Convolutional neural network (CNN). Fig.2 shows the overall architecture.
Fig. 2.
Architecture of our proposed sentiment analysis model for analyzing the blogs.
BERT. The BERT pretrained language model [35] was proposed by Google in 2017, which used to dynamically generate character embedding vectors in semantic space. It employs the ”masked language model” to pretrain multiple bidirectional transformer encoders. The deep bidirectional language representation of each character can be obtained from the forward and backward directional text information.
Given a blog , the character embedding can be obtained using the BERT pretrained model as shown below:
| (1) |
where is the character embedding of .
Bi-LSTM. The Bi-LSTM layer uses two LSTMs [36] in different directions connected by the same output layer to extract the contextual features of the text. This provides the output layer with complete contextual information. The forward LSTM network learns future characteristics and gets the forward hidden state for character , and the backward LSTM network learns historical features and gets the backward hidden state for character . The formula for calculating the final state is as follows:
| (2) |
| (3) |
| (4) |
where the final state is the concatenation of and . Therefore, the feature representation of the blog generated by the Bi-LSTM layer is shown as follows:
| (5) |
CNN. The blogs were posted by different bloggers, who expressed sentimental tendencies through adjectives, adverbs and other sentimental words. The CNN layer extracts local semantic features expressed by sentimental words using the convolutional layer [37]. The specific formula is as follows:
| (6) |
Output. The feature vectors obtained from the CNN layer are considered as the input of the full connection layer, and then input into the softmax classifier. The formula is as follows:
| (7) |
where represents the sentiment probability set of the blog, denotes the weight coefficient matrix, is the corresponding bias. Additionally, model parameters were adjusted by minimizing the cross entropy, and the specific formula is as follows:
| (8) |
where D denotes the number of training samples, is the ground-truth label, is the prediction label of i-th blog.
3.2.2. Information dissemination effect
In social networks, different communication subjects will have diverse views on the same information, and the large-scale dissemination of views will affect individuals’ risk perception and produce corresponding behaviors. For example, blogs expressing a positive attitude towards vaccination published by influential individuals can use their efficient information dissemination ability to promote vaccination. Factors of dissemination effect include information content quality and information publishers’ influence. The dissemination effect of a blog is mainly reflected by its quality and the corresponding blogger influence. Fig.3 shows the main composition of blog information dissemination effect. Therefore, combining blog quality (BQ) and the corresponding blogger influence (BI), the information dissemination effect (IDE) of a blog is calculated as follows:
| (9) |
where is a hyper-parameter for balancing the contribution between of and . The value of is empirically set to 0.4.
Fig. 3.
The main composition of blog information dissemination effect.
Blog quality. The blog quality is mainly reflected by the number of comments, retweets, and likes. The more comments, retweets, and likes that a blog obtains, the more attention receives and the greater its influence. Therefore, the calculation formula for blog quality using different metric contributions is as follows:
| (10) |
where is the blog influence of blogger denote the number of comments, retweets, and likes of the corresponding blog respectively.
Blogger influence. First, the increase in a blogger’s attention is a manifestation of its expanded influence. We use the PageRank (PR) algorithm [38] to calculate the blogger popularity. Second, the blogger’s active level reflects the blogger’s activity status on social platforms, and the frequency of bloggers following others, posting blogs, and increasing followers are regarded as influence factors of the blogger active level. Therefore, we define the influence of a blogger as:
| (11) |
where is the blogger influence of blogger represents the i-th blogger popularity, denotes the frequency of bloggers following others, posting blogs, and increasing followers of i-th blogger, k is the number of influence factors, and denotes the weight of the j-th influence factor of blogger i. In Eq (11), the calculation formulas of and are as follows:
| (12) |
| (13) |
where is the blogger’s fan set, is the follower numbers of blogger is set to 0.85 according to the traditional PageRank algorithm [38], denotes the time of data acquisition, is the blogger i registered time, is the number of j-th influence factor of blogger i active level.
When determining the values of in Eq (11), we used the analytic hierarchy process [39] to calculate the weight of blogger activity level. The relevant weight determination steps are illustrated in the supplementary material A.
3.2.3. Vaccination willingness
To quantify a score for vaccination willingness, the model first calculates the sentiment probability of each blog using Eq (7). Specifically, only the positive sentiment probability is selected for the subsequent stage. Next, the model quantifies the corresponding IDE using Eq (9). Finally, a score is calculated by combining the sentiment probability and corresponding IDE. We denote this score as the public’s vaccination willingness (PVW). The core hypothesis is that the higher the PVW, the more vaccination doses. The specific calculation formulas are as follows:
| (14) |
| (15) |
| (16) |
where denotes the positive probability of i-th blog, and represent the number of positive and negative blogs, .
3.3. Experimental parameter settings
Experimental parameter settings can directly affect the experimental results. To complete the experiments efficiently, the proposed QVW model was developed on Ubuntu 20.10 using Python in the TensorFlow framework. The relevant parameter settings are shown in Table 3 .
Table 3.
Experimental parameter settings.
| Parameter | Value |
|---|---|
| (BERT) Character embedding size | 128 |
| Bi-LSTM layer | 1 |
| Bi-LSTM hidden size | 128 |
| CNN sliding window size | 3, 4, 5 |
| CNN sliding window number | 128 |
| CNN pooling method | Max pooling |
| Initial learning rate | 3e-4 |
| Optimization | Adam |
| Dropout | 0.15 |
| Batch_size | 128 |
| 0.1919 | |
| 0.1744 | |
| 0.6337 |
4. Results
This section begins with an analysis of blogs collected from March 24 to April 28, 2021, to understand public sentiments on vaccination. Furthermore, PVW estimates are quantified using the proposed QVW model.
4.1. Sentiment analysis
4.1.1. Model evaluation
In order to evaluate the performance of the proposed sentiment analysis model, accuracy and F1-score are used during the test stage as described in Eqs. (17) to (20).
| (17) |
| (18) |
| (19) |
| (20) |
where , and FN are true positive, true negative, false positive, and false negative, respectively. To verify the effectiveness and necessity of each module on the sentiment analysis model, we designed three variant ablation study models. Details are shown below:
BERT: This model is the baseline.
BERT + Bi-LSTM: This model consists of BERT pre-trained model and Bi-LSTM layer.
BERT + Bi-LSTM + CNN: This is our proposed model.
We performed the different ablation study on the PNN dataset, and the experimental results of the ablation study are shown in Table 4 . The baseline achieves good performance by using the BERT model. The performance of BERT + Bi-LSTM model is significantly improved by adding the Bi-LSTM layer to the baseline model. The better performance is owing to the use of Bi-LSTM layer to extract bi-directional text information, which can significantly improve the performance of feature representation. Our proposed model achieves the optimal performance.
Table 4.
The experimental results of the ablation study.
| Model | Accuracy | F1-Score |
|---|---|---|
| BERT (Baseline) | 0.8542 | 0.8537 |
| BERT + BiLSTM | 0.8915 | 0.8907 |
| BERT + BiLSTM + CNN | 0.9129 | 0.9128 |
4.1.2. Numbers of sentiment criteria
The proposed sentiment analysis model categorized the collected blogs into three categories: positive, neutral, and negative. The proportion of blogs in each category are shown in Fig.4 . After analysis, we found 33,462 positive blogs (46.2%), 34,663 negative blogs (47.8%), and 4,371 neutral blogs (6.0%). The positive and negative blogs are in the majority and in roughly equal numbers. This indicates that mixed public attitudes toward COVID-19 vaccines and vaccination in general. The neutral blogs that merely contained information related to the COVID-19 vaccine were lower in frequency.
Fig. 4.

The proportion of sentiment criteria.
4.1.3. Timeline of sentiments
Fig. 5 shows how sentiments changed or shuffled over time. It can be inferred that most of the feelings expressed by the public falls under the positive and negative, with neutral feelings for a very small minority. The highest positive sentiment was roughly 61% on April 10, 2021 and the lowest positive sentiment was approximately 33% on April, 3 and 11, 2021. To understand how and what public discuss in the above three days, we performed topic modeling for the blogs of these three days and selected the hottest topic as shown in Table 5 . The introduction of topic modeling is illustrated in Appendix A. We further performed topic modeling for positive, neutral, and negative blogs separately (See Appendix A).
Fig. 5.

The percentage of positive, neutral and negative sentiment in blogs.
Table 5.
The hottest topic discussed by blogs on , and of April.
As shown in Table 5, on April 3, the hottest topic was
(“Pregnant and lactating women can be vaccinated against COVID-19”). Most bloggers expressed strong skepticism about this news from some organizations and conveyed negative sentiments. On April 10, the hottest topic was
(“Wenhong Zhang called on people to take the COVID-19 vaccine in response to the national policy”). Wenhong Zhang is an expert of infectious disease. Most blogs gave positive sentiments about this topic and expressed willingness or action on vaccination. On April 11, the hottest topic was
(“Force everyone to be vaccinated against COVID-19”). Many businesses and communities made it mandatory for all employees to be vaccinated against COVID-19. Most blogs opposed and were unsupportive of this topic, there conveying negative sentiments.
4.1.4. Sentiment of segmented population
To better understand the sentiment of different segmented populations toward COVID-19 vaccines or vaccination, we segmented all bloggers by three dimensions, including gender, certification, and age. As shown in Table 6 , first, the proportion of males with positive sentiments (50.60%) was higher than that of females (42.16%), and the proportion of females with negative sentiments was higher (51.44%). Second, the proportion of official bloggers with positive sentiments was higher (60.24%) than that of public bloggers (44.47%). Neutral sentiments were highest among official bloggers (13.75%), as some of the official blogs expressed knowledge about vaccines or vaccination. Third, among the three age groups, the ”40-” age group had the most proportion of bloggers with positive sentiments (50.22%), while the ”19–39” age group had the most proportion of bloggers with negative sentiments (49.39%), followed by the ”9–18” age group (47.38%).
Table 6.
The number and proportion of sentiment criteria in the segmented population.
| Variables | n (%) | Positive (%) | Neutral (%) | Negative (%) | |
|---|---|---|---|---|---|
| Gender | Male | 34,327 (47.35) | 17,370 (50.60) | 1,929 (5.62) | 15,028 (43.78) |
| Female | 38,169 (52.65) | 16,092 (42.16) | 2,442 (6.40) | 19,635 (51.44) | |
| Certification | Official | 7,763(10.71) | 4,677 (60.24) | 1,067 (13.75) | 2,019 (26.01) |
| Public | 64,728(89.29) | 28,785 (44.47) | 3,304 (5.10) | 32,644 (50.43) | |
| Age | 9–18 | 4,267 (5.89) | 1,972 (46.21) | 273 (6.41) | 2,022 (47,38) |
| 19–39 | 17,867 (24.64) | 8,097 (45.32) | 945 (5.29) | 8,825 (49.39) | |
| 40- | 6,505 (8.97) | 3,267 (50.22) | 445 (6.84) | 2,793 (42.94) | |
| Missing data | 43,857 (60.50) | - | - | - |
Note: Official certification includes government, enterprise, organization, and media certification. Public certification refers to the user certificate on Weibo through ID card or other credentials.
4.2. Public’s vaccination willingness
4.2.1. Timeline of public’s vaccination willingness
The change tendency of the PVW quantified by blogs and actual vaccination doses (AVD) from March, 24 to April, 28 is shown in Fig.6 . For PVW estimate and AVD, the colors used are red and green, respectively. The PVW estimate for April 10 was as high as about 0.70, while the PVW estimate for April 02 was as low as about 0.34. The average PVW estimate is about 0.53. It is clear that the PVW estimate and AVD have a certain correlation.
Fig. 6.
The change tendency of the PVW estimates quantified by blogs and the actual vaccination doses (AVD).
4.2.2. Validation results
To evaluate the overall accuracy of PVW, we computed the pearson correlation coefficient between the actual vaccination doses (AVD) and PVW estimates daily. Fig.7 shows scatter plots of PVW versus AVD for verifying its relevance. The correlation coefficient between PVW versus AVD for over 36 days is 0.67, which follows the positive skew. This implies that there is a strong correlation between them.
Fig. 7.

Scatter plots of PVW versus AVD.
To demonstrate the effectiveness of QVW model, we designed some of measurement methods across a range of IDE to determine how they affect correlation. Furthermore, we computed the pearson correlation coefficients between PVW estimates and AVD of days +1, t + 2, and t + 3 to observe the delayed impact of blogs. The baseline only considers the positive sentiment probability. Table 7 displays correlations between PVW versus AVD for the several measurement methods of IDE. Based on the baseline, four different measurement methods of considering IDE outperform the baseline. It indicates the effectiveness of IDE. The correlations of PVW estimates and AVD quantified by our proposed method are optimal of days +1 (r = 0.69, 0.86). The PVW estimates quantified by the measurement of blog quality (BQ) has the largest correlation value with AVD of days t + 2 (r = 0.62). In aggregate, the correlations appear robust to these algorithmic decisions, indicating that the value of this method is not limited to one particular implementation. Note the experimental results of our method, the correlation coefficient between the PVW estimates and AVD of day t + 1 is the highest (r = 0.86). The consistency decreases as the number of days of delay increases, and virtually no consistency is recorded by the three days delayed. The same is true for the other three methods, suggesting that the day’s blogs don’t have a timely impact, but are having the best impact the next day, and the impact will gradually decrease over time.
Table 7.
Pearson correlation coefficients (r) for PVW estimates of day t and AVD of day t + 1, t + 2, and t + 3 across a range of information dissemination effect measurement methods.
| Method | t | t + 1 | t + 2 | t + 3 |
|---|---|---|---|---|
| Senti (Baseline) | 0.49 | 0.69 | 0.43 | 0.16 |
| Senti + BQ | 0.53 | 0.71 | 0.62 | 0.27 |
| Senti + BP | 0.58 | 0.72 | 0.47 | 0.18 |
| Senti + BI (BP + BAL) | 0.61 | 0.79 | 0.53 | 0.26 |
| Ours (Senti + BQ + BI) | 0.67 | 0.86 | 0.58 | 0.23 |
Note: Our proposed method is highlighted in bold. The largest values in each column are in italics. Senti is the positive sentiment probability. BP represents the blogger popularity calculated by PageRank algorithm. BAL refers to blogger activity level.
5. Discussion
Our analysis shows different perceptions of blogs on the COVID-19 vaccine or vaccination. Our results not only help understand individual perceptions, but also provide important implications for understanding the public stance on current public health knowledge about COVID-19 vaccines. Furthermore, we investigated a novel method to mine the public’s vaccination willingness. Experimental results also demonstrate the feasibility of the proposed approach. We encourage future researchers to verify the overall accuracy of PVW using different validation methods, such as continued questionnaires surveys.
Rich textual data from social media can be used to better identify public attitudes and views on COVID-19 vaccines. As mentioned in Section 3.2, we propose a QVW model to analyze the blogs of 72,496 bloggers from March 24, to April 28, 2021, to mine the public’s vaccination willingness. The vaccination doses also depend on health conditions, social economy, cultures, and other factors, beyond the scope of this study and will be considered in our future research work. Nevertheless, among the blogs obtained, a significant consistency was observed between AVD and the public’s willingness quantified by the blogs (see subSection 4.2), illustrating the effectiveness of our proposed QVW model.
The intensity of bloggers’ sentiments on COVID-19 vaccines changed with time (see Fig.5. Notably, the positive and negative sentiment intensities were roughly equal on most days. However, there were differences in the intensity of positive sentiment on some days. For example, the highest positive sentiment is about 61% on April, 10 whereas the lowest positive sentiment is about 33% on April, 03 and April, 11, with a range of nearly 30%. Therefore, the blogger sentiment intensity about the COVID-19 vaccine fluctuates.
Furthermore, one can see a meaningful correlation between PVW and AVD (see Fig.7. The correlation coefficient between PVW and AVD is 0.67. The row of plots indicates that PVW for AVD follows the positive skew. That is, while most days have low-to-moderate AVD, there are a few low or high PVW estimates. These include April, 03 and April, 10 (see Fig.6. We encourage future researchers to explore more relations between PVW and AVD, such as causality. Additionally, they may also explore whether AVD can be predicted using PVW estimates.
5.1. Implications for theory
The theoretical significance of this study is in at least two important ways. First, we explored the sentiments and perceptions toward vaccination in China using information from social media analytic during pandemics. Although a few researchers have explored social media-based sentiment analysis on COVID-19, vaccines, and vaccination in different countries [10], [40], [41], this study is the first to analyze public sentiments and perceptions on vaccination in China using social media data. Furthermore, we extracted main topics of positive, neutral, and negative blogs during test period using the LDA algorithm and analyzed the reasons for the extreme value. We encourage future researchers to explore fine-grained topics using different aspects, such as healthcare advisory, anxiety, entertainment, industrial, politics, social support, and economy.
Second, we mined the public vaccination willingness in China from social media data using our proposed QVW model. Nowadays, social media has become an integral part of society, with millions of users voicing their opinions that before now, were left unheard. Thus, this opens up a wide field of analysis that was impossible before. The findings indicate that social media data is a good source for mining public opinions. The vaccination willingness can be quantified by social media data during unprecedented times like a pandemic. The sentiment analysis and IDE index were combined innovatively to quantify the PVW estimates. A few researchers have explored vaccination willingness using questionnaires, but compared with these methods, our approaches are timely, convenient and accuracy. Similarly, we encourage future researchers to propose better methods for quantifying PVW and explore more direct connections between the PVW estimates and AVD.
5.2. Implications for practice
The findings of this study have a number of important implications for future practice. Although this study focuses on mining COVID-19 vaccination willingness of China, the proposed method may have a generalization to the following aspects. To begin, the proposed method is available to explore and analyze public attitudes and opinions on vaccination, which is in preparation for the promotion of vaccination. Furthermore, the researchers can use the proposed method to mine the public vaccination willingness in different countries using different social media platforms, such as Twitter. Second, the proposed method can timely monitor the public willingness to take new vaccines, as COVID-19 may persist in mutating and some never-before-seen viruses may emerge, then targeted new vaccines will be rolled out.
The results of this study also can help governments, policymakers, and healthcare providers better understand the dynamic interrelationships between the overall sentiment of society and AVD. This may be used to design better strategies to drive a successful COVID-19 campaign. Specifically, policymakers and governments can devise appropriate policies and decisions to control public sentiment and reduce effects of adverse vaccine events on social sentiment. Furthermore, healthcare providers can better understand the public’s attitudes and willingness toward vaccination to improve the quality of service. Influential bloggers should actively advocate for people to get vaccinated against COVID-19. The public must engage more forcefully and constructively in health social media.
6. Conclusion
The COVID-19 pandemic is one of the most significant public health problems globally that disrupted the lives of millions in many countries. All governments and researchers around the world are trying to reduce the adverse effects of the disease. Currently, vaccination is the most effective way to fight the COVID-19 pandemic. In this paper, we propose a QVW model to mine the public’s vaccination willingness in China using social media data. To verify the effectiveness of our proposed method, we calculated the correlation coefficients of the public’s willingness and AVD (r = 0.67), providing a reliable signal of the public’s vaccination willingness estimates.
We hope the findings will drive a successful COVID-19 campaign. Specifically, the decision-makers and policy developers can reasonably apply social media data to strengthen vaccination willingness and reduce vaccine hesitancy and opposition. Public health authorities may be able to work through Weibo and others to collaborate with other official media outlets to increase positive information and reduce negative information about vaccines. When the PVW estimates are persistently high, public health authorities may appropriately improve the vaccine supply capacity and increase the number of medical workers for vaccination. On the contrary, when the PVW estimates continue to be low, public health authorities may find out the reasons and develop effective strategies and interventions. Furthermore, understanding public attitudes and vaccination willingness could help public health authorities reinforce optimistic comments within positive blogs while refuting aggressive language spreading false information within negative blogs.
One limitation of our study is ignoring the effects of COVID-19 vaccine news and statistics on the overall sentiment. Another limitation is the lack of consideration of government policies, vaccine supply capacity, individual’s perceived risk of disease and other factors that influence vaccination willingness. In the future, we will conduct more research to consider more influencing factors of vaccination willingness and find the vaccination willingness of segmented population, such as gender, age, education level, region and so on.
7. Summary points
What was already know on the topic?.
-
•
Vaccination is one of the most powerful and effective protective measures against COVID-19, however, a timely, inexpensive, and automated method for mining public vaccination willingness is desperately needed.
-
•
Social media has become an integral part of society, with millions of users voicing their opinions that before now, were left unheard. It provides a good source for mining public opinions.
-
•
The influence intensity in a social network is reflected through social activities among users. Furthermore, the content posted on social media can influence other people’s decisions.
What this study added to our knowledge?.
-
•
The public vaccination willingness can be quantified using social media data, suggesting that it need not rely on the questionnaire surveys to obtain vaccination willingness.
-
•
The proposed methodological tool to quantify the vaccination willingness is groundbreaking, which is instructive for future work improvement.
8. CRediT authorship contribution statement
Jiaming Ding: Conceptualization, Data Collection, Methodology, Software, Writing-Original Draft. Anning Wang: Conceptualization, Writing - Review & Editing, Investigation. Qiang Zhang: Supervision, Project administration, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by grants from the National Natural Science Foundation of China (Nos. 72101078 and 72171069) and the Fundamental Research Funds for the Central Universities (No.JZ2021HGTA0131).
Footnotes
Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.ijmedinf.2022.104941.
Appendix A. Topic modeling of COVID-19 vaccine-related blogs
Topic modeling is an effective text-mining tool that can discover hidden semantic structures in texts; it is an unsupervised method that can extract thematic structures from large amounts of textual content. Its main advantage is that the processing of huge quantities of blog content can be performed effectively to generate correlations between words belonging to the same topology [42]. Many topic models are available, and researchers chose Latent Dirichlet Allocation (LDA) as the most representative model for the study [43]. The LDA algorithm requires an input of the number of optimal topics that are selected by evaluating the perplexity of different topic number models. We performed the LDA algorithm on positive, neutral, and negative blogs using the genism library of Python 3.7. This algorithm ultimately produced the number of topics for positive, neutral, and negative blogs to be five, three, and four as shown in Table A1 .
Table A1.
Top words and example blogs for each positive, neutral, and negative topic.
![]() |
Note: The blogs are paraphrased to protect users’ privacy (The fourth column).
Positive topics. The positive topics reflect the optimistic opinions of bloggers about vaccination and new advances in vaccines, including ”positive emotion around vaccination,” ”global cooperation and support,” ”vaccine progress around the world,” ”progress on vaccination,” and ”opinion leader effect.” First, topic 1 focuses on positive public attitudes and actions toward COVID-19 vaccination related to personal experiences, received information, and personal values that have a positive impact on other people’s vaccine perceptions, which in turn promotes their vaccination behavior. Second, topics 2 and 3 deal with vaccines as a global issue related to the globality of COVID-19 vaccines. These topics involve active blogs about vaccine progress and global cooperation and support; these blogs release research progress on vaccines to the public in an attempt to relieve their concerns to a certain extent. Third, topic 4 centers on the progress of vaccination, including the report on the number of vaccinations and vaccine supply. Fourth, topic 5 covers the active blogs of opinion leaders on public vaccination. Opinion leaders can greatly speed up information dissemination, which in turn affects the public’s awareness of vaccines and vaccination behavior.
Neutral topics. The neutral topics focus on promoting public knowledge about vaccines and vaccination, including ”educating communities,” ”instruction on getting vaccines,” and ”vaccine rollout.” First, topic 6 disseminates important information and knowledge about vaccines and answers related questions through blogs, part of which was spreading information on vaccine knowledge sharing through webinars. Second, topic 7 provides guidance on access to vaccines, which includes information dissemination from health authorities at all levels to guide the public on vaccination. Third, as vaccine development advances, many vaccine-oriented blogs are rolling out, as described in topic 8.
Negative topics. The negative topics reflect the adverse opinions of bloggers about vaccination and compulsory measures, including ”negative emotion around vaccination,” ”special population restrictions,” ”compulsory measure,” and ”adverse reactions.” First, topic 9 focuses on negative public attitudes and actions toward the COVID-19 vaccine, partly due to concerns about its safety and efficacy, and partly due to lack of vaccine knowledge. Second, topics 10 and 11 include vaccines as controversial topics. These topics express confusion and dissatisfaction among organizations or institutions with mandatory vaccination policies. Related blogs spread primarily negative attitudes about vaccines to the public, hindering the progress of vaccination. Third, topic 12 expresses the adverse reactions after vaccination, such as arm soreness, drowsiness, allergies, fever, etc. These real feelings can cause worry and fear in the unvaccinated population, which can change their vaccinated behavior.
Supplementary material
The following are the Supplementary data to this article:
References
- 1.Huang C., Wang Y., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet. 2020;395(10223):497–506. doi: 10.1016/S0140-6736(20)30183-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Goldman R.D., Yan T.D., et al. Caregiver willingness to vaccinate their children against COVID-19: Cross sectional survey. Vaccine. 2020;38(48):7668–7673. doi: 10.1016/j.vaccine.2020.09.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Leigh J.P., Moss S.J., et al. Factors affecting COVID-19 vaccine hesitancy among healthcare providers in 23 countries. Vaccine. 2022;40(31):4081–4089. doi: 10.1016/j.vaccine.2022.04.097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yaqub O., Castle-Clarke S., Sevdalis N., Chataway J. Attitudes to vaccination: a critical review. Soc. Sci. Med. 2014;112:1–11. doi: 10.1016/j.socscimed.2014.04.018. [DOI] [PubMed] [Google Scholar]
- 5.Biswas N., Mustapha T., Khubchandani J., Price J.H. The nature and extent of COVID-19 vaccination hesitancy in healthcare workers. J. Commun. Health. 2021;46(6):1244–1251. doi: 10.1007/s10900-021-00984-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Callaway E. The coronavirus is mutating — does it matter? Nature. 2020;585(7824):174–177. doi: 10.1038/d41586-020-02544-6. [DOI] [PubMed] [Google Scholar]
- 7.Yoda T., Katsuyama H. Willingness to receive COVID-19 vaccination in Japan. Vaccines. 2021;9(1):48. doi: 10.3390/vaccines9010048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mellet J., Pepper M.S. A COVID-19 vaccine: big strides come with big challenges. Vaccines. 2021;9(1):39. doi: 10.3390/vaccines9010039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Trent M., Seale H., Chughtai A.A., Salmon D., MacIntyre C.R. Trust in government, intention to vaccinate and COVID-19 vaccine hesitancy: a comparative survey of five large cities in the United States, United Kingdom, and Australia. Vaccine. 2021;40(17):2498–2505. doi: 10.1016/j.vaccine.2021.06.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang C., Han B., et al. Vaccination willingness, vaccine hesitancy, and estimated coverage at the first round of COVID-19 vaccination in China: A national cross-sectional study. Vaccine. 2021;39(21):2833–2842. doi: 10.1016/j.vaccine.2021.04.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sherman S.M., Smith L.E., et al. COVID-19 vaccination intention in the UK: Results from the COVID-19 Vaccination Acceptability Study (CoVAccS), a nationally representative cross-sectional survey. Human Vaccines & Immunotherapeutics. 2021;17(6):1612–1621. doi: 10.1101/2020.08.13.20174045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhu X.-M., Yan W., et al. Patterns and influencing factors of COVID-19 vaccination willingness among college students in China. Vaccine. 2022;40(22):3046–3054. doi: 10.1016/j.vaccine.2022.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kessels R., Luyten J., Tubeuf S. Willingness to get vaccinated against Covid-19 and attitudes toward vaccination in general. Vaccine. 2021;39(33):4716–4722. doi: 10.1016/j.vaccine.2021.05.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Culotta A., Cutler J. Mining Brand Perceptions from Twitter Social Networks. Marketing science. 2016;35(3):343–362. doi: 10.1287/mksc.2015.0968. [DOI] [Google Scholar]
- 15.Martí P., Serrano-Estrada L., Nolasco-Cirugeda A. Social media data: Challenges, opportunities and limitations in urban studies. Comput. Environ. Urban Syst. 2019;74:161–174. doi: 10.1016/j.compenvurbsys.2018.11.001. [DOI] [Google Scholar]
- 16.Connochie D., Tingler R.C., Bauermeister J.A. Young men who have sex with men’s awareness, acceptability, and willingness to participate in HIV vaccine trials: Results from a nationwide online pilot study. Vaccine. 2019;37(43):6494–6499. doi: 10.1016/j.vaccine.2019.08.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mascaro V., Pileggi C., Currà A., Bianco A., Pavia M. HPV vaccination coverage and willingness to be vaccinated among 18–30 year-old students in Italy. Vaccine. 2019;37(25):3310–3316. doi: 10.1016/j.vaccine.2019.04.081. [DOI] [PubMed] [Google Scholar]
- 18.Wu S., Su J., et al. Willingness to accept a future influenza A(H7N9) vaccine in Beijing. China, Vaccine. 2018;36(4):491–497. doi: 10.1016/j.vaccine.2017.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Muric G., Wu Y., Ferrara E., et al. COVID-19 vaccine hesitancy on social media: building a public twitter data set of antivaccine content, vaccine misinformation, and conspiracies. JMIR public health and surveillance. 2021;7(11):e30642. doi: 10.2196/30642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Salathé M., Khandelwal S. Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS computational biology. 2011;7(10):e1002199. doi: 10.1371/journal.pcbi.1002199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yousefinaghani S., Dara R., Mubareka S., Papadopoulos A., Sharif S. An analysis of COVID-19 vaccine sentiments and opinions on Twitter. International Journal of Infectious Diseases. 2021;108:256–262. doi: 10.1016/j.ijid.2021.05.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lim L.J., Lim A.J., Fong K.K., Lee C.G. Sentiments regarding COVID-19 vaccination among graduate students in Singapore. Vaccines. 2021;9(10):1141. doi: 10.3390/vaccines9101141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Monselise M., Chang C.-H., Ferreira G., Yang R., Yang C.C., et al. Topics and sentiments of public concerns regarding COVID-19 vaccines: social media trend analysis. Journal of Medical Internet Research. 2021;23(10):e30765. doi: 10.2196/30765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Nurdeni D.A., Budi I., Santoso A.B. in: 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT) IEEE; 2021. Sentiment analysis on Covid19 vaccines in Indonesia: from the perspective of Sinovac and Pfizer; pp. 122–127. [DOI] [Google Scholar]
- 25.Marcec R., Likic R. Using twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines. Postgrad. Med. J. 2022;98(1161):544–550. doi: 10.1136/postgradmedj-2021-140685. [DOI] [PubMed] [Google Scholar]
- 26.Zhang Z., Feng G., Xu J., Zhang Y., Li J., Huang J., Akinwunmi B., Zhang C.J., Ming W.-K., et al. The impact of public health events on COVID-19 vaccine hesitancy on Chinese social media: national infoveillance study. JMIR Public Health and Surveillance. 2021;7(11):e32936. doi: 10.2196/32936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yin H., Song X., Yang S., Li J. Sentiment analysis and topic modeling for COVID-19 vaccine discussions. World Wide Web. 2022;25(3):1067–1083. doi: 10.1007/s11280-022-01029-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Brailovskaia J., Schneider S., Margraf J. To vaccinate or not to vaccinate!? Predictors of willingness to receive Covid-19 vaccination in Europe, the US, and China. PloS one. 2021;16(12):e0260230. doi: 10.1371/journal.pone.0260230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chen M., Li Y., Chen J., Wen Z., Feng F., Zou H., Fu C., Chen L., Shu Y., Sun C. An online survey of the attitude and willingness of Chinese adults to receive COVID-19 vaccination. Human Vaccines & Immunotherapeutics. 2021;17(7):2279–2288. doi: 10.1080/21645515.2020.1853449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.K.R. Nehal, L.M. Steendam, M. Campos Ponce, M. van der Hoeven, G.S.A. Smit, Worldwide vaccination willingness for COVID-19: a systematic review and meta-analysis, Vaccines 9 (10) (2021) 1071, doi:10.3390/vaccines9101071. [DOI] [PMC free article] [PubMed]
- 31.Shao W., Chen X., Zheng C., Wang G., Zhang B., Zhang W. Pneumococcal vaccination coverage and willingness in mainland China. Tropical Medicine & International Health. 2022;27(10):864–872. doi: 10.1111/tmi.13809. [DOI] [PubMed] [Google Scholar]
- 32.Markovič R., Šterk M., Marhl M., Perc M., Gosak M. Socio-demographic and health factors drive the epidemic progression and should guide vaccination strategies for best COVID-19 containment. Results in Physics. 2021;26:104433. doi: 10.1016/j.rinp.2021.104433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ni L., Chen Y.-W., de Brujin O. Towards understanding socially influenced vaccination decision making: An integrated model of multiple criteria belief modelling and social network analysis. Eur. J. Oper. Res. 2021;293(1):276–289. doi: 10.1016/j.ejor.2020.12.011. [DOI] [Google Scholar]
- 34.Medhat W., Hassan A., Korashy H. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal. 2014;5(4):1093–1113. doi: 10.1016/j.asej.2014.04.011. [DOI] [Google Scholar]
- 35.J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805Arxiv:1810.04805.
- 36.Hochreiter S., Schmidhuber J. Long short-term memory. Neural computation. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.173. [DOI] [PubMed] [Google Scholar]
- 37.S. Albawi, T.A. Mohammed, S. Al-Zawi, Understanding of a convolutional neural network, in: 2017 international conference on engineering and technology (ICET), IEEE, 2017, pp. 1–6, doi:10.1109/ICEngTechnol.2017.8308186.
- 38.Page L., Brin S., Motwani R., Winograd T. Stanford InfoLab; 1999. The PageRank citation ranking: bringing order to the Web. Tech. rep. [Google Scholar]
- 39.Vaidya O.S., Kumar S. Analytic hierarchy process: An overview of applications. European Journal of operational research. 2006;169(1):1–29. doi: 10.1016/j.ejor.2004.04.028. [DOI] [Google Scholar]
- 40.Manguri K.H., Ramadhan R.N., Amin P.R.M. Twitter sentiment analysis on worldwide COVID-19 outbreaks. Kurdistan Journal of Applied Research. 2020;5(3):54–65. doi: 10.24017/covid.8. [DOI] [Google Scholar]
- 41.Praveen S., Ittamalla R., Deepak G. Analyzing the attitude of Indian citizens towards COVID-19 vaccine – A text analytics study. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2021;15(2):595–599. doi: 10.1016/j.dsx.2021.02.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Vayansky I., Kumar S.A. A review of topic modeling methods. Inform. Syst. 2020;94:101582. doi: 10.1016/j.is.2020.101582. [DOI] [Google Scholar]
- 43.Jelodar H., Wang Y., Yuan C., Feng X., Jiang X., Li Y., Zhao L. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 2019;78(11):15169–15211. doi: 10.1007/s11042-018-6894-4. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





