Skip to main content
Digital Health logoLink to Digital Health
. 2023 Feb 19;9:20552076231158033. doi: 10.1177/20552076231158033

Anti-vaccination attitude trends during the COVID-19 pandemic: A machine learning-based analysis of tweets

Quyen G To 1,, Kien G To 2, Van-Anh N Huynh 2, Nhung TQ Nguyen 3, Diep TN Ngo 2, Stephanie Alley 1, Anh NQ Tran 2, Anh NP Tran 2, Ngan TT Pham 2, Thanh X Bui 2, Corneel Vandelanotte 1
PMCID: PMC9941594  PMID: 36825077

Abstract

Objective

Vaccine hesitancy has been ranked by the World Health Organization among the top 10 threats to global health. With a surge in misinformation and conspiracy theories against vaccination observed during the COVID-19 pandemic, attitudes toward vaccination may be worsening. This study investigates trends in anti-vaccination attitudes during the COVID-19 pandemic and within the United States, Canada, the United Kingdom, and Australia.

Methods

Vaccine-related English tweets published between 1 January 2020 and 27 June 2021 were used. A deep learning model using a dynamic word embedding method, Bidirectional Encoder Representations from Transformers (BERTs), was developed to identify anti-vaccination tweets. The classifier achieved a micro F1 score of 0.92. Time series plots and country maps were used to examine vaccination attitudes globally and within countries.

Results

Among 9,352,509 tweets, 232,975 (2.49%) were identified as anti-vaccination tweets. The overall number of vaccine-related tweets increased sharply after the implementation of the first vaccination round since November 2020 (daily average of 6967 before vs. 31,757 tweets after 9/11/2020). The number of anti-vaccination tweets increased after conspiracy theories spread on social media. Percentages of anti-vaccination tweets were 3.45%, 2.74%, 2.46%, and 1.86% for the United States, the United Kingdom, Australia, and Canada, respectively.

Conclusions

Strategies and information campaigns targeting vaccination misinformation may need to be specifically designed for regions with the highest anti-vaccination Twitter activity and when new vaccination campaigns are initiated.

Keywords: vaccine hesitancy, stance analysis, neural network, deep learning, twitter, social media

Introduction

Since the declaration of the COVID-19 pandemic by the World Health Organization (WHO) on 11 March 2020,1 over 429 million cases and 5.9 million deaths were confirmed in 224 countries/territories worldwide (24 February 2022).2 In many countries, social distancing and restrictions have been imposed to prevent the spread of the virus, relaxed when the situation improved and re-imposed when cases increased again. This cycle has disrupted life and caused significant negative impacts on health, social well-being, and the economy.3 For life to go back to normal, mass vaccination is an important strategy. A vaccine coverage above the herd immunity threshold could help control the spread of the virus.4 However, vaccine hesitancy has been a concern even before the COVID-19 pandemic, being ranked by WHO among the top 10 threats to global health.5

Attitudes toward vaccination varies by regions and exposure to vaccine-related information. For example, studies have shown that willingness to vaccinate declined in Australia except in lockdown areas where strict restrictions and/or the presence of the virus itself had positive effects on people's willingness to vaccinate.6,7 Additionally, exposure to misinformation and conspiracy theories was found to be a major reason for vaccine hesitancy.8,9 Beliefs in conspiracy theories and mistrust in official sources of information were also associated with vaccine hesitancy.10 Therefore, having information at regional levels is necessary and useful for building effective intervention strategies.

Twitter is an important source of information to study vaccination hesitancy because while social media, including Twitter, have been considered as effective platforms for communication among individuals and organizations, they have also been used as tools to quickly spread falsified information and conspiracy theories about vaccines.11 A report found that millions of people were following the Facebook and Youtube accounts of those against vaccination in 2019; since then, millions more have been subscribing to these accounts.12 Studies also found that the use of social media for health information was associated with a lower willingness to vaccinate.12,13 A survey among 1663 people found that people were more hesitant to vaccinate if they relied on social media for information about the pandemic.12

In an effort to prevent a wide spread of misinformation, social media companies have been trying to remove false claims and put warning labels on misinformation posts.14,15 However, it is unclear whether these actions are sufficient to control the spread of misinformation.16 Examining the trend of anti-vaccination attitude over time may provide some indications of the effectiveness of these interventions and insight into how people reacted to major vaccine-related news on social media.

Previous studies have applied sentiment analysis to examine vaccination-related tweets1719; however, there is a lack of studies applying stance analysis to examine anti-vaccination attitudes on Twitter. In stance analysis, a tweet is classified as being in favor or against vaccination whereas in sentiment analysis, a tweet is classified as having a positive or negative sentiment.20 While the two are quite similar, a key difference is that a negative tweet may not always be against vaccination, and a positive tweet not always be in favor of vaccination. This is particularly relevant during the COVID-19 pandemic, when people have expressed negative views of the vaccination rollout towards governments without expressing anti-vaccination opinions.

Given the large number of tweets and the text-based format, an effective analysis approach is to use machine learning (ML) techniques. Advanced ML techniques such as deep learning (or deep neural networks) have been shown to perform well on natural language processing tasks.21,22 A few studies have used deep learning to conduct stance analyses using tweets related to HPV vaccination.23,24 These models had F1-scores (calculated based on the precision or positive predictive value and recall or sensitivity) of between 0.70 and 0.81,23 and F1-scores of less than 0.77.24 Studies on vaccination stance and sentiment analysis were also conducted during the COVID-19 pandemic. One study using stance analysis found that Bidirectional Encoder Representations from Transformers (BERTs) outperformed LSTM and XGBoost achieving an F1-score of 0.98.25 Another study also used a BERT-based model to detect tweets with misinformation and achieved an F1-score of 0.92.26 In addition, sentiment analysis on vaccination tweets was conducted using a Twitter Modified BERT based model that reduced the training time from 64 hr to 17 hr and achieved an accuracy of 0.89.27 One study also used four models, i.e., mBERT-base, BioBERT, ClinicalBERT, and BERTurk, to classify sentiments of tweets for four aspects of policy, health, media, and other; performance of these models was high with F1-scores between 0.84 and 0.88 for each country dataset and total accuracy of 0.87.28

Given the importance of having information about vaccination attitudes at regional levels and a limited number of studies using deep learning to conduct stance analysis on anti-vaccination attitudes among Twitter users, this study therefore aims to investigate trends in anti-vaccination attitudes of Twitter users during the COVID-19 pandemic and within four English-speaking countries including Australia, Canada, the United Kingdom, and the United States. The finding of this study can help inform intervention strategies and policies to effectively suppress misinformation, improve vaccination attitudes, and eventually increase vaccination rates.

Methods

Data source

This study used a Twitter COVID-19-related dataset (version 68) collected by Banda et al. (2021), details of which were published elsewhere.29 Briefly, a Twitter Stream API which allows access to the public daily tweets was used to collect tweets between 01 January 2020 and 27 June 2021. The full dataset includes 1,122,879,197 tweets and retweets; the cleaned version without retweets includes 285,064,012 unique tweets. Non-English tweets were removed. English tweets were hydrated using the Tweepy library in Python 3. A total of 9,352,509 vaccine-related tweets that contain the following terms: “vaccin,” “vaxx,” or “inocul” were extracted, processed, and used in the analysis. Texts were changed to lowercase. Twitter handles, URLs, hyphens, hashtags (with attached words), numbers, special characters, and stop words were removed. Lemmatization was implemented for words in all tweets. Tweets (20,484 tweets) were labeled by 10 annotators working in pairs. Annotators were instructed to read the text and decide whether a tweet was anti-vaccination, pro-vaccination, neutral, or not sure without looking at the contents of any link if presented in the tweet. As annotators had difficulty to distinguish pro-vaccination from neutral tweets, these tweets were combined into one label, i.e., not anti-vaccination tweets. A third annotator then looked at tweets with inconsistent annotated labels and made the decision about which label to use for the model training. The average agreement between the two raters was 91.04%. These tweets with duplicates removed, were split into three separate collections: training (70%), development (15%), and test set (15%). The threshold for classifying a tweet as antivaccination was 0.5.

Anti-vaccination tweet identification

A deep learning model was developed with the use of transfer learning to identify anti-vaccination tweets. Details of how the model was developed have been published elsewhere.22 Briefly, this model was trained using a dynamic word embedding method named BERTs. BERT has been developed and used by Google for natural language processing tasks.30 BERT uses an attention mechanism to learn the relationship among words in a sentence. It inputs multiple words at the same time instead of word by word sequentially. Unlike static word embedding such as GloVe and Word2vec (i.e., one vector representation was generated for each word regardless of the different meaning of the word in sentences), dynamic embedding methods take into account the order of words in the sentence and different meanings of a word depending on the context of the sentence. The BERT-pretrained uncased model with 12 hidden layers (transformer blocks), a hidden size of 768, and 12 attention heads were used. Different learning rates and number of epochs were also experimented. The BERT-based model used in this study achieved a micro F1-score of 0.92. The study provided additional analyses on anti-vaccination attitude for four English-speaking countries and examined the potential links between anti-vaccination attitude and specific events during the pandemic.

Analysis

The BERT model was used to classify tweets into anti-vaccination or not anti-vaccination tweets. Although the BERT model's performance was excellent, an additional check was conducted by plotting the 7-day moving averages of percentages of anti-vaccination and not anti-vaccination classified by the BERT model (without manually labeled tweets) against those calculated from the 20,484 manually labeled tweets (the date was only up to September 2020). The 7-day moving averages were used to smooth out daily fluctuations. Figure 1 shows that there was a discrepancy in trends between the two datasets before March 2020 (7-day moving averages for the two lines are in opposite direction). This is likely due to the very small number of COVID-19 related tweets at the beginning of the pandemic. However, after March 2020, the trends were similar although the percentages of anti-vaccination tweets classified by the BERT model was smaller and less fluctuated than the manually labeled tweets (average across days was 4.9% vs. 8.5%, and standard deviation was 2.1% vs 5.8%).

Figure 1.

Figure 1.

Anti-vaccination tweets (7-day moving averages) classified by the ML model vs. manually labeled. ML: machine learning.

Time series plots were used to show trends in vaccination attitudes. A 7-day moving average was calculated for the number of tweets, the number of anti-vaccination tweets, and percentage of anti-vaccination tweets over time. Events were also plotted (Table 1). While these events were arbitrarily selected across the study period, they were expected to have positive or negative effects on vaccination attitudes.

Table 1.

Events that may affect anti-vaccination attitude.

Events Date
1 "We have it under control”—Donald Trump31 22 Jan 2020
2 Moderna's announcement of the first vaccine trial with humans in April32 24 Feb 2020
3 Bill Gates microchip conspiracy33 19 Mar 2020
4 Plandemic documentary posted34 07 May 2020
5 Interview with Andrew Kaufman about vaccine modification on human DNA on Youtube35 13 May 2020
6 #Scamdemic hashtag peaked36 19 July 2020
7 Antivaxxers attempted to discredit Pfizer within hours of the promising results announced37 9 Nov 2020
8 Twitter starts to remove and label misinformation tweets on COVID-19 vaccines14 16 Dec 2020
9 Multiple countries halted the use of AstraZeneca due to risk of blood clots38 28 Feb 2021
10 Link between AstraZeneca vaccine and blood clots identified39 07 Apr 2021

Geographic information associated with tweets was used to create country maps for Australia, Canada, the United Kingdom, and the United States. These countries were selected as they had the largest volumes of tweets with geographic information and also have English as the first language so that the majority of the tweets in the countries were captured, not just a small subset of Twitter users who could tweet in English while the majority tweeted in their native languages. Areas within these countries with more than five anti-vaccination tweets were plotted on the country maps. The same color scheme was used to allow comparison across areas and countries.

Results

Trend in anti-vaccination tweets worldwide

Among 9,352,509 tweets, 232,975 (2.49%) tweets were identified as anti-vaccination tweets. Figure 2 shows the number of vaccine-related tweets, anti-vaccination tweets, and the percentage of anti-vaccination tweets over time. In general, the number of vaccine-related tweets increased sharply after the implementation of the first vaccination round since November 2020 (daily average of 6967 before vs. 31,757 tweets after 9 November 2020) and so did the number of anti-vaccination tweets (daily average of 304 vs. 624 tweets). Particularly, the increases of anti-vaccination tweets were observed after conspiracy theories spread on social media. For example, around mid-March 2020, a conspiracy theory about Bill Gates trying to put a microchip into human body through vaccines was spread on social media and was followed with an increase in the daily number of anti-vaccination tweets from 75 to a maximum of 567 tweets in 46 days. The increase continued after a documentary about the “plandemic” and an interview about possibility of vaccines modifying human DNA posted on social media in May, the number of anti-vaccination tweets increased even more from 691 reaching a maximum of 1199 tweets in four days. Another sharp increase from 659 reaching a maximum of 1281 tweets in four days was observed after a Twitter hashtag named “#Scamdemic” peaked in July. Additionally, there were large increases in both the number of vaccine-related tweets and anti-vaccination tweets after the announcement of the promising results for Pfizer in November 2020 (949 to a maximum of 1478 tweets in 24 days). After Twitter started removing and labeling vaccine-related misinformation tweets from mid-December 2020, a decrease was observed in the number of anti-vaccination tweets (944 to a minimum of 284 tweets in nine days). The halt of AstraZeneca use in multiple countries due to risk of blood clots was also followed by the increase of the number of anti-vaccination tweets from 421 to a maximum of 1117 tweets in 30 days.

Figure 2.

Figure 2.

Trends over time (7-day moving average) by the BERT model for (A) the number of anti-vaccination and not anti-vaccination tweets; (B) the number of anti-vaccination tweets; and (C) the percentage of anti-vaccination tweets. Key events: (1) “We have it under control”—Donald Trump; (2) Moderna's announcement of the first vaccine trial on human; (3) Bill Gates microchip conspiracy; (4) Plandemic documentary posted; (5) Interview with Andrew Kaufman about vaccine modification on human DNA on Youtube; (6) #Scamdemic hashtag peaked; (7) Antivaxxers attempted to discredit Pfizer within hours of the promising results announced; (8) Twitter starts to remove and label misinformation tweets on COVID-19 vaccines; (9) Multiple countries halted the use of AstraZeneca due to risk of blood clots; (10) Link between AstraZeneca vaccine and blood clots identified.

BERTs: Bidirectional Encoder Representations from Transformers;

Anti-vaccination in four English-speaking countries

The number of tweets with geographic data worldwide was 188,809 (2%) of which 4,290, 14,401, 22,813, and 89,793 tweets were from Australia, Canada, the United Kingdom, and the United States, respectively. Percentages of anti-vaccination tweets were 3.45%, 2.74%, 2.46%, and 1.86% for the United States, the United Kingdom, Australia, and Canada, respectively. Figure 3 shows percentages of anti-vaccination within each country. In Australia, Queensland (3.48%) had highest while South Australia (1.52%) had the lowest percentages of anti-vaccination tweets. In Canada, Alberta (3.83%) had the highest and Quebec (0.65%) had the lowest percentages of anti-vaccination tweets. In the United Kingdom, Norfolk (9.2%) had the highest and Surrey (1.46%) had the lowest percentages of anti-vaccination tweets. In the United States, Rhode Island (8.64%) had the highest whereas Iowa (1.74%) had the lowest percentage of anti-vaccination tweets.

Figure 3.

Figure 3.

Percentages of anti-vaccination tweets by the ML model within (A) Australia, (B) the United States, (C) Canada, and (D) the United Kingdom. (Gray areas had ≤5 anti-vaccination tweets).

Discussion

The findings show that the percentage of anti-vaccination tweets (2.49%) was similar to 1.8% found by Yiannakoulias et al. (2022) using English tweets collected in 4 months of 2021,40 and 3% by Van Der Weide (2022) using Arabic tweets.41 The small percentage of anti-vaccination tweets is likely due to a large number of tweets being about news or announcement tweeted by many organizations on Twitter.41

Moreover, the number of vaccine-related tweets increased sharply since November 2020 when the results of vaccine trials were becoming available; and this number on average remained about 4.5 times higher compared to that of the period before November 2020. Despite a large increase in the number of anti-vaccination tweets after November 2020 (on average, about two times higher compared to the previous period), the percentage of anti-vaccination tweets reduced since August 2020. This is likely due to the vaccination campaigns being launched in the United Kingdom and Canada in December 202042,43 and Australia and the United States in January 202144,45 which saw a large increase in not anti-vaccination tweets. Nevertheless, as the absolute number of anti-vaccination tweets did not decline compared to the period before November 2020, it casts doubt on the effectiveness of Twitter's intervention introduced in mid-December 2020 to identify and remove anti-vaccination tweets.14 It is also worth noting that social media companies benefit from anti-vaccination activities on their platforms (up to one billion USD in annual revenue).12

Previous reports showed that vaccine-related events and misinformation could trigger discussions on social media.18,46 In this study, we found that anti-vaccination attitudes were strongly influenced by conspiracy theories as large increases in anti-vaccination tweets were observed after conspiracy materials were posted on social media. However, the sharp increases in the number of anti-vaccination tweets were followed by sharp declines in a short period of time. Possible explanations may be that (1) the tweeting of the conspiracy theories was the most intense around the time when these theories first posted and (2) many of these tweets may have been identified and removed by Twitter.14 However, as the number of anti-vaccination tweets remained high even after Twitter's intervention on vaccination misinformation was introduced, more resources for this task may be needed. Allocating these resources when conspiracy theories first appear could minimize their spread on Twitter.

After reports on blood clot incidence relating to the AstraZeneca vaccine, there was also an increase in the number of anti-vaccination tweets. However, the increase was less sharp. As COVID-19 vaccines have been rapidly developed, with reports of adverse events, and changes in health advice regarding for whom the vaccines are safe to receive, safety concerns are understandable, even among those not against vaccination. Therefore, providing accurate and clear explanations in time to ensure the public about vaccine safety is critical.

This study found that the United States had the highest percentage of anti-vaccination tweets compared to the United Kingdom, Australia, and Canada. This might be due to the politicization of vaccination in the United States.4749 Within these countries, the percentages of anti-vaccination tweets reported in this study were smaller compared to previously reported percentages of vaccine hesitancy; however, as previously mentioned, anti-vaccination attitude is not the same as vaccine hesitancy, which includes delaying acceptance of vaccination.50 In the United States, a survey between May and June, 2021 estimated an average vaccine hesitancy of about 16% across states.51 In the United Kingdom, vaccine hesitancy declined from 9% in January to 4% in June, 2021,52 and in Canada from 22% in July 2020 to 16% in April, 2021.53 An Australian study showed that unwilling to be vaccinated increased from about 13% in April 2020 to 21% in December 20206,7; however, this survey was not nationally representative, conducted prior to mass vaccine rollout, and to date, Australia has had extremely low rates of COVID infections compared to the United States, United Kingdom, and Canada.

Within the United States, percentages of anti-vaccination tweets were quite heterogeneous among the states. This is consistent with data about vaccine hesitancy in the United States which widely ranged between 7% and 29% across states.51 However, despite the relative similarity between data on anti-vaccination attitude and vaccine hesitancy for some states (e.g., Oklahoma, Arkansas, and Michigan), discrepancy was observed for others. For example, Oregon, California, and New York are Democratic states, where vaccine hesitancy is traditionally lower, but percentages of anti-vaccination tweets were similar or higher than those in Southern Republican states such as Alabama, Texas, and Florida that typically have higher levels of vaccine hesitancy.48,49 Given the complexity of the U.S. political system and the lack of data within each state, it is difficult to provide a valid explanation.

In Canada, except Quebec where people mainly speak French, the ranking for anti-vaccination attitudes observed in this study and vaccine hesitancy observed in a national survey was similar with Alberta being the province having the highest percentage of anti-vaccination tweets (3.83%) followed by Manitoba province (2.42%). British Columbia and Ontario had similar percentages of anti-vaccination tweets.53 For Australia and the United Kingdom, unfortunately, regional data either on anti-vaccination tweets or vaccine hesitancy were unavailable.

To the best of our knowledge, this is one of the first studies conducting a BERT-based analysis of anti-vaccination attitudes with Twitter data during the COVID-19 pandemic. The performance of the classifier in this study was comparable to other BERT-based models (F1-scores ranging between 0.87 and 0.98). These findings show that advanced ML techniques are capable and could be used for analyzing the very large text data generated from social media that traditional techniques may not be able to handle. However, given the complexity of natural language (e.g. irony, sarcasm, slang, and abbreviation), it is still a challenge for a machine to outperform humans on these classification tasks.

The study has limitations. First, only English tweets were selected and therefore, the findings may not be generalizable for other non-English-speaking countries. Second, Twitter users may also include Twitter bots and therefore are likely to be different from the general population. Third, the sample may be over-representative toward more active Twitter users, especially those with anti-vaccination attitude. Fourth, as the level of analysis was based on tweets, users’ characteristics that may influence vaccination attitudes such as gender, age, and ethnicity were not taken into account.7,54,55 Fifth, information bias is possible, for example, people may like or dislike a tweet not due to their real attitude but social desirability. Sixth, geographic data were only available for a small percentage of tweets (2%). Given the potential geographic and population biases that may occur due to lack of data outside of metropolitan regions, the findings at sub-country level should be interpreted cautiously. Finally, as accounts and tweets that are anti-vaccination were likely to be removed by Twitter, anti-vaccination attitude was likely to be underestimated.

Conclusion

Since the onset of COVID-19, anti-vaccination tweets accounted for a small percentage of vaccine-related tweets. The percentages of anti-vaccination tweets were higher before August 2020 and reducing since then. However, the number of anti-vaccination tweets increased sharply in November 2020 and remained high compared to the previous period. Conspiracy theories and news about the side effects of vaccines were often followed by an increase in the number of anti-vaccination tweets. The percentage of anti-vaccination tweets was highest in the United States (3.45%), followed by the United Kingdom (2.74%), Australia (2.46%), and Canada (1.86%).

The results of this study provide necessary information about anti-vaccination attitude at regional levels and trends of anti-vaccination attitude during the COVID pandemic that can inform intervention strategies and policies to fight against misinformation and improve attitude toward vaccination among Twitter users specifically and social media users in general.

Acknowledgements

Not applicable.

Footnotes

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: CV is supported by an ARC Future Fellowship (ID FT210100234), NHMRC Ideas Grant (ID 2012704), and NHF Vanguard Grant (ID 105816), and SA by a fellowship from the National Heart Foundation of Australia (ID 102609).

Ethical approval: This study used public tweets from Twitter that could be downloaded at https://zenodo.org/record/5637848. No patient was recruited.

Guarantor: QGT.

Author contributions: concept and design: QGT and CV; acquisition, analysis, or interpretation of data: all authors; drafting of the manuscript: QGT; critical revision of the manuscript for important intellectual content: all authors; statistical analysis: QGT; administrative, technical, or material support: KGT, VAHN, NTQN, DTNN, SA, ANQT, ANPT, NTTP, and TXB; supervision: CV. All authors have read and agreed to the published version of the manuscript.

References


Articles from Digital Health are provided here courtesy of SAGE Publications

RESOURCES