Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 2023 Feb 23;30(5):923–931. doi: 10.1093/jamia/ocad029

Patterns of diverse and changing sentiments towards COVID-19 vaccines: a sentiment analysis study integrating 11 million tweets and surveillance data across over 180 countries

Hanyin Wang 1,, Yikuan Li 2, Meghan R Hutch 3, Adrienne S Kline 4, Sebastian Otero 5, Leena B Mithal 6, Emily S Miller 7, Andrew Naidech 8, Yuan Luo 9,
PMCID: PMC10114113  PMID: 36821435

Abstract

Objectives

Vaccines are crucial components of pandemic responses. Over 12 billion coronavirus disease 2019 (COVID-19) vaccines were administered at the time of writing. However, public perceptions of vaccines have been complex. We integrated social media and surveillance data to unravel the evolving perceptions of COVID-19 vaccines.

Materials and Methods

Applying human-in-the-loop deep learning models, we analyzed sentiments towards COVID-19 vaccines in 11 211 672 tweets of 2 203 681 users from 2020 to 2022. The diverse sentiment patterns were juxtaposed against user demographics, public health surveillance data of over 180 countries, and worldwide event timelines. A subanalysis was performed targeting the subpopulation of pregnant people. Additional feature analyses based on user-generated content suggested possible sources of vaccine hesitancy.

Results

Our trained deep learning model demonstrated performances comparable to educated humans, yielding an accuracy of 0.92 in sentiment analysis against our manually curated dataset. Albeit fluctuations, sentiments were found more positive over time, followed by a subsequence upswing in population-level vaccine uptake. Distinguishable patterns were revealed among subgroups stratified by demographic variables. Encouraging news or events were detected surrounding positive sentiments crests. Sentiments in pregnancy-related tweets demonstrated a lagged pattern compared with the general population, with delayed vaccine uptake trends. Feature analysis detected hesitancies stemmed from clinical trial logics, risks and complications, and urgency of scientific evidence.

Discussion

Integrating social media and public health surveillance data, we associated the sentiments at individual level with observed populational-level vaccination patterns. By unraveling the distinctive patterns across subpopulations, the findings provided evidence-based strategies for improving vaccine promotion during pandemics.

Keywords: COVID-19, vaccine, sentiment analysis, deep learning

INTRODUCTION

To fight against the SARS-CoV-2 pandemic, vaccines from multiple pharmaceutical companies began to receive FDA approval at the end of 2020.1 As of fall 2022, more than 12 billion doses of vaccines have been administered across over 180 countries or regions (https://www.bloomberg.com/graphics/covid-vaccine-tracker-global-distribution/). Discussion regarding the vaccines emerged even earlier, especially on social media platforms.2–4 Individuals hold different attitudes and opinions towards vaccines and getting vaccinated during such an unprecedented pandemic.5,6 Social media, where user-generated content is emphasized, is involved more profoundly in people's lives. Information travels further on the internet and social media, including the news, commentaries, anecdotes, and personal feelings about coronavirus disease 2019 (COVID-19) vaccines. Surveillance data, the systematic collection of ongoing evidence, aim to provide subjective and comprehensive public health perspectives.

In previous studies, researchers have attempted to understand attitudes toward COVID-19 vaccination from various perspectives. Many studies administered survey-based methods and associated local distribution strategies focusing on many specific regions.7–9 However, survey-based studies are frequently subject to limited sample size and time frame. Other groups sought more generalizable perspectives using user-generated data, such as social media.4,10,11 Compared to surveys, social media analyses have many advantages, including timeliness, easy accessibility, objectivity, diversity, and generalizability. However, one remaining gap underlines understanding the sources resulting in delays and hesitancies of vaccine uptake. Previous studies focused exclusively on 1 type of data source that hindered the potential of interpretable and applicable findings.

In this study, we took advantage of the granularity of the user-generated data on one of the mainstream social media platforms, Twitter, and the rigor of the comprehensive public health surveillance data from health authorities to investigate the sentiments towards COVID-19. We further investigate informatics approaches to dissect the disparities among subpopulations. Sub-analyses were conducted focusing on pregnant people, as they were excluded from initial COVID-19 vaccine trials and administering a novel vaccine during pregnancy involves unique considerations. In addition to providing interpretable evidence to understand the vaccination trends in the current pandemic, we also proposed a set of scalable informatics strategies for characterizing immunization promotion in future pandemics and public health emergencies.

MATERIALS AND METHODS

Data extraction

We used user-generated posts from a mainstream social media platform, Twitter. Tweet objects were retrieved using the Twitter API (https://developer.twitter.com/en/docs/twitter-api) based on a set of identifiers for COVID-19-related tweets provided by the Panacea Lab.12 The actual tweet numbers may vary since tweets may be deleted or removed by the user or the platform constantly. We extracted tweets from March 1, 2020 to March 1, 2022, which covered the period when the topic of COVID-19 vaccines initially drew public attention to the date when billions of vaccines were administered worldwide. Vaccine-related tweets were identified by regular expressions (details in Supplementary Material). The tweets analyzed in this study are all original, that is, retweets are removed. Only the tweets in the English language were considered. Vaccine administration data were extracted from the Our World in Data (OWID),13 an integrated databased summarizing COVID-19 surveillance data from local official sources of over 180 countries worldwide, including the Center for Disease Control and Prevention (CDC), World Health Organization (WHO), local governments, etc. It is noteworthy that data for each country are updated to various frequencies. Pregnancy-related tweets were identified from the pool of all COVID-19 vaccine-related tweets by regular expression (details in Supplementary Material). Data on COVID-19 cases (https://covid.cdc.gov/covid-data-tracker/#pregnant-population [November 22, 2022]) and vaccinations among pregnant people (https://covid.cdc.gov/covid-data-tracker/#vaccinations-pregnant-women [November 22, 2022]) were obtained from the CDC. Since pregnancy-related data provided by CDC are only available for the United States, this subanalysis focuses only on data for the United States.

Sentiment analysis

In sentiment analysis, we assigned each tweet as positive, neutral, or negative sentiment toward the vaccine. To automatically recognize the sentiment for each tweet in a large-scale dataset, we finetuned a supervised deep learning classifier based on a subset of 7700 randomly selected, expert-annotated tweets. We followed the sentiment annotation protocol of our prior study14 (details in Supplementary Material). The 7700 tweets were split into 6160 (80%) for training, 308 (4%) for validation, and 1232 (16%) for testing. The test set was held out until the final evaluation. A state-of-the-art deep learning model for sentiment analysis, XLNet,15 was finetuned for the task. After initializing from the general-domain pretrained XLNet, we first finetuned the model against a general-domain sentiment analysis Twitter dataset, SemEval,16 to familiarize the model with Twitter-specific expressions. Subsequently, a second finetuning and evaluation were conducted against the annotated dataset's testing set. Extensive feature analysis and visualization were further conducted using “BertViz17 to illustrate keywords and cue phrases (details in Supplementary Material).

In the analysis of sentiment dynamics, sentiment changes consider users’ sentiment switching between positive and negative (ie, switching to neutral or switching from neutral are not counted) on subsequent tweets or vice versa. Counts for the number of users with sentiment changes were defined as those who posted a tweet with positive/negative sentiment on the given day and whose previous tweet was of negative/positive sentiment.

Structured features

Integrating multiple informatics approaches, we obtained structured features in an automated pipeline. The pipeline comprises “deepface” (V 0.0.68),18 a hybrid face recognition framework wrapping multiple state-of-the-art models, and “mordecai” (V 2.1.0),19 a package to parse the locations from free-text form to structured geographic information.

Three demographic variables, gender, age, and race or ethnicity (we use “race” hereafter for simplicity), were identified from the profile images. When the profile image was unavailable for any tweets from a user, we could not detect the demographics of such users. Gender in this study was considered dichotomously, that is, female and male. Race was classified into Asian, Black, Hispanic, and White. Age was further grouped into groups 0–19, 20–39, 40–59, and above 60. These classifications were made to reflect distinct social determinants and experiences of each subgroup that may be associated with vaccine attitudes.

Locations for each tweet were parsed from the “location” attribute in the “user” dictionary of the tweet object. The “location” attribute is an optional free-texted attribute generated by users, which possesses a great degree of freedom. The population estimates for calculating the per-capita metrics of each country were obtained from OWID (https://ourworldindata.org/grapher/population-past-future [November 21, 2022]). Countries with a total of fewer than 1000 tweets were not considered.

Statistical methods

All the daily counts were shown as 7-day averages to smooth local fluctuations. Proportion tests were conducted to compare the percentages between groups when appropriate, of which P-values of pairwise tests were adjusted by the false discovery rate (FDR) for multiple test corrections. Student's t tests were conducted to compare normally distributed groups. Implementation detail can be found in the Supplementary Material.

RESULTS

Data

In total, 11 211 672 vaccine-related tweets corresponding to 2 203 681 unique users from March 1, 2020 to March 1, 2022 with surveillance data of the same period were included in the analysis. Demographic information was detectable for 753 998 users, for whom the distribution of demographic features stratified by the number of users or tweets can be found in Figure 1. Distributions by users and by tweets showed similar patterns. The distribution of race was unbalanced, with most of the population being White. The age distribution was centered around the mid-aged group, with a slight right-skewed pattern and thin tails on both sides. Male users dominated the group by approximately 89%, in line with a recent Statista survey (https://www.statista.com/statistics/678794/united-states-twitter-gender-distribution/ [November 18, 2022]) showing that male users have a greater percentage among all users on Twitter.

Figure 1.

Figure 1.

Demographic distributions for the number of users (A, B, C) and the number of tweets (D, E, F). (A) Distribution of race among all users. (B) Distribution of age of all users. (C) Distribution of gender of all users. (D) Distribution of race among all tweets. (E) Distribution of age among all tweets. (F) Distribution of gender of all tweets.

Sentiment analysis

The accuracy yielded by the finetuned model on the test set of the annotation dataset is 0.92, with a balanced distribution for each class (Table 1).

Table 1.

Performance of sentiment classification by class

Precision Recall F1-score Support
Negative 0.88 0.91 0.90 216
Neutral 0.94 0.90 0.92 518
Positive 0.93 0.95 0.94 498
Macro averaged 0.92 0.92 0.92 1232
Weighted averaged 0.92 0.92 0.92 1232

Tweet trends

Temporal trends of sentiments

The number of vaccine-related tweets stratified by sentiments is shown in Figure 2A overlaid with daily vaccination counts. Sentiments set off heterogeneously during the early periods of the studied timeframe. While shortly after the rollout of the vaccine, positive sentiments surpassed neutral and negative sentiments in early January 2021 (after the dashed green line) and remained dominant for the rest of the studied period. An upswing was also observed in the vaccine uptake but lagged in the prevalence of the positive sentiments. Four crests for both positive sentiments and vaccine administration were annotated in Figure 2A. In addition to the overall trend, the 4 peaks of positive sentiments also precede each peak in vaccine administrations. Superimposing the trends with the vaccine development timeline, we detected encouraging news and motivational events happening globally and propagating on social media surrounding the peak days for positive sentiments (Table 2). It is noteworthy that numbers in Table 2 are shown as unaveraged counts of tweets or vaccinations, which can differ from those in Figure 2A.

Figure 2.

Figure 2.

Temporal trends of tweets and fluctuations. (A) 7-day averaged daily counts of tweets (left y-axis) and vaccinations (right y-axis) from March 1, 2020 to March 1, 2022. Daily counts of tweets are shown in negative, positive, and neutral sentiments by color. The number of vaccinations administered daily is illustrated in the shaded area. “1st,” “2nd,” “3rd,” and “4th” one the trend line and shaded area denote the 4 peaks of positive tweets and vaccination counts. The green vertical dashed line marked when positive sentiment started to dominate after largely trending with neutral and negative sentiments. (B) The fluctuations of sentiments in tweets over time. The number of users who switched to positive sentiments on a day is illustrated in the area above the axis, while the number of users who switched to negative sentiments on a day is illustrated in the area underneath the axis.

Table 2.

Dates and events with the greatest number of positive tweets around the 4 peaks

Peak Date Positive tweet counts Events
1st December 14, 2020 20 681 Sandra Lindsay, a nurse in New York, became the first person in the United States to get the COVID-19 shot.
2nd March 2, 2021 17 788 Single-dose vaccine from Johnson & Johnson received FDA approval in the United States.
3rd May 13, 2021 20 367 The Pfizer COVID-19 vaccine was authorized for adolescents 12–15 years old.
4th August 23, 2021 31 987 FDA approved first COVID-19 vaccine.

Fluctuation of sentiments in tweets

The tendencies of sentiment fluctuation are shown in Figure 2B. Two outstanding peaks towards positive sentiments can be observed in the figure, which corresponds to November 9, 2020, when Pfizer-BioNTech announced that the vaccine candidate against COVID-19 achieved success in the first interim analysis from a phase 3 study, and August 23, 2021, when FDA granted full approval to the first COVID-19 vaccine. Meanwhile, the highest peak towards negative sentiments was on April 13, 2021, when Johnson & Johnson vaccine paused after reports of rare clotting cases emerged.

Sentiment distribution by race or ethnicity, gender, and age group

The distributions of sentiment towards the vaccines by race, gender, and age are shown in proportions in Figure 3. Each sentiment panel shows the percentage of tweets from the given subpopulation. For example, “22%” for the Asian column in the negative subplot can be interpreted as “22% of all tweets posted by Asians individuals were of negative sentiment”. The White subpopulation had significantly fewer negative sentiments compared to the other 3 racial subpopulations. While we observed the same proportion of negative tweets in female and male users, female users showed a significantly higher proportion of positive sentiments towards the vaccines. Users aged 20–39 had significantly more positive sentiments towards the vaccine than users aged 40–59. Significant differences in sentiment towards the vaccine can be detected only between the 20–39 and 40–59 age groups, which might be due to the small sample size in younger and senior groups.

Figure 3.

Figure 3.

Distribution of proportions of sentiments by race, gender, and age group. The first column is for negative sentiments; the second column is for neutral sentiments; the last column is for positive sentiments. ns: not significant (P-value > .05); *: .05 < P-value ≤ .01; **: .01 < P-value ≤ .001; *** .001 < P-value ≤ 1e−4; ****: P-value < 1e−4. All the P-values are adjusted for multiple tests using the FDR method.

Distribution by geographical locations

Among all vaccine-related tweets, 6 524 086 tweets had locations parsed. The percentages of positive and negative COVID-19 vaccine-related tweets in each country or region are shown in Figure 4. The upper and lower 5 percentiles of the percentages of each sentiment were removed since some countries or regions with a limited number of users may result in extreme percentages. The data are divided into 10 quantiles which are illustrated by different shades. Most areas with no data available (gray) are non-English-speaking countries. For regions with data available, some showed a high percentage of positive sentiments and a low percentage of negative sentiments, such as China, India, and Zimbabwe. On the contrary, some regions, such as Australia, Sweden, and Colombia, displayed mostly negative sentiments towards the vaccines. Meanwhile, other regions demonstrated controversial attitudes towards the vaccines, such as the United States, Canada, and Brazil, where we see equivalent shares in percentages of positive and negative vaccine-related tweets (medium shade in both panels of Figure 4). The number of positive, negative, and neutral tweets of each country are listed in Supplementary Table S1 in the Supplementary Materials.

Figure 4.

Figure 4.

Percentages of COVID-19 vaccine-related tweets (with lower and upper 5 percentiles removed). (A) Positive tweets; (B) Negative tweets. Countries or regions excluded or with fewer than 1000 vaccine-related tweets are shown in gray.

Pregnancy-related tweets and data compared to nonpregnancy-related tweets and data

Among all vaccine-related tweets, we identified 105 039 pregnancy-related tweets, of which 23 325 had a location available and in the United States. In Figure 5, we compared the pregnancy-related trends with the situations in the general population in the United States. Given the initial exclusion of pregnant people in the clinical trials for COVID-19 vaccines, for a period, vaccines were available to pregnant persons without robust evidence. For pregnancy-related subanalysis, we divided the studied period into 4 epochs as indicated in Figure 5A: (1) pre-evidence-based recommendation stage (until March 7, 2021),20 (2) efficacy publication stage (until July 29, 2021) (https://www.acog.org/news/news-releases/2021/07/acog-smfm-recommend-covid-19-vaccination-for-pregnant-individuals [November 26, 2022]), (3) recommendation stage (until September 28, 2021) (https://www.cdc.gov/media/releases/2021/s0929-pregnancy-health-advisory.html [November 26, 2022]), and (4) urgent action stage. Similarly, the studied period can also be considered as 4 epochs for the general population regarding vaccine development and recommendation as indicated in Figure 5B: (1) the period when no approved COVID-19 vaccine (until December 11, 2020), (2) the period when vaccines are issued Emergency Use Authorization (EUA) (until August 22, 2021), (3) the period when US Food and Drug Administration (FDA) started to grant full approval to vaccines (until November 29, 2021), and (4) the period when booster shots were recommended by Center for Disease Control and Prevention (CDC).

Figure 5.

Figure 5.

Daily pregnancy-related tweet and vaccination and case counts among pregnant women in the United States, compared to the situation among the general population in the United States. Orange vertical dashed lines: EUA dates for Pfizer-BioNTech, Moderna, and Johnson & Johnson COVID-19 vaccines. Abbreviations: EUA: Emergency Use Authorization; FDA: US Food and Drug Administration.

A delayed positive sentiment rise was observed in pregnancy-related tweets. For the general population, positive sentiments remained dominant since vaccines were granted EUA near the end of 2020. In contrast, the significant peak in positive sentiments among pregnancy-related tweets was around August 11, 2021, when the CDC recommended COVID-19 vaccination for pregnant people based on safety data (https://www.cdc.gov/media/releases/2021/s0811-vaccine-safe-pregnant.html [November 3, 2022]). The upswing of vaccination uptake among pregnant people was also delayed. For the general population, the peak was found around mid-2021, after all 3 vaccines were granted EUA, while we observed a gradually increasing trend until early 2022 in the daily vaccination among pregnant people. Furthermore, we observed 2 peaks in both populations regarding the trend of COVID-19 cases, with both peaks in pregnant people preceding the general population. The second peak is much higher than the first peak among the general population, while we observed a comparable second peak among pregnant people.

DISCUSSION

In this study, we systematically characterized the tendencies of sentiments toward the COVID-19 vaccines in multiple dimensions by superimposing large-scale data from a mainstream social media platform, Twitter, and public health surveillance data worldwide. Integrating the 2 types of data allow us to magnify the information gain from the analysis as they complement each other well.

The XLNet model we finetuned was capable of identifying sentiment towards the vaccine automatically and efficiently. Meanwhile, as a transfer learning model, XLNet was designed to work across tasks with minimal additional training. Therefore, the resulting model can feasibly be re-applied to similar tasks. During the analysis, we noticed discussions regarding the vaccine were initiated even before the official vaccine rollout on social media. Although neutral and negative sentiments continued, positive sentiments have dominated ever since early 2021. The more positive sentiments demonstrated the better acceptance of the vaccine that subsequent upswing in vaccine uptake was observed from surveillance data (Figure 2). Although we do not claim causal relationships, timelines of encouraging development events were superimposable to the positive sentiment crests. Compared to tedious slogans, the report of such events bridges the information gaps, shares peer experiences and promotes confidence in the vaccine. Meanwhile, social media's unique information propagation pattern motivates sympathetic responses from peer users exponentially. Therefore, in addition to characterizing trends, social media could also be a powerful tool for health education.

Taking advantage of open-sourced nature of social media data, we could supplement a few missing parts of the surveillance data. Analyses of race, gender, and age groups revealed distinctive patterns among subpopulations (Figure 3). White users demonstrated a lower percentage of negative sentiments, which is in line with a recent study showing that White individuals have lower vaccination hesitancy in multiple countries.21 To understand Asian users’ higher negative and lower positive sentiments, we conducted further feature analysis probing the keywords among the negative tweets generated by Asian users (Supplementary Figure S1), which include “Trial Questioned,” “Admits A Mistake (in the trial),” “dangerous,” “untested vaccine,” “unnecessary,” “unethical.” The keywords suggested concerns regarding validity and rationale in clinical trials, which, in future pandemics, can be alleviated by more explicit clarifications through vaccine education. Male users were found to have significantly higher percentage of negative sentiments. Multiple studies showed higher rates of vaccine side effects among men than women,22,23 which may contribute to the lower number of positive sentiments among male users. However, previous global studies showed that COVID-19 fatality rates among men are higher than that among women in most countries,24 which suggests men are in higher urgency of vaccination. Feature analysis among negative tweets generated by male users further identified the following keywords (Supplementary Figure S2), “fatal,” “allergy patients,” “related deaths,” “blood clotting,” “terrible covid vaccine story,” which demonstrated concerns and hesitancy due to side effects and complications. Promoting the vaccine based on gender-specific scientific evidence could also elevate the vaccination rate and protect the population. In age group analysis, we only detected significant findings between the 20–39 and 40–59 groups since the 0–19 and 60–79 groups were of small sample sizes. The feature analysis for the 40–59 group, who had a high percentage of negative sentiments and a low percentage of positive sentiments, probed the following keywords among negative tweets (Supplementary Figure S3), “sore arm,” “Do not trust anything (Fauci says),” “epic failure (around COVID-19 vaccine),” “bad science,” “incomplete (covid-19 vaccine study).” Bad news and false information hindered the users' trust in the vaccine, where targeted vaccine promotion plans should come into play. Although not significant, the senior users displayed the lowest negative and highest positive sentiment. Senior citizens are prioritized in vaccine administration in many countries worldwide, which may imply that accessibility and availability of the vaccines also play important roles in shaping users' sentiments.

In the geographical analysis, the patterns of positive and negative sentiments among countries can be described by 3 categories: high in positive sentiments while low in negative sentiments, low in positive sentiments while high in negative sentiments, and comparable percentages between positive and negative sentiments (Figure 4). Policies regarding COVID-19 vaccinations among countries or regions are different, which come along with various vaccine-promoting strategies. Every country is positioned in a unique situation, with distinctive cultures, religions, economic status, and vaccine availabilities, vaccine plans tailored to each exclusive situation would likely improve vaccine uptake in certain places.

Vulnerable populations require extra attention regarding timely evidence-based recommendations. The pregnant population in this study displayed distinct patterns compared to the general population (Figure 5). The 2 preceding peaks in COVID-19 cases illustrate the escalated susceptibility during pregnancy, which was found to be a high-risk condition for COVID-19 severity and complications.25 Delay in the vaccination trend upswing illustrated the more substantial hesitancy among pregnant people. Feature analysis probed the keywords and cue phrases in the tweets of negative sentiments (Supplementary Figure S4), which suggested possible evidence for hesitancy. For pregnancy-related analysis, we identified keywords including “complication associated,” “no clinical evidence,” “mRNA shots not safe”; while for the general population in the United States, we detected phrases including “FDA pauses (one of the vaccines),” “risk of worsening clinical disease,” “not effective.” Particularly, “no clinical evidence” was highlighted among negative pregnancy-related tweets, which is consistent with the delays that are likely due to the initial trials failing to include pregnant women. A significant surge in positive sentiments was observed associated with the recommendation for pregnant people released on August 11, 2021 based on safety data, demonstrating the significance of tailored guidelines, which also led to an increase in vaccine uptake. Furthermore, the second peak of COVID-19 cases among the general population is considerably higher than the first peak, while the second peak of COVID-19 cases among pregnant people is comparable with the first one, which could benefit from the increased vaccination uptake. If recommendations could have been released earlier for pregnant women, the first peak of the cases among pregnant people could have been lowered. Therefore, timely evidence-based recommendations are essential in protecting vulnerable populations.

Limitations

We only considered English tweets, which limited coverage in non-English-speaking regions. This ignites exciting future studies directions addressing this limitation by developing natural language processing algorithms that works across multiple languages. The algorithms we used to detect demographic variables and geographical locations were adopted from open-sourced repositories18 which are not guaranteed perfect. Younger and more senior populations were not the predominant users included in our study. Therefore, the findings for those age groups might be weaker than other populations. Additionally, we acknowledge our singular focus on 1 source of data, Twitter. However, this focused data source helps us to eliminate redundancy since it is highly likely that people have social media accounts across multiple platforms where they may post similar content. Though important for every country, pregnancy-related data are only available in the United States. The data were adopted directly from the CDC website, where we do not have access to raw data for validation. The limitations are all opportunities for future research.

Supplementary Material

ocad029_Supplementary_Data

ACKNOWLEDGMENTS

We thank Northwestern University Quest High Performance Computing for supporting the computation in this study.

Contributor Information

Hanyin Wang, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.

Yikuan Li, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.

Meghan R Hutch, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.

Adrienne S Kline, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.

Sebastian Otero, Department of Pediatrics, Feinberg School of Medicine, Ann & Robert H. Lurie Children's Hospital of Chicago, Northwestern University, Chicago, Illinois, USA.

Leena B Mithal, Department of Pediatrics, Feinberg School of Medicine, Ann & Robert H. Lurie Children's Hospital of Chicago, Northwestern University, Chicago, Illinois, USA.

Emily S Miller, Department of Obstetrics & Gynecology, Northwestern Medicine, Chicago, Illinois, USA.

Andrew Naidech, Department of Neurology, Northwestern Medicine, Chicago, Illinois, USA.

Yuan Luo, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.

Funding

National Institute of Neurological Disorders and Stroke grant numbers R01NS110779 and U01NS110772 (to AN); National Library of Medicine grant number R01LM013337 and National Center for Advancing Translational Science grant number UL1TR001422 (to YL); Natinoal Institute of Allergy and Infectious Diseses grant number NIH/NIAID-K23 AI139337 (to LBM); Miller was site PI for Pfizer phase 2/3 randomized trial of the COVID vaccine in pregnant people.

AUTHOR CONTRIBUTIONS

Conceptualization of the study: HW, YL, AN, and YL. Data annotation: HW, YL, MH, and AK. Data analysis: HW and YL. Paper writing: HW. Pregnancy-related analysis consultation: SO, LM, and EM. Critical revision: HW, MH, YL, AK, SO, LM, EM, AN, and YL.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

CONFLICT OF INTEREST STATEMENT

None declared.

DATA AVAILABILITY

All Twitter data were extracted using the Twitter API (https://developer.twitter.com/en/docs/twitter-api), and identifiers of COVID-19-related tweets are provided by the Panacea Lab at Georgia State University.12 Vaccination data13 and world population data (https://ourworldindata.org/grapher/population-past-future [November 21, 2022]) were taken from Our World in Data. Pregnancy-related data were obtained from the Centers for Disease Control and Prevention (https://covid.cdc.gov/covid-data-tracker/#vaccinations-pregnant-women [November 22, 2022]). All the data are open-sourced. The annotated data for finetuning the XLNet model can be found on GitHub: https://github.com/luoyuanlab/twitter_vaccine_analysis.

CODE AVAILABILITY

The code for the pipeline used to obtain all the variables used in the analysis can be found on GitHub: https://github.com/luoyuanlab/twitter_vaccine_analysis.

REFERENCES

  • 1. Andreadakis Z, Kumar A, Román RG, Tollefsen S, Saville M, Mayhew S.. The COVID-19 vaccine development landscape. Nat Rev Drug Discov 2020; 19 (5): 305–6. [DOI] [PubMed] [Google Scholar]
  • 2. Lyu JC, Le Han E, Luli GK.. COVID-19 vaccine-related discussion on twitter: topic modeling and sentiment analysis. J Med Internet Res 2021; 23 (6): e24435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Yin F, Wu Z, Xia X, Ji M, Wang Y, Hu Z.. Unfolding the determinants of COVID-19 vaccine acceptance in China. J Med Internet Res 2021; 23 (1): e26089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Puri N, Coomes EA, Haghbayan H, Gunaratne K.. Social media and vaccine hesitancy: new updates for the era of COVID-19 and globalized infectious diseases. Hum Vaccin Immunother 2020; 16 (11): 2586–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Roy B, Kumar V, Venkatesh A.. Health care workers’ reluctance to take the Covid-19 vaccine: a consumer-marketing approach to identifying and overcoming hesitancy. NEJM Catalyst Innov Care Deliv 2020; 1 (6). doi: 10.1056/CAT.20.0676. [DOI] [Google Scholar]
  • 6. Viswanath K, Bekalu M, Dhawan D, Pinnamaneni R, Lang J, McLoud R.. Individual and social determinants of COVID-19 vaccine uptake. BMC Public Health 2021; 21 (1): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Largent EA, Persad G, Sangenito S, Glickman A, Boyle C, Emanuel EJ.. US public attitudes toward COVID-19 vaccine mandates. JAMA Netw Open 2020; 3 (12): e2033324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Ward JK, Alleaume C, Peretti-Watel P; COCONEL Group. The French public's attitudes to a future COVID-19 vaccine: the politicization of a public health issue. Soc Sci Med 2020; 265: 113414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Freeman D, Loe BS, Chadwick A, et al. COVID-19 vaccine hesitancy in the UK: the Oxford coronavirus explanations, attitudes, and narratives survey (Oceans) II. Psychol Med 2022; 52 (14): 3127–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wilson SL, Wiysonge C.. Social media and vaccine hesitancy. BMJ Glob Health 2020; 5 (10): e004206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Benis A, Seidmann A, Ashkenazi S.. Reasons for taking the COVID-19 vaccine by US social media users. Vaccines 2021; 9 (4): 315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Banda JM, Tekumalla R, Wang G, et al. A large-scale COVID-19 Twitter chatter dataset for open scientific research—an international collaboration. Epidemiologia (Basel) 2021; 2 (3): 315–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Mathieu E, Ritchie H, Ortiz-Ospina E, et al. A global database of COVID-19 vaccinations. Nat Hum Behav 2021; 5: 1–7. [DOI] [PubMed] [Google Scholar]
  • 14. Wang H, Li Y, Hutch M, Naidech A, Luo Y.. Using tweets to understand how COVID-19–related health beliefs are affected in the age of social media: twitter data analysis study. J Med Internet Res 2021; 23 (2): e26302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV.. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 2019; 32. [Google Scholar]
  • 16.SemEval-2017 task 4: sentiment analysis in Twitter. In: proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017); 2017; Vancouver, Canada.
  • 17. Vig J. A multiscale visualization of attention in the transformer model. In: proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics; July 2019: 37–42; Florence, Italy. doi: 10.18653/v1/P19-3007. [DOI]
  • 18.HyperExtended lightface: a facial attribute analysis framework. In: 2021 International Conference on Engineering and Emerging Technologies (ICEET); IEEE; 2021; Istanbul, Turkey.
  • 19. Halterman A. Mordecai: full text geoparsing and event geocoding. J Open Source Softw 2017; 2 (9): 91. [Google Scholar]
  • 20. Gill L, Jones CW.. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibodies in neonatal cord blood after vaccination in pregnancy. Obstet Gynecol 2021; 137 (5): 894–6. [DOI] [PubMed] [Google Scholar]
  • 21. Nguyen LH, Joshi AD, Drew DA, et al. Self-reported COVID-19 vaccine hesitancy and uptake among participants from different racial and ethnic groups in the United States and United Kingdom. Nat Commun 2022; 13 (1): 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Saeed BQ, Al-Shahrabi R, Alhaj SS, Alkokhardi ZM, Adrees AO.. Side effects and perceptions following Sinopharm COVID-19 vaccination. Int J Infect Dis 2021; 111: 219–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Abu-Hammad O, Alduraidi H, Abu-Hammad S, et al. Side effects reported by Jordanian healthcare workers who received COVID-19 vaccines. Vaccines 2021; 9 (6): 577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Dehingia N, Raj A.. Sex differences in COVID-19 case fatality: do we know enough? Lancet Glob Health 2021; 9 (1): e14–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wastnedge EA, Reynolds RM, Van Boeckel SR, et al. Pregnancy and COVID-19. Physiol Rev 2021; 101 (1): 303–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocad029_Supplementary_Data

Data Availability Statement

All Twitter data were extracted using the Twitter API (https://developer.twitter.com/en/docs/twitter-api), and identifiers of COVID-19-related tweets are provided by the Panacea Lab at Georgia State University.12 Vaccination data13 and world population data (https://ourworldindata.org/grapher/population-past-future [November 21, 2022]) were taken from Our World in Data. Pregnancy-related data were obtained from the Centers for Disease Control and Prevention (https://covid.cdc.gov/covid-data-tracker/#vaccinations-pregnant-women [November 22, 2022]). All the data are open-sourced. The annotated data for finetuning the XLNet model can be found on GitHub: https://github.com/luoyuanlab/twitter_vaccine_analysis.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES