Key Points
Question
Can public perceptions of the human papillomavirus (HPV) vaccine be accessed from the perspective of behavior change theories by mining social media data with machine learning algorithms?
Findings
This cohort study included 1 431 463 English-language posts about the HPV vaccine from 486 116 unique usernames from a social media platform. An increase in HPV vaccine–related discussions was found, and the results suggest temporal and geographic variations in public perceptions of the HPV vaccine.
Meaning
The findings of this study suggest that social media and machine learning algorithms can serve as a complementary approach to inform public health surveillance and understanding and help to design targeted educational and communication programs that increase HPV vaccine acceptance.
This cohort study develops and validates deep learning models to understand public perceptions of human papillomavirus (HPV) vaccines from the perspective of behavior change theories using data from social media.
Abstract
Importance
Human papillomavirus (HPV) vaccine hesitancy or refusal is common among parents of adolescents. An understanding of public perceptions from the perspective of behavior change theories can facilitate effective and targeted vaccine promotion strategies.
Objective
To develop and validate deep learning models for understanding public perceptions of HPV vaccines from the perspective of behavior change theories using data from social media.
Design, Setting, and Participants
This retrospective cohort study, conducted from April to August 2019, included longitudinal and geographic analyses of public perceptions regarding HPV vaccines, using sampled HPV vaccine–related Twitter discussions collected from January 2014 to October 2018.
Main Outcomes and Measures
The prevalence of social media discussions related to the construct of health belief model (HBM) and theory of planned behavior (TPB), categorized by deep learning algorithms. Locally estimated scatterplot smoothing (LOESS) revealed trends of constructs. Social media users’ US state–level home location information was extracted from their profiles, and geographic analyses were performed to identify the clustering of public perceptions of the HPV vaccine.
Results
A total of 1 431 463 English-language posts from 486 116 unique usernames were collected. Deep learning algorithms achieved F-1 scores ranging from 0.6805 (95% CI, 0.6516-0.7094) to 0.9421 (95% CI, 0.9380-0.9462) in mapping discussions to the constructs of behavior change theories. LOESS revealed trends in constructs; for example, prevalence of perceived barriers, a construct of HBM, deceased from its apex in July 2015 (56.2%) to its lowest prevalence in October 2018 (28.4%; difference, 27.8%; P < .001); Positive attitudes toward the HPV vaccine, a construct of TPB, increased from early 2017 (30.7%) to 41.9% at the end of the study (difference, 11.2%; P < .001), while negative attitudes decreased from 42.3% to 31.3% (difference, 11.0%; P < .001) during the same period. Interstate variations in public perceptions of the HPV vaccine were also identified; for example, the states of Ohio and Maine showed a relatively high prevalence of perceived barriers (11 531 of 17 106 [67.4%] and 1157 of 1684 [68.7%]) and negative attitudes (9655 of 17 197 [56.1%] and 1080 of 1793 [60.2%]).
Conclusions and Relevance
This cohort study provided a good understanding of public perceptions on social media and evolving trends in terms of multiple dimensions. The interstate variations of public perceptions could be associated with the rise of local antivaccine sentiment. The methods described in this study represent an early contribution to using existing empirically and theoretically based frameworks that describe human decision-making in conjunction with more intelligent deep learning algorithms. Furthermore, these data demonstrate the ability to collect large-scale HPV vaccine perception and intention data that can inform public health communication and education programs designed to improve immunization rates at the community, state, or even national level.
Introduction
Human papillomavirus (HPV) is the most common sexually transmitted disease in the United States.1 HPV infections cause approximately 33 700 cases of cancer every year in the United States, including cervical, vaginal, vulvar, penile, and anal cancers.2,3 The HPV vaccine has been available since 2006 to protect against HPV-associated cancers and is recommended for adolescents starting at age 9 years through age 26 years if not vaccinated, and, for some people, up to age 45 years.4 Unfortunately, compared with other adolescent vaccines (eg, tetanus, diphtheria, pertussis [Tdap] and meningococcal B [MenB]), HPV vaccine rates remain low, with approximately 51% of adolescents not completing the HPV vaccination series.5 The most common reasons for parental declination of HPV vaccine include safety concerns, perceived lack of necessity, and lack of knowledge about the vaccine and HPV.6 For this reason, knowledge about the prevalence of these concerns can inform tailored strategies to mitigate them and improve immunization rates.
Behavior change theories provide a conceptual framework to understand the determinants of and methods for influencing specific health behaviors.7 The health belief model (HBM) and the theory of planned behavior (TPB) are among the most popular behavior change theories that have been widely adopted to explain health behaviors. The HBM assumes that motivation to adopt preventive health behaviors, such as screening and vaccination, is primarily due to the following constructs: perceived susceptibility, perceived severity, perceived benefits, perceived barriers, cues to action, and self-efficacy.8 The TPB assumes that constructs, including attitudes, subjective norms, and perceived behavioral control, drive people’s intention to perform a healthy behavior.9 Associations have been established between the theoretical constructs of HBM10,11,12,13 and TPB14,15,16 and HPV vaccination intention and uptake.
Improving understanding of the public perceptions of HPV and the HPV vaccine is essential to developing tailored educational efforts and increasing HPV vaccination rates. Furthermore, understanding these perceptions at the community, state, and national levels over time can provide detailed data useful in designing targeted approaches to improving immunization education programs and public health campaigns. Social media platforms offer a unique opportunity to examine the unfiltered opinions, comments, and discussions of large populations, while mitigating the limitations of traditional surveys, which include resource costs and the difficulties of tracking changes in real-time.17,18,19 Our objective was to use machine learning (ML) methods to examine HPV vaccine discussions on Twitter, which has been recognized as 1 of the major sources for accessing public opinions on various topics, from politics20 to public health.21 Compared with other social media platforms, Twitter has fewer privacy restrictions (ie, easy access to large-scale public discussions) and has younger users than the general population,22 which makes it an important resource to study adolescent vaccine-related discussions.
Semiautomatic methods to understand social media vaccine discussions included manual coding and hashtag or keywords analysis,23,24,25,26 but these are limited by lack of scalability and inaccuracies, respectively. Given the unique characteristics of the tweet as a social media post (eg, short text, occurrences of cyber slang), obtaining an accurate understanding of these discussions is challenging.27 ML methods emerged to address these limitations and to improve the precision of understanding the public perception of vaccines,28,29 particularly the HPV vaccine.30,31,32,33,34,35 As a subset of ML algorithms, deep learning (DL) algorithms have been applied in analyzing social media natural language processing (NLP) tasks,36,37,38 and its superiority has been found in comparison with traditional ML efforts.39,40 DL is also advantageous because it can save significant feature engineering efforts in NLP (the process of extracting numeric features from the text that represents the meaning of the contents and is crucial to the effectiveness of these learning algorithms), which is typically required by ML algorithms. A glossary of ML-relevant concepts in this study is provided in the eTable in the Supplement.
Methods
Ethics Approval and Consent to Participate
This study received an institutional review board exemption from the Committee for the Protection of Human Subjects at The University of Texas Health Science Center at Houston. A waiver of informed consent was granted due to the retrospective design of the study. This study follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
Study Overview
An overview of the study design can be seen in Figure 1. We first collected HPV vaccine–related sampled discussions, using keywords, and then manually categorized (ie, annotated, in the language of ML) a subset of the posts with regard to the theoretical constructs of HBM and TPB. The initial human-categorized posts were the gold-standard corpus41 (ie, posts with human-assigned labels) that were used to train and evaluate the ML and DL algorithms. The models that performed most successfully (ie, had the highest F-1 scores) were selected and applied to the remaining unlabeled posts. The analyses, including time-series analysis and geographic analysis, were then performed on the DL-categorized behavioral constructs to identify variations of public perceptions toward the HPV vaccine.
Figure 1. Overview of Study Design.
API indicates application programming interface; HBM, health belief model; HPV, human papillomavirus; TPB, theory of planned behaviors.
Data Collection and Initial Human Categorization
We used a set of keywords to collect HPV vaccine–related posts by using Twitter streaming application programming interface (approximately 1% of the entire stream volume) from January 1, 2014, to October 26, 2018. Keywords included hpv, human papillomavirus, gardasil, and cervarix. Only English-language posts were included in the study. In regard to HBM, we focused on the 4 primary constructs, including perceived susceptibility, perceived severity, perceived benefits, and perceived barriers; for the TPB model, we focused on an amalgamated construct of attitude. Several other constructs also influence HPV vaccination behavior. However, considering the low prevalence of these constructs in our data set, this study focused only on the major constructs noted earlier.
The human categorizations of the social media platform discussions to HBM or TPM were acquired from our previous studies.40,42 The included constructs, definitions, and examples of posts to the social media platform are shown in Table 1. Three reviewers were trained and then categorized a subset of 6000 posts based on their relevance to the HBM constructs. Each post was assigned to none (not related to HBM), 1, or multiple HBM constructs. For TPB constructs, 3 reviewers categorized the same 6000 posts based on its attitude toward the HPV vaccine. The reviewer first decided whether the post was related to the attitude toward the HPV vaccine. If it was related, the reviewer further decided whether it was positive, negative, or neutral. This gold-standard corpus was then used to train and evaluate a variety of ML and DL algorithms. The annotated corpus is available online.43
Table 1. Definitions and Examples of Key Constructs of Behavior Change Theories Found in a Social Media Platforma.
Behavior change theory | Construct | Definition | Sample raw posts |
---|---|---|---|
Health belief model | Perceived susceptibility | The assessment of the risk of getting an HPV infection |
|
Perceived severity | The assessment of whether an HPV infection is a sufficient health concern |
|
|
Perceived benefits | Benefits of the HPV vaccine in protecting against HPV infection, HPV infection-induced cancers, and so forth |
|
|
Perceived barriers | Adverse effects of the HPV vaccine, cost of getting an HPV vaccine, negative news and reports on the HPV vaccine, and so forth |
|
|
Theory of planned behavior | Positive attitude | Shows positive opinion or prompt HPV vaccine |
|
Neutral attitude | Related to HPV vaccine topic but contains no sentiment or sentiment is unclear |
|
|
Negative attitude | Concerns or doubts about the HPV vaccine |
|
DL-Based Categorization of Discussions
We framed the automated understanding of content from the social media platform to text classification tasks, which aimed to classify the content of posts to predefined categories. We built ML and DL classifiers for constructs of HBM and TPB, respectively. These classifiers were trained using the human-categorized posts described earlier. For the 4 primary HBM constructs, we first categorized the post based on its relevance to any of the HBM constructs and then categorized the relevant posts to the primary HBM constructs, using binary classification (1 classifier for 1 construct). For TPB constructs, we first categorized the post based on its relevance to attitude toward the HPV vaccine and then categorized the relevant posts into 1 of 3 attitudes: positive, negative, or neutral. Each post was categorized to HBM and TPB separately.
To select the best classifiers for our tasks, we performed an evaluation of multiple ML and DL algorithms and configurations. The descriptions of those algorithms and experimental details are described in the eAppendix in the Supplement. We evaluated the algorithms on HBM and TPB categorization (ie, classification). For each categorization task, we divided task-relevant labeled posts into training, validation, and test sets with a proportion of 7:1:2. We trained the models on the training set, performed hyperparameter selection on the validation set, and evaluated the performance of classifiers on the test set. We repeated random sampling of the posts 10 times (with replacements) with the same proportion and calculated metrics for each model at each time.
For all binary classifiers (ie, classifying the post as HBM related, TPB related, or to each HBM construct), we calculated sensitivity, specificity, accuracy, precision, recall, and F-1 score. For the multiclass classifier (ie, to classify the post into 1 of 3 attitudes), we calculated overall accuracy as well as precision, recall, and F-1 score for each attitude (ie, positive, negative, or neutral).
Statistical Analysis
All statistical tests were 2-tailed, and statistical significance was set at P < .05. Time-series and geographic analyses were conducted in R version 3.6.2 (R Project for Statistical Computing).
Temporal and Geographic Analyses of Social Media Discussions
We performed time-series analysis on the predicted constructs to extract the evolving trends and geographic analysis to identify the US interstate variations of public perceptions of the HPV vaccine. We selected the best-performing model (ie, attentive recurrent neural network [Att-RNN] with fastText [FT] HPV embedding; eAppendix in the Supplement) for prediction of the unlabeled data in our collection. To reduce the variances of the DL model,44 we repeated random sampling and the training of Att-RNN model 10 times; the final prediction of all the unlabeled posts was based on the majority voting of the 10 models.
Time-Series Analysis
We defined the prevalence of each theoretical construct by calculating the proportion of the number of posts that were classified to that construct to the total number of posts that were classified to the corresponding theory. We calculated the prevalence of each construct for each week. To extract the trend of the constructs, we applied time-series analyses to the weekly prevalence data. Specifically, we decomposed the prevalence into seasonal, trend, and random noise components using locally estimated scatterplot smoothing (LOESS).45 The decomposition was done by the R function stl. Seasonal-trend decomposition via LOESS smoothing is a common time series analysis method in various disciplines.45 We tested the increasing or decreasing trend of each construct by 2-sample proportion test.
Geographic Analysis
Users could self-report their home location in their profiles. Because it is optional for users to complete their profiles, the home location information is often sparse. Given that the available home location is in the free-text format, we leveraged an open-source lexicon-based script46 to map the home location string to a US state. For example, “Miami, FL” was mapped to Florida, “Texas” to Texas. After excluding the posts for which we could not map users’ home location to a US state, we calculated the count and prevalence of theory and construct-related posts for each US state.
Results
A total of 1 431 463 English-language posts from 486 116 unique usernames were collected as our study cohort. A total of 6000 posts were selected for the initial human-categorization of discussions on the social media platform. The κ interannotator agreement for each HBM construct ranged from 0.727 to 0.834. The overall κ interannotator agreement TPB categorization was 0.851.
Performance of Classification Algorithms
The comparison of various word-embedding techniques and classification algorithms can be seen in the eAppendix in the Supplement. The DL model Att-RNN with FT HPV word embedding provided the best performance on most tasks and was thus selected for prediction purposes. The performance of Att-RNN with FT HPV word embedding on the gold-standard corpus can be seen in Table 2. The model achieved a mean accuracy of 0.8018 (95% CI, 0.7924-0.8113) and 0.9226 (95% CI, 0.9171-0.9281) for identifying HBM-related and TPB-related posts, respectively. For HBM-related constructs, the model achieved a mean accuracy between 0.8721 (95% CI, 0.8614-0.8828) and 0.9063 (95% CI, 0.8977-0.9149) and a mean F-1 score between 0.6805 (95% CI, 0.6516-0.7094) and 0.8999 (95% CI, 0.8906-0.9091). For identifying TPB-related posts, the model achieved a mean F-1 score of 0.9421 (95% CI, 0.9380-0.9462); for TPB attitude, it achieved a mean F-1 score between 0.6996 (95% CI, 0.6841-0.7141) and 0.8103 (95% CI, 0.8011-0.8196).
Table 2. Metrics of Att-RNN With FT HPV Embedding in Mapping Discussions on a Social Media Platform to the Theoretical Constructs.
Theory and construct | Mean (95% CI) | |||||
---|---|---|---|---|---|---|
Sensitivity | Specificity | Accuracy | Precision | Recall | F-1 score | |
Health belief model | ||||||
Related | 0.8072 (0.7823-0.8321) | 0.7954 (0.7727-0.8181) | 0.8018 (0.7924-0.8113) | 0.8254 (0.8120-0.8389) | 0.8072 (0.7823 - 0.8321) | 0.8156 (0.8049-0.8263) |
Susceptibility | 0.6889 (0.6489-0.7289) | 0.9396 (0.9256-0.9536) | 0.9015 (0.8906-0.9125) | 0.6784 (0.6276-0.7293) | 0.6889 (0.6489 - 0.7289) | 0.6805 (0.6516-0.7094) |
Severity | 0.7620 (0.7194-0.8047) | 0.9419 (0.9286-0.9552) | 0.9063 (0.8977-0.9149) | 0.7681 (0.7322-0.8040) | 0.7620 (0.7194 - 0.8047) | 0.7626 (0.7405-0.7847) |
Benefits | 0.7305 (0.6801-0.7808) | 0.9197 (0.9043-0.9350) | 0.8721 (0.8614-0.8828) | 0.7564 (0.7281-0.7846) | 0.7305 (0.6801 - 0.7808) | 0.7407 (0.7154-0.7661) |
Barriers | 0.8890 (0.8682-0.9098) | 0.9219 (0.9017-0.9420) | 0.9063 (0.8975-0.9150) | 0.9123 (0.8929-0.9317) | 0.8890 (0.8682 - 0.9098) | 0.8999 (0.8906-0.9091) |
Theory of planned behavior | ||||||
Related | 0.9487 (0.9429-0.9546) | 0.8710 (0.8568-0.8851) | 0.9226 (0.9171-0.9281) | 0.9357 (0.9291-0.9422) | 0.9487 (0.9429 - 0.9546) | 0.9421 (0.9380-0.9462) |
Positive | NA | NA | NA | 0.7425 (0.7144-0.7705) | 0.7500 (0.7201-0.7798) | 0.7447 (0.7307-0.7587) |
Negative | NA | NA | NA | 0.7987 (0.7842-0.8132) | 0.8235 (0.8007-0.8464) | 0.8103 (0.8011-0.8196) |
Neutral | NA | NA | NA | 0.7172 (0.6992-0.7351) | 0.6843 (0.6579-0.7106) | 0.6996 (0.6841-0.7151) |
Abbreviations: Att-RNN, attentive recurrent neural network; FT, fastText; HPV, human papillomavirus; NA, not applicable.
Temporal Trends for Theoretical Constructs
After applying the models to classify the unlabeled posts, 948 501 and 920 486 posts were classified as HBM related and TPB related, respectively. For HBM-related posts, 125 516 (13.2%), 215 964 (22.8%), 239 835 (25.3%), and 387 049 (40.8%) were classified into susceptibility, severity, benefits, and barriers, respectively. For TPB attitude–related posts, 331 836 (36.1%); 341 281 (37.1%), and 247 369 (26.9%) were classified into positive, negative, and neutral, respectively.
There were dramatic fluctuations in the prevalence of each construct (eFigure 1 and eFigure 2 in the Supplement). In addition, there were increasing trends for the total number of theory-related posts (ie, HBM related and TPB related) during the study period. Time-series analysis further extracted smooth trends for each construct (Figure 2). Among HBM-related constructs, there was a decreasing trend for the prevalence of barriers, from its highest peak in July 2015 (56.2%) to the lowest prevalence in October 2018 (28.4%; difference, 27.8%; P < .001). We also found an increasing trend for the prevalence of severity, with the lowest prevalence in March 2015 (8.8%) and the highest prevalence in October 2018 (31.3%; difference, 22.5%; P < .001). The prevalence of benefits decreased from early in 2015 to the middle of 2016 and remained relatively stable afterward; susceptibility demonstrated an opposite trend, as the prevalence increased from early 2015, with the lowest prevalence in March 2015 (1.9%) and highest prevalence in September 2018 (16.8%; difference, 14.9%; P < .001). Among the attitudes toward the HPV vaccine, neutral attitude stayed stable over the years; since early 2017, positive attitude toward the HPV vaccine demonstrated an increasing trend, from 30.7% to 41.9% (difference, 11.2%; P < .001), while negative attitude demonstrated a decreasing trend, from 42.3% to 31.3% (difference, 11.0%; P < .001).
Figure 2. Trend of Theoretical Constructs After Removing Seasonal Effect and Random Noise.
HBM indicates health belief model; TPB, theory of planned behaviors.
Interstate Variations of HPV Vaccine Perceptions
There were 486 116 unique usernames derived from 1 431 463 contributions to the platform. Among these users, 128 812 profiles (26.5%) (369 181 posts [25.8%] in total) had home locations that could be mapped to US states. The geographical analyses of HPV vaccine perceptions were based on these 369 181 posts. Figure 3 shows the clustering of HPV vaccine discussions. HPV vaccine–related discussions were clustered in US states with large populations. California had the largest proportion of HPV vaccine–related discussions on the site (54 764 of 369 181 [14.8%]). Other large US states, such as Texas, New York, Ohio, and Florida, also show clustered discussions related to the HPV vaccine.
Figure 3. Interstate Variations on the Count of HPV Vaccine–Related Discussions on Social Media.
HBM indicates health belief model; TPB, theory of planned behaviors.
We further examined the interstate variations on the prevalence of theoretical constructs regarding the HPV vaccine (eFigure 3 and eFigure 4 in the Supplement). For HBM constructs, states in the central US, including South Dakota, Nebraska, and Kansas, showed a relatively higher prevalence of discussions related to perceived benefits (144 of 357 [40.3%], 434 of 996 [43.6%], and 1191 of 3033 [39.3%], respectively) and a relatively lower prevalence of perceived barriers (79 [22.1%], 178 [17.9%], and 650 [21.4%], respectively). In particular, Ohio and Maine showed a high prevalence of discussions related to perceived barriers (11 531 of 17 106 [67.4%] and 1157 of 1684 [68.7%], respectively) and low prevalence of discussions related to perceived benefits (2057 [12.0%] and 212 [12.6%], respectively). For TPB attitude, similar to HBM constructs, states in the central United States showed a relatively higher prevalence of discussions related to positive attitudes toward the HPV vaccine and a relatively lower prevalence of discussions related to negative attitudes. In particular, Ohio and Maine demonstrated a high prevalence of negative attitudes (9655 of 17 197 [56.1%] and 1080 of 1793 [60.2%], respectively).
Discussion
Vaccine hesitancy is listed as among the top 10 global health threats by the World Health Organization (WHO).47 Existing studies have found that HPV vaccine refusal or hesitancy may be motivated by theoretical constructs of behavior change theories.11,48 In this study, we examined social media trends related to HPV and the HPV vaccine in connection with the HBM and TPB constructs. We found an increase in the number of theory-related posts (ie, HBM related and TPB related) during the years of our study, demonstrating an increased interest in discussing the HPV vaccine on social media. Overall, our findings suggest public perception of the HPV vaccine may be improving. We found that attitudes toward the HPV vaccine became more positive in recent years. This may be attributable to the substantial efforts put forth by the medical and public health community regarding HPV and the HPV vaccine. We also found an increase in users’ perception of HPV severity, which demonstrates that a shift to a focus on cancer prevention in regard to HPV has been effective. After the licensure of the HPV vaccine and the slow acceptance among parents of adolescents, the US Centers for Disease Control and Prevention (CDC) shifted their educational efforts and messaging to focus primarily on HPV vaccination as cancer prevention. They also encouraged health care professionals to issue strong presumptive recommendations and bundle all recommended adolescent vaccines together rather than singling out HPV, regardless of it not being required by many states for school entry. Our study demonstrates that the surveillance of social media discussions regarding vaccines could assist communication in responses to the rise of antivaccine sentiment in a timely manner, inform educational efforts, and gauge national opinion in regard to HPV vaccine. In addition, our approach enables us to understand an individual’s health beliefs and attitudes toward the vaccine, which could facilitate further innovative and customized vaccination promotion strategies.
The analyses of public perception variations in certain states could assist public health professionals mitigate the influence of local antivaccine movements, examine vaccine policy, and inform vaccine-promotion campaigns. The clustering of antivaccine sentiment (eg, high prevalence in perceived barriers and negative attitudes) could relate to the rise of local antivaccine sentiment. For example, in the present study, Ohio was identified as having a higher prevalence of antivaccine sentiment on social media. A review of contributions to the social media platform from Ohio residents found that most discussions from this state regarding the HPV vaccine related to rumors and misinformation about the injuries and risks associated with the HPV vaccine. The clustering of antivaccine sentiment regarding the HPV vaccine in states such as Ohio also could be promoted by the local antivaccine movement49 and antivaccine thought influencers50 who reside in these states. Health care professionals have increasingly reported social media as a major source of information cited by parents against HPV vaccination.51 For health care professionals, trends in social media discussions within specific communities, states, or regions could help to predict future patient sentiment and alert a practice to expect and prepare for potentially increased levels of vaccine hesitancy. This would give health care professionals the opportunity to engage in training and education for vaccines, if needed, and to establish practices within their offices to address vaccine hesitancy and/or refusal.
This study is important in the context of population-level and individual-level vaccination decision-making. Social media surveillance can assist in understanding popular trends in opinion, alerting public health practitioners to the pulse of public sentiment. Significant for public health is the potential to intelligently process social media messaging, categorizing the sender’s motivations (ie, HBM and TPB construct–related perceptions) and interjecting salient, tailored commentary as a counterfactual to misinformation. The methods described in this study may enable persuasive messaging to be injected into social media streams to mitigate vaccine hesitancy in the general public and, more pointedly, among parents of vaccine-eligible children. This intervention strategy offers the potential for future research and may assist in reducing vaccine hesitancy and thus contribute to the mission of pediatric and adolescent practices in achieving HPV vaccination goals. The methods described in this study represent an early contribution to using existing empirically and theoretically based frameworks to develop more intelligent artificial intelligence and DL algorithms that may positively influence HPV vaccine decision-making.
At the current time, public health departments rely on slow, expensive, and time-limited methods, such as paper or electronic surveys or occasional large-scale studies designed for other reasons but that collect vaccine-related data (eg, national behavioral health surveys). Such methods are characterized by long lag times between survey and vaccine decision-making, rely on respondent recall, and provide only gross summary metrics without targeted and regionally actionable information. This study demonstrated the feasibility of methods that benefit vaccine-promotion programs. It provided a method to automatically understand population-level and individual-level health beliefs and attitudes toward the HPV vaccine. This can then inform rational and directed programmatic efforts to improve actual immunization coverage rates by allowing for real-time monitoring of beliefs and intentions and adjustment of educational and public health campaigns and messaging as warranted. Such data-enabled real-time information is invaluable to the design of such efforts and can assist in realizing the benefits of increased population vaccine coverage levels.
Limitations
There are a few limitations on public health surveillance using social media.52 Particular to this is population bias, ie, social media users may not be representative of the general population. Thus, findings based on social media data should be interpreted with caution. However, as the population of users on the social media platform we studied tends to be younger than the general population,53 which is the target population for HPV vaccine promotion, we believe the public opinions on this platform can be very valuable and complementary to traditional survey-based findings. Another limitation of this study is that the treatment of predicted labels as true labels for the time-series analysis could lead to information bias due to misclassification rates.54,55 Given that the models achieved high accuracy on most tasks, we believe that the general trends are reliable. A further limitation is a gold-standard corpus limited to 6000 posts. This may not fully represent the unlabeled collection (approximately 1.5 million posts), and the shift in the data distribution between labeled and unlabeled data might bring additional bias to the prediction. To mitigate this, we recommend that future studies add more representative posts to the gold-standard corpus.
Conclusions
This study evaluated DL algorithms for mapping HPV vaccine–related social media discussions to the constructs of behavior change theories. DL algorithms outperformed ML algorithms on our tasks. In particular, the study provided data demonstrating several important parameters useful to designing strategies that could improve immunization coverage rates. First, time-series analysis on the predicted constructs revealed the evolving trends of public perception in regard to the HPV vaccine. Second, geographical analyses identified state-level clustering of public perceptions in regard to the HPV vaccine. This is important in terms of understanding the epidemiology of vaccine misinformation and disinformation and in targeting geographic areas that need data-informed educational and other programmatic efforts to counter such concerns. Third, this study’s innovation in categorizing messages informed by theory-based constructs to differentiate and fine tune attitudes provided a sound theoretical basis for future public health messaging and for rapidly measuring and assessing the effects of such messaging and programs.
eFigure 1. Prevalence of Constructs From HBM
eFigure 2. Prevalence of Attitudes From TPB
eFigure 3. Interstate Variations on the Prevalence of HBM Constructs on Social Media
eFigure 4. Interstate Variations on the Prevalence of TPB Attitude on Social Media
eTable. Glossary on Artificial Intelligence and ML-Relevant Concepts Used in This Study
eAppendix. Supplemental Methods
References
- 1.Satterwhite CL, Torrone E, Meites E, et al. Sexually transmitted infections among US women and men: prevalence and incidence estimates, 2008. Sex Transm Dis. 2013;40(3):187-193. doi: 10.1097/OLQ.0b013e318286bb53 [DOI] [PubMed] [Google Scholar]
- 2.US Centers for Disease Control and Prevention Reasons to get vaccinated. Reviewed March 26, 2020. Accessed October 16, 2020. https://www.cdc.gov/hpv/parents/vaccine/six-reasons.html
- 3.Saraiya M, Unger ER, Thompson TD, et al. ; HPV Typing of Cancers Workgroup . US assessment of HPV types in cancers: implications for current and 9-valent HPV vaccines. J Natl Cancer Inst. 2015;107(6):djv086. doi: 10.1093/jnci/djv086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.US Centers for Disease Control and Prevention Vaccinating boys and girls. Reviewed August 15, 2019. Accessed June 23, 2020. https://www.cdc.gov/hpv/parents/vaccine.html
- 5.US Centers for Disease Control and Prevention Understanding HPV coverage. Reviewed August 23, 2018. Accessed October 16, 2020. https://www.cdc.gov/hpv/hcp/vacc-coverage/index.html
- 6.Beavis A, Krakow M, Levinson K, Rositch AF. Reasons for lack of HPV vaccine initiation in NIS-Teen over time: shifting the focus from gender and sexuality to necessity and safety. J Adolesc Health. 2018;63(5):652-656. doi: 10.1016/j.jadohealth.2018.06.024 [DOI] [PubMed] [Google Scholar]
- 7.Patel VL, Arocha JF, Ancker JS, eds. Cognitive Informatics in Health and Biomedicine Understanding and Modeling Health Behaviors. Springer; 2017. doi: 10.1007/978-3-319-51732-2 [DOI] [Google Scholar]
- 8.Champion VL, Skinner CS. The health belief model In Glanz K, Rimer BK, Viswanath K, eds. Health Behavior and Health Education: Theory, Research, and Practice. Jossey-Bass; 2008:45-65. [Google Scholar]
- 9.Ajzen I. The theory of planned behavior. Organ Behav Hum Decis Process. 1991;50(2):179-211. doi: 10.1016/0749-5978(91)90020-T [DOI] [Google Scholar]
- 10.Reiter PL, Brewer NT, Gottlieb SL, McRee AL, Smith JS. Parents’ health beliefs and HPV vaccination of their adolescent daughters. Soc Sci Med. 2009;69(3):475-480. doi: 10.1016/j.socscimed.2009.05.024 [DOI] [PubMed] [Google Scholar]
- 11.Donadiki EM, Jiménez-García R, Hernández-Barrera V, et al. Health belief model applied to non-compliance with HPV vaccine among female university students. Public Health. 2014;128(3):268-273. doi: 10.1016/j.puhe.2013.12.004 [DOI] [PubMed] [Google Scholar]
- 12.Abraham C, Sheeran P. The health belief model In Ayers S, Baum A, McManus C, et al. , eds. Cambridge Handbook of Psychology, Health, and Medicine. Cambridge University Press; 2014:97-102. doi: 10.1017/CBO9780511543579.022 [DOI] [Google Scholar]
- 13.Mehta P, Sharma M, Lee RC. Designing and evaluating a health belief model-based intervention to increase intent of HPV vaccination among college males. Int Q Community Health Educ. 2013-2014;34(1):101-117. doi: 10.2190/IQ.34.1.h [DOI] [PubMed] [Google Scholar]
- 14.Gerend MA, Shepherd JE. Predicting human papillomavirus vaccine uptake in young adult women: comparing the health belief model and theory of planned behavior. Ann Behav Med. 2012;44(2):171-180. doi: 10.1007/s12160-012-9366-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Askelson NM, Campo S, Lowe JB, Smith S, Dennis LK, Andsager J. Using the theory of planned behavior to predict mothers’ intentions to vaccinate their daughters against HPV. J Sch Nurs. 2010;26(3):194-202. doi: 10.1177/1059840510366022 [DOI] [PubMed] [Google Scholar]
- 16.Kahn JA, Rosenthal SL, Hamann T, Bernstein DI. Attitudes about human papillomavirus vaccine in young women. Int J STD AIDS. 2003;14(5):300-306. doi: 10.1258/095646203321605486 [DOI] [PubMed] [Google Scholar]
- 17.Mitra T, Counts S, Pennebaker JW Understanding anti-vaccination attitudes in social media. Published 2016. Accessed October 16, 2020. https://www.microsoft.com/en-us/research/uploads/prod/2019/05/antivax-icwsm16.pdf
- 18.Sadaf A, Richards JL, Glanz J, Salmon DA, Omer SB. A systematic review of interventions for reducing parental vaccine refusal and vaccine hesitancy. Vaccine. 2013;31(40):4293-4304. doi: 10.1016/j.vaccine.2013.07.013 [DOI] [PubMed] [Google Scholar]
- 19.Chan B, Lopez A, Sarkar U. The canary in the coal mine tweets: social media reveals public perceptions of non-medical use of opioids. PLoS One. 2015;10(8):e0135072. doi: 10.1371/journal.pone.0135072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.O’Connor B, Balasubramanyan R, Routledge BR, Smith NA From tweets to polls: linking text sentiment to public opinion time series. Published 2010. Accessed October 16, 2020. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1536/1842
- 21.Paul MJ, Dredze M You are what you tweet: analyzing Twitter for public health. Published 2011. Accessed October 16, 2020. http://www.cs.jhu.edu/~mpaul/files/2011.icwsm.twitter_health.pdf
- 22.Wojcik S, Hughes A. Sizing up Twitter users Pew Research Center. Published April 24, 2019. Accessed October 16, 2020. https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users/
- 23.Becker BFH, Larson HJ, Bonhoeffer J, van Mulligen EM, Kors JA, Sturkenboom MCJM. Evaluation of a multinational, multilingual vaccine debate on Twitter. Vaccine. 2016;34(50):6166-6171. doi: 10.1016/j.vaccine.2016.11.007 [DOI] [PubMed] [Google Scholar]
- 24.Radzikowski J, Stefanidis A, Jacobsen KH, Croitoru A, Crooks A, Delamater PL. The measles vaccination narrative in twitter: a quantitative analysis. JMIR Public Health Surveill. 2016;2(1):e1. doi: 10.2196/publichealth.5059 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Love B, Himelboim I, Holton A, Stewart K. Twitter as a source of vaccination information: content drivers and what they are saying. Am J Infect Control. 2013;41(6):568-570. doi: 10.1016/j.ajic.2012.10.016 [DOI] [PubMed] [Google Scholar]
- 26.Keelan J, Pavri V, Balakrishnan R, Wilson K. An analysis of the human papilloma virus vaccine debate on MySpace blogs. Vaccine. 2010;28(6):1535-1540. doi: 10.1016/j.vaccine.2009.11.060 [DOI] [PubMed] [Google Scholar]
- 27.Shah H. Twitter sentiment analysis. Int J Adv Res Comput Sci Softw Eng. 2018;7(12):15-21. doi: 10.23956/ijarcsse.v7i12.493 [DOI] [Google Scholar]
- 28.Brooks B. Using Twitter data to identify geographic clustering of anti-vaccination sentiments. Dissertation. University of Washington. 2014.
- 29.Salathé M, Khandelwal S. Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS Comput Biol. 2011;7(10):e1002199. doi: 10.1371/journal.pcbi.1002199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Shapiro GK, Surian D, Dunn AG, Perry R, Kelaher M. Comparing human papillomavirus vaccine concerns on Twitter: a cross-sectional study of users in Australia, Canada and the UK. BMJ Open. 2017;7(10):e016869. doi: 10.1136/bmjopen-2017-016869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Du J, Xu J, Song HY, Tao C. Leveraging machine learning-based approaches to assess human papillomavirus vaccination sentiment trends with Twitter data. BMC Med Inform Decis Mak. 2017;17(2)(suppl 2):69. doi: 10.1186/s12911-017-0469-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dunn AG, Leask J, Zhou X, Mandl KD, Coiera E. Associations between exposure to and expression of negative opinions about human papillomavirus vaccines on social media: an observational study. J Med Internet Res. 2015;17(6):e144. doi: 10.2196/jmir.4343 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Surian D, Nguyen DQ, Kennedy G, Johnson M, Coiera E, Dunn AG. Characterizing Twitter discussions about HPV vaccines using topic modeling and community detection. J Med Internet Res. 2016;18(8):e232. doi: 10.2196/jmir.6045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dunn AG, Surian D, Leask J, Dey A, Mandl KD, Coiera E. Mapping information exposure on social media to explain differences in HPV vaccine coverage in the United States. Vaccine. 2017;35(23):3033-3040. doi: 10.1016/j.vaccine.2017.04.060 [DOI] [PubMed] [Google Scholar]
- 35.Zhou X, Coiera E, Tsafnat G, Arachi D, Ong MS, Dunn AG. Using social connection information to improve opinion mining: identifying negative sentiment about HPV vaccines on Twitter. Stud Health Technol Inform. 2015;216(c):761-765. doi: 10.3233/978-1-61499-564-7-761 [DOI] [PubMed] [Google Scholar]
- 36.Nakov V, Ritter A, Rosenthal S, Sebastiani F, Stoyanov V SemEval-2016 task 4: sentiment analysis in Twitter. Published 2016. Accessed October 16, 2020. https://www.aclweb.org/anthology/S16-1001.pdf
- 37.Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko S. SemEval-2018 task 1: affect in tweets. Proc 12th Int Workshop on Semantic Eval. 2018:1-17. doi: 10.18653/v1/s18-1001 [DOI] [Google Scholar]
- 38.Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: sentiment analysis in Twitter. Proc 11th Int Workshop on Semantic Eval. 2017:502-518. doi: 10.18653/v1/S17-2088 [DOI] [Google Scholar]
- 39.Du J, Tang L, Xiang Y, et al. Public perception analysis of tweets during the 2015 measles outbreak: comparative study using convolutional neural network models. J Med Internet Res. 2018;20(7):e236. doi: 10.2196/jmir.9413 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Du J, Cunningham RM, Xiang Y, et al. Leveraging deep learning to understand health beliefs about the human papillomavirus vaccine from social media. NPJ Digit Med. 2019;2(1):27. doi: 10.1038/s41746-019-0102-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wissler L, Almashraee M, Monett D, Paschke A The gold standard in corpus annotation. Paper presented at: 5th IEEE Germany Student Conference; June 26-27, 2014; Passau, Germany. [Google Scholar]
- 42.Du J, Xu J, Song H, Liu X, Tao C. Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets. J Biomed Semantics. 2017;8(1):9. doi: 10.1186/s13326-017-0120-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Center for Biomedical Semantics and Data Intelligence HPV Twitter Corpus. Accessed October 20, 2020. https://github.com/UT-Tao-group/HPV_Twitter_Corpus
- 44.Valentini G, Masulli F. Ensembles of learning machines. Lecture Notes in Computer Science. 2002. 2486:3-20. doi: 10.1016/b978-0-12-417295-1.00020-5 [DOI] [Google Scholar]
- 45.Cleveland RB, Cleveland WS, McRae JE, Terpenning I. STL: a seasonal-trend decomposition procedure based on Loess. J Off Stat. 1990;6(1):3-73. Accessed October 21, 2020. https://www.wessa.net/download/stl.pdf [Google Scholar]
- 46.Github. twitter-user-geocoder. Accessed October 21, 2020. https://github.com/bianjiang/twitter-user-geocoder
- 47.World Health Organization. Ten threats to global health in 2019. Accessed May 11, 2020. https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019
- 48.Gilkey MB, Calo WA, Marciniak MW, Brewer NT. Parents who refuse or delay HPV vaccine: differences in vaccination behavior, beliefs, and clinical communication preferences. Hum Vaccin Immunother. 2017;13(3):680-686. doi: 10.1080/21645515.2016.1247134 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Rouan R. Robert F. Kennedy Jr., 120 others at Statehouse blast vaccinations. Updated June 27, 2019. Accessed October 19, 2020. https://www.dispatch.com/news/20190626/robert-f-kennedy-jr-120-others-at-statehouse-blast-vaccinations
- 50.Smith TC. Vaccine rejection and hesitancy: a review and call to action. Open Forum Infect Dis. 2017;4(3):ofx146. doi: 10.1093/ofid/ofx146 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Teague TA, Shay A, Healy CM, et al. Social media and HPV vaccine hesitancy: an emergent concern for pediatric providers. Poster presentation at: National Foundation for Infectious Diseases; Washington, DC; November 16-17, 2019. [Google Scholar]
- 52.Fung ICH, Tse ZTH, Fu KW. The use of social media in public health surveillance. Western Pac Surveill Response J. 2015;6(2):3-6. doi: 10.5365/WPSAR.2015.6.1.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hughes A, Wojcik S. 10 facts about Americans and Twitter. Pew Research Center. Published August 2, 2019. Accessed July 2, 2020. https://www.pewresearch.org/fact-tank/2019/08/02/10-facts-about-americans-and-twitter/
- 54.Duan R, Cao M, Wu Y, et al. An empirical study for impacts of measurement errors on EHR based association studies. AMIA Annu Symp Proc. 2017;2016:1764-1773. [PMC free article] [PubMed] [Google Scholar]
- 55.Chen Y, Wang J, Chubak J, Hubbard RA. Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: empirical illustration using breast cancer recurrence. Pharmacoepidemiol Drug Saf. 2019;28(2):264-268. doi: 10.1002/pds.4680 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eFigure 1. Prevalence of Constructs From HBM
eFigure 2. Prevalence of Attitudes From TPB
eFigure 3. Interstate Variations on the Prevalence of HBM Constructs on Social Media
eFigure 4. Interstate Variations on the Prevalence of TPB Attitude on Social Media
eTable. Glossary on Artificial Intelligence and ML-Relevant Concepts Used in This Study
eAppendix. Supplemental Methods