Comparing COVID-19 vaccine passports attitudes across countries by analysing Reddit comments

Muhammet Mücahit Enes Yurtsever; Muhammad Shiraz; Ekin Ekinci; Süleyman Eken

doi:10.1177/01655515221148356

. 2023 Feb 3:01655515221148356. doi: 10.1177/01655515221148356

Comparing COVID-19 vaccine passports attitudes across countries by analysing Reddit comments

Muhammet Mücahit Enes Yurtsever ¹, Muhammad Shiraz ², Ekin Ekinci ³, Süleyman Eken ^4,^✉

PMCID: PMC9899678

Abstract

Topic mining and sentiment polarity analysis together can adequately represent the topics and attitudes of users. The goal of this article is to use Reddit’s location-based subreddits to look at country-level differences in attitudes towards COVID-19 vaccine passports. We used sentiment analysis and latent topic modelling on textual data obtained from 18 Reddit communities concentrating on COVID-19 vaccine passports from 1 January 2021 to 28 February 2022 to study COVID-19 vaccine passports–related discussion on Reddit. To discover changes in sentiment and latent topics, 11,168 comments were aggregated and examined by month. The number of comments on postings from country-specific subreddits was positively proportional to the number of new COVID-19 cases reported each day. The more subjective expressions and positive/negative interpretations occurred after July 2021. Communities indicated more positive sentiment than negative sentiment towards vaccine passports–related topics, according to polarity analysis. Topic modelling found that community members were concerned about a variety of concerns related to their socioeconomic status. Throughout the topic modelling, keywords suggesting people’s privacy concerns and acceptance of various COVID-19 control methods were found. The use of public opinion and topic modelling to analyse vaccine passports could help with important global health informatics concerns associated with their socioeconomic status.

Keywords: Reddit, sentiment analysis, topic modelling, vaccine certificate, vaccine passports

1. Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has wreaked havoc on individuals, businesses and governments all across the world. COVID-19 has caused approximately 493 million cases and nearly 6.2 million fatalities around the world¹ as of this writing, making it the most important pandemic of our lifetime. The global spread of the COVID-19 virus, as well as its long-term persistence, has had far-reaching social, economic and health consequences. Efforts to manage and control the virus, as well as cope with its ramifications, have been complicated by the virus’s unpredictable propagation and return in many forms. However, policymakers are becoming increasingly hesitant to rely primarily on containment and mitigation methods, and must strike a balance between economic and social revivals and containment/mitigation activities.

A vaccine passport (VP) is one of the risk management solutions being considered by countries and enterprises to prevent COVID transmission and mitigate the social and economic effects of the disease [1]. VPs are a type of authorised ID, often digital, that contains the bearer’s health and immunisation information. States, businesses and countries might employ VPs to allow people to travel internationally or attend huge gatherings like concerts, conferences and athletic events [2]. This passport has a standardised format that may be used across countries to identify people who have been vaccinated against the virus, potentially minimising confusion/fraud and cutting transaction/information costs. The fundamental goals are to reduce the danger of transmission while also making travel easier, which would help countries that rely largely on tourism or regions where concerts or other large gatherings occur frequently. In some ways, the VP concept is not wholly new in terms of history. People who travelled to certain parts of the world for years had to show papers – or a medical passport known as a yellow card to verify that they had received immunisations against diseases such as yellow fever, cholera and rubella. However, not all countries are on board with the VP concept, and public acceptance does not appear to be universal [3–5].

The pandemic highlighted the need for health information technology (HIT) assistance in efforts to recognise, prevent and manage the pandemic, assuring well-informed, data-driven decisions and facilitating evidence-based policymaking. According to countries’ income levels, preparedness and response to the pandemic may vary. The importance of HIT infrastructure in low- and middle-income countries (LMICs) has been emphasised. Besides HIT system, global health informatics and data governance in LMIC are very important during and after the pandemic. Key global health informatics issues related to the implementation of VPs are critical in this context. Standardisation of certification, compatibility and acceptability of some vaccines as verifiable credentials over the world are additional issues. So, we compare public sentiment towards COVID-19 VPs across countries by analysis of comments on Reddit social media platforms.

Various agencies and countries have begun to implement COVID-19 vaccine methods such as World Health Organization’s smart vaccine certificate, European Union’s (EU) Digital COVID Certificate, IATA Travel Pass, IBM Digital Health Pass and green pass–like ones. Besides them, researchers have investigated people’s privacy issues and whether or not COVID-19 control mechanisms were effective. One way of them is to use social media data. Khan et al. [6] focused on how the COVID-19 passport was being debated and perceived on Twitter, as well as the key players participating in the early debate. They discovered that the media might play a critical role in raising good knowledge about a vaccine certificate or a passport, which could help people get back to normal. Melton et al. [7] conducted a sentiment analysis and latent Dirichlet allocation (LDA) topic modelling on textual data to analyse COVID-19 vaccine–related debate in social media. Topic modelling suggested that community members were more concerned with adverse effects than with bizarre conspiracy theories. The goal of Gulati [8] was to address a huge research gap in the tourist literature by focusing on a new type of tourism that branched off from COVID-19. On Twitter, he investigated the social mood and emotion surrounding vaccine tourism and tried to categorise them into eight main emotions: joy, disgust, fear, wrath, anticipation, grief, trust and surprise. Following vaccine scientific disclosures, Kumar et al. [9] examined COVID-19 disinformation on Reddit using topic modelling to understand changes in word prevalence within topics and social network analysis to determine the relationship between Reddit communities (subreddits). Wu et al. [10] used LDA and Linguistic Inquiry and Word Count (LIWC) text analysis to define the relevant conversation about COVID-19 vaccines and combine quantitative and qualitative comparisons to develop insights. Aygün et al. [11] conducted aspect-base sentiment analysis for the United States, the United Kingdom, Canada, Turkey, France, Germany, Spain and Italy showing the approach of Twitter users to vaccine and vaccine types during the COVID-19 period. A total of 4 different aspects (policy, health, media and others) and 4 different Bidirectional Encoder Representations from Transformers (BERT) models (mBERT-base, BioBERT, ClinicalBERT and BERTurk) were used. Yan et al. [12] studied city-level variances in attitudes towards vaccine-related subjects using Reddit’s location-based subreddits. They showed how data from city-specific subreddits can be used to gain a better understanding of local concerns and sentiments about COVID-19 vaccines. This could lead to more targeted and publicly acceptable policies based on social media postings. From a collection of geo-located COVID-19 vaccine–related tweets from London, UK, Lanyi et al. [13] used a natural language processing (NLP) tool to quickly identify and analyse important hurdles to vaccine adoption. By using text mining technique with LDA procedure to undertake an overview of a large body of coronavirus literature, Cheng et al. [14] showed how information specialists could benefit the health and medical community. Chamorro-Padial et al. [15] described an information retrieval system that chooses pertinent works associated with particular concepts using latent information. They found important ideas connected to a certain query and a corpus by using LDA models on COVID-19-related articles. Avcı and Yıldız Durak [16] examined the knowledge of digital citizenship, methods for finding information online and the distinction of information literacy levels based on the use of digital technology both before and during the COVID-19 epidemic. They also looked at if there were any relationships between the research’s factors using multivariate analysis of variance (MANOVA). Zhao et al. [17] compared the number of publications with the number of COVID-19 cases that were reported, looked at scientific collaboration and showed academic response patterns at a global, regional and national level. They also contrasted the academic activity linked to this newly discovered infection with those linked to persistent infections. This study explores the following research questions:

RQ1. What is the general public’s opinion or sentiment towards the COVID-19 VPs on Reddit, the most popular social media platform?
RQ2. What are the main topics related to COVID-19 identified from Reddit comments?
RQ3. What is a correlation between engagement level on Reddit and daily new COVID-19 cases?
RQ4. Which countries show stronger positive sentiment in VPs comments?
RQ5. How does the polarity change over time for each country?
RQ6. How does the subjectivity change over time for each country?
RQ7. What is the change in latent topics on a polarity basis?
RQ8. What is the change in latent topics on a monthly basis?
RQ9. How do the latent topics change between countries?

The contributions of this work to the literature are as follows:

We use location-based subreddits on Reddit to study country-level variations in sentiments towards COVID-19 VP-related topics.
We characterise differences in COVID-19 VP-related topics and sentiments among countries.

The remainder of this article is organised as follows: we give the details of the proposed work in Section ‘Material and Methods’. Section ‘Results’ gives, first, our own created data set and, second, performance results. Section ‘Discussion’ summarises the key findings and highlights the strengths and the limitations of the study. Finally, the last section presents the ‘Conclusion’ of this work and states the significance of it.

2. Material and methods

This section presents material and methods of the proposed system. Figure 1 shows the research flow.

2.1. Data acquisition

We gathered data from the Reddit information-sharing social media network, which is currently viewed by around 430 million users, to analyse public attitudes regarding the COVID-19 VP. Reddit is a massive online community wherein users can post content into content-specific sub-communities called ‘subreddits’ to be viewed by others within and without the subreddit. Users can ‘upvote’ or ‘downvote’ posts to shift their rankings on the website and in this way as a community dictates which posts are more easily seen by the wider community and which are not. Users can also comment on and discuss posted content under a chosen pseudonym in each post’s ‘thread’ and comments can also be upvoted or downvoted as well as ‘gilded’ to show the community’s reaction to them. Reddit also uses other metrics to score comments such as ‘controversiality’ and ‘distinguished’.

Reddit comments were acquired using the Pushshift Reddit API [18]. We harvested approximately 11,200 comments from 18 subreddits (India, Pakistan, Philippines, Ukraine, Vietnam, China, Lebanon, Malaysia, Russia, Thailand, Turkey, Canada, New Zealand, UK, Australia, Germany, CoronavirusUS and Norway). Relevant comments were acquired by searching the keywords ‘vaccine passport’, ‘covid passport’ and ‘vaccine certificate’. In the selection of the countries where the comments will be collected, the countries grouped (as upper-income, upper-middle-income and lower-middle-income) according to the gross national product per capita report prepared by the World Bank were used.

We also used statistics for COVID-19 cases based on date and new case fields of COVID-19 in these countries. These data were obtained from Our World in Data (OWID)² COVID data set. OWID is a scientific online publication that focuses on large global problems such as poverty, disease, hunger, climate change, war, existential risks and inequality.

2.2. Data pre-processing

Several pre-processing steps were operated on raw data. First, the body column of Reddit comments was converted to string type and unnecessary columns, NaN and NaT rows were dropped. Then, all Reddit comments were converted to lowercase, and nonalphabetic letters, URLs, hyperlinks, emojis,³ special characters, excess new lines and references to other users were removed using regular expressions. Texts were then lemmatized using spaCy and stripped of stop words using Gensim.⁴ We used NLP tools in Python to do sentiment analysis and non-negative matrix factorisation (NMF) topic modelling after the data were cleaned and structured.

2.3. Data overview

Between January 2021 and February 2022, 11,615 comments were collected from the 18 subreddits. After pre-processing, 11,168 valid comments were found. Table 1 shows descriptive statistics of subreddits. The first group is for upper-income countries, the second one is for upper-middle-income countries and the last is for lower-middle-income countries.

Table 1.

Descriptive statistics for extracted comments.

Subreddit	Vaccine passport	COVID passport	Vaccine certificate	Total	%
Canada	4246	1215	184	5645	50
New Zealand	1063	358	370	1791	16
UK	736	359	80	1175	10.52
Australia	327	156	85	568	5.14
Germany	93	92	146	331	2.9
CoronavirusUS	152	48	9	209	1.8
Norway	28	33	94	155	1.4
China	19	62	6	87	0.77
Lebanon	10	8	11	29	0.26
Malaysia	79	52	69	200	1.8
Russia	6	18	9	33	0.29
Thailand	111	90	92	293	2.6
Turkey	13	14	5	32	0.28
India	437	35	286	358	3.2
Pakistan	8	15	13	36	0.32
Philippines	65	56	64	185	1.65
Ukraine	13	24	14	51	0.45
Vietnam	28	20	22	70	0.62

Open in a new tab

2.4. Topic modelling

Topic models are built on the simple assumption that documents are collections of topics, with each topic representing a probability distribution over words. Topic models’ major goal is to find the principal themes in a corpus of documents that are supposed to be thematically comparable, cohesive and self-contained [19]. The probabilistic graphical models, in which topics are commonly represented as distributions over words and documents are represented as distributions over topics, are the most prominent topic models. A good example is LDA [20]. NMF, a topic mining method that differs from the previous probabilistic views, models the underlying components as coordinate axes, with each document corresponding to a unique point in the latent linear space from a geometric perspective [21].

Lee and Seung [22] proposed a subspace approach–dubbed NMF for its non-subtractive representation property, which has been used in size reduction and factor analysis. NMF is a statistical model for factoring or decomposing a non-negative input matrix into two non-negative submatrices. It can only include non-negative elements. The non-negative state reflects the natural representation of data in many application fields. Low-order NMF not only enables users to work with reduced-dimensional models, but also improves the efficiency of statistical classification, grouping and data organisation in general. As a commonly utilised dimension reduction technique, standard NMF performs well for text clustering and topic modelling. Because the squared 2-norm is not resilient to outliers and disturbances, certain robust NMF approaches are used, such as using robust error functions or half-quadratic minimisation.

The resulting matrix factorisation from NMF applied to an input term-document matrix yields a topic model interpretation. More specifically, the term-document matrix $M$ and $M \in R^{m \times n}$ is commonly encoded in two lower dimensional non-negative matrices: term-topic $U$ , where $U \in R^{m \times k}$ , and topic-document $V$ , where $V \in R^{k \times n}$ , in which each column of $U$ may be considered as a topic and each column of $V$ can be treated as a compact embedding in the latent topic space.

We write the matrix $M$ in terms of $U$ and $V$ matrices according to equation (1)

M \approx U \times V

(1)

Each data point, which is represented by the column of M, can be approximated by an additive combination of non-negative basis vectors, which are represented by the columns of U. Because the purpose of dimension reduction is to find a compact representation, k must satisfy the condition $k < m$ and $k < n$ .

NMF is an unsupervised machine learning algorithm. The quantification of the distance between items is at the heart of unsupervised learning. Various methods can be used to determine the distance. Some of them are categorised as generic – Frobenius norm, Kullback–Leibler divergence and so on. The NMF based on the Frobenius norm was used in this article. The equation of Frobenius norm is given in equation (2)

mi n_{U \geq 0, V \geq 0} f (U, V) = ∥ M - UV ∥_{F}^{2}

(2)

2.5. Sentiment analysis

Sentiment analysis is a type of NLP that searches the Internet for user sentiments in the form of reviews, comments and other types of content. Nowadays, social media platforms such as Facebook, Twitter and Reddit are frequently utilised to share user evaluations on a variety of topics, including movies, news, food, fashion, politics and COVID-19. Reviews and opinions are important in determining a user’s level of satisfaction with a given business. The polarity, that is, positive, negative and neutral, is then determined using these reviews. Models of sentiment analysis concentrate on polarity (positive, negative and neutral), as well as feelings and emotions (angry, happy, sad and so on), and even intentions (e.g. interested vs not interested). If polarity precision is so important, polarity categories might be expanded to fine-grained (five levels). In this article, we considered polarity categories as coarse-grained (three levels). TextBlob [23] is a Python package for textual data processing. It offers a basic API for doing standard NLP activities like part-of-speech tagging, noun phrase extraction, sentiment analysis, classification and translation, among others. Two features are the main features for sentiment analysis with TextBlob. Subjectivity gives numerical information about how subjective the text is. As it goes above 0, the text contains more subjective expressions. Polarity provides numerical information about how positive or negative the text is. While 0 represents neutral for most data sets, values above 0 indicate a positive interpretation, and values below a negative interpretation. In this article, the texts were labelled into three different categories as positive, negative and neutral using the polarity value. Considering all comments, the positive ones are superior to the other two ones. It shows that people are positive about COVID-19 VPs.

3. Results

All analyses were performed using Python (version 3.7). Our code can be found on GitHub at https://github.com/meyurtsever/public-sentiment-over-vaccine-passports.

3.1. Sentiment-based results

One of the research questions of this article is to create a high-functioning sentiment analysis that successfully identifies the sentiment of Reddit comments about COVID-19 VPs. After labelling the data set, we explored different traditional machine learning techniques, deep learning models and contextualised word embedding models for sentiment analysis. Considering all models, BERT has the best performance metrics. Since contextualised word embedding models take context variation into account when representing words, the best performances among all realised models are those of the BERT and DistilBERT. Table 2 shows performance results of different methods for sentiment analysis.

Table 2.

Performance results of different methods for sentiment analysis.

Methods	Accuracy	F-measure	Precision	Recall
SVM	0.84	0.82	0.83	0.81
CNN (1D)	0.89	0.85	0.87	0.82
MLP	0.80	0.76	0.81	0.71
LSTM	0.88	0.86	0.87	0.86
BERT	0.94	0.95	0.95	0.96
MBERT	0.94	0.94	0.93	0.93
RoBERTa	0.90	0.89	0.89	0.89
DistilBERT	0.94	0.95	0.95	0.94
GPT-2	0.85	0.84	0.86	0.83

Open in a new tab

Another research question asks which countries show stronger positive sentiment in VPs comments. Figure 2 shows the number of positive and negative comments for each country. It shows that Canada has more positive sentiments than other countries. Polarity and subjectivity changes over time for each country are important issues. Figure 3(a) and (b) shows polarity and subjectivity changes over time, respectively. Both of them indicate that more subjective expressions and positive/negative interpretations occurred after July 2021.

Figure 3. — Sentiment-based analysis over time: (a) Polarity of comments over time and (b) Subjectivity of comments over time.

3.2. Data visualisation

Data visualisation (charts, infographics, etc.) is a wonderful approach to conveying significant information from a data set. Wordcloud,⁵ a Python programming library, is used to implement the solution for textual-based visualisation. A word cloud is a cloud that contains a large number of words of various shapes and sizes. The size of each word indicates its frequency or importance; a larger size indicates more frequently repeated words. The word clouds for different countries can be seen in Figure 4(a)–(c).

Figure 4. — Visualisation-based analysis: (a) Word cloud for India, (b) word cloud for Thailand and (c) word cloud for Canada.

3.3. Correlation-based analysis

Pearson correlation was used to determine the relationship between the amount of Reddit comments and new cases of COVID-19. In general, the number of comments on daily update posts shows the same pattern over time and has been found to be directly linked with the number of new COVID-19 cases in all countries. Figure 5(a) and (b) shows line chart showing the correlation between COVID-19 case counts and comment counts for New Zealand as an example (R = 0.30436).

Figure 5. — Correlation-based analysis: (a) Correlation between COVID-19 case counts and comment counts for New Zealand and (b) best fit line.

3.4. Polarity-based comparison

For all developed, middle and less developed countries, the NMF model results in latent 10 topics discussed in positive, negative and neutral comments separately. It has been observed that the NMF topic model produces interesting results. When the positive comments are examined, it is seen that society’s view of the COVID-19 vaccine is discussed under the headings of topic 1, topic 3, topic 4 and topic 5. Keywords in these topics may suggest that individual rights and freedoms, which prioritise body immunity and compulsory vaccine in order to combat the disease that will also affect vulnerable groups, may bring anti-vaccine opposition to the fore. Also, whether mandatory vaccine entails a violation of fundamental rights and freedoms or is the vaccine necessary or not topics are discussed. Under the herd immunity topic, it is claimed that natural immunity provides stronger immunity. We made this inference with the major keywords such as vaccinated, unvaccinated, spread, herd, immunity, freedom, rights, charter, law, anti, vax and vaxxer under the related topics. Topics 2 and 6 show that there is talk about epidemic measures in Canada. The remaining issues are on the topics of risk status monitoring, travel restriction, vaccine rate and social distance. The keywords of prevention of disease and prevention of the spread of contagion are included under these headings. The topics extracted from positive comments are given in Figure 6(a). In negative comments, it is seen that COVID-19 in society, anti-vax people and rejection are mentioned in the topics. There are two situations here: the first is that the opposition to the vaccine is wrong, and the second is that forcing the vaccine will cause problems such as mandate, restriction of freedom and contempt of anti-vaxxers. Other frequent topics encountered in negative comments are thoughts about contamination and international travel, vaccine cards and their application. Especially, people who have been vaccinated just for the necessity of a vaccine card for international travel make negative comments. Imposing sanctions on un-vaccinated persons who will cause contamination is also discussed under these topics. For example, a negative comment states, ‘I think government benefits should be withdrawn from un-vaccinated people who can’t provide a medical certificate (similar to how un-vaccinated children don’t receive the childcare benefit)’.

Figure 6. — Topic extraction: (a) Topics for positive comments, (b) topics from comments of developed countries and (c) topics in March 2021.

It has been observed that the neutral comments are mostly made on the vaccine card, and it has been observed that there are comments on the QR code, COVID-19 in society, the necessity of vaccines, the opposition to the vaccine and business life.

3.5. Comparison across countries

In this section, a comparison has been made by considering the countries according to their development levels. First of all, the topics extracted from the comments made by developed countries’ people are examined. COVID-19 in society, VP, international travel, government plan, pandemic precautions of the Canadian government, vaccine check strategy, indoor rules, vaccine dose, anti-vaccine and COVID-19 truck blockade in Canada. The strategies implemented by the developed countries in the fight against the virus and the attitudes of the nations of these countries towards these strategies have emerged as the topics discussed. The extracted topics for developed countries are given in Figure 6(b). When we examine it for developing countries, it has seen that the topics commented on are mostly about travel. Under these topics, Sinovac, dose, vaccine, certificate, flight, hotel, tourist, EU and so on are the most encountered topic words. Especially, travel restrictions within and between countries, and preference for vaccines in travel restrictions are the most mentioned comments in the data set. There are comments about the travel restrictions faced by those who have had the Sinovac vaccine. For example, a comment states, ‘Saudi still won’t accept Sinovac vaccinated person for now’. Because developing countries have the Sinovac vaccine, they are exposed to travel restrictions, and their vaccine cards are not valid. The topics for less developed countries are vaccine cards, VPs, travel, policies of governments, COVID-19 tests, vaccine administration and herd immunity. The main concerns in the comments are vaccine apprehension, insufficient doses and a low vaccine rate compared with developed countries. The government’s inability to vaccinate, and as a result, the excessive increase in the number of cases, is emphasised. It has been seen that those who want to immigrate to developed countries with the negative effects of COVID-19 have commented.

3.6. Monthly analysis

The topics discussed in the 13-month period between February 2021 and February 2022 are examined. Our findings for February 2021 are that 50% of the topics discussed are about the vaccines. VPs, vaccine status follow-up and vaccine obligations were the main topics. The reason for this is that February 2021 coincided with the intensification of VP discussions. By March 2021, as given in Figure 6(c), it was seen that the topics discussed were related to the Oxford-AstraZeneca vaccine because this vaccine was proven to be safe. Therefore, the vaccine policy of countries, health information technologies, survival, vaccine card and the COVID-19/flu epidemic were the most frequently commented on topics. Also, quarantine and the economy were among the most talked-about topics in March. The topics discussed in April 2021 showed that the focus was on vaccine certificates, vaccine companies, vaccines for the young population, cold/COVID-19 confusion, deaths and infectious diseases. Since May 2021 was when travel plans were started to be made and restrictions were relaxed a little, vaccine, travel, VP, vaccine rate, travel, vaccine information, anti-vaccine and travel conditions emerged as the most frequently spoken topics in this period. Travel and travel/contagion relationship for June 2021, preventive measures that governments should take, such as vaccine card/certificate application, vaccine status of the population, vaccine follow-up application and polymerase chain reaction (PCR) test application at country entrances could be summarised as the subject headings and subject words under these headings. The topics discussed in July 2021 were discovered as vaccine, VP/certificate, testing, travel to EU countries and registration depending on travel. The topics in August 2021 were public health, risk, workplaces, vaccine obligatory, anti-vaccine and personal rights. When September 2021 was examined, the topics related to anti-vaccine in Quebec province of Canada were obtained. When September 2021 was examined, it was observed that the actions of anti-vax protesters in the province of Quebec and the related events taking place in the newspapers. October 2021 appeared as a period in which topics related to the vaccine of Israel were discussed. To be deemed fully immunised, Israel required a booster dose. More than a million people’s VPs could be revoked soon if they did not receive a booster or provide proof of current COVID-19 immunisation. November 2021 was again a period in which comments were made on VPs/certificates, vaccine obligations and vaccines accepted by countries. In December 2021, frequent comments were observed on the lockdown period, mask, vaccine and COVID-19 test. In January 2022, unlike in other months, it was observed that vaccine expenditures were one of the most frequently talked-about topics. From the comments of February 2022, the topics of COVID-19, prohibitions, vaccine obligation, reminder dose, effects of vaccines, protection, removal of restrictions and contagion were extracted.

4. Discussion

4.1. Principal findings

Using the NMF topic modelling, correlation and sentiment polarity analysis, this study looked at the topic features and sentiment distribution for distinct communities across multiple dimensions. There were 10 key themes, of which the travel and vaccine cards topic categories represented the largest proportion. Emotional variations were noted in different topics, inferring that user attitudes and statuses differed.

4.2. Implications

Because the prevalence of COVID-19 vaccines is increasing over the world, many countries considered implementing proof-of-vaccine documents such as passports. As a public health measure, such measures attempted to aid in the reopening of the economy as well as the removal of other public health limitations, including masking and physical distancing.

Our analysis has three important implications as the first to characterise COVID-19 vaccination passport discussions on Reddit. First, according to our sentiment analysis and topic modelling of Reddit data, while many people support VPs, viewing them as a guarantee of low risk of becoming infected by all other people, some people have a negative attitude towards them, viewing these tools as restricting their freedom. Second, topic modelling found that community members were concerned about a variety of concerns related to their socioeconomic status. Third, based on our findings, social media platforms and policymakers should be able to devise more successful tactics for gaining public acceptance and trust in vaccination passports by obtaining a more detailed landscape of online debates.

Sentiment analysis (i.e. opinion mining) is a technique in NLP that aims to identify and classify the emotion (positive, negative and neutral) related to a text. It is especially helpful for decision-makers or authorities about an important issue, making it easier to identify and analyse people’s comments automatically. It has been observed that government authorities and policies affect the vaccination decision of community members. However, it was also observed that most of the people expressed their negative thoughts regardless of the rules and pressures. Overall, we have found that people feel a strong duty to protect themselves and others, but some question their vaccination imperatives and are concerned about COVID-19 vaccines because due long-term safety of vaccines has not yet been established.

4.3. Limitations and future directions

Although the results are based on the analysis, there are several drawbacks that may encourage additional research. Our conclusions were based on the accuracy of data gathered using our search phrases. We searched Reddit for articles about the COVID-19 VP and found text fragments that were typical of VP perceptions. As a result, we are confident in the accuracy of our data. Any usage of Reddit data has a number of drawbacks and limits. The demographic mix of Reddit users is unknown because no personal information is gathered. According to some evidence, Reddit users are more likely to be male and younger than the general population. Our findings should be evaluated in light of our data’s likely gender and age skew.

We may have overlooked some viewpoints or worries underlying comments in our present research. So, in future works, we should amplify the scale of data with other social media platforms to gain a more accurate view of conceptual features of the COVID-19 VP discussion on Reddit.

Despite the fact that Reddit user interactions (i.e. upvotes and downvotes) may aid us better gain a better understanding of trending topics, we did not include them in this work. This can be accomplished in future works. Whereas this study gives some knowledge into the popular consciousness, more research is needed to know how to reach out to populations as opposed to the COVID-19 VP and fight false news.

5. Conclusion

Topic mining and sentiment polarity analysis together can adequately represent the topics and attitudes of Reddit users regarding VPs. This study provides new perspectives for understanding the key global health informatics issues related to the implementation of vaccination passports. Overall, our findings show that social media data can be used to better explain pandemic issues and sentiments at the national level, allowing for more tailored and publicly reasonable policies.

^1.

https://www.worldometers.info/coronavirus/coronavirus-death-toll/

^2.

https://ourworldindata.org/

^3.

https://pypi.org/project/emoji/

^4.

https://pypi.org/project/gensim/

^5.

https://pypi.org/project/wordcloud/

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iD: Süleyman Eken Inline graphic https://orcid.org/0000-0001-9488-908X

Contributor Information

Muhammet Mücahit Enes Yurtsever, Department of Information Systems Engineering, Kocaeli University, Turkey.

Muhammad Shiraz, Department of Information Systems Engineering, Kocaeli University, Turkey.

Ekin Ekinci, Department of Computer Engineering, Sakarya University of Applied Sciences, Turkey.

Süleyman Eken, Department of Information Systems Engineering, Kocaeli University, Turkey.

References

[1].Sharif A, Botlero R, Hoque Net al. A pragmatic approach to COVID-19 vaccine passport. BMJ Glob Health 2021; 6(10): e006956. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Goel RK, Jones JR. Managing the risk of COVID-19 via vaccine passports: modeling economic and policy implications. Manag Decis Econ 2022; 43: 2578–2586. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Hall MA, Studdert DM. ‘Vaccine passport’ certification—policy and ethical considerations. New Engl J Med 2021; 385(11): e32. [DOI] [PubMed] [Google Scholar]
[4].Wilford SH, McBride N, Brooks Let al. The digital network of networks: regulatory risk and policy challenges of vaccine passports. Eur J Risk Regul 2021; 12(2): 393–403. [Google Scholar]
[5].Pavli A, Maltezou HC. COVID-19 vaccine passport for a safe resumption of travel. J Travel Med 2021; 18: 8800. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Khan ML, Malik A, Ruhi Uet al. Conflicting attitudes: analyzing social media data to understand the early discourse on COVID-19 passports. Technol Soc 2022; 68: 101830. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Melton CA, Olusanya OA, Ammar Net al. Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: a call to action for strengthening vaccine confidence. J Infect Publ Health 2021; 14(10): 1505–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Gulati S. Decoding the global trend of ‘vaccine tourism’ through public sentiments and emotions: does it get a nod on Twitter? Glob Knowl Mem Commun 2022; 71: 899–915. [Google Scholar]
[9].Kumar N, Corpus I, Hans Met al. COVID-19 vaccine perceptions: an observational study on Reddit, 2021, https://www.medrxiv.org/content/10.1101/2021.04.09.21255229v1.full.pdf [DOI] [PMC free article] [PubMed]
[10].Wu W, Lyu H, Luo J. Characterizing discourse about COVID-19 vaccines: a Reddit version of the pandemic story. Health Data Sci 2021; 2021: 9837856. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Aygün Ï, Kaya B, Kaya M. Aspect based Twitter sentiment analysis on vaccination and vaccine types in COVID-19 pandemic with deep learning. IEEE J Biomed Health Inform 2022; 26: 2360–2369. [DOI] [PubMed] [Google Scholar]
[12].Yan C, Law M, Nguyen Set al. Comparing public sentiment toward COVID-19 vaccines across Canadian cities: analysis of comments on Reddit. J Med Intern Res 2021; 23(9): e32685. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Lanyi K, Green R, Craig Det al. COVID-19 vaccine hesitancy: analysing Twitter to identify barriers to vaccination in a low uptake region of the UK. Front Digit Health 2021; 3: 804855. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Cheng X, Cao Q, Liao SS. An overview of literature on COVID-19, MERS and SARS: using text mining and latent Dirichlet allocation. J Inform Sci 2022; 48(3): 304–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Chamorro-Padial J, Rodrigo-Ginés FJ, Rodríguez-Sánchez R. Finding answers to COVID-19-specific questions: an information retrieval system based on latent keywords and adapted TF-IDF. J Inform Sci 2022; 2022: 1110995. [Google Scholar]
[16].Avcı Ü, Yıldız Durak H. Examination of digital citizenship, online information searching strategy and information literacy depending on changing state of experience in using digital technologies during COVID-19 pandemic. J Inform Sci 2022; 2022: 1114455. [Google Scholar]
[17].Zhao W, Zhang L, Wang Jet al. How has academia responded to the urgent needs created by COVID-19? A multi-level global, regional and national analysis. J Inform Sci 2022; 2022: 1084646. [Google Scholar]
[18].Baumgartner J, Zannettou S, Keegan Bet al. The Pushshift Reddit dataset. In: Proceedings of the international AAAI conference on web and social media, vol. 14, pp. 830–839, https://ojs.aaai.org/index.php/ICWSM/article/view/7347/7201 [Google Scholar]
[19].Ekinci E, Omurca SI. NET-LDA: a novel topic modeling method based on semantic document similarity. Turk J Electr Eng Comput Sci 2020; 28(4): 2244–2260. [Google Scholar]
[20].Ekinci E, Omurca SI. Concept-LDA: incorporating Babelfy into LDA for aspect extraction. J Inform Sci 2020; 46(3): 406–418. [Google Scholar]
[21].Chen Y, Zhang H, Liu Ret al. Experimental explorations on short text topic mining between LDA and NMF based schemes. Knowl Based Syst 2019; 163: 1–13. [Google Scholar]
[22].Lee DD, Seung HS. Learning the parts of objects by nonnegative matrix factorization. Nature 1999; 401(6755): 788–791. [DOI] [PubMed] [Google Scholar]
[23].Loria S. TextBlob documentation. Release 015 2018; 2: 269. [Google Scholar]

[bibr1-01655515221148356] [1].Sharif A, Botlero R, Hoque Net al. A pragmatic approach to COVID-19 vaccine passport. BMJ Glob Health 2021; 6(10): e006956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr2-01655515221148356] [2].Goel RK, Jones JR. Managing the risk of COVID-19 via vaccine passports: modeling economic and policy implications. Manag Decis Econ 2022; 43: 2578–2586. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr3-01655515221148356] [3].Hall MA, Studdert DM. ‘Vaccine passport’ certification—policy and ethical considerations. New Engl J Med 2021; 385(11): e32. [DOI] [PubMed] [Google Scholar]

[bibr4-01655515221148356] [4].Wilford SH, McBride N, Brooks Let al. The digital network of networks: regulatory risk and policy challenges of vaccine passports. Eur J Risk Regul 2021; 12(2): 393–403. [Google Scholar]

[bibr5-01655515221148356] [5].Pavli A, Maltezou HC. COVID-19 vaccine passport for a safe resumption of travel. J Travel Med 2021; 18: 8800. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr6-01655515221148356] [6].Khan ML, Malik A, Ruhi Uet al. Conflicting attitudes: analyzing social media data to understand the early discourse on COVID-19 passports. Technol Soc 2022; 68: 101830. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr7-01655515221148356] [7].Melton CA, Olusanya OA, Ammar Net al. Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: a call to action for strengthening vaccine confidence. J Infect Publ Health 2021; 14(10): 1505–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr8-01655515221148356] [8].Gulati S. Decoding the global trend of ‘vaccine tourism’ through public sentiments and emotions: does it get a nod on Twitter? Glob Knowl Mem Commun 2022; 71: 899–915. [Google Scholar]

[bibr9-01655515221148356] [9].Kumar N, Corpus I, Hans Met al. COVID-19 vaccine perceptions: an observational study on Reddit, 2021, https://www.medrxiv.org/content/10.1101/2021.04.09.21255229v1.full.pdf [DOI] [PMC free article] [PubMed]

[bibr10-01655515221148356] [10].Wu W, Lyu H, Luo J. Characterizing discourse about COVID-19 vaccines: a Reddit version of the pandemic story. Health Data Sci 2021; 2021: 9837856. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr11-01655515221148356] [11].Aygün Ï, Kaya B, Kaya M. Aspect based Twitter sentiment analysis on vaccination and vaccine types in COVID-19 pandemic with deep learning. IEEE J Biomed Health Inform 2022; 26: 2360–2369. [DOI] [PubMed] [Google Scholar]

[bibr12-01655515221148356] [12].Yan C, Law M, Nguyen Set al. Comparing public sentiment toward COVID-19 vaccines across Canadian cities: analysis of comments on Reddit. J Med Intern Res 2021; 23(9): e32685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr13-01655515221148356] [13].Lanyi K, Green R, Craig Det al. COVID-19 vaccine hesitancy: analysing Twitter to identify barriers to vaccination in a low uptake region of the UK. Front Digit Health 2021; 3: 804855. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr14-01655515221148356] [14].Cheng X, Cao Q, Liao SS. An overview of literature on COVID-19, MERS and SARS: using text mining and latent Dirichlet allocation. J Inform Sci 2022; 48(3): 304–320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr15-01655515221148356] [15].Chamorro-Padial J, Rodrigo-Ginés FJ, Rodríguez-Sánchez R. Finding answers to COVID-19-specific questions: an information retrieval system based on latent keywords and adapted TF-IDF. J Inform Sci 2022; 2022: 1110995. [Google Scholar]

[bibr16-01655515221148356] [16].Avcı Ü, Yıldız Durak H. Examination of digital citizenship, online information searching strategy and information literacy depending on changing state of experience in using digital technologies during COVID-19 pandemic. J Inform Sci 2022; 2022: 1114455. [Google Scholar]

[bibr17-01655515221148356] [17].Zhao W, Zhang L, Wang Jet al. How has academia responded to the urgent needs created by COVID-19? A multi-level global, regional and national analysis. J Inform Sci 2022; 2022: 1084646. [Google Scholar]

[bibr18-01655515221148356] [18].Baumgartner J, Zannettou S, Keegan Bet al. The Pushshift Reddit dataset. In: Proceedings of the international AAAI conference on web and social media, vol. 14, pp. 830–839, https://ojs.aaai.org/index.php/ICWSM/article/view/7347/7201 [Google Scholar]

[bibr19-01655515221148356] [19].Ekinci E, Omurca SI. NET-LDA: a novel topic modeling method based on semantic document similarity. Turk J Electr Eng Comput Sci 2020; 28(4): 2244–2260. [Google Scholar]

[bibr20-01655515221148356] [20].Ekinci E, Omurca SI. Concept-LDA: incorporating Babelfy into LDA for aspect extraction. J Inform Sci 2020; 46(3): 406–418. [Google Scholar]

[bibr21-01655515221148356] [21].Chen Y, Zhang H, Liu Ret al. Experimental explorations on short text topic mining between LDA and NMF based schemes. Knowl Based Syst 2019; 163: 1–13. [Google Scholar]

[bibr22-01655515221148356] [22].Lee DD, Seung HS. Learning the parts of objects by nonnegative matrix factorization. Nature 1999; 401(6755): 788–791. [DOI] [PubMed] [Google Scholar]

[bibr23-01655515221148356] [23].Loria S. TextBlob documentation. Release 015 2018; 2: 269. [Google Scholar]

PERMALINK

Comparing COVID-19 vaccine passports attitudes across countries by analysing Reddit comments

Muhammet Mücahit Enes Yurtsever

Muhammad Shiraz

Ekin Ekinci

Süleyman Eken

Abstract

1. Introduction

2. Material and methods

Figure 1.

2.1. Data acquisition

2.2. Data pre-processing

2.3. Data overview

Table 1.

2.4. Topic modelling

2.5. Sentiment analysis

3. Results

3.1. Sentiment-based results

Table 2.

Figure 2.

Figure 3.

3.2. Data visualisation

Figure 4.

3.3. Correlation-based analysis

Figure 5.

3.4. Polarity-based comparison

Figure 6.

3.5. Comparison across countries

3.6. Monthly analysis

4. Discussion

4.1. Principal findings

4.2. Implications

4.3. Limitations and future directions

5. Conclusion

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases