Skip to main content
F1000Research logoLink to F1000Research
. 2024 Apr 16;12:1007. Originally published 2023 Aug 21. [Version 4] doi: 10.12688/f1000research.130610.4

Sentiment analysis of Indonesian tweets on COVID-19 and COVID-19 vaccinations

Viskasari Pintoko Kalanjati 1,a, Nurina Hasanatuludhhiyah 1,b, Annette d'Arqom 1, Danial H Arsyi 2, Ancah Caesarina Novi Marchianti 3, Azlin Muhammad 4, Diana Purwitasari 5
PMCID: PMC11007366  PMID: 38605817

Version Changes

Revised. Amendments from Version 3

We have made minor revisions to the introduction part as suggested by the reviewer. We added an explanation of previous studies.

Abstract

Background

Sentiments and opinions regarding COVID-19 and the COVID-19 vaccination on Indonesian-language Twitter are scarcely reported in one comprehensive study, and thus were aimed at our study. We also analyzed fake news and facts, and Twitter engagement to understand people’s perceptions and beliefs that determine public health literacy.

Methods

We collected 3,489,367 tweets data from January 2020 to August 2021. We analyzed factual and fake news using the string comparison method. The difflib library was used to measure similarity. The user’s engagement was analyzed by averaging the engagement metrics of tweets, retweets, favorites, replies, and posts shared with sentiments and opinions regarding COVID-19 and COVID-19 vaccination.

Result

Positive sentiments on COVID-19 and COVID-19 vaccination dominated, however, the negative sentiments increased during the beginning of the implementation of restrictions on community activities (PPKM).  The tweets were dominated by the importance of health protocols (washing hands, keeping distance, and wearing masks). Several types of vaccines were on top of the word count in the vaccine subtopic. Acceptance of the vaccination increased during the studied period, and the fake news was overweighed by the facts. The tweets were dynamic and showed that the engaged topics were changed from the nature of COVID-19 to the vaccination and virus mutation which peaked in the early and middle terms of 2021. The public sentiment and engagement were shifted from hesitancy to anxiety towards the safety and effectiveness of the vaccines, whilst changed again into wariness on an uprising of the delta variant.

Conclusion

Understanding public sentiment and opinion can help policymakers to plan the best strategy to cope with the pandemic. Positive sentiments and fact-based opinions on COVID-19, and COVID-19 vaccination had been shown predominantly. However, sufficient health literacy levels could yet be predicted and sought for further study.

Keywords: COVID-19, COVID-19 vaccination, Tweets, Sentiment analysis, Vaccine, social media

Introduction

Since first named as a global pandemic by the World Health Organization (WHO) in March 2020, COVID-19 has been the utmost issue challenging all aspects of human life worldwide. 1 , 2 Whilst, at the beginning of the crisis, little had been known about the pathogen, its detection, and its management; the disease began to increase the morbidity and mortality rate steeply and thus overwhelming the health care system in many countries, especially during the pre-vaccination period. 3

One of the keys to controlling the morbidity and mortality rate increase is vaccination allied with the implementation of WHO-recommended efforts for reducing transmission, together with strong epidemiology surveillance. 4 These are called for public literacy and engagement on COVID-19 and COVID-19 vaccination, which sentiments represent public opinion and beliefs toward the issues could be read on various platforms of social media including Twitter and might affect public acceptance toward the vaccine and vaccination program. 3 , 4 Although robust evidence has supported the efficacy and safety of various types of COVID-19 vaccines, public skepticism concerning vaccine effectiveness and side-effects has become a significant shortcoming in achieving wide-vaccination coverage. 5

Analysis of social media content is a reliable way of mining for peoples opinions and beliefs, including toward COVID-19, COVID-19 vaccines, and vaccination; thus might help decision-makers to develop policies related to COVID-19 and COVID-19 vaccination. 6 The use of social media has been rising drastically and the analysis of public opinion uploaded on social media can be an effective means to capture real-time public sentiment. 7 Previous studies have been conducted on public opinion regarding COVID-19 vaccines and vaccinations in various countries. Sentiment analysis on tweet posts successfully portrayed public’s communication and perception on COVID-19 vaccines across India in the beginning of vaccine rollout, showing 44.65% positive sentiment of the Indian people towards ‘COVID-19 vaccines’. 8 Analysis of global tweets also captured the greater impact of tweets expressing positive sentiment compared to those expressing neutral or negative sentiments. 9

In Indonesia, hesitancy towards the vaccine and low literacy levels affect the acceptance of COVID-19 vaccination or certain vaccine brands. 10 An early survey on vaccine acceptance from 1359 Indonesian respondents conducted from the end of March to April 2020, showed the acceptance rate as high as 95% for the free vaccine with reported efficacy of approximately 95%. However, the acceptance level dropped to 67% if the reported efficacy of the COVID-19 vaccine was only 50%. 11 The Ministry of Health of the Republic of Indonesia stated that a survey on vaccination by the COVID-19 Symptom Survey conducted by the University of Maryland Joint Survey Methodology Program in partnership with Facebook showed several factors cause doubts about vaccination acceptance amongst Indonesian people e.g. concern about the side effects, and the comorbidity that may affect the post-vaccination health state. 12 Although Indonesia has the fourth-highest number of social network users worldwide, 13 studies on social media data to identify public sentiments and engagement towards COVID-19 and COVID-19 vaccination have still been limited and thus is the aim of the current study.

Method

Ethics

This study received approval from the Health Research Ethics Committee (KEPK), Faculty of Medicine, Universitas Airlangga, Indonesia (approval no. 145/EC/KEPK/FKUA/2021).

Several stages were carried out in this study to identify and analyze public opinion ( Figure 1). The preprocessing started with data preparation which included determining searching terms for COVID-19 and COVID-19 vaccinations, then crawling the Indonesian-language tweets data based on the meta-tagging language stored on Twitter ( https://twitter.com/?lang=en-id) during the period of 1 st January 2020 to 31 st August 2021.

Figure 1. The process stages of the dataset on Indonesia Tweets during 1st of January, 2020–31st of August, 2021.

Figure 1.

The collected data were clustered into categories according to the search terms listed in Table 1. Subsequently, the data was cleaned from several elements such as emoticons, hashtags, non-alphanumeric characters, and URLs. The lower-text was used to convert all Twitter text to the lowercases. We then removed the punctuation in the Tweet text. The samples of dataset were taken from the cleaned data. The sample dataset was subjected to manual labeling within 3 categories, namely positive, neutral, and negative. We developed the sentiment model by using the manually labeled data and IndoBERT as tokenizer. The sentiment model was applied to run the labeling of all remaining data ( Figure 1).

Table 1. Searching terms COVID-19, COVID-19 vaccine and vaccination.

Category Searching Terms
COVID-19
COVID-19, corona, corona virus, corona virus, SARS COV-2, COVID, masks, keep your distance, physical distancing, social distancing, washing hands, PPKM, PSBB, lock down, WFH, LFH, online learning, self-isolation, swab, PCR, 3M, 5M, 6M, tracing, comorbid COVID-19.
COVID-19 Vaccine and Vaccination
Vaccine Type COVID-19 vaccine, Sinovac, Chinese vaccine, Nusantara vaccine, red and white vaccine, biopharmaceutical vaccine, inactivated vaccine, mRNA vaccine, Zeneca Astra vaccine, Pfizer vaccine, Moderna vaccine.
Vaccine Effectiveness Immunity, prevent COVID transmission, COVID positivity rate, positive COVID, herd immunity, death rate; delta variant, delta strain, variant of concern, prevent severe COVID, prevent ICU needs, prevent MRS needs.
Vaccine Side Effect KIPI, post-immunization co-occurrence, paralysis, blood clots, blood clots, vaccine death, death after the vaccine, allergy, positive for COVID-19, drowsiness, hunger, sexual disorders, vaccine side effects, vaccine hazard, stroke, Guillain-Barre syndrome, pain, swelling, dizziness, headache, fever, muscle aches, vaccine chip.
Vaccination National COVID vaccination, national COVID immunization, vaccination acceleration, mass vaccination, health worker vaccination, elderly vaccination, child and adolescent vaccination, third dose vaccination, booster, vaccination stage 1, vaccination stage 2, vaccination stage 3 vaccination, vaccination of BUMN, vaccination certificate, vaccination fake, self-vaccination, vaccination age 12-17, vaccination age 18-59, cooperation vaccination, paid vaccination, vaccination comorbid.

The data was subjected to exploratory data analysis (EDA), in order to to find out the insight about the data, to discover patterns, to spot anomalies, and to check assumptions with the help of statistics and graphical presentations. 14 The yield data was presented as the distribution of sentiment. The metric of tweet exposure was presented by word cloud and by the graph that depicted the dynamic changes by month within the captured period. The metric of tweet engagement involved the measurement of the metadata accompanying posts, including the number of retweets, likes, and replies. We also identified the circulating facts and the fake news on Twitter ( Figure 1). Python visualization was used for data presentation. 15

Determination of the fact and fake news

The string comparison method was used to assess fact or fake tweets. This method compared tweets with the list of fake tweets on a dictionary issued by Turn Back Hoax. 16 The difflib library was used to obtain the similarity value between tweets. In determining fake news labeled tweets, it is necessary to have an additional parameter in the form of a threshold to tolerate the similarity between a tweet that can be considered a fake tweet and a tweet that is still considered factual. The range of similarity values was determined based on a sample test by taking into account the data results with a range of 0 – 1, where the closer the value to 1 means the more appropriate the word is in the list of incorrect tweets dictionary. After observing the tweet data, the threshold value was determined to be 0.7. The tweets with a similarity value above 0.7 would be categorized as fake tweets ( Table 2).

Table 2. Tweets that were indicated as the fake news showed invalid sources that could not be traced from the valid references (e.g. official release from the government, scientific articles of peer-reviewed journals).

Username Tweet Content classification
infocovid19_id #HoaxBuster [SALAH] 21% Pasien Mengalami Efek Samping Setelah Memakai Vaksin Moderna Selengkapnya: https://t.co/Nh3DxlJIbr fake news
cirtbuleleng Dunia Setujui Vaksin Nusantara https://t.co/iWBlfFBTXM fake news

Analysis of public opinion and sentiment

Datasets from public and private accounts that had gone through the preprocessing stage were labeled as positive, negative, and neutral, as well as knowing the pattern of agreement on vaccination program data that have been marked as positive (pro-vaccination), negative (anti- and doubtful toward the vaccination) and Neutral which refers to the study. 17 Pro vaccines are categorized for tweets with a positive tendency towards vaccination. Tweets show that the public can well accept the existence of vaccines and invitations to participate in vaccinations, even when giving an opinion about their condition after the vaccination process, commonly known as adverse event following immunization (AEFI). Anti-Vaccine is given in tweets that reject vaccination, accompanied by arguments against it. Doubt was given to tweets that tend to be confused about the purpose of vaccination or still doubt the effectiveness of certain vaccine brands, such as wanting only Pfizer vaccines and disparaging other brands of vaccines. Neutral category tweets were usually dominated by news accounts and only inform facts or narratives without expressing an opinion that says whether the statement is negative or positive towards vaccination. 5 , 6 , 11

A total of 3000 data were taken randomly from the dataset to be used as training data using the IndoBert model, with an accuracy rate of 75%. IndoBERT constitutes a self-contained Deep Learning model designed for Natural Language Processing (NLP), inspired by the Transformer model. Each output element in this model is intricately connected to every input element, with dynamically computed weights based on inter-element relationships. The performance of IndoBERT model has been validated by previous study showing an average accuracy value of 92.07%. 18 BERT is formulated to aid computers in comprehending the ambiguous meaning of language within a text by utilizing the surrounding text to establish context. This model is trained using over 220 million words in the Indonesian language. In this study, IndoBERT is employed for the tokenization process of words before engaging in the classification task. Following the successful tokenization of the Indonesian language by IndoBERT, the data proceeded directly to the classifier layer to discern patterns in word tendencies, classifying them into three categories: positive, neutral, and negative. 19

Results and discussion

A total of 3,489,367 Indonesian tweets data were collected from 1 st January 2020 to 31 st August 2021. The total dataset obtained was 324,358 data; consisting of 24,579 data related with COVID-19 and the rests were related with COVID-19 vaccine and vaccination.

Figure 2 showed the peak counts of tweets were seen in three parts during this period of time. In June 2020; in January 2021 and again in July 2021. The word-cloud of these tweets were seen in this figure, e.g. corona virus, health protocol, COVID-19 spread, preventing COVID-19, PPKM (abbreviation of Indonesian government program to control COVID-19), hand washing and physical distancing. Several keywords i.e. PCR (polymerase-chain reaction), antigen swab and transportation protocols dominated the tweets.

Figure 2. Tweets distribution on COVID-19 during 2020-2021. The peak counts were seen in June, 2020; in January, 2021; and in July, 2021 (graph). Word-cloud on COVID-19 were dominated by keywords of the virus, health protocols, detection methods and travelling protocols (words).

Figure 2.

During 2020 when the vaccine and vaccination programs had yet known, the trending keywords were more into the COVID-19 virus and how to take a preventive step i.e. PPKM (pemberlakuan pembatasan kegiatan masyarakat)/restrictions towards community activity), and working from home/wfh ( Figure 3). The word count graph shows many Twitter users discussed the COVID-19 virus along with tweets containing education to prevent exposure to the virus. During that period, the topic of discussion was still about education to suppress the spread of COVID-19. Preventive actions still being promoted, such as staying at home, avoiding direct contact with other people, avoiding non-essential travel, social distancing, frequent hand washing, and so on 20 , 21 remain hot topics among the public. Meanwhile, the word count shows a hot topic among the people after implementing PPKM. After the government established and implemented the PPKM policy, Twitter users discussed correlated issues such as working from home. 22 24 PPKM is intended as a form of response to the increase in COVID-19 cases, so the problem of the virus is still quite busy being discussed on Twitter. Implementing PPKM levels 1-4 by the government, which has brought pros and cons to the community, has also become a topic of discussion. Even so, this policy is considered effective in suppressing the surge in the increase in COVID-19 cases. 22

Figure 3. Trend keyword on the tweets showed that the public engagement and exposure during 2020 were predominantly on the virus, corona, COVID-19, PPKM and working from home/WFH, respectively.

Figure 3.

In general, retweets dominate engagement, followed by replies ( Figure 4). This supports evidence that the public is more focused on sharing information such as the rise or fall of cases as well as information on education and preventive measures implemented by the government to suppress the spread of COVID-19. 23

Figure 4. Engagement on COVID-19 were predominated by retweets during 2020-2021, except at the end of 2021 and in July, 2021 were the reply was higher than retweets and likes.

Figure 4.

On the other hand, the tweets count on COVID-19 and COVID-19 vaccination was highest at July 2021 ( Table 3), arguably due to the rise of SARS-CoV-2 delta variant that significantly increased the morbidity and mortality rate, thus called for more definitive prevention act. 23 25

Table 3. Tweets count on COVID-19 vaccine and vaccination during 2020-2021, it was shown that the highest was in July, 2021; followed by those from the start of 2021 until March, 2021.

Month Count (Tweets)
2020 2021
January 1 1392
February 0 1066
March 2 1837
April 4 558
May 1 830
June 4 1347
July 13 13599
Agustus 73 7015
September 101 N/A
October 186 N/A
November 216 N/A
December 1682 N/A

The engagement on COVID-19 vaccine was topped by the likes during 2020-2021, followed by retweets and reply ( Figure 5). In Indonesia, several brands that have gone through safe and halal tests that adapt to the conditions of Indonesia, being a country with one of the largest Muslim population in the world, are exciting topics to be discussed by the public. 24 For example, the most popular brand in Indonesia is Sinovac. 25 This is because Sinovac is the fastest vaccine brand to enter Indonesia. 26 It can be seen from the data that has been successfully presented that Sinovac continues to dominate, especially at the end of 2020 and early 2021, when the Sinovac vaccine has entered Indonesia. The Moderna vaccine also experienced an increase in the number of tweets because, during this period, many of these vaccines were distributed in Indonesia. 27 Moderna vaccine produces more side effects than other vaccines, 28 thus becoming increasingly discussed on Twitter. The AstraZeneca vaccine also carries more frequent side effects when compared to the Sinovac vaccine ( Figure 5). 29

Figure 5. Engagement on the vaccine of COVID-19 during 2020-2021 was shown to be highest at July, 2021 by the likes, followed by retweets and reply, respectively (graph). The word-cloud of this topic was predominated by the kind of vaccine brands (words).

Figure 5.

The side effect of COVID-19 vaccine was seen as the engagement topic topped by likes, retweets and reply which were higher in December, 2020; in March, 2021 and in July, 2021 compared to other months during this period. Whereas the type of vaccine engagement showed higher in February 2021; in May 2021 and in July 2021 compared to other months ( Figure 6).

Figure 6. Engagement ratio on the side effect of COVID-19 vaccine was shown highest at June-July, 2021; followed by in March, 2021 and at the change of the year from 2020 to 2021 (top). The below chart showed the engagement level of vaccine type in this period, which topped by the likes and followed by retweets and reply, respectively.

Figure 6.

The graphs and word cloud show that of the vaccine side effects subtopics recorded, the word most frequently discussed in tweets was COVID-19 vaccine and its side effect ( Figure 7).

Figure 7. Word-cloud of vaccine side effects subtopics, predominated with the trend keywords e.g. Vaccine, COVID-19, positive COVID-19 and side effect.

Figure 7.

On the other hand, the trending keywords on the vaccine effectiveness was topped by trending keywords of the disease transmission, management guidelines, virus variance e.g. delta variant also with the immune system, prevention and health system ( Figure 8).

Figure 8. Word-cloud of the vaccine effectiveness subtopic based on the searching terms were dominated by the transmission of COVID-19, delta variant, strategy of prevention, management of the disease and preventive strategy.

Figure 8.

From Figure 9, we can see that the paid vs. free vaccination programs became the trending keywords along with the vaccination program of the government and herd immunity. People were cautious on the paid vaccination, when in reality the vaccination program was held nationally since February 2021 by the government with primary target of healthy people aged 18-59 years old, and also prioritized for the health providers to instigate the herd immunity that was discussed as the ideal condition after vaccination coverage was achieved widely. 23 25

Figure 9. Word-cloud of the vaccination sub-topic, where paid vaccination, vaccination stage, government vaccination program and free vaccination became the trending keywords.

Figure 9.

Based on the analysis using similarity value, 15 tweets were indicated as fake tweets with misleading content. Some of them include the worldwide approval of a dendritic cell-based vaccine candidate developed by a group of Indonesian researchers, which in fact had not even been authorized into phase 2 clinical trial by the Indonesian authoritative body. There was also content about the affiliation of several vaccine manufacturers to certain companies. Another topic was that up to 21% of vaccine trial participants experienced adverse effects after receiving the Moderna vaccine. The fake tweets of vaccine manufacturing companies presumed to have been developing COVID vaccines even before the COVID-19 pandemic started was also found to be misleading content. From the entire dataset, it can be concluded that there was extremely low tweet activity of spreading fake news with a negative context, such as finding supporters for fake news from tweets that are replied.

The sentiment referred to in the discussion includes users agreeing and understanding the conditions for COVID-19. The debate regarding the support for preventive actions by the government is also a topic that is still hotly discussed. In addition, Twitter users continue to carry out their activities as usual and provide education on how to prevent exposure to the virus, increasing positive sentiment about the issue. In the second period, the trend of positive sentiment led to discussions around the expressions of Twitter users to express their response to the increasing number of COVID-19 cases in Indonesia, accompanied by campaigns from all parties to carry out vaccinations aggressively. However, there was an increase in negative sentiment in the second period compared to the first period. This period was when the government begins to issue PPKM policies that reap the pros and cons of the community. PPKM, which has a level of 1-4, was arguably considered to harm the community’s economy because of the limited activities of the community at work. During PPKM 3-4 part of rules was closing the purchasing center at 20.00 GMT+7, and making the visitor capacity a maximum of 50%. 30 Thus, many parties have complained about the condition of the policy. The sentiment distribution every monthis presented as shown in Figure 10.

Figure 10. Public Sentiment about COVID-19 was predominated by the positive opinions, whilst the neutral and negative sentiments were followed, respectively by the approximately half of the prior sentiment (graph). The chart showed that the positive sentiment outweighed the negative sentiments and were peaked in June, 2020; in January, 2021 and in July, 2021.

Figure 10.

Sentiment analysis was conducted to find out the positive and negative sentiments of the public towards the vaccination program in 2020, shows that positive tweet sentiment dominates all existing sentiments. Analysis of the dataset indicates that generally, tweets come from news accounts where tweets originating from these accounts are classified as neutral tweets. While the period of 2021, shows that although the number of tweets is more dominant than in the first period, it shows that public opinion has a positive sentiment tendency. This might be due to the public rise of awareness of the importance of vaccination to help reducing the spread of the COVID-19 virus. However, negative sentiment is still a significant problem because people expect only certain types of vaccines would work, and/or are still doubtful and do not believe in vaccination programs to tackle the pandemic ( Figure 11).

Figure 11. (A) Vaccination sentiment analysis comparison between 2020 and 2021 (graph). The sentiment of COVID-19 vaccination program in 2020 (B) and 2021 (C) showed in the pie charts, respectively, where in both period the positive sentiment outweighed the negative sentiment by approximately 5 times.

Figure 11.

Figure 12 shows that both in 2020 and 2021, positive sentiment tends to dominate above 50%. The proportion of positive sentiment decreases considerably while negative sentiment only slightly decreases in 2021. The percentage of neutral sentiment doubles in 2021 compared to 2020. Negative sentiment can happen because, in 2020, the existing vaccine research is still in the development stage by scientists, while during the first midterm of 2021, COVID-19 vaccines are still being rolled out to limited people in Indonesia.

Figure 12. The sentiment on the side effect of COVID-19 vaccine in 2020 and 2021 showed by the left and the right pie charts, respectively. The positive sentiment was higher than the negative or the neutral sentiments in both years.

Figure 12.

In 2021, various vaccine products have gone through stages of trials and study results of their efficacies and side effects have been released to public by vaccine manufacturers. Due to the increasing clarity of reports regarding the effect of vaccine in stimulating the body’s immunity against virus, the public has more confidence in the role of vaccines in accelerating recovery from the pandemic. It is reflected by slightly increased positive sentiment of vaccination program in 2021 ( Figure 11). However, fear toward vaccine’s side effect yet exists, possibly contribute to the decrease of positive sentiment in 2021 ( Figure 12).

At Figure 13, the sentiment of the vaccination program in Indonesia was mostly positive (59.1%); whilst the analysis sentiment of various types of vaccines for COVID-19 that were available was predominated by positive opinions in general, during both years (approximately five times higher than the negative sentiments). We compared sentiment polarities toward vaccines in 2020 and 2021 and found increases in both positive and negative sentiments in 2021 after the vaccination program had started. Although positive sentiments showed a greater increase, the negative sentiments, which remained at 15.2%, deserve attention, as they may reflect a proportion of individuals with vaccine hesitancy and/or rejection.

Figure 13. The sentiment on COVID-19 various type of vaccines in 2020 (A) and 2021 (B) showed that the positive sentiments outweighed (C) the negative and neutral sentiments; the pie chart showed the public sentiment on the effectiveness of the vaccines in government vaccination program that was predominated by, again, positive sentiment followed by the neutral and negative sentiments.

Figure 13.

This study shows tendency towards positive sentiments on COVID-19 topics and subtopics vaccine and vaccination throughout 2020 until mid-2021, somewhat contrasting with the results from other sentiment analysis on Indonesia tweets that showed predominantly neutral sentiment. These studies captured tweets within shorter period. Sumertajaya et al chose a time frame of January 15, 2021 to January 28, 2021, for the reason that this was the first period of the COVID-19 vaccination program being launched in Indonesia. 31 While Agustiningsih et al. captured tweets within September 2021. 32 Both used different method of learning framework. Sumertajaya et al implemented support vector machine (SVM) and random forest, while Agustiningsih et al. employed bidirectional Long Short-Term Memory (LSTM) combined with word embedding. 31 , 32

Social media studies have been used over the past decade to identify public opinion and sentiment toward particular health issues, for instance, the 2009 H1N1 outbreak. 33 The vaccination issue has been a matter of importance to be analyzed using this approach since the individual decision on vaccine uptake is modulated by opinions from social networks. 34 Disinformation narratives spread on social media that are sometimes hostile, causing anxiety, fear, and distrust toward vaccination could largely contribute to vaccine hesitancy and refusal. 35 The rapid and vast dissemination of disinformation should be addressed appropriately by effective strategies for vaccine promotion. The surveillance of real-time social media information flow could be a remarkable source of timely data updates for adjusting those strategies. 34 Interestingly, a strong correlation was shown between online-expressed sentiments and estimated vaccination rates. 36

Recent studies have confirmed evidence of vaccination impacts on several public health parameters, which give promise for its role in achieving herd immunity and further ending pandemic. Vaccination has remarkably reduced COVID-19 cases and hospitalizations. 37 COVID-19 related morbidity and mortality, 38 and incidence due to variants of concern. 39 Additionally, rapid vaccine roll-out is believed to boost economic recovery. 40 Therefore, accelerating the vaccination pace is imperative for countries worldwide. 37 Efforts to lower vaccine hesitancy should be prioritized as it is the greatest threat to achieving high vaccine coverage. 41 Social media is perceived to be a major source of misinformation and is able to amplify and disseminate it without temporal and spatial limits, fostering vaccine hesitancy and lowering vaccine uptake. 42 Our study identified a number of vaccine-related misinformation circulating on social media that were categorized as fake tweets. Our finding is in accordance with a study by Islam et al., that analyzed rumors and conspiracy theories related to COVID-19 vaccine on variable online platforms. They encountered that the majority of these contents were false and/or misleading. They also reported that Indonesia was among the countries with a high number of online rumors related to COVID-19 vaccine. 43 Our study found 15 vaccine-related fake tweets. Even though they were quite low in number considering the dataset collected was within 20 months, their impact on vaccine hesitancy shouldn’t be undermined. 44 We found fake tweets about the adverse effect of Moderna vaccine that was said to affect 21% of trial participants. This exaggerated misinformation certainly may raise public concern about vaccine safety, which may lead to vaccine hesitancy. 45 Another fake tweets issue was that vaccine manufacturers had been developing COVID-19 vaccines long before the pandemic emerged. This issue appeared to arise due to doubt about the fast pace of vaccine development. Some individuals may furtherly link it to the conspiracy theory that COVID-19 is a bioweapon designed by particular countries or parties. Meanwhile, belief in conspiracy theories is a driving factor for an individual to reject vaccination. 46 , 47 Actually, Twitter has set policies and conducted efforts to counteract any types of misinformation. It has launched detailed criteria and examples of false or misleading information about COVID-19 vaccine posted by users. Some actions taken to violations include content removal, tweet labeling, and adding corrective information. Twitter may also disable retweets, quoting, or any other ways of engagement to those false tweets, in case they pose potential harm to the public. The user’s account may be temporarily locked or permanently suspended. 48 It is very likely that Twitter has effectively cleansed any circulating misinformation. In addition, the ministry of health and the ministry of communication and informatics have provided platforms to continuously inform the public about the identified fake news or hoax and cooperate with social media to monitor misinformation related-contents and provide authoritative information as valuable trusted sources. 49 However, social media remains a battleground where the anti-vaccine movement is difficult to combat. 50

We deep-dived user-generated Twitter posts to elaborate on factors associated with vaccine hesitancy. Other than the spread of misinformation through social media, we discovered the public’s perceptions and concerns about vaccines effectiveness and side effects. The negative sentiment toward these topics may be associated with the public’s doubt regarding the vaccine as an effective and secure means to manage the pandemic. Our result was in line with a study on medical students reporting most of the participants had concerns regarding vaccines adverse effects and ineffectiveness. 51 We identified a potential misconception regarding the side effects of the vaccine that it may indeed cause COVID-19 infection. This was actually one of the false myths identified and clarified by health authorities, yet still remains to be popular according to our findings. In this study, we found that people’s opinion showed on this particular social media is arguably correlated with their literacy on the issue. 52 It might be affected by the ambience of social environment and information percepted at a certain period.

Despite the limitation that the collected opinion of Indonesian Twitter users might not fully reflect those of whole Indonesian population and that the used search terms might potentially cause overlap between the “COVID-19” and the “COVID-19 Vaccine and Vaccination” subsets, this study successfully confirmed the usefulness of social media studies to provide insights into the public’s attention, discussion, concerns, and sentiments about COVID-19 and COVID-19 vaccines and vaccination. 53 We demonstrated the dynamically changing public attention over time, where it peaked in July 2021, during the second surge of COVID-19 cases. It can be extrapolated that the public well responded to the government campaign of accelerating vaccine rollout as a means to curb the disease. 54 The top trend discussion topics may reflect a public concern, for instance, “paid vaccination. Even though we did not perform sentiment analysis on this specific topic, it can be proposed that the public highly disagreed with this policy. Since we did not perform time series analysis for public sentiment, our results only depict change of sentiment by year that hardly represent how sentiments have evolved over time month to month. Therefore, future studies should implement time series analysis to capture the dynamic nature of public opinion affected by the disease, circulating fake news or government actions. Beyond its potential usefulness, social media studies focusing on health issues, leave out several ethical concerns such as privacy, informed consent, and anonymity that remain the subjects of dispute. 55 Given the huge amount of dataset, informed consent was not possible to obtain in this study. However, anonimity is secured in this study, therefore the underlying data is not made published.

Conclusions

The public opinion and sentiment analysis on social media using an artificial intelligence of NLP may shortly provide timely data reflecting real-world public opinion. Thus, it should be part of the basis for developing strategies for public health response, particularly in a critical period of disease outbreak. Considering the high value of social media analysis, the more robust analytical methods should be used in the future studies, allowing for a clearer understanding of trends and patterns of public opinions of various health matters.

Acknowledgements

Thank you to the Faculty of Medicine, Universitas Airlangga, Surabaya Indonesia for their support.

Funding Statement

This project was supported by Universitas Airlangga, Surabaya, Indonesia (SK. Rektor Unair no. 390/UN3/2021).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 4; peer review: 1 approved

Data availability

The underlying data to this research cannot be shared due to the ethical and copyright restrictions surrounding social media data. The Methods section contains detailed information to allow replication of the study. Any queries about the methodology should be directed to the corresponding author.

References

  • 1. Chou W-YS, Budenz A: Considering emotion in COVID-19 vaccine communication: Addressing vaccine hesitancy and fostering vaccine confidence. Health Commun. 2020 Dec 5;35(14):1718–1722. 10.1080/10410236.2020.1838096 [DOI] [PubMed] [Google Scholar]
  • 2. El Keshky MES, Basyouni SS, Al Sabban AM: Getting through COVID-19: The pandemic’s impact on the psychology of sustainability, quality of life, and the global economy – A systematic review. Front. Psychol. 2020 Nov 12;11. 10.3389/fpsyg.2020.585897/full [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Tangcharoensathien V, Bassett MT, Meng Q, et al. : Are overwhelmed health systems an inevitable consequence of COVID-19? Experiences from China, Thailand, and New York State. BMJ. 2021 Jan 22;372:n83. 10.1136/bmj.n83 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Van Kerkhove MD: COVID-19 in 2022: controlling the pandemic is within our grasp. Nat. Med. 2021 Dec 14;27(12):2070–2070. 10.1038/s41591-021-01616-y Reference Source [DOI] [PubMed] [Google Scholar]
  • 5. Troiano G, Nardi A: Vaccine hesitancy in the era of COVID-19. Public Health. 2021 May;194:245–251. 10.1016/j.puhe.2021.02.025 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Han X, Wang J, Zhang M, et al. : Using social media to mine and analyze public opinion related to COVID-19 in China. Int. J. Environ. Res. Public Health. 2020 Apr 17;17(8):2788. 10.3390/ijerph17082788 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Sinnenberg L, Buttenheim AM, Padrez K, et al. : Twitter as a tool for health research: A systematic review. Am. J. Public Health. 2017 Jan;107(1):e1–e8. 10.2105/AJPH.2016.303512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Mir AA, Sevukan R: Sentiment analysis of Indian Tweets about Covid-19 vaccines. J. Inf. Sci. 2022 Sep 16;016555152211180. 10.1177/01655515221118049 [DOI] [Google Scholar]
  • 9. Mir AA, Rathinam S, Gul S: Public perception of COVID-19 vaccines from the digital footprints left on Twitter: analyzing positive, neutral and negative sentiments of Twitterati. Libr. Hi Tech. 2022 Mar 29;40(2):340–356. 10.1108/LHT-08-2021-0261 [DOI] [Google Scholar]
  • 10. Gerretsen P, Kim J, Caravaggio F, et al. : Individual determinants of COVID-19 vaccine hesitancy. PLoS One. 2021 Nov 17;16(11):e0258462. 10.1371/journal.pone.0258462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Harapan H, Wagner AL, Yufika A, et al. : Acceptance of a COVID-19 vaccine in Southeast Asia: A cross-sectional study in Indonesia. Front Public Heal. 2020 Jul 14;8. 10.3389/fpubh.2020.00381/full [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. The Ministry of Health of Republic of Indonesia: 80,8% orang Indonesia bersedia menerima vaksin COVID-19. Kementerian Kesehatan Republik Indonesia. 2021. Reference Source
  • 13. Statistae: Number of social network users in selected countries in 2021 and 2026. Statista;2021. Reference Source [Google Scholar]
  • 14. Patil P: What is exploratory data analysis? Towards Data Science. 2018. Reference Source
  • 15. Nantasenamat C: How to build a machine learning model. 2020. Reference Source
  • 16. Satyawati NP, Utari P, Hastjarjo S: Fact checking of hoaxes by masyarakat antifitnah Indonesia. Int. J. Multicult. Multireligious Underst. 2019;6(6). Reference Source [Google Scholar]
  • 17. Yousefinaghani S, Dara R, Mubareka S, et al. : An analysis of COVID-19 vaccine sentiments and opinions on Twitter. Int. J. Infect. Dis. 2021 Jul;108:256–262. 10.1016/j.ijid.2021.05.059 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Sinapoy MIK, Sibaroni Y, Prasetyowati SS: Comparison of LSTM and IndoBERT Method in Identifying Hoax on Twitter. J. RESTI (Rekayasa Sistem dan Teknologi Informasi) 2023 June;7(3):657–662. 10.29207/resti.v7i3.4830 [DOI] [Google Scholar]
  • 19. Saadah S, Auditama KM, Fattahila AA, et al. : Implementation of BERT, IndoBERT, and CNN-LSTM in classifying public opinion about COVID-19 vaccine in Indonesia. J. RESTI (Rekayasa Sist dan Teknol Informasi). 2022 Aug 30;6(4):648–655. 10.29207/resti.v6i4.4215 Reference Source [DOI] [Google Scholar]
  • 20. Lotfi M, Hamblin MR, Rezaei N: COVID-19: Transmission, prevention, and potential therapeutic opportunities. Clin. Chim. Acta. 2020 Sep;508:254–266. 10.1016/j.cca.2020.05.044 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Tfi MR, Hamblin MR, Rezaei N: COVID-19: Transmission, prevention, and potential therapeutic opportunities. Clin. Chim. Acta. 2020;508(January):254–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Saraswati I, Muzdalifah A, Herawati AR, et al. : Implementation of restrictiuons on community activities (PPKM) policy analysis level 1-4 in dealing with the covid-19 outbreak in Indonesia. Int. J. Soc. Serv. Res. 2021 Nov 14;1(3):203–210. 10.46799/ijssr.v1i3.34 Reference Source [DOI] [Google Scholar]
  • 23. Hasanah NA, Suciati N, Purwitasari D: Identifying degree-of-concern on COVID-19 topics with text classification of twitters. Regist. J. Ilm. Teknol. Sist. Inf. 2021 Feb 16;7(1):50. 10.26594/register.v7i1.2234 Reference Source [DOI] [Google Scholar]
  • 24. Pristiyono RM, Ihsan MA, Anjar A, et al. : Sentiment analysis of COVID-19 vaccine in Indonesia using Naïve Bayes Algorithm. IOP Conf. Ser. Mater. Sci. Eng. 2021 Feb 1;1088(1):012045. 10.1088/1757-899X/1088/1/012045 [DOI] [Google Scholar]
  • 25. Nurdeni DA, Budi I, Santoso AB: Sentiment analysis on Covid19 vaccines in Indonesia: From the perspective of sinovac and pfizer. 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT). IEEE. 2021; pp.122–127. Reference Source
  • 26. Yeremia AE, Raditio KH: Indonesia-China Vaccine Cooperation and South China Sea Diplomacy. ISEAS Yusof Ishak Inst.;2021; p.55. Reference Source [Google Scholar]
  • 27. U.S. Embassy & Consulates in Indonesia: United States provides 4 million moderna COVID-19 vaccine doses to Indonesia through COVAX 2021. Reference Source
  • 28. Meo SA, Bukhari IA, Akram J, et al. : COVID-19 vaccines: comparison of biological, pharmacological characteristics and adverse effects of Pfizer/BioNTech and Moderna Vaccines. Eur. Rev. Med. Pharmacol. Sci. 2021 Feb;25(3):1663–1669. 10.26355/eurrev_202102_24877 [DOI] [PubMed] [Google Scholar]
  • 29. Bralianti PD, Akbar FN: Covid-19 vaccines and its Adverse Events Following Immunization(AEFI). Avicenna Med. J. 2021 Jul 15;2(1):19–28. 10.15408/avicenna.v2i1.19832 Reference Source [DOI] [Google Scholar]
  • 30. Miharja M, Salim E, Nachrawi G, et al. : Implementation of emergency public activity restrictions (PPKM) in accordance with human rights and Pancasila principles. BIRCI-Journal. 2021;15:6855–6866. [Google Scholar]
  • 31. Sumertajaya IM, Angraini Y, Harahap JR, et al. : Sentiment Analysis on Covid-19 Vaccination in Indonesia Using Support Vector Machine and Random Forest. JUITA: Jurnal Informatika. 2022; May 31;10(1):1–8. 10.30595/juita.v10i1.12394 https://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/12394 [DOI] [Google Scholar]
  • 32. Agustiningsih KK, Utami E, Alsyaibani MA.: Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter Using Pre-Trained and Self-Training Word Embeddings. Jurnal Ilmu Komputer dan Informasi. 2022 Feb 27;15(1):39–46. 10.21609/jiki.v15i1.1044 [DOI] [Google Scholar]
  • 33. Chew C, Eysenbach G: Pandemics in the age of twitter: Content analysis of tweets during the 2009 H1N1 uutbreak. Sampson M, editor. PLoS One. 2010 Nov 29;5(11):e14118. 10.1371/journal.pone.0014118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Du J, Xu J, Song H-Y, et al. : Leveraging machine learning-based approaches to assess human papillomavirus vaccination sentiment trends with Twitter data. BMC Med. Inform. Decis. Mak. 2017 Jul 5;17(S2):69. 10.1186/s12911-017-0469-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Pulido C, Ruiz-Eugenio L, Redondo-Sama G, et al. : A new application of social impact in social media for overcoming fake news in health. Int. J. Environ. Res. Public Health. 2020 Apr 3;17(7):2430. 10.3390/ijerph17072430 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Salathé M, Khandelwal S: Assessing vaccination sentiments with online social media: Implications for infectious disease dynamics and control. Meyers LA, editor. PLoS Comput. Biol. 2011 Oct 13;7(10):e1002199. 10.1371/journal.pcbi.1002199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Chen X, Huang H, Ju J, et al. : Impact of vaccination on the COVID-19 pandemic in U.S. states. Sci. Rep. 2022 Jan 28;12(1):1554. 10.1038/s41598-022-05498-z Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Moghadas SM, Vilches TN, Zhang K, et al. : The impact of vaccination on coronavirus disease 2019 (COVID-19) outbreaks in the United States. Clin. Infect. Dis. 2021 Dec 16;73(12):2257–2264. 10.1093/cid/ciab079 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Suthar AB, Wang J, Seffren V, et al. : Public health impact of covid-19 vaccines in the US: observational study. BMJ. 2022 Apr 27;377:e069317. 10.1136/bmj-2021-069317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Boediono L: Indonesia: Rapid mass vaccination, effective pandemic control and strong fiscal and monetary support are critical to boost the economic recovery. The World Bank;2021. Reference Source [Google Scholar]
  • 41. Nossier SA: Vaccine hesitancy: the greatest threat to COVID-19 vaccination programs. J. Egypt. Public Health Assoc. 2021 Dec 5;96(1):18. 10.1186/s42506-021-00081-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Wilson SL, Wiysonge C: Social media and vaccine hesitancy. BMJ Glob. Heal. 2020 Oct;5(10):e004206. 10.1136/bmjgh-2020-004206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Islam MS, Kamal A-HM, Kabir A, et al. : COVID-19 vaccine rumors and conspiracy theories: The need for cognitive inoculation against misinformation to improve vaccine adherence. Lavorgna L, editor. PLoS One. 2021 May 12;16(5):e0251605. 10.1371/journal.pone.0251605 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Pierri F, Perry BL, DeVerna MR, et al. : Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal. Sci. Rep. 2022 Apr 26;12(1):5966. 10.1038/s41598-022-10070-w Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Wagner AL, Huang Z, Ren J, et al. : Vaccine hesitancy and concerns about vaccine safety and effectiveness in Shanghai, China. Am. J. Prev. Med. 2021 Jan;60(1):S77–S86. 10.1016/j.amepre.2020.09.003 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Bertin P, Nera K, Delouvée S: Conspiracy beliefs, rejection of vaccination, and support for hydroxychloroquine: A conceptual replication-extension in the COVID-19 pandemic context. Front. Psychol. 2020 Sep 18;11. 10.3389/fpsyg.2020.565128/full [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Douglas KM: COVID-19 conspiracy theories. Gr. Process Intergr. Relations. 2021 Feb 4;24(2):270–275. 10.1177/1368430220982068 [DOI] [Google Scholar]
  • 48. Twitter help centre: COVID-19 misleading information policy. Twitter. Reference Source
  • 49. Ministry of Communication and Informatics of Republic of Indonesia: Pers release No. 116/HM/KOMINFO/09/2020. Pemerintah RI nyatakan komitmen tangani Infodemic.
  • 50. Germani F, Biller-Andorno N: The anti-vaccination infodemic on social media: A behavioral analysis. PLoS One. 2021 Mar 3;16(3):e0247642. 10.1371/journal.pone.0247642 [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 51. Saied SM, Saied EM, Kabbash IA, et al. : Vaccine hesitancy: Beliefs and barriers associated with COVID-19 vaccination among Egyptian medical students. J. Med. Virol. 2021 Jul 25;93(7):4280–4291. 10.1002/jmv.26910 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Kalanjati VP, Hasanatuludhhiyah N, d’Arqom A, et al. : Health literacy on COVID-19 and COVID-19 vaccinations in Indonesia [version 2; peer review: 2 approved]. F1000Research. 2022;11:1296. 10.12688/f1000research.125551.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Chen J, Wang Y: Social media use for health purposes: Systematic review. J. Med. Internet Res. 2021 May 12;23(5):e17917. 10.2196/17917 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Nugraha RR, Miranda AV, Ahmadi A, et al. : Accelerating Indonesian COVID-19 vaccination rollout: a critical task amid the second wave. Trop. Med. Health. 2021 Dec 22;49(1):76. 10.1186/s41182-021-00367-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Al-Zaman MS, Khemka A, Zhang A, et al. : The Defining Characteristics of Ethics Papers on Social Media Research: A Systematic Review of the Literature. J. Acad. Ethics. 2023 Nov 6:1–27. 10.1007/s10805-023-09491-7 [DOI] [Google Scholar]
F1000Res. 2024 Apr 17. doi: 10.5256/f1000research.164939.r266874

Reviewer response for version 4

Giuseppe Porro 1

No further comments from my end

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

No

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

Reviewer Expertise:

sentiment analysis; subjective well-being evaluation; statistical methods for causal inference

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2024 Apr 10. doi: 10.5256/f1000research.163487.r256878

Reviewer response for version 3

Aasif Ahmad Mir 1

Paper analyses twitter communication about Sentiments of Indonesian tweets on COVID-19 and COVID-19 vaccinations

gain some insights into overall public communication about the topic. It also shows a degree of positive, negative, and Neutral tweets gaining insights into dominant public opinion. Overall, it is a worthwhile contribution to the field. I would recommend the paper with some minor modifications.

Comments to the Author

Abstract

  • Abstract is comprehensive and effectively summarizes the key points of the study.

Introduction

  • Introduction is well written and provides detailed background of the study.

Previous studies

  • The section describes the previous literature is missing.

  •  Some important related studies are missing e.g. refer 1 and 2.

Methodology

The article contains detailed methodology and presents everything clearly.

Findings and Discussion

  • Findings are clearly explained and presented.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Social media analytics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

  • 1. : Public perception of COVID-19 vaccines from the digital footprints left on Twitter: analyzingpositive ,neutral andnegative sentiments of Twitterati. Library Hi Tech .2022;40(2) : 10.1108/LHT-08-2021-0261 340-356 10.1108/LHT-08-2021-0261 [DOI] [Google Scholar]
  • 2. : Sentiment analysis of Indian Tweets about Covid-19 vaccines. Journal of Information Science .2022; 10.1177/01655515221118049 10.1177/01655515221118049 [DOI] [Google Scholar]
F1000Res. 2024 Mar 18. doi: 10.5256/f1000research.163487.r255020

Reviewer response for version 3

Giuseppe Porro 1

Approved

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

No

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

Reviewer Expertise:

sentiment analysis; subjective well-being evaluation; statistical methods for causal inference

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2023 Dec 13. doi: 10.5256/f1000research.159111.r224744

Reviewer response for version 2

Giuseppe Porro 1

The issues raised have been properly addressed by the Authors.

I have no further comments.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

No

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

Reviewer Expertise:

sentiment analysis; subjective well-being evaluation; statistical methods for causal inference

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2024 Feb 27. doi: 10.5256/f1000research.143380.r206421

Reviewer response for version 1

Annisa Ristya Rahmanti 1,2

  • The work presents a clear and structured overview of the study, which focuses on analyzing sentiments regarding COVID-19 and vaccinations expressed in Indonesian-language tweets. However, while the author briefly mentions previous studies, the current research would benefit from a more comprehensive literature review. This would better contextualize the research within the broader scope of existing findings, particularly those focusing on studies conducted in similar contexts or regions. 

  • The study design employs relevant tools such as the IndoBERT model for clustering and the difflib library for string comparison. However, the technical soundness is difficult to fully assess without more detail on the validation of these methods and their appropriateness for the analysis of Indonesian-language tweets. Could you provide the performance model?

  • Although the results were presented using pie charts to illustrate sentiment trends, a time series analysis would provide a more accurate representation of how sentiments have evolved over time. Time series analysis can capture the dynamic nature of public opinion affected by fake news or government actions, allowing for a clearer understanding of trends and patterns that may not be immediately apparent in static visual representations such as pie charts.

  • It would be pertinent to adjust the conclusion to acknowledge the methodological limitations and suggest more robust analytical methods, like time series analysis, for future studies. Additionally, addressing the ethical considerations involved in using social media data is also crucial.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Partly

Are all the source data underlying the results available to ensure full reproducibility?

Partly

Is the study design appropriate and is the work technically sound?

Partly

Are the conclusions drawn adequately supported by the results?

Partly

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

health informatics, NLP, social media data analytics, AI

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2024 Mar 7.
Viskasari Kalanjati 1

  • Comment and suggestion: The work presents a clear and structured overview of the study, which focuses on analyzing sentiments regarding COVID-19 and vaccinations expressed in Indonesian-language tweets. However, while the author briefly mentions previous studies, the current research would benefit from a more comprehensive literature review. This would better contextualize the research within the broader scope of existing findings, particularly those focusing on studies conducted in similar contexts or regions. 

  • Answer: Thank you for the insightful comment. We have added an explanation of our result in comparison with other studies in Indonesia. Please find it in the result and discussion chapter.

  • The study design employs relevant tools such as the IndoBERT model for clustering and the difflib library for string comparison. However, the technical soundness is difficult to fully assess without more detail on the validation of these methods and their appropriateness for the analysis of Indonesian-language tweets. Could you provide the performance model?

  • Answer: Thank you for the suggestion. We have added an explanation about the validation of IndoBert Model, and cited previous studies that have shown good performance and validity of IndoBERT model. Difflib is an established method of string comparison that is often used to examine the similarity of text. Please find further explanation at https://docs.python.org/3/library/difflib.html . 

  • Although the results were presented using pie charts to illustrate sentiment trends, a time series analysis would provide a more accurate representation of how sentiments have evolved over time. Time series analysis can capture the dynamic nature of public opinion affected by fake news or government actions, allowing for a clearer understanding of trends and patterns that may not be immediately apparent in static visual representations such as pie charts.

  • Answer: Thank you for the suggestion. We highly appreciate it. We recognize it as one of the limitations of our study and put it in a suggestion for future studies. 

  • It would be pertinent to adjust the conclusion to acknowledge the methodological limitations and suggest more robust analytical methods, like time series analysis, for future studies. Additionally, addressing the ethical considerations involved in using social media data is also crucial.

  • Answer: Thank you for the suggestion. We have made revisions accordingly. Please find it in the paragraph about the limitations of our study and also in the conclusion. 

F1000Res. 2023 Nov 15. doi: 10.5256/f1000research.143380.r219774

Reviewer response for version 1

Giuseppe Porro 1

The paper offers a sentiment and opinion analysis about the reactions to Covid-19 pandemic, vaccination strategies and measures of social restrictions in Indonesia. The data source is a collection of Twitter messages posted between January 2020 and August 2021.

The topic is interesting, its relevance is well argued and the results are properly presented and discussed.

My main remarks concern the opportunity of providing more detailed information about the methodology the Authors apply in the study. 

The unavailability of the data (due to privacy constraints) makes it necessary to be clear about the analytical procedures illustrated in Fig.1, in order to ensure the replicability of the study, albeit with different data and to avoid, as far as possible, any "black box" effect.

In particular:

a) I assume that the data were labeled into the categories indicated in Table 1 in the downloading stage. Then they were further classified into "classes, which were tweet exposure and engagement analysis" (p.3): how was this classification stage performed?

b) before classifying data, a cluster analysis was applied using IndoBERT model. What is the role of cluster analysis in the study? Is there any relationship between cluster analysis and classification stages? Or, alternatively, is cluster analysis only aimed at visual inspection of data (in this case, it should be described before the classification stage, in order to avoid misunderstanding)? 

c) please, provide details on how the sentiment analysis is performed: how is a tweet evaluated as positive, negative or neutral? At p.5 (point 2) the manual classification of a 3000-texts-size random sample is described and indicated as a training set for the IndoBERT model. How does the IndoBERT model work? Is the 75% accuracy rate the result of a test by the Authors themselves?

d) p.2, point 2: the description of the content of positive, negative and neutral tweets is accompanied by a sort of opinion analysis: is it the result of a reading of the training sample or has an opinion analysis been actually extended to the whole dataset?

e) despite the low incidence of fake tweets, the Authors invite not to undervalue the impact of fake news. Is there any way for identifying a fake news diffusion pattern, following retweets, replies and "like"s, as these seem to be available pieces of information (see p.6)?

In order not to move the focus of the paper from a public health to an artificial intelligence one, going into a more detailed description of the methodology may require a methodological appendix, where some examples could also be provided.

Minor remarks:

- as usual, when a sentiment or opinion analysis is performed via social network sites data, some words should be spent to comment the lack of representativeness of these data with respect to the whole country population. 

- Table 1: I'm assuming possible overlap between the "Covid-19" and the "Covid-19 Vaccine and Vaccination" subsets. This, of course, does not undermine the validity of the analysis in any sense, but it should be pointed out to the reader.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

No

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Partly

Reviewer Expertise:

sentiment analysis; subjective well-being evaluation; statistical methods for causal inference

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2023 Nov 19.
Viskasari Kalanjati 1

The paper offers a sentiment and opinion analysis about the reactions to Covid-19 pandemic, vaccination strategies and measures of social restrictions in Indonesia. The data source is a collection of Twitter messages posted between January 2020 and August 2021.

The topic is interesting, its relevance is well argued and the results are properly presented and discussed.

Answer: We highly appreciate your comment.

My main remarks concern the opportunity of providing more detailed information about the methodology the Authors apply in the study.

The unavailability of the data (due to privacy constraints) makes it necessary to be clear about the analytical procedures illustrated in Fig.1, in order to ensure the replicability of the study, albeit with different data and to avoid, as far as possible, any "black box" effect.

In particular:

a) I assume that the data were labeled into the categories indicated in Table 1 in the downloading stage. Then they were further classified into "classes, which were tweet exposure and engagement analysis" (p.3): how was this classification stage performed?

Answer: Thank you for the question. We try to clarify the explanation of the stages of analysis in the manuscript. Please find it in the revised manuscript, in the method chapter.

b) before classifying data, a cluster analysis was applied using IndoBERT model. What is the role of cluster analysis in the study? Is there any relationship between cluster analysis and classification stages? Or, alternatively, is cluster analysis only aimed at visual inspection of data (in this case, it should be described before the classification stage, in order to avoid misunderstanding)? 

Answer: We highly appreciate your question. We have clarified the explanation of the method. Please find it in the revised manuscript.

We reworded the term of cluster analysis. We used IndoBERT model to label all cleaned data within the categories of positive, neutral, and negative sentiments. 

IndoBERT (Bidirectional Encoder Representations from Transformers) constitutes a self-contained Deep Learning model designed for Natural Language Processing (NLP), inspired by the Transformer model. Each output element in this model is intricately connected to every input element, with dynamically computed weights based on inter-element relationships. BERT is formulated to aid computers in comprehending the ambiguous meaning of language within a text by utilizing the surrounding text to establish context.

IndoBERT represents the Indonesian iteration of the BERT model, a Deep Learning model tailored for Natural Language Processing (NLP). This model is trained using over 220 million words in the Indonesian language. BERT is engineered to assist computers in understanding the ambiguous meaning of language within a text by leveraging the surrounding text to construct context, thereby obtaining improved word weighting values.

In this study, IndoBERT is employed for the tokenization process of words before engaging in the classification task. Following the successful tokenization of the Indonesian language by IndoBERT, the data proceeds directly to the classifier layer to discern patterns in word tendencies, classifying them into three categories: positive, neutral, and negative.

c) please, provide details on how the sentiment analysis is performed: how is a tweet evaluated as positive, negative or neutral? At p.5 (point 2) the manual classification of a 3000-texts-size random sample is described and indicated as a training set for the IndoBERT model. How does the IndoBERT model work? Is the 75% accuracy rate the result of a test by the Authors themselves?

Answer: Thank you for the question.  

The training and testing processes within the IndoBERT model depend on a dataset prepared by the researcher, comprising 3000 randomly selected text data. The entirety of this process involves the researcher labeling the data into three classes. Upon completion of the labeling process, the data is subsequently partitioned into training and testing sets at an 80%:20% ratio. The training data is utilized by the IndoBERT model to discern patterns within words indicative of their tendency to fall into one of three categories: positive, neutral, or negative.

After the model completes its training process, the subsequent step involves testing the data. Specifically, the model attempts to predict text data using the test data and compares these predictions with the labels assigned by the researcher. This comparative analysis aims to demonstrate the model's performance in classifying text sentiment. The results of this testing reveal an accuracy value of 75%, evaluated against the labels predicted by the model and those designated by the researcher. This accuracy value signifies the model's successful acquisition of patterns in the Indonesian language and its proficient performance in sentiment classification.

d) p.2, point 2: the description of the content of positive, negative and neutral tweets is accompanied by a sort of opinion analysis: is it the result of a reading of the training sample or has an opinion analysis been actually extended to the whole dataset?

Answer: Thank you for the question, The opinion analysis has been applied to the whole dataset.

e) despite the low incidence of fake tweets, the Authors invite not to undervalue the impact of fake news. Is there any way for identifying a fake news diffusion pattern, following retweets, replies and "like"s, as these seem to be available pieces of information (see p.6)?

Answer: Thank you for the question. This is an interesting topic to learn more about the diffusion pattern of circulating fake news, that necessarily will be useful for combating the spread of fake news. For further research, the idea is to capture all the metadata of users who interacted with the main post that the model has predicted as fake news by crawling all replies, likes, and retweets. This enables us to track how far the influences of the first posts are. It is also interesting for further study about the development of automated reply that notifies the main post audience that this Tweet has been classified as fake news, and provides a valid source of information

In order not to move the focus of the paper from a public health to an artificial intelligence one, going into a more detailed description of the methodology may require a methodological appendix, where some examples could also be provided.

Answer: Thank you for the suggestion. We highly appreciate it. Since this journal applies an open review system whereby all readers can access the communication between authors and reviewers, details of the methods can easily be found through this communication. Therefore, we do not add the appendix.

Minor remarks:

- as usual, when a sentiment or opinion analysis is performed via social network sites data, some words should be spent to comment the lack of representativeness of these data with respect to the whole country population.

Answer: Thank you for the suggestion. We are pleased to accommodate it.

- Table 1: I'm assuming possible overlap between the "Covid-19" and the "Covid-19 Vaccine and Vaccination" subsets. This, of course, does not undermine the validity of the analysis in any sense, but it should be pointed out to the reader.

Answer: Thank you for the suggestion. We are pleased to accommodate it.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The underlying data to this research cannot be shared due to the ethical and copyright restrictions surrounding social media data. The Methods section contains detailed information to allow replication of the study. Any queries about the methodology should be directed to the corresponding author.


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES