Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Jul 9;16(7):e0253569. doi: 10.1371/journal.pone.0253569

Evidence of disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy

Samantha Ajovalasit 1,2,*, Veronica Maria Dorgali 3, Angelo Mazza 1, Alberto d’Onofrio 4,5, Piero Manfredi 6
Editor: Alexandre Bovet7
PMCID: PMC8270452  PMID: 34242253

Abstract

Background

In Italy, in recent years, vaccination coverage for key immunizations as MMR has been declining to worryingly low levels, with large measles outbreaks. As a response in 2017, the Italian government expanded the number of mandatory immunizations introducing penalties to unvaccinated children’s families. During the 2018 general elections campaign, immunization policy entered the political debate with the government in-charge blaming oppositions for fuelling vaccine scepticism. A new government (formerly in the opposition) established in 2018 temporarily relaxed penalties and announced the introduction of forms of flexibility.

Objectives and methods

First, we supplied a definition of disorientation, as the “lack of well-established and resilient opinions among individuals, therefore causing them to change their positions as a consequence of sufficient external perturbations”. Second, procedures for testing for the presence of both short and longer-term collective disorientation in Twitter signals were proposed. Third, a sentiment analysis on tweets posted in Italian during 2018 on immunization topics, and related polarity evaluations, were used to investigate whether the contrasting announcements at the highest political level might have originated disorientation amongst the Italian public.

Results

Vaccine-relevant tweeters’ interactions peaked in response to main political events. Out of retained tweets, 70.0% resulted favourable to vaccination, 16.4% unfavourable, and 13.6% undecided, respectively. The smoothed time series of polarity proportions exhibit frequent large changes in the favourable proportion, superimposed to a clear up-and-down trend synchronized with the switch between governments in Spring 2018, suggesting evidence of disorientation among the public.

Conclusions

The reported evidence of disorientation for opinions expressed in online social media shows that critical health topics, such as vaccination, should never be used to achieve political consensus. This is worsened by the lack of a strong Italian institutional presence on Twitter, calling for efforts to contrast misinformation and the ensuing spread of hesitancy. It remains to be seen how this disorientation will impact future parents’ vaccination decisions.

Introduction

The dramatic success of immunization programs in industrialized countries, with decades of high vaccine uptake and ensuing herd immunity, is suffering a drawback, namely the generalized fall of perceived risks arising from vaccine-preventable infectious diseases. This promotes the spread of resistance, or reluctance, to vaccination. This phenomenon, nowadays identified as “vaccine hesitancy” [13] is currently considered one of the top threats to global health because of its pervasive and complex nature [4]. Ensuring vaccination programs’ resilience to the hesitancy threat is a significant task of current Public Health systems.

In Italy, the MMR (measles, mumps, and rubella) vaccination coverage at 24 months, which was in the region of 91% in 2010, fell at 85.3% in 2015 and remained low after that. Parallel to this, large measles outbreaks, with 844 cases in 2016, 4,991 in 2017 (with four deaths), and 2,029 cases in the first six months of 2018 [57] were observed.

As a response, in the Italian National immunization plan for 2017–2019, the Italian government acted to increase the number of mandatory immunizations [811] by introducing penalties for non-vaccinators in the form of fines and restrictions to admittance to kindergarten and school. The decree’s ethical implications, mainly the introduction of sanctions, have been strongly challenged, especially in online social media (OSM). With the 2018 general elections, the vaccination policy flooded the political debate, with the government accusing the opposition of fuelling scepticism around vaccination. The new government, established in June 2018 and composed by a coalition between an anti-establishment movement and a far-right party, allowed, after several contrasting announcements, unvaccinated children to be admitted to school.

Over the past fifteen years, OSM emerged as a major popular source of information, including health topics [1214]. However, within OSM, anyone can express her/his own opinion, regardless of her/his expertise in the particular topic considered. As a result, parents’ immunization decisions could be influenced by misconceptions and false information [1517]. The massive misinformation pervading the OSM environment has been defined by the World Economic Forum as one of the main threats to current societies [1519], in particular, because of the emergence of echo chambers, i.e., “polarised groups of like-minded people who keep framing and reinforcing a shared narrative” [16].

Although opposition to vaccination, favoured by equally misinformation, existed since the very introduction of the smallpox vaccine [20], recently, because of the increase in Internet access and the birth of the new communication platforms, misinformation is spreading at unprecedented rates [16, 21].

We focused our analysis on Twitter, a microblogging service, which is considered, as well as Facebook, a public square where anyone can express and share opinions and participate in discussions. On Twitter, user A may see user Bs’ messages without being involved in a direct relationship (“follow”). Twitter thus represents a social network and an information network at the same time. This makes Twitter different from Facebook because of its structure. For example, Facebook allows easier identification of echo chambers and homophily [22, 23].

For epidemiological purposes, Twitter data have been used for surveillance and descriptive studies, e.g., the spread of seasonal flu, the 2009 H1N1 pandemic outbreak, and the 2014 Western Africa Ebola outbreak. In all these examples, a clear correlation between the temporal spread of infections and social media interactions emerged [24].

Supported by the steadily increasing internet access, Twitter is currently one of the primary tools used by political leaders to communicate with their public [2527]. However, this implies that when political leaders intervene on scientific subjects, such as immunization, they exert tremendous pressure on public opinion [28]. When health-related topics are the subject of political disputes, with contrasting information being massively delivered by not formally qualified persons, some individuals may be induced to change their opinions compulsively, originating a condition of disorientation. Properly defining disorientation and testing for its presence in Twitter signals is a main task of this article. A preliminary literature search on the subject suggested that the issue of “disorientation”, though ubiquitously present in many disciplines such as medical and cognitive sciences, spatial and information sciences, and social science [2931], does not seem to have received systematic attention in the literature on information, opinions and online social media. Generally speaking, “disorientation” can be simply a consequence of the lack of adequate information, of the over-exposition to information, including misinformation, and more generally, of information disorder [32]. All these factors can make it difficult for people to filter the masses of available information properly. To simplify things and develop simple tests for the presence of disorientation in data, we assumed that disorientation (towards vaccines) could be coarsely identified as the lack of well-established and resilient opinions among individuals, therefore causing individuals to change their opinions as a consequence of sufficient external perturbations. The question then shifts on which the perturbations might be “sufficient”. Clearly, some perturbations–typically those arising as direct resp onses of the public to media news—can be very short-lasting. In relation to this, we define a concept of “short-term disorientation” as a state in which people keep changing suddenly (and often) their opinion on the debated subject because of the overwhelming impact of multiple contrasting information. However, other perturbations, such as those following from non-scientific arguments persistently promoted or supported at the highest political level, e.g., a political party, or even a government, might generate longer-term effects that we term here as “long-term disorientation”.

Consistently, in this article, we used sentiment analysis to describe the trend in communication about vaccines on Twitter in Italy throughout 2018 and to evaluate polarity in the opinions about immunization as preliminary steps to bring evidence–by appropriate statistical tests—that the prolonged phase of contrasting political announcements on a sensitive topic such as mass immunization might have originated a condition of disorientation among the Italian public.

Materials and methods

Twitter is an online social media and micro-blogging service born in 2006. Users (“tweeters”) write texts (“tweets”) of 280 characters maximum length, which are publicly visible by default until users decide to protect their tweets. According to statista.com (accessed on March 13th, 2021), in 2021, Twitter has 340 million (estimated) active users worldwide.

Data extraction, transformation, and cleaning

We collected tweets in Italian containing at least one of a set of keywords related to vaccination behaviour and vaccine-preventable infectious diseases posted in 2018, using the Twitter Advanced Search Tool. In total, we retrieved 443,167 tweets. Keywords were chosen from a review of previous literature, and they were appropriately expanded for our purposes. Subsequently, we applied supervised classification techniques to screen out irrelevant tweets and analyze the polarity proportions of the retained ones. Consistently, we deliberately chose a broader set of keywords in order to retrieve the largest possible set of tweets and then apply finer tools to identify and leave-out noise.

Data cleaning was performed using the Python programming language. A probabilistic approach was used to re-filter tweets written in Italian; then, possible duplications were removed using the Tweets ID field with 318,371 tweets retained for the analysis. For each post, we tracked subsequent interactions by counting the number of re-tweets and likes.

Tweets classification, sentiment analysis, and training set

Sentiment analysis deals with the computational treatment of opinions, sentiments, and subjectivity within texts [33, 34]. Here, we used sentiment analysis methods for classifying tweets. In our analysis, we identified four categories: (i) favourable (F), if the tweet unambiguously showed a convinced pro-vaccine position, (ii) contrary (C), if the tweet unambiguously showed a position contrary to vaccination, (iii) undecided (U), if the tweet was neither favourable nor unfavourable, (iv)out of context (OOC), if the tweet was unrelated to immunization or if it did not fit any of the preceding categories (e.g., if it was merely spreading news or linking to another source, without expressing an opinion or a clear position). Tweets from the latter category were removed from subsequent analyses. Throughout the rest of the article, we will generically label the observed proportions of the three categories (F, C, U) (computed over any time period) as the “polarity” proportions. The sum of the contrary and undecided proportions can be taken as an estimate of the hesitant proportion in the overall Twitter population during the period considered. Notably, this is a wide population, potentially including people not involved in vaccination decisions, neither currently nor in the future, and therefore not necessarily relevant for the future vaccine coverage. Nonetheless, they represent a large population participating in a hot public debate and, therefore, relevant to opinion formation.

A supervised classification procedure [35, 36] was used to classify tweets into the four categories previously defined. First, a training set was created by manually tagging a random sample of 15,000 tweets out of the 318,371 retained for the analysis. Manual labelling was done by 15 trained university students. In particular, 15% of these 15,000 tweets were intentionally duplicated to measure the mutual (dis)agreement among annotators. The resulting accuracy was 0.6298 (CI 0.6034–0.6557), with a ‘Fleiss’ Kappa of 0.410, resulting in a fair agreement.

Next, we manually reviewed the duplicated tweets and those that showed invalid content (such as hashtag only or URL only tweets). Tweets labelled during previous explorative analysis were added. Eventually, we obtained a set of 14306 unique tweets that made the training set. In the classification process, we used unigram and bigram; we kept the hashtag (#vaccino) and removed the mentions (e.g. @screenname).

Eventually, the training set was used to compare five alternative classification models based on the following algorithms: Classification Tree, Random Forest, Naive Bayes, Support Vector Machine (SVM), and K-Nearest Neighbors.

Seeking evidence of disorientation in Twitter data

Consistently with the proposed definition of disorientation, in what follows, we propose a few procedures aimed to test for the presence of disorientation about vaccination amongst tweeters in Italy. We distinguish between short- and long-term disorientation. The former deals with a condition in which people keep changing suddenly (and often) their opinion on the debated subject because of the overwhelming impact of short-term information disorder. The latter deals with longer-term opinion perturbations, as it can be the case when the highest political actors, e.g., a political party, or even a government, persistently promote or support non-scientific arguments, including forms of denialism, thereby generating longer-term disorientation effects, including disorientation waves.

Short-term disorientation

To seek short-term disorientation symptoms in the data, we applied a number of tests relying on the size of the deviations (measured through the variance) from appropriately defined average opinions. The tests were conducted considering all the tweets retained, assuming they represented a random sample of an appropriate underlying super population. In particular, we proposed three different tests.

A basic multinomial test of daily tweeting trends

We applied a simple multinomial test to identify those changes in the polarity proportions resulting from randomness and separate them from those that did not. Our null hypothesis is that the (true) proportions of categories (F, C, U) were the ones observed throughout the entire year. In practice, we computed, for every day, the probability value (p-value) that the observed vector of opinion proportions is a (random) sample drawn from a multinomial population whose parameter vector is given by the overall yearly mean of the polarity proportions.

A “running” multinomial test for fast-changing opinions

To further understand the short term changes in opinions, we tested whether each observed daily vector of polarity proportions represented a random sample drawn from a “running” multinomial population whose parameter vector is given by the average polarity proportions observed over the preceding 15 days. The figure of 15 days, representing our null hypothesis, was selected somewhat arbitrarily as a minimal duration representing a “stable” opinion (or “average preferences persistence”) in the short term. However, a sensitivity analysis was conducted to check the robustness of this choice.

A running-variance test

Furthermore, all along the observed period, we computed a running 15-days variance of the proportion favourable to vaccination and tested (by the standard Chi-square) the null hypothesis that the 15-days variance is equal to the overall variance throughout the entire year.

Longer-term disorientation

As for long-term perturbations, we applied a polynomial fit to the smoothed polarity proportions trend over the entire year to look for possible evidence of long-term disorientation amongst the public. Smoothing was carried out using a discrete beta-kernel based procedure proposed by [37]; the use of beta kernels allows overcoming the problem of boundary bias, commonly arising from the use of symmetric kernels. The finite support of the beta kernel function can match our time interval so that, when smoothing is made near the time interval boundaries, no weight is allocated outside the support. The smoothing bandwidth parameter has been chosen using cross-validation.

Results

Automatic data classification and polarity proportions

Among the five classification algorithms tested, the Support Vector Machine (SVM) performed best (details in the online appendix), and it was consequently adopted. Main summary results based on standard measures [38] are reported in Table 1. These measures are consistent with the classification provided by human annotation (S5 Table reported in the S1 File). As mentioned in the previous section, by selecting a broad set of keywords, we chose to retrieve a larger set of tweets and left to supervised classification algorithms the task of identifying noise. Consistently, 57.8% of the total tweets were classified as out-of-context and discarded. Of the remaining tweets, the overall proportions of classified as favorable, contrary and undecided were: F = 70.0% (CI: 61.5–74.0), C = 16.4% (CI: 12.7–25.2), U = 13.6% (CI: 8.06–20.5), respectively.

Table 1. Results of the support vector classifier (the classifier eventually selected) for the four categories considered in this work (favourable, contrary, undecided and out of context).

precision recall f1-score support
Favorable 0.43 0.46 0.44 785
Contrary 0.24 0.19 0.21 318
Undecided 0.20 0.15 0.17 299
Out of Context 0.63 0.67 0.65 1460
Accuracy 0.50 2862
Macro avg 0.37 0.37 0.37 2862
Weighted avg 0.49 0.50 0.49 2862

Hesitancy

The proportion of hesitant individuals in our overall Twitter population, given by the sum of the contrary and undecided proportions, resulted in 30,1%.

Institutional presence on Twitter

The Italian Ministry of Health use of Twitter is relegated to press communications and the16.4 publication of statistics. Between 2013 and September 18th, 2019, the Italian Ministry of Health tweeted 2,454 times (of which 172 included the word vaccin*), i.e., 25% the figure observed in France from the Ministère des Solidarités et de la Santé. Essentially the same holds for the Italian National Institute of Health.

Temporal trends

The daily levels of Twitter interaction (including original tweets and subsequent likes or re-tweets) for 2018 (see Fig 1) show three prominent peaks, each accounting for hundreds of thousands of interactions. These three peaks represent users’ responses to well-identified triggering events. The first peak, recorded on June 22nd, 2018, is the second-highest; it follows an Italian Minister of Interior public speech that defined the number of mandatory immunizations in the National Immunization Plan as “intolerably excessive”. Polarity proportions observed on this day were F = 71.7%, C = 15.8%, and U = 12.5%, respectively. The second peak, recorded on August 4th, 2018, is the highest; it follows a government decree that suspends sanctions, such as non-admission to school, imposed by the previous government on unvaccinated children. Notably, whereas the number of tweets on this day exhibited a dramatic increase compared to the previous days, the underlying polarity proportions (F = 77.7%, C = 11.8%, and U = 10.5%) showed only moderate variations. The third peak (September 5th, 2018) follows the government’s change of position on easing the sanctions on unvaccinated children (F = 73.6%, C = 14.4%, U = 12.0%). The graph in Fig 1shows a number of further lower peaks, still attributable to interventions in the political debate, over a long-term background of low-level activity.

Fig 1. Tweeting about vaccines in Italy during 2018: Time series of total daily interaction counts (tweets, likes, and re-tweets) and exact dates at main triggering political events or speeches.

Fig 1

Testing the presence of disorientation

With the caveats reported above, the proportion of people “not favourable” to immunization–around 30%—was a worrying symptom of the complicated state of opinions about vaccination in Italy. The results of the various procedures proposed to investigate disorientation are reported below.

Short-term disorientation

Using as null hypothesis the polarity proportions observed for the whole year (F = 70.0%, C = 16.4%, U = 13.6%), the basic multinomial test (Fig 2) is significant in 132 days (α = 5%), against an expectation of approximately 18 days (5% of 365).

Fig 2. Results of the basic multinomial test.

Fig 2

Blue circles, green squares, and purple diamonds denote the days when the null hypothesis was rejected at the significance levels of 10%, 5%, and 1%, respectively. For readability, we showed the smoothed polarity proportions. In the online appendix, we have reported in S1 Fig the real (raw) proportions used in the multinomial tests.

The running multinomial test (Fig 3) is significant in 101 days (α = 5%), providing further evidence of instability in polarity proportions.

Fig 3. Results of the running multinomial test at 15 days.

Fig 3

Blue circles, green squares, and purple diamonds denote the days when the null hypothesis was rejected at the significance levels of 10%, 5%, and 1%, respectively.

Last, the running-variance test (Fig 4) is significant in 80 days (α = 5%). In particular, significantly high variances appeared in February and March 2018, at the end of the electoral campaign, and around the voting days. In contrast, significantly low variances appeared after the new government took office and before schools opening, suggesting a possible stabilization of opinions after the transition from one government to the next.

Fig 4. 15-days running variance of the proportion favourable to vaccination (black line).

Fig 4

Blue circles, green squares, and purple diamonds denote the days when the null hypothesis was rejected at the significance levels of 10%, 5%, and 1%, respectively.

Overall, the three tests performed agree in bringing statistical evidence towards a rapid shift in vaccination opinions, denoting the presence of short-term disorientation according to the first definition provided.

Smoothing and longer-term disorientation

The smoothed time series show that many of the sudden changes of the daily polarity proportions originate from a rather small number of more stable and longer-lasting fluctuations (Fig 5). For the proportion favourable to immunization, the amplitude of these more stable oscillations is remarkable (from 60% to 76%), suggesting a substantial size of the “non-resilient” component of the population favourable to vaccination.

Fig 5. Kernel smoothing of daily polarity proportion jointly with the corresponding linear and quadratic interpolations.

Fig 5

Panels (a),(b),(c) report the favourable, contrary and undecided proportions, respectively.

A stepwise polynomial fit of the smoothed polarity proportions (Fig 5) selected the parabolic function as the best one, allowing for a dramatic increase in the determination coefficient compared to the linear case, whereas higher-order functions increased R2 only negligibly (R2 values are reported in Fig 5).

Between January and May, the parabolic trend exhibits a clear increase of the favourable proportion (and a parallel decline in the proportions undecided and contrary), possibly reflecting the “tail” of the positive impact of the “vaccine decree” issued by the previous government, and a marked decline thereafter, when the new government had taken office, losing more than 7% by the end of the year. While we are not able to provide a direct causal link between the government change and the variation in polarity proportions, the association remains of concern in light of its political context.

Discussion

The contribution’s main objective was to investigate whether the 2018 series of contrasting announcements on immunization policy at the highest Italian political level originated disorientation amongst the Italian public. We carried out a sentiment analysis on tweets posted in Italian during 2018 containing vaccine-related keywords.

Our results are as follows. A polarity analysis showed that the proportion of tweets favourable to vaccination was about 70%, the unfavourable one about 16%, while the “undecided” accounted for 13%, in line with similar studies [3941], yielding an estimate of the hesitant proportion in the range of 30%. As for the temporal trends of tweets, relevant interactions showed clear peaks in correspondence with vaccine-related news and political speeches, indicating that this OSM “is used as an agora for matters of public interest” [42]. Finally, as for the key category of “disorientation”, we proposed in this paper a twofold definition namely, short-term disorientation, characterised by unstable, fast-changing, opinions about vaccination, vs long-term disorientation. Our results documented the presence of short-term disorientation by a range of alternative tests. Additionally, a clear yearly trend emerged, showing that the proportion favourable to vaccination increased up to when the previous government–strongly supporting immunization over the media–was in charge (May 2018), and it started declining as soon as the new government, fostering a more ambiguous position, had taken office. We felt hard to believe that this association was unrelated to the new government’s continued and ambiguous series of announcements.

Compared to similar studies on vaccination opinions in online social media, we believe that the attempt to define and measure the concept of disorientation and document it is a major strength of the present work. We remark that available data only allowed us to test for disorientation at the “collective” (or aggregate) level. A different research design, tracking users over time (which would require data at the individual level) would in principle allow to investigate disorientation at the individual (or micro-) level.

The reported evidence of disorientation on vaccination is suggestive of the potentially harmful role played by the use of critical health topics for purposes of political consensus. Sadly, these happenings are not new, think, e.g., to the dramatic impact of the denialism of the HIV promoted by a former president of South-Africa in a critical phase of the HIV epidemic [43]. However, these aspects can become especially important due to online social media’s increasing role as a source of information (mainly misinformation) [44], which might yield social pressures eventually harmful to vaccine uptake. Said otherwise, persistent disorientation can be inflated by online misinformation, finally drifting into hesitancy. From this viewpoint, we believe that the category of disorientation will deserve future inquiry in more focused studies.

In the Italian case, the effect of disorientation might have been worsened by the almost lack, till the end of 2018, of a stable institutional presence on Twitter by Italian Public Health institutions. This fact, which appears in continuity with the traditional lack of communication between Italian public health institutions and citizens long before the digital era [14], calls for rapid public efforts in terms of an active presence on online social media, aimed to detect and contrast the spread of misinformation and the possible further spread of vaccine hesitancy [45, 46]. Clearly, in the current Italian context, with the ongoing COVID-19 third wave triggered by the new virus variants, the emergency situation has forced a temporary improvement. However, it cannot be disregarded the fact that at the end of the first pandemic wave (June-July 2020), an amazing large (41%) proportion of Italian adults declared themselves contrary to Covid-19-vaccination (https://www.cattolicanews.it/vaccino-anti-covid-italiani-poco-propensi).

Though not designed for this purpose, this analysis might provide valuable suggestions for vaccine decision-makers. Indeed, the large proportion of hesitant (in the region of 30%) should be carefully considered, if not for their potential impact on current coverage, at least for the social pressure they might enact within online social media, which might eventually feedback negatively on future coverage, as previously pinpointed.

About the limitations of the present analysis, it must be acknowledged that the definitions adopted for the concept of disorientation and its empirical investigation were given in an ad-hoc manner for the present analyses, due to the lack of literature support. From this viewpoint, the category of disorientation will deserve future inquiry in more focused studies, both conceptual and applied. We also have to mention the sub-optimal accuracy of the adopted classification, whose scores are only slightly above those that might result from a random classifier. This was mostly due to the agreement between human annotators, which was somewhat lower than expected. Nonetheless, we feel that these types of issues can often arise when dealing with controversial topics as the one covered by this work, which could trouble also well-trained human annotators.

More in general, the intrinsic limitations of Twitter data (e.g., the maximum length of texts; use of slang, abbreviations, and irony; the tendency to overcome the maximum length by subdividing a single thread into multiple tweets) have largely been acknowledged in the OSM and the related social sciences and public health literature [32, 40].

A further possible limitation lies in the fact that a small proportion of users is highly active and therefore responsible for a substantial proportion of tweets. This can introduce a bias towards the most active users.

From a broader perspective, it must be recalled that the spread of vaccine hesitancy pairs with the widespread diffusion of the so-called “Post Trust Society” [47] and of the “Post Truth Era” [48]. The present investigation can help public health policymakers better orient vaccine-related communication to mitigate the impact of vaccine hesitancy and refusal. This is, however, only a part of the story. Indeed, it is fundamental for public health systems to be able to develop real-time tools to identify fake news as well as tweets hostile to immunization—that might have the largest impact—and appropriately reply to them. This would require that public health communication agencies and institutions are also active in the real-time analysis of online media data, not just in the production of regular communication. On top of this, given the sensible role of the immunization topic, it is surely urgent to develop a moral code preventing the use of such topics for purposes of political consensus and ensuring avoidance of contradictions and ambiguities amongst government members.

A number of previous points might be worth considering in future research, by comparing the language used by tweeters (regardless of their position towards vaccination) and the language of the tweets posted by public health institutions, which represent an important aspect in the communication with agents, particularly with respect to “undecided” individuals, in order to enhance their vaccine confidence. A further point deals with the frequency of fake news spreader users. In this work, we took users as they were, without further control over their profiles. However, this is a key issue deserving careful investigation in future work. Also, the quantitative importance of the followers, which could represent a vehicle for misinformation spreading, possibly distinguished by polarity, as well as that of highly active tweeters, as it emerged in this study, is worth considering in future work on the subject.

Supporting information

S1 Fig. The figure represents the real (raw) polarity proportions of favourable (blue), contrary (purple), and undecided (green) with respect to vaccination topic.

(TIF)

S1 File. This file contains all the classification reports for all the classifier tested (S1–S5 Tables in S1 File), the keywords adopted to retrieve the tweets (S6 Table in S1 File).

(DOCX)

S2 File. This file contains the tweets’ Ids containing the Ids and the class of the tweet.

(ZIP)

Acknowledgments

We warmly thank three anonymous referees and an Editor of the Journal whose valuable comments allowed us to greatly improve the quality of the manuscript. We also thank Emanuele Del Fava and Alessia Melegaro for their valuable comments on a previous draft of this work.

Data Availability

Due to restrictions in the Twitter terms of service (https://twitter.com/en/tos) and the Twitter developer policy (https://developer.twitter.com/en/developer-terms/agreement-and-policy.html) we cannot provide the full text of tweets used in this study. However, for replication purposes, we provide the ID of every tweet used, also containing the manually labeled training set adopted; details on how to fetch tweets given their IDs are provided in https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-lookup. The dataset of the Tweets IDs corresponds to the data available up to January 7 2019. Furthermore, note that any sample of tweets containing the same set of keywords listed in the manuscript and posted over the same time period is likely to yield study findings similar to those reported in the article. We extracted data available at that time from the Twitter public web interface; this data can also be purchased from Twitter via their Historical PowerTrack API http://support.gnip.com/apis/historical_api2.0/. The authors provide set of minumum requirements in form of Tweets IDs.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.MacDonald NE, Eskola J, Liang X, Chaudhuri M, Dube E, Gellin B, et al. Vaccine hesitancy: Definition, scope and determinants. Vaccine. 2015;33(34):4161–4. doi: 10.1016/j.vaccine.2015.04.036 [DOI] [PubMed] [Google Scholar]
  • 2.Bedford H, Attwell K, Danchin M, Marshall H, Corben P, Leask J. Vaccine hesitancy, refusal and access barriers: The need for clarity in terminology. Vaccine. 2018. Oct 22;36(44):6556–8. doi: 10.1016/j.vaccine.2017.08.004 [DOI] [PubMed] [Google Scholar]
  • 3.MacDonald N, Dubé E, Butler R. Vaccine hesitancy terminology: A response to Bedford et al. Vaccine [Internet]. 2019;37(30):3947–8. Available from: doi: 10.1016/j.vaccine.2017.11.060 [DOI] [PubMed] [Google Scholar]
  • 4.https://www.who.int/emergencies/ten-threats-to-global-health-in-2019.
  • 5.Pezzotti P, Bellino S, Prestinaci F, Iacchini S, Lucaroni F, Camoni L, et al. The impact of immunization programs on 10 vaccine preventable diseases in Italy: 1900–2015. Vaccine [Internet]. 2018;36(11):1435–43. Available from: doi: 10.1016/j.vaccine.2018.01.065 [DOI] [PubMed] [Google Scholar]
  • 6.Siani A. Measles outbreaks in Italy: A paradigm of the re-emergence of vaccine-preventable diseases in developed countries. Prev Med (Baltim) [Internet]. 2019;121(September 2018):99–104. Available from: doi: 10.1016/j.ypmed.2019.02.011 [DOI] [PubMed] [Google Scholar]
  • 7.Istituto superiore di Sanità: Morbillo & Rosolia News Rapporto N 34/2017, N 37/2018, N 43/2018.
  • 8.Gualano MR, Bert F, Voglino G, Buttinelli E, D’Errico MM, De Waure C, et al. Attitudes towards compulsory vaccination in Italy: Results from the NAVIDAD multicentre study. Vaccine. 2018;36(23):3368–74. doi: 10.1016/j.vaccine.2018.04.029 [DOI] [PubMed] [Google Scholar]
  • 9.Italian Ministry of Health (2017) Piano Nazionale Prevenzione Vaccinale (PNPV) 2017–2019.
  • 10.Italian Ministry of Health (2017) The decree on vaccine-based prevention.
  • 11.http://www.salute.gov.it/portale/documentazione/p6_2_8_3_1.jsp?lingua=italiano&id=20
  • 12.Kata A. Anti-vaccine activists, Web 2.0, and the postmodern paradigm—An overview of tactics and tropes used online by the anti-vaccination movement. Vaccine [Internet]. 2012;30(25):3778–89. Available from: 10.1016/j.vaccine.2011.11.112 [DOI] [PubMed] [Google Scholar]
  • 13.Aquino F, Donzelli G, De Franco E, Privitera G, Lopalco PL, Carducci A. The web and public confidence in MMR vaccination in Italy. Vaccine [Internet]. 2017;35(35):4494–8. Available from: 10.1016/j.vaccine.2017.07.029 [DOI] [PubMed] [Google Scholar]
  • 14.Benelli E. The role of the media in steering public opinion on healthcare issues. Health Policy (New York). 2003;63(2):179–86. doi: 10.1016/s0168-8510(02)00064-7 [DOI] [PubMed] [Google Scholar]
  • 15.Kata A. A postmodern Pandora’s box: Anti-vaccination misinformation on the Internet. Vaccine. 2010;28(7):1709–16. doi: 10.1016/j.vaccine.2009.12.022 [DOI] [PubMed] [Google Scholar]
  • 16.Schmidt AL, Zollo F, Scala A, Betsch C, Quattrociocchi W. Polarization of the vaccination debate on Facebook. Vaccine. 2018;36(25):3606–12. doi: 10.1016/j.vaccine.2018.05.040 [DOI] [PubMed] [Google Scholar]
  • 17.Zollo F, Novak PK, Del Vicario M, Bessi A, Mozetič I, Scala A, et al. Emotional dynamics in the age of misinformation. PLoS One. 2015;10(9):1–22. doi: 10.1371/journal.pone.0138740 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Howell L., Digital wildfires in a hyperconnected world. WEF Report 2013
  • 19.Vicario M Del, Bessi A, Zollo F, Petroni F, Scala A, Caldarelli G, et al. The spreading of misinformation online. Proc Natl Acad Sci U S A. 2016; doi: 10.1073/pnas.1517441113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Salmon DA, Teret SP, MacIntyre CR, Salisbury D, Burgess MA, Halsey NA. Compulsory vaccination and conscientious or philosophical exemptions: Past, present, and future. Lancet. 2006. Feb 4;367(9508):436–42. doi: 10.1016/S0140-6736(06)68144-0 [DOI] [PubMed] [Google Scholar]
  • 21.Bessi A, Coletto M, Davidescu GA, Scala A, Caldarelli G, Quattrociocchi W. Science vs conspiracy: Collective narratives in the age of misinformation. PLoS One. 2015;10(2):1–17. doi: 10.1371/journal.pone.0118093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gottfried J, Shearer E. News Use Across Social Media Platforms 2016. 2016; http://assets.pewresearch.org/wp-content/uploads/sites/13/2016/05/PJ_2016.05.26_social-media-and-news_FINAL-1.pdf
  • 23.Colleoni E, Rozza A, Arvidsson A. Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data. J Commun. 2014;64(2):317–32. [Google Scholar]
  • 24.Paul MJ, Dredze M, Broniatowski D. Twitter Improves Influenza Forecasting. PLoS Curr. 2014;1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Becatti C, Caldarelli G, Lambiotte R, Saracco F. Extracting significant signal of news consumption from social networks: the case of Twitter in Italian political elections. Palgrave Commun. 2019;5(1):1–16. [Google Scholar]
  • 26.Campanale M, Caldarola EG. Revealing political sentiment with Twitter: The case study of the 2016 Italian constitutional referendum. Proc 2018 IEEE/ACM Int Conf Adv Soc Networks Anal Mining, ASONAM 2018. 2018;861–8.
  • 27.Ju A, Jeong SH, Chyi HI. Will Social Media Save Newspapers? Journal Pract [Internet]. 2014;8(1):1–17. Available from: 10.1080/17512786.2013.794022 [DOI] [Google Scholar]
  • 28.Zhang E. J., Chughtai A. A., Heywood A., and MacIntyre C. R., (2019) Influence of political and medical leaders on parental perception of vaccination: A cross-sectional survey in Australia. BMJ Open, vol. 9, no. 3, p. e025866. doi: 10.1136/bmjopen-2018-025866 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Acemoǧlu D., Como G., Fagnani F., Ozdaglar A. (2013). “Opinion fluctuations and disagreement in social networks. Mathematics of Operations Research, 38(1), 1–27. [Google Scholar]
  • 30.Keller A. M., Taylor H. A., Brunyé T. T. (2020). Uncertainty promotes information-seeking actions, but what information? In Cognitive Research: Principles and Implications Vol. 5, Issue 1, pp. 1–17. doi: 10.1186/s41235-020-00245-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shi G., Proutiere A., Johansson M., Baras J. S., Johansson K. H. (2016). The Evolution of beliefs over signed social networks. Operations Research, 64(3), 585–604. [Google Scholar]
  • 32.Wardle C., Derakhshan H., (2017) Information Disorder Toward an interdisciplinary framework for research and policymaking, Council of Europe, http://tverezo.info/wp-content/uploads/2017/11/PREMS-162317-GBR-2018-Report-desinformation-A4-BAT.pdf
  • 33.Sebastiani F. Machine Learning in Automated Text Categorization [Internet]. [cited 2020 Jan 14]. www.ira.uka.de/bibliography/Ai/automated.text.
  • 34.Pang B, Lee L., Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2008. Vol.2. [Google Scholar]
  • 35.Tan PN, Steinbach M, Kumar V. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7; 2006. [Google Scholar]
  • 36.Gokulakrishnan B., Priyanthan P., Ragavan T., Prasath N., & Perera A. (2012). Opinion mining and sentiment analysis on a Twitter data stream. International Conference on Advances in ICT for Emerging Regions, ICTer 2012—Conference Proceedings, 182–188.
  • 37.Mazza A, Punzo A. DBKGrad: An R Package for Mortality Rates Graduation by Fixed and Adaptive Discrete Beta Kernel Techniques. 2012; Available from http://arxiv.org/abs/1211.1184 [Google Scholar]
  • 38.Pedregosa et al. (2011) Scikit-learn: Machine Learning in Python. JMLR 12, pp. 2825–2830. [Google Scholar]
  • 39.Giambi C, Fabiani M, D’Ancona F, Ferrara L, Fiacchini D, Gallo T, et al. Parental vaccine hesitancy in Italy–Results from a national survey. Vaccine. 2018;36(6):779–87. doi: 10.1016/j.vaccine.2017.12.074 [DOI] [PubMed] [Google Scholar]
  • 40.Salathé M, Khandelwal S. Assessing vaccination sentiments with online social media: Implications for infectious disease dynamics and control. PLoS Comput Biol. 2011;7(10). doi: 10.1371/journal.pcbi.1002199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Larson HJ, Smith DMD, Paterson P, Cumming M, Eckersberger E, Freifeld CC, et al. Measuring vaccine confidence: Analysis of data obtained by a media surveillance system used to analyze public concerns about vaccines. Lancet Infect Dis [Internet]. 2013;13(7):606–13. Available from: 10.1016/S1473-3099(13)70108-7 [DOI] [PubMed] [Google Scholar]
  • 42.Garimella K., De Francisci Morales G., Gionis A., Mathioudakis M. (2017). The Effect of Collective Attention on Controversial Debates on Social Media. [Google Scholar]
  • 43.Wang J. (2008). AIDS denialism and “The humanization of the African.” Race & Class, 49(3), 1–18. [Google Scholar]
  • 44.Lachlan K. A., Spence P. R., Edwards A., Reno K. M., & Edwards C. (2014). If you are quick enough, i will think about it: Information speed and trust in public health organizations. Computers in Human Behavior, 33, 377–380. [Google Scholar]
  • 45.Bello-Orgaz G, Hernandez-Castro J, Camacho D. Detecting discussion communities on vaccination in Twitter. Futur Gener Comput Syst. 2017;66:125–36. [Google Scholar]
  • 46.Karafillakis E, Larson HJ. The benefit of the doubt or doubts over benefits? A systematic literature review of perceived risks of vaccines in European populations. Vaccine [Internet]. 2017;35(37):4840–50. Available from: 10.1016/j.vaccine.2017.07.061 [DOI] [PubMed] [Google Scholar]
  • 47.Löfstedt RE. Risk management in post-trust societies. Risk Manag Post-Trust Soc. 2005;1–165. [Google Scholar]
  • 48.Keyes Ralph. The post-truth era. Dishonesty and deception in contemporary life. Macmillan 2004. [Google Scholar]

Decision Letter 0

Alexandre Bovet

22 Apr 2020

PONE-D-20-01302

Evidence of distrust and disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy.

PLOS ONE

Dear Ms. Ajovalasit,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The reviewers raised major concerns about the manuscript. We invite you to thoroughly review your manuscript by clearly addressing each issue raised by the two reviewers, including the issues about the clarity of manuscript, the methodology used and the soundness of the results and discussion.

We hope that the reviewer reports will allow you to submit a considerably improved version of the manuscript for further consideration.

We would appreciate receiving your revised manuscript by May 29 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript',

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Alexandre Bovet, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please include additional information about your dataset and ensure that you have included a statement specifying whether the collection method complied with the terms and conditions for the website.

3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

4. Please ensure that you refer to Figure 2 and 3 in your text as, if accepted, production will need this reference to link the reader to the figure.

5. Please include a caption for figure 3.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors analyzed twitter data involving vaccination-related Italian-language tweets from 2018. They randomly selected 15,000 tweets, which were then manually labelled by 15 students. They found that most of these tweets were composed by “serial twitterers,” with tweets tending to peak around main political events related to vaccination in the Italian context. The majority of these tweets (75%) showed favorable opinion towards vaccination, 14% were undecided, and 11% were unfavorable. The authors argue that there was evidence of “disorientation” among the public.

Overall, the manuscript as it currently stands is difficult to follow. There are many grammatical mistakes throughout the paper, which is distracting. One example comes from the title of a section “Matherials and Methods” (line 97). Many sentences are too long and difficult to follow. The whole manuscript would benefit from a careful reread from the authors and from asking a native English speaker to read over the text to point to language-related issues.

The introduction overall was quite good and provided relevant background for understanding terms related to vaccine hesitancy, vaccination discussions in online formats, and the Italian context. It would have been useful to have more background about which political parties specifically were involved in these political developments in Italy.

The objectives stated in lines 88-93 do not match those provided in the abstract. I have copied and pasted them below. In the abstract, there are 3. In the manuscript, there are 4. These objectives could be tightened up and clarified further. For example, what do the authors mean by “the trend of communication on vaccines on online social media”? This is a broad statement, it is not specific to Italy, and the authors do not consider social media sites outside of Twitter in their analysis. Are authors seeking to establish the prevalence of vaccine hesitancy on Twitter as a proxy for vaccine hesitancy among the actual population in Italy? In my view, these objectives/aims merit further clarification. This would help them better structure the results section.

Objectives and Methods. By a sentiment analysis on tweets posted in Italian during 2018, we attempted at (i) characterising the temporal flow of communication on vaccines over Twitter and underlying triggering events, (ii) evaluating the usefulness of Twitter data for estimating vaccination parameters, and (iii) investigating whether the contrasting announcements at the highest political level might have originated disorientation amongst the public.

(i) describe the trend of communication on vaccines on online social media, (ii) evaluate the potential usefulness of current Twitter data to estimate key epidemiological parameters such as e.g., the hesitant proportion in the population, (iii) evaluating the effectiveness of institutional communication as a tool to contrast misinformation, and (iv) showing evidence that the recent prolonged phase of contrasting announcements at the highest political level on a sensible topic such as mass immunization might have originated a distrust potentially seeding future coverage decline.

I found it difficult to follow the results section because there was not a clear structure in place. It might be helpful for the authors to provide a couple sentences in the introduction and results section that give the reader a sense of knowing what the paper is covering and how it is organized. I was surprised that the concept of “disorientation” was explained in the results section (line 185). If this is an important concept for the authors' analysis, it would have been helpful to have an explanation of it in the introduction.

Some general comments:

I would like to know how the authors determined what was “out of context” (line 148). It would be helpful if the authors provided an example or two.

The authors use the term “serial twitterers.” Would “serial tweeters” be more appropriate? How frequently were these users tweeting? The authors state that they tweet about essentially everything. This is quite vague.

In line 153, it would be more helpful for the reader if the authors state: favorable (F), contrary (c), undecided (U), etc. instead of providing a list of concepts and then using their abbreviations later.

In line 157, the authors’ explanation of “hesitants” left me confused. What are these two sentences about? This merits clarification and more information.

The authors use the term “misinformation” quite a bit. It would be useful to know if they actually examined if the tweets they examined included misinformation. In other words, did they consider that tweets showing unfavorable opinions about vaccination were examples of misinformation?

In line 277, the authors assert that a precondition to establishing trust would be to have more frequent presence of public health authorities in online media. I find such a statement to be quite strong and needs to be backed up with additional data. It might be helpful, but I’m doubtful that the Italian minister of health simply tweeting more about vaccination is a precondition for establishing trust in the public.

Reviewer #2: (also uploaded as PDF)

Review for manuscript "Evidence of distrust and disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy."

In this work the authors are analyzing vaccination-related data retrieved from Twitter from 2018 in Italian language and put into the political context during this time. A subset of the data was annotated into 4 categories, those being "favorable", "contrary", "undecided" and "out of context" and a Machine Learning classifier was trained on this data. Predicted data by this classifier was subsequently analyzed, particularly with respect to the absolute counts in each category and their temporal trends. Overall, most tweets were categorized "out of context". Among the relevant category, most tweets were determined to be "favorable" and the rest was subdivided into the categories "contrary" and "undecided". Polynomial fitting was applied to the sentiment trends showing a decline of the "favorable" group towards the end of the year, as well as a slight increase in "contrary" and especially "undecided". The authors then discuss a possible relation between the change of the government to the way vaccination is discussed on Twitter. One of the general conclusions is an increase in "disorientation" due to the ambiguous announcements made by the new government.

The work proposed is interesting and focuses on a relevant topic. However, there is a mismatch between the presented results and the discussion section. The conclusion of there being a direct link between the change of government and the decline in vaccination sentiment and increase in "disorientation" needs to be discussed more clearly. There are several parts of the paper which are unclear and need to be rewritten. I therefore suggest a major revision of this manuscript before publication.

Note that the comments are not given in a specific order. Also, I have not corrected any grammatical mistakes.

Methods

• (minor) The authors mention a total of 4 classes ("favorable", "contrary", "undecided" and "out of context"). It is unclear whether the algorithm was trained on 4 classes or only on 3 classes. If the "out of context" class was simply removed then it means that the predicted data will come from a different underlying distribution than the training data (which could be problematic and should at least be mentioned).

• (minor) Precision, recall and F1 were given for the classifiers. It would be helpful to know the F1 scores for each subclass. Furthermore, it should be mentioned whether these scores are micro or macro averages.

• (minor) Lines 139-144 need better explanation and phrasing. What test was used to determine the degree of freedom for the smoothing? What kernel smoothing procedure?

• (minor) It is not mentioned whether the data was collected through the Twitter API (if so, which endpoint was used?) or via the website. If data was collected via the website it should be written (potentially in the discussion) that the search is not exhaustive and the returned data is filtered by Twitter in terms of relevance/trendingness, which might bias the analysis.

• (minor) It would be very much appreciated if the tweet IDs were published together with the code. This would allow other researchers to reproduce these results. Additionally, given the effort in collecting the annotation data, releasing this data would increase the impact of the work significantly.

Results

• (minor) Figure 1 lacks y-axis labels and legend for the color bar

• (major) It is unclear how the "disorientation" was measured and how it relates to the observed signal. If disorientation is simply a result of the up-and-down trend then one could e.g. plot the variance of the signal over time and see if it increases "sharply" when the government changed. The term "disorientation" is only mentioned in the abstract, title and the beginning of the results section but not in the discussion.

Discussion

• (minor) "After removing noise, the population appeared to be mostly composed by “serial- twitterers” i.e., people tweeting about everything “on top”, including also vaccines, regardless of their awareness of the topic." (Lines 234-236)

What do the authors mean by "serial-twitterers", a group of normal twitter users which also tweet about other things than vaccines? If so, how do the authors know since not all tweets from the timelines of these users were collected? It is also not clear what the term "on top" means in this context. I would recommend to not use the term "serial twitterer" and instead describe this group in another way. Also authors should provide some sort of quantitative reasoning/support for how they allocated users to this group.

• (major) Lines 247-258 discuss how the MMR vaccine coverage relates to the sentiment observed. This should be either moved to the results section or (as the authors state) if not part of the main message of this manuscript it should not be discussed at all. The question of correlation between sentiment and vaccine coverage is an important one, but should be analyzed in more detail and by contrasting e.g. with data from opinion polls before a clear link can be made between Twitter sentiment and vaccination coverage. There is also important literature on this topic which would need to be included in this type of analysis.

"As for the limitations of this work, the main critical point lies in the general relevance of opinion-based information from OSM for predicting trends of vaccine uptake." (Lines 295-296)

The authors mention this as the main limitation of this study. However, as mentioned above vaccine uptake was not properly studied. Therefore, this caveat doesn't apply here.

• (minor) "A key problem is the appropriate modulation of the “language style” to be used by public health communication on online social media." (Lines 280-281)

Since no analysis on language style was performed this should be either left out or rephrased. If kept, authors should include appropriate literature on this topic.

• (minor) "We plan to deep(en) this in future research [...]", (Line 281)

The mentioned research sounds important, but a bit misplaced in the middle of the discussion of the results. Future research should be summarized in a general sense (what is the future research needed to be done by the community as a whole?) at the end and discussed together with caveats.

• (major) "A specific search was therefore carried out over the set of retained tweets by further keywords specifically targeting this situation [...]" (Lines 120-121)

It is unclear which fields of the tweets were searched (user description, text, etc.)? It is also unclear how (if a tweet matched any of the provided keywords) this would directly identify said tweeter as a parent with children in the age of childhood immunization. Later in the discussion it is mentioned that the number of tweets matching the criteria was really small (line 244), therefore it was not analysed further. Although I appreciate the inclusion of negative results, it would be better to move most of it to the results section. Furthermore, as this approach was not successful what was the reason for this? Have the authors tried to expand the search to other keywords? Was the total body of tweets not large enough? The discussion should also involve issues related to identifying demographic subgroups by simple keyword matching (which is obviously problematic).

• (major) "In relation to the growing literature on sentiment analyses and vaccines this is, to the best of our knowledge, the first work on the subject documenting a clear medium-term distrust effect towards immunization arising from persistently ambiguous positions at the highest political level." (lines 291-293)

"Resulting from" is a strong statement, implying direct causation just by observing minor correlations (R2 values are relatively low). This seems to be the main hypothesis of this work but it is not properly discussed. One possible way to discuss causality would be using the Bradford Hill criteria (strength, consistency, temporality, etc.) Some of these criteria might match better, others worse.

• (major) Lines 303-309 are contrasting Twitter to Facebook data and the observation of echo chambers. No Facebook data was analyzed in this study, hence I don't see the need to contrast the collected data with Facebook data. Furthermore, no analysis was conducted with regards to the effects of echo chambers. It is important to address the issues of Twitter data, but it should be limited with respect to the analysis & conclusions in the manuscript.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Michael J. Deml

Reviewer #2: Yes: Martin Müller

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: review.pdf

PLoS One. 2021 Jul 9;16(7):e0253569. doi: 10.1371/journal.pone.0253569.r002

Author response to Decision Letter 0


6 Aug 2020

Response to the comment of reviewer 1.

• The authors analyzed twitter data involving vaccination-related Italian-language tweets from 2018. They randomly selected 15,000 tweets, which were then manually labelled by 15 students. They found that most of these tweets were composed by "serial twitterers," with tweets tending to peak around main political events related to vaccination in the Italian context. The majority of these tweets (75%) showed favorable opinion towards vaccination, 14% were undecided, and 11% were unfavorable. The authors argue that there was evidence of "disorientation" among the public.

We thank the reviewer for her/his careful reading. We have done our best to improve the paper by strictly following the reviewer points. In this response, we have reported all the reviewer points followed by our responses encapsulated into text boxes. Details on how the referee’s points were incorporated into either the manuscript or the Supplementary Materials are also reported.

• Overall, the manuscript as it currently stands is difficult to follow. There are many grammatical mistakes throughout the paper, which is distracting. One example comes from the title of a section "Matherials and Methods" (line 97). Many sentences are too long and difficult to follow. The whole manuscript would benefit from a careful reread from the authors and from asking a native English speaker to read over the text to point to language-related issues.

Many thanks for the constructive criticism. We have revised the entire manuscript to correct mistakes and to split, or simplify, long or complicated sentences. We have made an effort to sharply improve the English shape. The misprint in the title of the “Materials and Methods” section has been corrected. Additionally, we have attempted to make the titles of the subsections more informative.

The introduction overall was quite good and provided relevant background for understanding terms related to vaccine hesitancy, vaccination discussions in online formats, and the Italian context. It would have been useful to have more background about which political parties specifically were involved in these political developments in Italy.

We thank the reviewer for the appreciation. In the revised draft (Lines 61-65) we have reported a few more details about the political position of the parties involved in the “hot” political debate on vaccination in Italy during 2018 (and related policies).

• The objectives stated in lines 88-93 do not match those provided in the abstract. I have copied and pasted them below. In the abstract, there are 3. In the manuscript, there are 4. These objectives could be tightened up and clarified further. For example, what do the authors mean by "the trend of communication on vaccines on online social media"? This is a broad statement, it is not specific to Italy, and the authors do not consider social media sites outside of Twitter in their analysis. Are authors seeking to establish the prevalence of vaccine hesitancy on Twitter as a proxy for vaccine hesitancy among the actual population in Italy? In my view, these objectives/aims merit further clarification. This would help them better structure the results section.

“Objectives and Methods. By a sentiment analysis on tweets posted in Italian during 2018, we attempted at (i) characterising the temporal flow of communication on vaccines over Twitter and underlying triggering events, (ii) evaluating the usefulness of Twitter data for estimating vaccination parameters, and (iii) investigating whether the contrasting announcements at the highest political level might have originated disorientation amongst the public.

(i) describe the trend of communication on vaccines on online social media, (ii) evaluate the potential usefulness of current Twitter data to estimate key epidemiological parameters such as e.g., the hesitant proportion in the population, (iii) evaluating the effectiveness of institutional communication as a tool to contrast misinformation, and (iv) showing evidence that the recent prolonged phase of contrasting announcements at the highest political level on a sensible topic such as mass immunization might have originated a distrust potentially seeding future coverage decline.”

We apologise for the presence of inconsistencies in the exposition. We have amended both the abstract (Lines >24) and the main text in order to align the number of objectives throughout the entire manuscript (Lines >92) by dropping objective (iii) “evaluating the effectiveness of institutional communication as a tool to contrast misinformation, “ We have also better clarified that Italy during 2018 is the chosen spatio-temporal context of the analysis in the manuscript. In relation to the sentence "the trend of communication on vaccines on online social media", we acknowledge it was vague since the manuscript focused on the “the trend of communication on vaccines on Twitter in Italy during 2018". Consequently, we have rewritten the corresponding sentences in the abstract and introduction."

As for the referee question on whether we were seeking to establish the prevalence of vaccine hesitancy on Twitter as a proxy for vaccine hesitancy among the actual population in Italy, our answer is somewhat articulated.

As a rule, we can hardly take our twitter evaluations to represent a statistical estimate of the hesitating proportion in Italy. Indeed, as explained in the subsection “True hesitant” of the Results section, the proportion of tweeters actually involved in immunization decision (in an identifiable manner) among the population of people tweeting about the broad subject of vaccines and immunization during 2018 in Italy, was negligible. This was surprising to us if one considers the large age-band involved in compulsory immunizations (from 0 to 15 years of age) and therefore the (possibly large) number of parents’ cohorts involved. This might be due to the fact parents tend to avoid using Twitter to specifically speak about their children. On the other hand, the fact that many people posted tweets with generic contents on immunization is suggestive of the fact that, in the period considered, the topic of immunization had become a topic of general interest amongst the general public opinion, to the point that it was used by politicians for purposes of political consensus rather than aimed at the general interest.

From this viewpoint, we wanted therefore to suggest that in such situations Twitter might act as a sort of large scale “echo chamber” (ref Cinelli, et al (2020) Echo Chambers on Social media: A comparative Analysis) eventually generating social pressures potentially harmful for vaccine uptake (as reported in the Discussion).

In the revised Discussion we have made an effort to better expose our viewpoint.

• I found it difficult to follow the results section because there was not a clear structure in place. It might be helpful for the authors to provide a couple sentences in the introduction and results section that give the reader a sense of knowing what the paper is covering and how it is organized. I was surprised that the concept of "disorientation" was explained in the results section (line 185). If this is an important concept for the authors' analysis, it would have been helpful to have an explanation of it in the introduction.

Many thanks for the important point. We have made an effort to improve the structure of the manuscript along the indications of the reviewer. In particular, as suggested, we have made an effort to give more structure to the “Materials and Methods” and “Results” sections. First of all, we have systematically created, in these two sections, parallel subsections dealing with the same topics, which should much improve readability. Moreover, we added a number of sentences to better guide the reader throughout the paper.

We apologise for having been loose in the presentation of the concept of “disorientation” (as also remarked by another referee), which clearly is one of the key concepts of the manuscript, and made an effort to improve the manuscript on this point.

Motivated by the referee remark, during the revision of the manuscript we have further investigated the literature on social media looking for definitions that could apply to the concept of “disorientation” we had in mind and that we could apply to our investigation, but we weren’t successful.

Therefore, in the revised “Material and Methods” section we aimed at making our twofold definition of disorientation clearer and have re-organized the manuscript by paying appropriate focus on this concept. In particular, we distinguished carefully between (i) “short-term” disorientation i.e., a state in which people keep changing suddenly and often their opinion on the debated subject possibly as a consequence of being overwhelmed by multiple contrasting information (Lines > 150), and (ii) “longer-term disorientation” (Lines>196) i.e., longer-term trends in opinions caused by persistent ambiguous communication at the highest political level.

In particular, to detect short-term disorientation, we proposed three statistical tests based on the variability of polarity opinions: (i) a general multinomial test, (ii) a running multinomial test, and (iii) a running variance test. These tests are presented at Lines >174 of the revised manuscript. As regards longer-term disorientation, which was already discussed in the original draft, we have made an effort to improve the readability.

Consistently, new results and figures have been added in the Results section to present our new findings on the characterization of short-term disorientation.

• Some general comments:

• I would like to know how the authors determined what was "out of context" (line 148). It would be helpful if the authors provided an example or two.

In the revised manuscript, in subsection “Data Extraction, transformation and cleaning” (line 106), we have provided a more accurate description of the procedure used to separate the relevant tweets from those we classified as out-of-context. In subsection “Tweets Classification, sentiment analysis, and training set” (line 118) , we added further details on the out-of context category. Finally, in the online appendix, we have report a few – duly anonymised – tweets for each category.

• The authors use the term "serial twitterers." Would "serial tweeters" be more appropriate? How frequently were these users tweeting? The authors state that they tweet about essentially everything. This is quite vague.

Definitely, the wording "serial tweeters" was more appropriate. However, as also remarked by another referee, in the revised manuscript we preferred to avoid the use of this term due to some possible ambiguities, and decided instead to report a more detailed description of the characteristics of tweeters, which is reported in the subsection “Tweeters” of the Results section.

First, we have made an analysis on the concentration of the distribution of tweets among tweeters and we found that 1% (5%) of all users tweeted the 30% (50%) of the overall tweet dataset. Moreover, in the online appendix we have reported a Lorenz curve to provide an overall view of the phenomenon.

Finally, to deep the point suggested by the reviewer, we added details on the behaviour of the top 40 users. We found that the majority of them tweeted about everything, especially on debates of a highly polarized nature, most often regardless of having or not an appropriate awareness or background of the topic. Thus, we felt that tweeting about vaccines, seemed to be more representative of this social-hyper activism due to the polarized nature of the subject – rather than by fully awareness of the debate.

• In line 153, it would be more helpful for the reader if the authors state: favorable (F), contrary (c), undecided (U), etc. instead of providing a list of concepts and then using their abbreviations later.

The point was fixed in the subsection “Tweets Classification, sentiment analysis, and training set”.

• In line 157, the authors' explanation of "hesitants" left me confused. What are these two sentences about? This merits clarification and more information.

We apologies. The original sentence (Line117 of the original draft) was related to the agreed concept of hesitancy in relation to childhood immunization, which was explained at the beginning of the Introduction. This definition clearly applies only to parents of children involved in actual immunization decisions, not to everyone in the general population (and therefore, it possibly applies only to a subset of tweeters). This was explained, but possibly too briefly, at lines 117-120 of the Methods of the original manuscript (and the sentence at line 157 cited by the reviewer was based on the idea of seeking to exactly identify tweets from parents actually involved in vaccination decisions by appropriate keywords).

We have therefore made an effort to clarify the point by amending both parts of the manuscript. In particular the amended subsection ““True” hesitant parents” should now be much clearer to follow.

• The authors use the term "misinformation" quite a bit. It would be useful to know if they actually examined if the tweets they examined included misinformation. In other words, did they consider that tweets showing unfavorable opinions about vaccination were examples of misinformation?

This issue is an important one. We apologise if we, unwillingly, abused with theterm “misinformation”. Actually our answer is “no” i.e.,, we deliberately didn’t carry out any analysis aimed to seek whether some tweets (particularly by those unfavourable to vaccination) included misinformation, or were sources of misinformation, because our main goal was to develop a polarity analysis by studying the proportions of opinions (favourable, contrary, undecided) and whether they showed evidence of disorientation according to the definitions provided above (where obviously misinformation can trigger disorientation) regardless of the specific content and of the information sources used (or shared) by tweeters.

• In line 277, the authors assert that a precondition to establishing trust would be to have more frequent presence of public health authorities in online media. I find such a statement to be quite strong and needs to be backed up with additional data. It might be helpful, but I'm doubtful that the Italian minister of health simply tweeting more about vaccination is a precondition for establishing trust in the public.

Many thanks. The reviewer is correct on the fact that “mere quantitative presence” is not a precondition to generate trust in the public.

However, in our manuscript we only intended to pinpoint that the traditional lack of adequate communication between Italian public health institutions and citizens, dating back to long before the digital era, seemed to extend until the more recent epoch of diffusion of online social media (as reported in reference 23). This is hardly a productive attitude on a sensible subject as the one of immunization, where national public health institutions should put critical effort to contrast online misinformation and infodemics (see Lachlan et al,2014; If you are quick enough, I will think about it: information speed and trust in public health organizations) triggering hesitancy. Therefore, in the manuscript we only wanted to report simple instances of best practices that public health institution might adopt on online social media.

We have amended the manuscript (Lines 352-361), in order to make our point clear and avoiding misunderstanding in this sense.

• Response to reviewer 2.

• Review for manuscript "Evidence of distrust and disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy."

• In this work the authors are analyzing vaccination-related data retrieved from Twitter from 2018 in Italian language and put into the political context during this time. A subset of the data was annotated into 4 categories, those being "favorable", "contrary", "undecided" and "out of context" and a Machine Learning classifier was trained on this data. Predicted data by this classifier was subsequently analyzed, particularly with respect to the absolute counts in each category and their temporal trends. Overall, most tweets were categorized "out of context". Among the relevant category, most tweets were determined to be "favorable" and the rest was subdivided into the categories "contrary" and "undecided". Polynomial fitting was applied to the sentiment trends showing a decline of the "favorable" group towards the end of the year, as well as a slight increase in "contrary" and especially "undecided". The authors then discuss a possible relation between the change of the government to the way vaccination is discussed on Twitter. One of the general conclusions is an increase in "disorientation" due to the ambiguous announcements made by the new government.

The work proposed is interesting and focuses on a relevant topic.

We thank the reviewer for her/his careful reading and for the appreciation of our work, as well as for the very useful and constructive criticism. We have done our best to improve the paper by strictly following the reviewer points. In this response, we have reported all the reviewer points followed by our responses encapsulated into text boxes. Details on how the referee’s points were incorporated into either the manuscript or the Supplementary Materials are also reported.

However, there is a mismatch between the presented results and the discussion section. The conclusion of there being a direct link between the change of government and the decline in vaccination sentiment and increase in "disorientation" needs to be discussed more clearly. There are several parts of the paper which are unclear and need to be rewritten. I therefore suggest a major revision of this manuscript before publication. Note that the comments are not given in a specific order. Also, I have not corrected any grammatical mistakes.

In the revised manuscript, we have made a serious effort to clarify the point by rewriting both the Introduction and the Discussion. We have removed sentences suggesting a direct causal link between government changes and the decline in the proportion favourable to vaccination: we clearly wanted to only pinpoint the existence of an association which however deserves to be deepened, in view of its potentially harmful implications for vaccine coverage. We proposed definitions for the concept of “disorientation” (in the Methods section) and improved the related analyses (please, see later responses). Additionally, we have thoroughly revised the entire manuscript and made an effort to increase the clarity of the exposition.

Methods

• (minor) The authors mention a total of 4 classes ("favorable", "contrary", "undecided" and "out of context"). It is unclear whether the algorithm was trained on 4 classes or only on 3 classes. If the "out of context" class was simply removed then it means that the predicted data will come from a different underlying distribution than the training data (which could be problematic and should at least be mentioned).

We actually trained the algorithm with 4 classes because during early exploratory analyses we soon realised from inspection of tweets that there was a disproportion of tweets belonging to the category that eventually we labelled as "out of context". For simplicity, in the Results section we reported only figures on the three “main” categories F-C-U. In the revised manuscript, we have clarified the point by making several amendments in the subsection “Tweets Classification, sentiment analysis, and training set” of the “Material and Methods” section.

• (minor) Precision, recall and F1 were given for the classifiers. It would be helpful to know the F1 scores for each subclass. Furthermore, it should be mentioned whether these scores are micro or macro averages.

In the online appendix of the revised manuscript we have reported the classification metrics for each classifier.

We used the 10-fold cross validation on 80% of the labelled sample and we selected the best classifier according to the validation with the remaining 20%. We looked at the classification result where all results of the validation set were in favour of the SVM (see revised appendix).

Since different classifiers yielded in some cases quite similar results, we used as discriminant between classifiers the F1-weighted score to select the best one. This score calculates first the metrics for each class, and then it does averages by each support (which is the number of true instances per class). Note that “Weighted” alters macro average to account for label imbalance, i.e. calculates the average assigning each class a weight based on the support, this so the F-Score is not between the precision and the recall.

• (minor) Lines 139-144 need better explanation and phrasing. What test was used to determine the degree of freedom for the smoothing? What kernel smoothing procedure?

We used a discrete beta kernel-based smoothing procedure proposed by Mazza and Punzo (2014) in

order to overcome the problem of boundary bias, commonly arising from the use of symmetric kernels. The support of the beta kernel function, in fact, can match our time interval so that, when smoothing is made near boundaries, it allows avoiding the allocation of weight outside the support. The smoothing bandwidth parameter has been chosen using cross-validation.

This part was amended and moved to the subsection on “Longer-term disorientation”.

• (minor) It is not mentioned whether the data was collected through the Twitter API (if so, which endpoint was used?) or via the website. If data was collected via the website it should be written (potentially in the discussion) that the search is not exhaustive and the returned data is filtered by Twitter in terms of relevance/trendingness, which might bias the analysis.

We didn’t use APIs. Data were collected via the website by a scraper using the features of advanced search. Scraping was performed in the best possible manner we could achieve to avoid filtering (in particular, filtering resulting from our accounts).

In the revised manuscript we have rephrased the corresponding subsection on “Data extraction” in the “Material and Methods” to adequately clarify the point.

• (minor) It would be very much appreciated if the tweet IDs were published together with the code. This would allow other researchers to reproduce these results. Additionally, given the effort in collecting the annotation data, releasing this data would increase the impact of the work significantly.

That would be an important point. Tweets IDs are now attached as external file provided with the appendix.

To retrieve the original dataset we used the following git repository-user guide. “https://github.com/Jefferson-Henrique/GetOldTweets-python”.

Tweets were downloaded within a monthly search and then merged to construct the whole dataset, by removing duplicated on ID key a and removed within the non italian tweets with a probabilistic approach.

Results

• (minor) Figure 1 lacks y-axis labels and legend for the color bar

The point has been fixed. In particular, in the revised manuscript we removed the bar.

• (major) It is unclear how the "disorientation" was measured and how it relates to the observed signal. If disorientation is simply a result of the up-and-down trend then one could e.g. plot the variance of the signal over time and see if it increases "sharply" when the government changed.

• (Major)The term "disorientation" is only mentioned in the abstract, title and the beginning of the results section but not in the discussion.

We apologise for having been loose in the presentation of the concept of “disorientation” (as also remarked by another referee), which clearly is one of the key concepts of the manuscript, and made an effort to improve the manuscript on this topic.

During the revision of the manuscript we have further investigated the literature on social media looking for definitions that could apply to the concept of “disorientation” we had in mind and that we could apply to our investigation but we weren’t successful.

Therefore, in the revised “Material and Methods” section, we aimed at making our twofold definition of disorientation clearer and have re-organized the manuscript by paying appropriate focus on this concept. In particular, we distinguished carefully between (i) “short-term” disorientation i.e., defined as “a state in which people keep changing suddenly and often their opinion on the debated subject possibly as a consequence of being overwhelmed by multiple contrasting information” (Lines > 150), and (ii) “longer-term disorientation” (Lines>196) i.e., longer-term trends in opinions caused by persistent ambiguous communication at the (highest) political level.

In particular, to detect short-term disorientation, we proposed three statistical tests based on the variability of polarity opinions: (i) a general multinomial test, (ii) a running multinomial test, and (iii) a running variance test, following the reviewer’s indication. These tests are presented at Lines >175 of the revised manuscript.

As regards longer-term disorientation, which was already discussed in the original draft, we have made an effort to improve the readability of the text.

Consistently, new results and figures have been added in the Results section to present our new findings on the characterization of short-term disorientation.

Discussion

• (minor) "After removing noise, the population appeared to be mostly composed by "serial- twitterers" i.e., people tweeting about everything "on top", including also vaccines, regardless of their awareness of the topic." (Lines 234-236)

What do the authors mean by "serial-twitterers", a group of normal twitter users which also tweet about other things than vaccines? If so, how do the authors know since not all tweets from the timelines of these users were collected? It is also not clear what the term "on top" means in this context. I would recommend to not use the term "serial twitterer" and instead describe this group in another way. Also authors should provide some sort of quantitative reasoning/support for how they allocated users to this group.

Following the reviewer’s suggestion, in the revised manuscript we preferred to avoid the use of the term “serial tweeters” and decided instead to report a more detailed description of the characteristics of tweeters, which is reported in the new subsection “Tweeters” of the Results section.

First, we have made an analysis on the concentration of the distribution of the number of tweets by tweeters and we found that 1% (5%) of all users tweeted the 30% (50%) of the overall tweet dataset. Moreover, in the online appendix we have reported a Lorenz curve to provide an overall view of the phenomenon.

Finally, to deep the point suggested by the reviewer, we checked out in detail for the behaviour of the top 40 users. We found that the majority of them tweeted about everything, especially on debates of a highly polarized nature, most often regardless of having or not an appropriate awareness or background of the topic. Thus, we felt that tweeting about vaccines, seemed to be more representative of this social-hyper activism due to the polarization nature of the subject – rather than by fully awareness of the debate.

• (major) Lines 247-258 discuss how the MMR vaccine coverage relates to the sentiment observed. This should be either moved to the results section or (as the authors state) if not part of the main message of this manuscript it should not be discussed at all. The question of correlation between sentiment and vaccine coverage is an important one, but should be analyzed in more detail and by contrasting e.g. with data from opinion polls before a clear link can be made between Twitter sentiment and vaccination coverage. There is also important literature on this topic which would need to be included in this type of analysis.

To avoid ambiguities, we have removed this part.

"As for the limitations of this work, the main critical point lies in the general relevance of opinion-based information from OSM for predicting trends of vaccine uptake." (Lines 295-296)

The authors mention this as the main limitation of this study. However, as mentioned above vaccine uptake was not properly studied. Therefore, this caveat doesn't apply here.

As for the previous point, to avoid ambiguities, we have removed this part.

• (minor) "A key problem is the appropriate modulation of the "language style" to be used by public health communication on online social media." (Lines 280-281)

Since no analysis on language style was performed this should be either left out or rephrased. If kept, authors should include appropriate literature on this topic.

We followed the reviewer suggestion and we left out this part. (line 343)

• (minor) "We plan to deep(en) this in future research [...]", (Line 281)

The mentioned research sounds important, but a bit misplaced in the middle of the discussion of the results. Future research should be summarized in a general sense (what is the future research needed to be done by the community as a whole?) at the end and discussed together with caveats.

We followed the reviewer suggestion and have moved the reported sentence on future research at the end of manuscript.

• (major) "A specific search was therefore carried out over the set of retained tweets by further keywords specifically targeting this situation [...]" (Lines 120-121)

It is unclear which fields of the tweets were searched (user description, text, etc.)? It is also unclear how (if a tweet matched any of the provided keywords) this would directly identify said tweeter as a parent with children in the age of childhood immunization. Later in the discussion it is mentioned that the number of tweets matching the criteria was really small (line 244), therefore it was not analysed further. Although I appreciate the inclusion of negative results, it (300) would be better to move most of it to the results section. Furthermore, as this approach was not successful what was the reason for this? Have the authors tried to expand the search to other keywords? Was the total body of tweets not large enough? The discussion should also involve issues related to identifying demographic subgroups by simple keyword matching (which is obviously problematic).

We searched the text.

The reason for trying this approach was that we wanted to investigate the “true” hesitant proportion among tweeters, which seemed to us worth investigating. We failed to do so. We conjectured that this failure depended primarily on the fact that parents tend in general (though not necessarily) to avoid to use Twitter for such purposes.

In the revised manuscript we have made an effort to improve the relevant parts of the Methods, Results, and Discussion to cope with the reviewer’s point.

• (major) "In relation to the growing literature on sentiment analyses and vaccines this is, to the best of our knowledge, the first work on the subject documenting a clear medium-term distrust effect towards immunization arising from persistently ambiguous positions at the highest political level." (lines 291-293)

"Resulting from" is a strong statement, implying direct causation just by observing minor correlations (R2 values are relatively low). This seems to be the main hypothesis of this work but it is not properly discussed. One possible way to discuss causality would be using the Bradford Hill criteria (strength, consistency, temporality, etc.) Some of these criteria might match better, others worse.

About the particular issue of the “low value” of R2, it is to be reminded that we were dealing with strongly oscillating data so that simple trend functions unavoidably yield to low R2 values. For this reason, we found quite encouraging the fact that the R2 figure from the parabolic regression was four times higher than the one from the linear regression, suggesting a strong improvement passing from a linear to parabolic trend.

As for the parabolic trend, we clearly only observed an association between opinions towards vaccination on Twitter and the political communication provided by the two governments involved during 2018. Therefore, no causation was proved: we apologise for the wrong wording in the original draft, and have carefully modified the corresponding subsection (“Smoothing and longer-term disorientation”) of the Results section along the referee’s indications.

Thanks for the very important point.

As for Bradford-Hill causation criteria (many thanks for the suggestion!) we thought that “specificity” and “temporality” (and perhaps, “analogy”) were somewhat supportive of our argument, while other criteria were of more difficult application. Therefore, we preferred to give up.

• (major) Lines 303-309 are contrasting Twitter to Facebook data and the observation of echo chambers. No Facebook data was analyzed in this study, hence I don't see the need to contrast the collected data with Facebook data. Furthermore, no analysis was conducted with regards to the effects of echo chambers. It is important to address the issues of Twitter data, but it should be limited with respect to the analysis & conclusions in the manuscript.

We have removed this part.

Attachment

Submitted filename: Response_to_reviewers_and_editor_Ajovalasit_PONE.docx

Decision Letter 1

Alexandre Bovet

16 Sep 2020

PONE-D-20-01302R1

Evidence of disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy.

PLOS ONE

Dear Dr. Ajovalasit,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

In particular, address the issue if the low classification scores shown in Tab 1 of the SI which questions the validity of the results. In a email exchange, reviewer 2 mentionned that he overlooked this issue and wrote "The scores are very low, this should at the very least be mentioned in the caveats. Especially considering that the work builds on the “undecided” category. "

Moreover, the methodology used for collecting, processing and classifying tweets is not explained in sufficient details (see my additional comments below).

Please also address the issues about the clarity of the manuscript raised by Reviewer 1.

Please submit your revised manuscript by Oct 31 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Alexandre Bovet, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

I thank the authors for having addressed issues raised by the two referees, however there are still important issues with the manuscript, the methodology of the manuscript needs to be better explained and the classification scores are low, which need to be addressed.

Please explain clearly, in order to allow your results to be reproduced, the following points:

- how the Twitter scraper you used works and if there is some rate-limiting,

- what exactly the data filtering and cleaning do,

- how many tweets you collected in total and how many remains after filtering,

- how the smoothing works,

- p.11 line 239, define clearly the "surrounding days",

- what features of the tweets (unigrams, bigrams, trigrams, hasthags, mentions, emojis, ... ?) are used for the classification,

- how the cross-validation is done,

- how "polarity" is defined,

- how the tweet aggregation is done.

In general, please add all the clarifications already asked by the reviewers in the main manuscript.

It is not clear if you are aggregating tweets at the user level or not. If not, this is problematic as you mention that 1% of the users posted 30% of the tweets and therefore your results are strongly biased towards the most active users.

Moreover, since you are interested in users that change their opinions over time (disorientation), you could track specific users and measure how the opinions of their tweets change over time this would help you to validate your measure of disorientation.

Please remind the readers of what is the null hypothesis in the figure captions and clearly define each plot lines (blue, green, purple).

Please report the average training scores in the main manuscript.

The training scores are low, in particular for the undecided class upon which the results are built. Please comment on the validity of the results. Could you improve the classification by using a different set of features?

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed many of my initial concerns from the first round of reviews. Regarding the scientific rigor of the paper, the authors have provided interesting results. That said, the presentation of the manuscript continues to be difficult to follow for me. The paper includes many aspects that are not well linked together, in my view. The paper is very busy in the sense that it introduces many different items and does not adequately pull them together to give the reader a sense of why what they did is important.

If the authors wish to make a major point of the paper to define the concept of disorientation, as they state on p. 16, line 328, then I would expect this to be a clearer issue in the introduction section of the paper. If disorientation is an issue that the authors want to better define, the introduction of this concept and related literature should be presented in the introduction section, and not in the methods section (lines 145 and on, where it is first defined).

That said, I do not fully understand why the concept is introduced in the first place. On line 145, the authors state, "To the best of our knowledge, the concept of "disorientation" does not seem to have been well defined in the literature of online social media. Properly defining the concept of "disorientation" can be complicated, e.g., it can be simply a consequence of the lack of adequate information, but also of the over-exposition to information including misinformation."

If the topic has not been well-defined in this literature, it would be helpful for the reader to know if the term has been used at all, and in what papers/articles. To me, this was difficult to read because it sounds like the authors have decided at this point that disorientation was a concept they were interested in, it has not been covered in the literature, but they are going to use it anyways. This is not a problem, per se, but it could be presented in a much easier to follow and coherent fashion.

The authors mentioned having addressed language-related issues and long sentences throughout the paper, but I was able to identify language issues already in the abstract. Lines 25 - 28, "attempted at (i) characterizing...(ii)evaluating..." etc. This should be "attempted TO (i) characterizE...(ii) evaluatE..." etc. Line 37, "critical health topics, as immunization" --> this sentence should include "such" between "topics," and "as." There were other language related issues throughout the manuscript. Line 61: oppositions --> opponents. This sentence is also very long. Line 76. "troughs" --> "through." There were additional grammatical issues, but I have not outlined them all here. The authors again used many long sentences with multiple subordinate clauses. For example, the first paragraph of the discussion section is composed of 2 very long, confusing sentences (Lines 285-292).

The graphics could also be better explained with legends for the colors.

For readability, it would be helpful if the authors took a line-by-line reading to clarify all sentences and shorten them to make the paper easier to follow for the reader.

Reviewer #2: The work has now greatly been improved and all comments have been addressed. A minor comment: Authors may want to increase DPI on the figures (and if jpg was used to use the PNG format instead), in order to avoid blurriness.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Martin Müller

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 9;16(7):e0253569. doi: 10.1371/journal.pone.0253569.r004

Author response to Decision Letter 1


16 Mar 2021

#Reviewer 1

1 The authors have addressed many of my initial concerns from the first round of reviews. Regarding the scientific rigor of the paper, the authors have provided interesting results. That said, the presentation of the manuscript continues to be difficult to follow for me. The paper includes many aspects that are not well linked together, in my view. The paper is very busy in the sense that it introduces many different items and does not adequately pull them together to give the reader a sense of why what they did is important.

1A We thank the reviewer for her/his careful reading of the revised draft, and we apologize for having failed in achieving adequate clarity. In this further revised version of the manuscript, we have done our best to improve the paper by strictly following the reviewer's suggestions in her/his second report. In this response, we have reported all the reviewer points followed by our responses encapsulated into text boxes. Details on how the referee's points were incorporated into either the manuscript or the Supplementary Materials are also reported. The notation "see LXXX" specifies the line of the revised manuscript where the related information can be found.

2 If the authors wish to make a major point of the paper to define the concept of disorientation, as they state on p. 16, line 328, then I would expect this to be a clearer issue in the introduction section of the paper. If disorientation is an issue that the authors want to better define, the Introduction of this concept and related literature should be presented in the introduction section, and not in the methods section (lines 145 and on, where it is first defined).

That said, I do not fully understand why the concept is introduced in the first place. On line 145, the authors state, "To the best of our knowledge, the concept of "disorientation" does not seem to have been well defined in the literature of online social media. Properly defining the concept of "disorientation" can be complicated, e.g., it can be simply a consequence of the lack of adequate information, but also of the over-exposition to information including misinformation."

If the topic has not been well-defined in this literature, it would be helpful for the reader to know if the term has been used at all, and in what papers/articles. To me, this was difficult to read because it sounds like the authors have decided at this point that disorientation was a concept they were interested in, it has not been covered in the literature, but they are going to use it anyways. This is not a problem, per se, but it could be presented in a much easier to follow and coherent fashion.

2A Yes, we definitely believe that the definition of disorientation and its measurement /testing using data on the temporal path of polarity proportions was the key contribution of this work. We, therefore, thank the reviewer for her/his further suggestions about the presentation of the topic (and apologize for the "bad selling" of our idea). In the revised draft, we carefully followed the reviewer's suggestions, in particular:

• We introduced our key ideas about "disorientation" in the Introduction (moving there a part of the contents originally presented in the Methods, suitably revised). Therein we also motivate our choice to introduce a new definition in view of the (to the best of our search on the literature) lack of equivalent concepts in the literature on OSM and other relevant literatures. See main text from L-82

• We left in the "Materials and Methods" the technical part on the statistical methodology used to document the presence of disorientation through appropriate tests on time series of polarity proportions. See main text from L-159

3 The authors mentioned having addressed language-related issues and long sentences throughout the paper, but I was able to identify language issues already in the abstract. Lines 25 - 28, "attempted at (i) characterizing...(ii)evaluating..." etc. This should be "attempted TO (i) characterizE...(ii) evaluatE..." etc. Line 37, "critical health topics, as immunization" --> this sentence should include "such" between "topics," and "as." There were other language related issues throughout the manuscript. Line 61: oppositions --> opponents. This sentence is also very long. Line 76. "troughs" --> "through." There were additional grammatical issues, but I have not outlined them all here. The authors again used many long sentences with multiple subordinate clauses. For example, the first paragraph of the discussion section is composed of 2 very long, confusing sentences (Lines 285-292).

3A We did our best to revise the new version of the manuscript thoroughly. Additionally, all the points raised by the reviewer have been carefully fixed.

4 The graphics could also be better explained with legends for the colors.

4A We added legends for the colours in all the figures using different colours.

5 For readability, it would be helpful if the authors took a line-by-line reading to clarify all sentences and shorten them to make the paper easier to follow for the reader.

5A We did our best to clarify sentences and improve the overall readability.

## Reviewer 2

The work has now greatly been improved and all comments have been addressed.

We thank the reviewer for her/his appreciation of our effort in responding to the reviewer's comments. Below you find our response to your last point.

Additional to this, the Editor mentioned an email exchange where you suggested that "The scores are very low, this should at the very least be mentioned in the caveats. Especially considering that the work builds on the "undecided" category. "

In order to respond to this point, during the preparation of this revision of the manuscript, we have critically re-analyzed the entire research path and carefully re-checked all steps of data preparation and analysis. This permitted us to correct some problems in the data preparation, which allowed us to completely solve the problem of the "low classification scores shown in Tab 1 of the SI", consequently achieving reasonable values for all classes considered. This can be appreciated from a new Table (Table 1) that has been added to the main text (upon request by the Editor) reporting the involved training scores, including average training scores, as requested.

Entering into details, the detected problem in classification was responsible for the excessively large proportion of out-of-context tweets over total data. This, in turn, was yielding to low numbers of tweets for the two less represented categories ("contrary" and "undecided"), mechanically leading to low precision, which is a problem well acknowledged in the literature.

The discovery of the bug allowed to solve all these problems altogether, eventually providing very reasonable results.

1 A minor comment: Authors may want to increase DPI on the figures (and if jpg was used to use the PNG format instead), in order to avoid blurriness.

1A The point has been fixed. High-quality figures were provided by substantially expanding the number of DPI.

Decision Letter 2

Alexandre Bovet

9 Jun 2021

Evidence of disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy.

PONE-D-20-01302R2

Dear Dr. AJOVALASIT,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Alexandre Bovet, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments:

Please take into account the issues raised by of the Reviewer when preparing the final version.

Please also mention the imperfectness of the classification as a limitation of the results in the discussion. The classification scores are only slightly above the ones of a random classifier.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: In this manuscript, the authors analyze the polarity of vaccine relevant tweets during a time period in which there were multiple changes in vaccine policy in Italy that may have shifted opinions on vaccination. Overall, the statistical analysis is clearly described and the results are interesting and relevant.

My only concern is minor and is with the introduction of the concept of disorientation – given the title of the paper I expected the focus to be on disorientation and yet in the abstract it is only briefly mentioned as one of three objectives and there is no clear definition of disorientation (which is not a concept that I was familiar with prior to this manuscript and many readers may not be familiar with). I would suggest adding definitions of disorientation to the abstract to orient readers to the concept as they start reading the manuscript. Second, I would suggest re-emphasizing the definitions of short- and long-term disorientation defined in the introduction in the methods section describing how short and long-term disorientation were detected in the data (starting at p.8, line 159).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Acceptance letter

Alexandre Bovet

25 Jun 2021

PONE-D-20-01302R2

Evidence of disorientation towards immunization on online social media after contrasting political communication on vaccines. Results from an analysis of Twitter data in Italy.

Dear Dr. AJOVALASIT:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Alexandre Bovet

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. The figure represents the real (raw) polarity proportions of favourable (blue), contrary (purple), and undecided (green) with respect to vaccination topic.

    (TIF)

    S1 File. This file contains all the classification reports for all the classifier tested (S1–S5 Tables in S1 File), the keywords adopted to retrieve the tweets (S6 Table in S1 File).

    (DOCX)

    S2 File. This file contains the tweets’ Ids containing the Ids and the class of the tweet.

    (ZIP)

    Attachment

    Submitted filename: review.pdf

    Attachment

    Submitted filename: Response_to_reviewers_and_editor_Ajovalasit_PONE.docx

    Data Availability Statement

    Due to restrictions in the Twitter terms of service (https://twitter.com/en/tos) and the Twitter developer policy (https://developer.twitter.com/en/developer-terms/agreement-and-policy.html) we cannot provide the full text of tweets used in this study. However, for replication purposes, we provide the ID of every tweet used, also containing the manually labeled training set adopted; details on how to fetch tweets given their IDs are provided in https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-lookup. The dataset of the Tweets IDs corresponds to the data available up to January 7 2019. Furthermore, note that any sample of tweets containing the same set of keywords listed in the manuscript and posted over the same time period is likely to yield study findings similar to those reported in the article. We extracted data available at that time from the Twitter public web interface; this data can also be purchased from Twitter via their Historical PowerTrack API http://support.gnip.com/apis/historical_api2.0/. The authors provide set of minumum requirements in form of Tweets IDs.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES