Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Jan 12;17(1):e0261768. doi: 10.1371/journal.pone.0261768

Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics

David A Broniatowski 1,2,*, Daniel Kerchner 3, Fouzia Farooq 4, Xiaolei Huang 5, Amelia M Jamison 6,¤, Mark Dredze 7, Sandra Crouse Quinn 6, John W Ayers 8
Editor: Barbara Guidi9
PMCID: PMC8754324  PMID: 35020727

Abstract

The COVID-19 pandemic brought widespread attention to an “infodemic” of potential health misinformation. This claim has not been assessed based on evidence. We evaluated if health misinformation became more common during the pandemic. We gathered about 325 million posts sharing URLs from Twitter and Facebook during the beginning of the pandemic (March 8-May 1, 2020) compared to the same period in 2019. We relied on source credibility as an accepted proxy for misinformation across this database. Human annotators also coded a subsample of 3000 posts with URLs for misinformation. Posts about COVID-19 were 0.37 times as likely to link to “not credible” sources and 1.13 times more likely to link to “more credible” sources than prior to the pandemic. Posts linking to “not credible” sources were 3.67 times more likely to include misinformation compared to posts from “more credible” sources. Thus, during the earliest stages of the pandemic, when claims of an infodemic emerged, social media contained proportionally less misinformation than expected based on the prior year. Our results suggest that widespread health misinformation is not unique to COVID-19. Rather, it is a systemic feature of online health communication that can adversely impact public health behaviors and must therefore be addressed.

Introduction

On February 15, 2020, the Director General of the World Health Organization declared that the coronavirus disease 2019 pandemic (COVID-19) spurred an “infodemic” of misinformation [1]. This claim quickly became accepted as a matter of fact among government agencies, allied health groups, and the public at-large [210]. For instance, during the past year over 15,000 news reports archived on Google News refer to a COVID-19 “infodemic” in their title and about 5,000 scholarly research reports on Google Scholar refer to an infodemic in the title and/or abstract. Despite this widespread attention, the claim that online content about COVID-19 is more likely to be false than other topics has not been tested.

We seek to characterize the COVID-19 infodemic’s scale and scope in comparison to other health topics. In particular, we focus on the opening stages of the infodemic–March through May, 2020 –when case counts began to increase worldwide, vaccines were not yet available, and concerted collective action–such as social distancing, mask-wearing, and compliance with government lockdowns–was necessary to reduce the rate at which COVID-19 spread. Misinformation during this time period was especially problematic because of its potential to undermine these collective efforts. Our study therefore aims to answer the following question:

  • Were posts about COVID-19 more likely to contain links to misinformation when compared to other health topics?

Beyond the sheer volume of links shared, one might define an “infodemic” by the likelihood that a particular type of post might go viral. Thus, our second question:

  • When it comes to COVID-19, were links containing misinformation more likely to go viral?

To answer these questions, we must rely on a scalable method. One commonly used proxy for misinformation is source credibility. If the infodemic was indeed characterized by false content, one might expect a higher proportion of this content to come from low credibility sources that “lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information” [11]. Thus, our third question:

  • Does content from less credible sources include more misinformation?

Evidence before this study

Prior studies [12] found that low-credibility content was, in fact, rare on Twitter, albeit shared widely within concentrated networks [13]. We only found two studies comparing across multiple social media platforms [13, 14], with both studies concluding that the prevalence of low-credibility content varied significantly between platforms. None of these studies compared COVID-19 content to other health topics.

To our knowledge, this study is the first to evaluate the claim of an infodemic by comparing COVID-19 content to other health topics. We analyzed hundreds of millions of social media posts to determine if COVID-19 posts pointed to lower-credibility sources compared to other health content.

Materials and methods

Data collection

Data comprised all public posts made to Twitter, and public posts from Facebook from pages (intended to represent brands and celebrities) with more than 100,000 likes, and groups (intended as venues for public conversation) with at least 95,000 members or those based in the US with at least 2,000 members.

COVID-19 tweets

First, we collected English language tweets from Twitter matching keywords pertaining to COVID-19 [15] between March 8, 2020 and May 1, 2020. Next, we compared these to tweets containing keywords pertaining to other health topics [16] for the same dates in 2019.

We obtained COVID-19 tweets using the Social Feed Manager software [17], which collected English-language tweets from the Twitter API’s statuses/filter streaming endpoint (https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter) matching keywords of “#Coronavirus”, “#CoronaOutbreak”, and “#COVID19” posted between March 8, 2020 and May 1, 2020 [15].

Health tweets

We obtained tweets about other health topics using the Twitter Streaming API to collect English-language tweets containing keywords pertaining to generalized health topics posted between March 8, 2019 and May 1, 2019 (keywords are listed in reference [16]).

Facebook data

Next, we collected comparable data from Facebook for the same dates using CrowdTangle [18]–a public insights tool owned and operated by Facebook. Specifically, we collected English-language posts from Facebook Pages matching keywords of:

  • “coronavirus”, “coronaoutbreak”, and “covid19” posted between March 8, 2020 and May 1, 2020, downloaded on June 2–3, 2020.

  • the same health-related keywords used in the health stream posted between March 8, 2019 and May 1, 2019, downloaded on July 13–14, 2020.

Ethics

The data used in this article are from publicly available online sources, the uses of which are deemed exempt by the George Washington University institutional review board (180804).

Credibility categorization

Our analysis draws upon an assumption that is widespread in prior work [1114, 19, 20]: that “the attribution of ‘fakeness’ is … not at the level of the story but at that of the publisher.” [19]. This assumption is attractive because it is scalable, allowing researchers to analyze vast quantities of posts by characterizing their source URLs. We therefore extracted all Uniform Resource Locators (URLs) in each post. We used the “tldextract” Python module [21] to identify each URL’s top-level domain (for example the top-level domain for http://www.example.com/this-is-an-example-article.html is example.com), unshortening links (e.g., "bit.ly/x11234b") if necessary (see Appendix A in S1 File). We grouped these top-level domains into three categories reflecting their overall credibility using a combination of credibility scores from independent sources (NewsGuard; http://www.newsguard.com/, and MediaBiasFactCheck; http://www.MediaBiasFactCheck.com), as follows (see Appendix B in S2 File for details):

More credible

This category contained the most credible sources. Top-level domains were considered “more credible” if they fit into one of the following two categories:

  • Government and Academic Sources, defined by Singh et al. [12], as “high quality health sources”, included official government sources such as a public health agency (e.g., if the top-level domain contained.gov), or academic journals and institutions of higher education (e.g., if the top-level domain contained.edu; see Appendix B in S2 File).

  • Other More Credible Sources, defined by Singh et al. [12], as “traditional media sources”, were given a credibility rating of at least 67% by NewsGuard, or rated as “very high” (coded as 100%) or “high” (80%) on the MediaBiasFactCheck factual reporting scale (NewsGuard and MediaBiasFactCheck scores are strongly correlated, r = 0.81, so we averaged these scores when both were available).

Less credible

Top-level domains were considered “less credible” if they were given a credibility rating between 33% and 67% by NewsGuard, or rated as “mostly factual” (60%) or “mixed” (40%) on the MediaBiasFactCheck factual reporting scale (averaging these when both were available).

Not credible

These sources contained the least credible sources, such as conspiracy-oriented sites, but also government-sponsored sites that are generally considered propagandistic. Top-level domains were considered “not credible” if they:

  • Were given a credibility rating of 33% or less by NewsGuard or rated as “low” (20%) or “very low” (0%) on the MediaBiasFactCheck factual reporting scale.

  • Were rated as a “questionable source” by MediaBiasFactCheck.

Like prior work, [12, 13, 19, 20, 22] our analysis draws upon a widespread simplifying assumption: that “the attribution of ‘fakeness’ is … not at the level of the story but at that of the publisher.” [19]. This assumption is attractive because it is scalable. However, in the interest of evaluating it for health topics, we performed an additional validity check. To determine the content of each credibility category, we developed a codebook (Table 1) to assess the presence of false claims. We generated a stratified sample of 3000 posts by randomly selecting 200 posts from each COVID-19 dataset for each credibility category (More, Less, Not Credible, Unrated) and a set of 200 “in platform” posts (i.e., those linking to Twitter from Twitter or Facebook from Facebook). Three authors (DK, FF, and AMJ) manually labeled batches of 100 posts from each platform each until annotators achieved high interrater reliability (Krippendorff’s α>0.80), which we obtained on the second round (α = 0.81). Disagreements were resolved by majority and ties adjudicated by a fourth author (DAB). The remaining 2400 posts were then split equally between all three annotators. We also generated qualitative descriptions for each credibility category.

Table 1. Codebook for qualitative analysis.
Misinformation Yes = if the message contains egregious falsehoods, conspiracy theories, or misleading use of data related to COVID-19. Also, news reports repeating these claims. Any story that is promoting misinformation. 
No = Anything else. 
Uncertainty Yes = if the message expresses the idea that there is a lot that science still does not know about the coronavirus and/or casts doubt on science and scientists. e.g. “how do we know that masks work?”
No = Anything else
Partisan bias Conservative = expresses viewpoints supporting conservative politics or opposing liberal politics in the United States. Includes key talking points. 
Liberal = expresses viewpoints supporting liberal politics or opposing conservative politics in the United States. Includes key talking points.
Other = expresses a political opinion but not one of the two major US parties. International politics. 
None = No political content. 
Content Area Political = primary purpose of content is political. 
Lifestyle = discusses non-medical aspects of the pandemic including societal impacts. Impacts on life—school closures, cancellations, impacts on work, travel restrictions, long lines. 
Opportunistic = using popularity of COVID to market unrelated or semi-related content. Hashtag hijacking.
Information Sharing = link sharing or news sharing (does not need to be accurate information)
Discussion = first person discussion of experiences or of information. 

Using this codebook, annotators achieved Krippendorff’s α1 = 0.742 on the first set of 100 posts. Annotators achieved α2 = 0.811 on the second set of 100 posts. The remaining 2400 posts were then split uniformly at random between the three annotators.

Virality analysis. We conducted negative binomial regressions for each COVID dataset to predict the number of shares or retweets for each original post (Facebook and Twitter share counts were current as of June 2, 2020, and May 31, 2020, respectively). Following Singh et al. [12], we analyzed high-quality health sources separately from traditional media sources, separating the “more credible” category into two subcategories: “academic and government” and “other more credible” sources. For tweets with multiple URLs, we assigned each tweet with a lower-credibility URL (“not credible” or “less credible”) to its least credible category (see Appendix C in S3 File).

Results

We identified 305,129,859 posts on Twitter, 13,437,700 posts to Facebook pages, and 6,577,307 posts to Facebook groups, containing keywords pertaining to COVID-19 and other health conditions. These posts contained 41,134,540 URLs (excluding in-platform links such as retweets and shares) including 554,378 unique top-level domains. 14,609 (2.6%) of these unique top-level domains were assigned a credibility rating, these top-level domains accounted for 19,294,621 (47%) of all URLs shared. The remaining URLs were unrated (see S1 Fig for raw counts).

Content of credibility categories

We conducted an inductive analysis of each credibility category to validate the use of credibility as a proxy for misinformation (see Table 2 for examples of URLs from each category).

Table 2. Examples for each credibility category.

Link Top-Level Domain Credibility Rating Article Headline
https://www.gov.uk/government/publications/covid-19-guidance-on-social-distancing-and-for-vulnerable-people/guidance-on-social-distancing-for-everyone-in-the-uk-and-protecting-older-people-and-vulnerable-adults gov.uk More Credible (Government or Academic) Guidance on Social Distancing for Everyone in the UK
https://www.thehindu.com/news/national/ramp-up-testing-its-our-only-weapon-against-coronavirus-rahul-gandhi/article31354101.ece thehindu.com More Credible (Other) Coronavirus | Lockdown only a pause button, testing is the only weapon, says Rahul Gandhi
https://www.rappler.com/world/asia-pacific/interview-south-korean-ambassador-han-dong-man-coronavirus rappler.com Less Credible Lessons from South Korea: Transparency, Rapid Testing, No Lockdowns
https://www.afa.net/the-stand/culture/2020/04/shutdowns-were-pointless-all-along/ afa.net Not Credible Shutdowns Were Pointless All Along

To comply with NewsGuard’s terms of service, examples are drawn from websites that have been rated by MediaBiasFactCheck, but not by NewsGuard.

“Not credible” sources contained more misinformation than “more credible” sources

In our stratified random sample of 3000 posts, those with URLs rated as “not credible” were 3.67 (95% CI: 3.50–3.71) times more likely to contain false claims than “more credible” sources (Fig 1). Results were comparable when comparing only those posts labeled as containing news or information (see S2 Fig), and we did not detect a significant difference between high-quality health sources (5.33% misinformation, 95% CI: 0.00–10.42, n = 75) and more credible traditional media sources (5.33% misinformation, 95% CI: 3.41–7.26, n = 525). Neither intermediate “less credible” sources (8.50% misinformation, 95% CI: 6.11–10.89), “unrated” sources (7.33% misinformation, 95% CI: 5.10–9.56), or “in platform” sources (5.17% misinformation, 95% CI: 4.26–6.07) were statistically significantly more likely to contain misinformation when compared to “more credible” sources (5.33% misinformation, 95% CI: 3.41–7.26, n = 600).

Fig 1. Proportions of misinformation for each credibility category.

Fig 1

Error bars represent 95% confidence intervals.

Beyond these misinformation ratings, we calculated the proportions of each content type in our codebook, for each credibility category (S2 Fig). A qualitative description of each category follows.

More credible

These sources primarily shared news and government announcements. Content was rarely political, although users sometimes editorialized, often with liberal bias. Here, misinformation reported on, and potentially amplified, questionable content, such as explaining conspiracy theories or reporting on claims that bleach cures COVID. Some content also expressed uncertainty around COVID-19 science, pointing out limitations of data and models, and acknowledging major questions could not yet be answered.

Less credible

These sources contained a wide variety of content. Non-US politics were common, especially from Indian, Chinese, and European sources. Misinformation in this category included some political conspiracy theories, but also more subtle falsehoods including suggesting COVID is less severe than flu, promoting hydroxychloroquine as a cure, or claiming that “lockdowns” are an overreaction. This category also includes content that inadvertently amplified questionable content while attempting to debunk it.

Not credible

Misinformation was more common in this category. Common themes included: blaming China for the virus, questioning its origins, rejecting vaccines, and framing COVID as undermining U.S. President Trump. These sources also tended to have a conservative political bias. Content emphasizing scientific uncertainty suggested that response measures were unjustified or that science was distorted for political ends. This category also included propaganda narratives, often extolling Russian and Chinese COVID responses.

Comparison to other health topics prior to the pandemic

Posts about COVID-19 were less likely to contain links to “not credible” sources and more likely to contain links to “more credible” sources when compared to other health topics prior to the pandemic. On average, URLs shared were more likely to be credible than non-credible during the pandemic (Fig 2). Among rated links, the proportion of “not credible” links shared during the pandemic in posts containing COVID-19 keywords was lower on Twitter (RR = 0.37; 95% CI: 0.37–0.37), Facebook Pages (RR = 0.41; 95% CI: 0.40–0.42), and Facebook Groups (RR = 0.37; 95% CI: 0.37–0.38). Additionally, the proportion of “more credible” links in posts containing COVID-19 keywords was higher on Twitter (RR = 1.13; 95% CI: 1.13–1.13), Facebook Pages (RR = 1.07; 95% CI: 1.07–1.07), and Facebook Groups (RR = 1.03; 95% CI: 1.02–1.03). These results replicated when focusing only on “high-quality health sources”—academic and government sources—for all three platforms: Twitter (RR = 3.52; 95% CI: 3.50–3.54), Facebook Pages (RR = 1.15; 95% CI: 1.14–1.17), and Facebook Groups (RR = 1.09; 95% CI: 1.06–1.11). URLs were also less likely to be unrated during the pandemic: Twitter RR = 0.67 (95% CI: 0.67 to 0.67), Facebook Pages RR = 0.74 (95% CI: 0.74 to 0.74), and Facebook Groups RR = 0.58 (95% CI: 0.58 to 0.58) (see Supplementary Material).

Fig 2. Proportions of COVID-19 and health URLs for each credibility category and social media platform.

Fig 2

The least credible posts are not the most viral

Even if low credibility content is less widespread on Twitter and Facebook, it can still be harmful if it garners more engagement. We therefore compared the average number of shares for each credibility category. We did not find that the least credible content was the most widely shared. Rather, on Twitter and Facebook Pages, the most viral posts contained links to government and academic sources, whereas intermediate “less credible” sources were the most viral in Facebook Groups (Fig 3).

Fig 3. Average number of shares for each credibility category by platform, estimated using negative binomial regression.

Fig 3

Discussion

Like prior studies [12, 14, 22], we find that there is indeed an overwhelming amount of content pertaining to COVID-19 online, making it difficult to discern truth from falsehood. Furthermore, we found that posts with URLs rated as “not credible” were indeed more likely to contain falsehoods than posts in other categories.

We are the first to compare this content to other health topics across platforms, adding much needed context. Upon comparison, we found that social media posts about COVID-19 were more likely to come from credible sources, and less likely to come from non-credible sources. Thus, available evidence suggests that misinformation about COVID-19 is proportionally quite rare, especially when compared to misinformation about other health topics.

Although sources rated as “not credible” were roughly 3.67 times more likely to share misinformation, Fig 2 shows that misinformation–i.e., explicitly false claims about COVID-19 –was only present in a minority of posts. Thus, prior studies which used credibility as a proxy for misinformation may have overestimated the prevalence of explicitly false claims. Explicit falsehoods, although harmful, seem to be rare. To the extent that “more credible” sources shared misinformation, they did so to report on or, in some cases, attempt to debunk, it. Thus, contrary to the claim of an “infodemic” of misinformation, posts about COVID-19 included less misinformation than other health-related posts prior to the pandemic.

Our results demonstrate that the volume of low-credibility content is much lower than the volume of high-credibility content on Twitter and Facebook. However, small volumes of harmful content could still be problematic if they garner a disproportionately large number of engagements. We found that this was not the case. To the contrary, content from the highest-quality sources–government and academic websites–was shared more often, on average, on both Twitter and Facebook. In Facebook Groups, where links to “not credible” sources were shared more often than links to high-quality sources, intermediate “less credible” sources were most frequently shared. However, we did not find that misinformation was significantly more prevalent in this category than in the “more credible” category.

Taken as a whole, these results suggest that misinformation about COVID-19 may largely be concentrated within specific online communities with limited spread overall. Online misinformation about COVID-19 remains problematic. However, our results suggest that the widespread reporting of false claims pertaining to COVID-19 may have been overstated at the start of the pandemic, whereas other health topics may be more prone to misinformation.

Limitations

Our inclusion criteria for social media data are based on keywords associated with COVID-19, vaccine-preventable illnesses, and other health conditions. This collection procedure might introduce some noise in our dataset, for example if online actors exploited the virality of the COVID-19 hashtags/keywords to promote their content. If so, this would engender potentially more misinformation during the pandemic; in fact, we found that there was less (see S2 Fig, where we quantified proportions of “opportunistic” content). Furthermore, we used inclusion criteria that are comparable to prior studies, including those upon which the initial claim of an infodemic was based: a WHO/PAHO fact sheet from May 1, 2020 (https://iris.paho.org/bitstream/handle/10665.2/52052/Factsheet-infodemic_eng.pdf?sequence=14&isAllowed=y), defines the “infodemic” using keyword search terms that are similar to ours. Other studies of the “infodemic” have taken the same approach [1214]. Thus, our findings contextualize previous work in this area which has primarily focused on low-credibility sources rather than a more holistic picture.

Our inclusion criteria yielded several unrated URLs, comprising roughly half our sample. These URLs were not primarily misinformative (see S3 Fig). However, even if unrated URLs did contain large quantities of misinformation, COVID-19 data were statistically significantly less likely to contain this unrated content on all social media platforms studied compared to what would be expected prior to the pandemic.

Conclusions

Taken together, our findings suggest that the “infodemic” is, in fact, a general feature of health information online, that is not restricted to COVID-19. In fact, COVID-19 content seems less likely to contain explicitly false facts. This does not mean that misinformation about COVID-19 is absent; however, it does suggest that attempts to combat it might be better informed by comparison to the broader health misinformation ecosystem. Such a comparison would potentially engender a more dramatic response.

Health leaders who have focused on COVID-19 misinformation should acknowledge that this problem affects other areas of health even more so. Beyond the COVID-19 infodemic, calls-to-action to address medical misinformation more broadly should be given higher priority.

Supporting information

S1 File. Appendix A. Unshortening URLs.

(PDF)

S2 File. Appendix B.

Measuring source credibility.

(PDF)

S3 File. Appendix C.

Categorizing tweets with multiple URLs.

(PDF)

S1 Fig. Raw counts of posts and URLs in each dataset.

URLs are segmented by whether they were rated, unrated, or “in platform” (e.g., pointing from Facebook to Facebook or from Twitter to Twitter).

(PDF)

S2 Fig. Content proportions in each dataset (n = 600 for each credibility category).

(PDF)

S3 Fig. Proportion of posts sharing information and also containing falsehoods (“misinformation”) broken down by credibility category.

Error bars reflect one standard error.

(PDF)

Data Availability

Data are available at Harvard’s Dataverse at this link: https://doi.org/10.7910/DVN/X6AF8I; however, there are some legal restrictions on sharing data, per the terms of service of social media platforms and NewsGuard's licensing terms as follows: · Twitter o Link to Terms of Service: https://developer.twitter.com/en/developer-terms/agreement-and-policy o Availability: Tweets IDs are now available on Harvard’s Dataverse at https://doi.org/10.7910/DVN/X6AF8I · CrowdTangle o Relevant Terms of Service: CrowdTangle prohibits providing raw data to anyone outside of a CrowdTangle user’s account. The user can share the findings, but not the data. If a journal asks for data to verify findings, the CrowdTangle user may send a .csv, but it cannot be posted publicly, and the journal must delete it after verification. o Availability: Frequency counts of each top-level domain in this dataset are now available on Harvard’s dataverse at https://doi.org/10.7910/DVN/X6AF8I. Additionally, CrowdTangle list IDs are provided in the references. Anyone with a CrowdTangle account may access these lists and the corresponding raw data. Researchers may request CrowdTangle access at https://help.crowdtangle.com/en/articles/4302208-crowdtangle-for-academics-and-researchers · Domain frequency counts for Twitter and Facebook o Relevant Terms of Service: None o Availability: csv files containing the frequencies of each domain in each dataset are now available at https://doi.org/10.7910/DVN/X6AF8I · MediaBiasFactCheck o Link to Terms of Service: https://mediabiasfactcheck.com/terms-and-conditions/ o Availability: MediaBiasFactCheck ratings are currently publicly available at https://mediabiasfactcheck.com/ · NewsGuard o Relevant Terms of Service: https://www.newsguardtech.com/terms-of-service/ o Availability: These data were provided under license for a fee by a third party provider (NewsGuard). Researchers seeking to use NewsGuard data in their own research may inquire about licensing the data directly from NewsGuard for a fee. Researcher licensing information can be found here: https://www.newsguardtech.com/newsguard-for-researchers/ All DOIs provided above are activated and publicly accessible. The authors did not receive any special privileges in accessing any of the third-party data that other researchers would not have.

Funding Statement

This work was supported in part by grant number R01GM114771 to D.A. Broniatowski and S.C. Quinn, and by the John S. and James L. Knight Foundation to the GW Institute for Data, Democracy, and Politics. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Decision Letter 0

Barbara Guidi

2 Sep 2021

PONE-D-21-17260

Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics

PLOS ONE

Dear Dr. Broniatowski,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper needs a MAJOR REVISION in order to be evaluated for a publication. In particular, reviewers highlighted that the contribution of the paper needs to be improved. Please follow the suggestions included in this email.

Please submit your revised manuscript by Oct 15 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Barbara Guidi

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following financial disclosure: 

This work was supported in part by grant number R01GM114771 to D.A. Broniatowski and S.C. Quinn, and by the James S. and John L. Knight Foundation to the GW Institute for Data, Democracy, and Politics.

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4.  Thank you for stating the following in the Competing Interests section: 

David A. Broniatowski received an honorarium from the United Nations Shot@Life Foundation – a non-governmental organization that promotes childhood vaccination. Mark Dredze holds equity in Sickweather Inc. and has received consulting fees from Bloomberg LP and Good Analytics Inc. None of these organizations had any role in the study design, data collection, and analysis, decision to publish, or preparation of the article. The remaining authors declare no competing interests.

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. 

Please respond by return email with your amended Competing Interests Statement and we will change the online submission form on your behalf.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

6. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript. 

7. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper provides a study of the credibility of the news propagated on Twitter and Facebook about Covid-19. The authors used the URLs to define if the news (posts, tweets) are credible or not. The results show that the news concerning a Covid-19 are more credible than other health topics.

It will more interesting if the authors could add another factor analyzing the text keywords or summary to measure the credibility of the text and not only the URL.

The citation should be put before the end of the sentence. “spurred an 5 “infodemic” of misinformation.(1)”

The paper should be formatted to respect the Plos One format.

Reviewer #2: In this paper, the authors present a study concerning misinformation spreading during the covid-19 pandemic and compare it with misinformation spreading regarding other health topics. The paper is well written and relatively easy to follow, but the contribution should be improved.

In detail, the authors have more than 318 million posts from Twitter and Facebook but only use 3000 for their analyses concerning misinformation spreading. I think the authors can improve on that. The 3000 manually annotated posts can be used as a basis, such as to train an AI classifier, and then use the classifier to classify the remaining posts in order to gain more insight concerning misinformation. on a much bigger dataset. For instance, do people share news from unreliable sources to "tag" these sources and make other people aware that fake news are circulating on certain topics? Or are people sharing links that support their thesis? The topic is interesting, but the contribution is not enough for a journal publication.

I also recommend the authors restyle Figure 1 because it's very hard to understand and some labels are impossible to read. Maybe use a pie chart or a stacked histogram? I think you should also discuss and describe the results shown in figures 2 and 3 in more detail.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Jan 12;17(1):e0261768. doi: 10.1371/journal.pone.0261768.r002

Author response to Decision Letter 0


17 Oct 2021

COMMENT R1-1: The paper provides a study of the credibility of the news propagated on Twitter and Facebook about Covid-19. The authors used the URLs to define if the news (posts, tweets) are credible or not. The results show that the news concerning a Covid-19 are more credible than other health topics.

It will more interesting if the authors could add another factor analyzing the text keywords or summary to measure the credibility of the text and not only the URL.

RESPONSE R1-1: We thank the reviewer for this comment, because it allows us to clarify the main construct of our study: credibility. In our study, credibility is defined on URLs and is a feature of publishers; not of content. Several prominent, and highly-cited, papers have been published using this assumption:

• Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., ... & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094-1096.

• Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., & Lazer, D. (2019). Fake news on Twitter during the 2016 US presidential election. Science, 363(6425), 374-378.

• Pennycook, G., & Rand, D. G. (2019). Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences, 116(7), 2521-2526.

• Singh, L., Bode, L., Budak, C., Kawintiranon, K., Padden, C., & Vraga, E. (2020). Understanding high-and low-quality URL Sharing on COVID-19 Twitter streams. Journal of computational social science, 3(2), 343-366.

• Yang, K. C., Pierri, F., Hui, P. M., Axelrod, D., Torres-Lugo, C., Bryden, J., & Menczer, F. (2021). The COVID-19 Infodemic: Twitter versus Facebook. Big Data & Society, 8(1), 20539517211013861.

• Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., ... & Scala, A. (2020). The COVID-19 social media infodemic. Scientific Reports, 10(1), 1-10.

Thus, the assumption that URLs are an adequate measure of credibility is widespread and accepted in top journals, including in Science, PNAS, and Scientific Reports.

Using credibility as our primary metric, this paper makes a major contribution to the existing, published literature: we are the first one to compare COVID credibility scores to those of other health topics. Prior papers analyzing the credibility of the COVID infodemic studied COVID in isolation and so there was no comparator. Our novel contribution is that we are able to quantify whether the credibility of COVID information was higher or lower than information about other health topics. This adds much needed context to the debate.

Broadly speaking, we agree with the reviewer that credibility is only a proxy measure of misinformativeness. We therefore conducted an additional analysis where we rated the misinformation content of posts that differ in credibility (lines 173-184). This analysis demonstrates clearly that content with low credibility URLs is significantly more misinformative than other types of content. We quantified how much of this content is explicitly false. Overall, our results are robust – we found that COVID content has a lower proportion of low credibility URLs than other health topics. We now include discussion about these points in our manuscript (lines 263-271), and we thank the reviewer for encouraging this clarification.

COMMENT R1-2: The citation should be put before the end of the sentence. “spurred an 5 “infodemic” of misinformation.(1)”

The paper should be formatted to respect the Plos One format.

RESPONSE R1-2: Per the reviewer’s recommendation we have put the referenced citation before the end of the sentence, and we have formatted the paper to respect the Plos One format. We thank the reviewer.

COMMENT R2-1: In this paper, the authors present a study concerning misinformation spreading during the covid-19 pandemic and compare it with misinformation spreading regarding other health topics. The paper is well written and relatively easy to follow, but the contribution should be improved.

RESPONSE R2-1: We thank the reviewer for the overall positive feedback and comments!

COMMENT R2-2: In detail, the authors have more than 318 million posts from Twitter and Facebook but only use 3000 for their analyses concerning misinformation spreading. I think the authors can improve on that. The 3000 manually annotated posts can be used as a basis, such as to train an AI classifier, and then use the classifier to classify the remaining posts in order to gain more insight concerning misinformation. on a much bigger dataset.

RESPONSE R2-2: Our primary aim was to assess misinformation by examining the credibility of URLs shared on Facebook and Twitter so that we might compare COVID-19 posts to posts about other health topics. As we state in RESPONSE R1-1, we drew upon a widely accepted method for doing so. Despite the widespread use of this technique, which has been published in top journals, we provided an additional validity check by annotating 3000 posts by hand. This validity check confirms that low-credibility URLs are more likely to contain misinformation than high-quality URLs. Thus, our inferences both rely upon a solid body of literature and provide additional data to support our claims.

Nevertheless, we agree with the reviewer that the 3000 manually annotated posts can be used as a basis to make more refined inferences about the larger dataset. Although the reviewer suggests training an AI classifier to make these inferences, we respectfully point out that our sample of 3000 posts was stratified for each credibility category with 600 posts drawn uniformly at random from within each category. According to the Law of Large Numbers, the mean proportions of each observed label in our dataset will converge to the corresponding proportions of these labels in our populations. Therefore, a machine learning classifier is unnecessary for us to make strong statistically valid inferences regarding the underlying structure of the dataset. Furthermore, any classifier that we train would introduce sources of systematic error and bias, with these errors and biases depending on the specific choice of classifier used. On the other hand, the statistical approach that we use – stratified randomly sampling – relies on distribution-free chi-square tests and is therefore guaranteed to be free of systematic errors (i.e., biases). Furthermore, this statistical approach allows us to explicitly quantify the variance error (i.e., random error, as opposed to systematic error) in our measures, as is reflected in the confidence intervals and relative risk ratios on lines 221-231. We thank the reviewer for encouraging us to clarify the rigor of our exposition.

COMMENT R2-3: For instance, do people share news from unreliable sources to "tag" these sources and make other people aware that fake news are circulating on certain topics? Or are people sharing links that support their thesis? The topic is interesting, but the contribution is not enough for a journal publication.

RESPONSE R2-3: We thank the reviewer for encouraging us to more explicitly examine how content from different credibility categories are used. We have conducted a more in-depth analysis of our hand-annotated sample, which directly addresses the reviewer’s concern that people might be sharing news from unreliable sources to ‘tag’ these sources. We found that when people reported on questionable content, such as explaining conspiracy theories or reporting on claims that bleach cures COVID, they typically did so by referencing “more credible” and “less credible” sources. In fact, this was the principal source of misinformation discussed by “more credible sources”. In contrast, we did not find that sharing “not credible” content did so frequently. We now include this discussion in the manuscript (lines 259-262).

COMMENT R2-4: I also recommend the authors restyle Figure 1 because it's very hard to understand and some labels are impossible to read. Maybe use a pie chart or a stacked histogram? I think you should also discuss and describe the results shown in figures 2 and 3 in more detail.

RESPONSE R2-4: We have relabeled restyled Figure 1 as a pie chart. We also now discuss and describe the results shown in Figures 2 and 3 in more detail. We thank the reviewer for this suggestion.

In conclusion, we have addressed all of the reviewers’ comments to the best of our abilities and we believe that our paper has significantly improved as a result. We thank the reviewers for their attention to detail, and we look forward to a positive response.

Attachment

Submitted filename: RESPONSE TO REVIEWERS 10-17-2021.docx

Decision Letter 1

Barbara Guidi

10 Dec 2021

Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics

PONE-D-21-17260R1

Dear Dr. Broniatowski,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Barbara Guidi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Barbara Guidi

28 Dec 2021

PONE-D-21-17260R1

Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics

Dear Dr. Broniatowski:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Barbara Guidi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Appendix A. Unshortening URLs.

    (PDF)

    S2 File. Appendix B.

    Measuring source credibility.

    (PDF)

    S3 File. Appendix C.

    Categorizing tweets with multiple URLs.

    (PDF)

    S1 Fig. Raw counts of posts and URLs in each dataset.

    URLs are segmented by whether they were rated, unrated, or “in platform” (e.g., pointing from Facebook to Facebook or from Twitter to Twitter).

    (PDF)

    S2 Fig. Content proportions in each dataset (n = 600 for each credibility category).

    (PDF)

    S3 Fig. Proportion of posts sharing information and also containing falsehoods (“misinformation”) broken down by credibility category.

    Error bars reflect one standard error.

    (PDF)

    Attachment

    Submitted filename: RESPONSE TO REVIEWERS 10-17-2021.docx

    Data Availability Statement

    Data are available at Harvard’s Dataverse at this link: https://doi.org/10.7910/DVN/X6AF8I; however, there are some legal restrictions on sharing data, per the terms of service of social media platforms and NewsGuard's licensing terms as follows: · Twitter o Link to Terms of Service: https://developer.twitter.com/en/developer-terms/agreement-and-policy o Availability: Tweets IDs are now available on Harvard’s Dataverse at https://doi.org/10.7910/DVN/X6AF8I · CrowdTangle o Relevant Terms of Service: CrowdTangle prohibits providing raw data to anyone outside of a CrowdTangle user’s account. The user can share the findings, but not the data. If a journal asks for data to verify findings, the CrowdTangle user may send a .csv, but it cannot be posted publicly, and the journal must delete it after verification. o Availability: Frequency counts of each top-level domain in this dataset are now available on Harvard’s dataverse at https://doi.org/10.7910/DVN/X6AF8I. Additionally, CrowdTangle list IDs are provided in the references. Anyone with a CrowdTangle account may access these lists and the corresponding raw data. Researchers may request CrowdTangle access at https://help.crowdtangle.com/en/articles/4302208-crowdtangle-for-academics-and-researchers · Domain frequency counts for Twitter and Facebook o Relevant Terms of Service: None o Availability: csv files containing the frequencies of each domain in each dataset are now available at https://doi.org/10.7910/DVN/X6AF8I · MediaBiasFactCheck o Link to Terms of Service: https://mediabiasfactcheck.com/terms-and-conditions/ o Availability: MediaBiasFactCheck ratings are currently publicly available at https://mediabiasfactcheck.com/ · NewsGuard o Relevant Terms of Service: https://www.newsguardtech.com/terms-of-service/ o Availability: These data were provided under license for a fee by a third party provider (NewsGuard). Researchers seeking to use NewsGuard data in their own research may inquire about licensing the data directly from NewsGuard for a fee. Researcher licensing information can be found here: https://www.newsguardtech.com/newsguard-for-researchers/ All DOIs provided above are activated and publicly accessible. The authors did not receive any special privileges in accessing any of the third-party data that other researchers would not have.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES