Skip to main content
. 2022 Dec 5;8:e1151. doi: 10.7717/peerj-cs.1151

Table 1. A summary of the existing COVID-19 datasets collected for misinformation detection or situational information classification.

The summary includes the size, the language, the sources of samples, the annotation method, and the class granularity. All listed datasets are designed for misinformation detection except the ones released for situational info classification.

Dataset Size Main input Language Classification task Annotation method
Patwa et al. (2021) 10.7K Twitter, Facebook Fact-checking websites English Binary (fake, real) Manually + fact-checked claims
COVIDLies (Hossain et al., 2020) 6.7K Twitter English Binary (fake, real) Manually
Helmstetter & Paulheim (2021) 400K Twitter English Binary (fake, non-fake) Weak supervision
ReCOVery (Zhou et al., 2020) 146K News articles Twitter English Binary (fake, real) Distance Supervision
CoAID (Cui & Lee, 2020) 301.1K Social Posts
User engagements
News Articles
English Binary (fake, real) Distance Supervision
MM-COVID (Li et al., 2020b) 11.1K Twitter Multi-lingual Binary (fake, real) Manually
ArCOV19-Rumors (Haouari et al., 2020b) 9.4K Twitter Arabic Three classes (false, true, other) Manually
Alqurashi et al. (2021) 8.7K Twitter Arabic Binary (misleading not-misleading) Manually
COVID-19-FAKES (Elhadad, Li & Gebali, 2020) 0.4K Twitter Arabic + English Binary (fake, real) Automatically (13 ML algorithm)
Mahlous & Al-Laith (2021) (a) 2.5K Twitter Arabic Binary (fake, genuine) Manually
Mahlous & Al-Laith (2021) (b) 14.9K Twitter Arabic Binary (fake, genuine) Automatically
CLEF-2021 CheckThat! Lab (task 3A) (Shahi, Struß & Mandl, 2021) 1.2K News articles Multi-lingual Four classes (false, true, partially false, other) Manually
FakeCovid (Shahi & Nandini, 2020) 5.1K Several Social media platforms Multi-Lingual Three classes (false, true, partially false) Manually
Alsudias & Rayson (2020) 2K Twitter Arabic Three classes (false, true, unrelated) Manually
CMU-MisCOV19 (Memon & Carley, 2020) 0.5K Twitter English Multi-class (17 classes) Manually
Li et al. (2020a) 3K Weibo English Multi-class (eight Situational classes) Manually
ArCorona (Mubarak & Hassan, 2020) 8K Twitter Arabic Multi-class (17 classes) (Mixing situational and misinformation classes) Manually
AraCOVID19-MFH (Ameur & Aliane, 2021) 10.8K Twitter Arabic Ten independent tasks (each with 2–4 classes) Manually
Alam et al. (2020) 722 Twitter Arabic Ten independent tasks (each with 2–3 classes) Manually
Out dataset (ArCOV19-MCM) 6.7K Twitter Arabic Multi-class (19 misinformation classes) Manually
Out dataset (ArCOV19-MLM) 6.7K Twitter Arabic Multi-label (19 misinformation classes) Manually
Out dataset (ArCOV19-Sit) 4.2K Twitter Arabic Multi-class (Six situational classes) Manually