. 2022 Dec 5;8:e1151. doi: 10.7717/peerj-cs.1151

Table 1. A summary of the existing COVID-19 datasets collected for misinformation detection or situational information classification.

The summary includes the size, the language, the sources of samples, the annotation method, and the class granularity. All listed datasets are designed for misinformation detection except the ones released for situational info classification.

Dataset	Size	Main input	Language	Classification task	Annotation method
Patwa et al. (2021)	10.7K	Twitter, Facebook Fact-checking websites	English	Binary (fake, real)	Manually + fact-checked claims
COVIDLies (Hossain et al., 2020)	6.7K	Twitter	English	Binary (fake, real)	Manually
Helmstetter & Paulheim (2021)	400K	Twitter	English	Binary (fake, non-fake)	Weak supervision
ReCOVery (Zhou et al., 2020)	146K	News articles Twitter	English	Binary (fake, real)	Distance Supervision
CoAID (Cui & Lee, 2020)	301.1K	Social Posts User engagements News Articles	English	Binary (fake, real)	Distance Supervision
MM-COVID (Li et al., 2020b)	11.1K	Twitter	Multi-lingual	Binary (fake, real)	Manually
ArCOV19-Rumors (Haouari et al., 2020b)	9.4K	Twitter	Arabic	Three classes (false, true, other)	Manually
Alqurashi et al. (2021)	8.7K	Twitter	Arabic	Binary (misleading not-misleading)	Manually
COVID-19-FAKES (Elhadad, Li & Gebali, 2020)	0.4K	Twitter	Arabic + English	Binary (fake, real)	Automatically (13 ML algorithm)
Mahlous & Al-Laith (2021) (a)	2.5K	Twitter	Arabic	Binary (fake, genuine)	Manually
Mahlous & Al-Laith (2021) (b)	14.9K	Twitter	Arabic	Binary (fake, genuine)	Automatically
CLEF-2021 CheckThat! Lab (task 3A) (Shahi, Struß & Mandl, 2021)	1.2K	News articles	Multi-lingual	Four classes (false, true, partially false, other)	Manually
FakeCovid (Shahi & Nandini, 2020)	5.1K	Several Social media platforms	Multi-Lingual	Three classes (false, true, partially false)	Manually
Alsudias & Rayson (2020)	2K	Twitter	Arabic	Three classes (false, true, unrelated)	Manually
CMU-MisCOV19 (Memon & Carley, 2020)	0.5K	Twitter	English	Multi-class (17 classes)	Manually
Li et al. (2020a)	3K	Weibo	English	Multi-class (eight Situational classes)	Manually
ArCorona (Mubarak & Hassan, 2020)	8K	Twitter	Arabic	Multi-class (17 classes) (Mixing situational and misinformation classes)	Manually
AraCOVID19-MFH (Ameur & Aliane, 2021)	10.8K	Twitter	Arabic	Ten independent tasks (each with 2–4 classes)	Manually
Alam et al. (2020)	722	Twitter	Arabic	Ten independent tasks (each with 2–3 classes)	Manually
Out dataset (ArCOV19-MCM)	6.7K	Twitter	Arabic	Multi-class (19 misinformation classes)	Manually
Out dataset (ArCOV19-MLM)	6.7K	Twitter	Arabic	Multi-label (19 misinformation classes)	Manually
Out dataset (ArCOV19-Sit)	4.2K	Twitter	Arabic	Multi-class (Six situational classes)	Manually