Skip to main content
. 2021 Jun 18;7:e518. doi: 10.7717/peerj-cs.518

Table 2. Characteristics of the surveyed datasets for fake news detection.

Dataset News domain Application purpose Type of disinformation Language Size News content type Rating scale Media platform Spontaneity Availability Extraction time
Yelp dataset (Barbado, Araque & Iglesias, 2019) Technology Fake detection Fake reviews English 18912 reviews Text 2 values Mainstream Yes Yes No
PHEME dataset (Zubiaga et al., 2016) Society, politics Rumor detection Rumors English and German 330 rumorous conversations and4842 tweets overall Text 3 values Social media (Twitter) Yes Yes No
CREDBANK (Mitra & Gilbert, 2015) Society Veracity classification Rumors English 60 million streaming tweets Text 5 values Social media (Twitter) Yes Yes Yes (October 2014 - February 2015)
BuzzFace (Santia & Williams, 2018) Politics, society Veracity classification Fake news articles English 2263 news Text 4 values Social media (Facebook) Yes Yes Yes (September 2016)
FacebookHoax (Tacchini et al., 2017) Science Fake detection Hoaxes English 15500 posts Text 2 values Social media (Facebook) Yes Yes Yes (July 2016 - December 2016)
LIAR (Wang, 2017) Politics Fake detection Fake news articles English 12836 short statements Text 6 values Mainstream + social media (Facebook, Twitter) Yes Yes Yes (2007-2016)
Fact checking dataset (Vlachos & Riedel, 2014) Politics, society Fact checking Fake news articles English 221 statements Text 5 values Mainstream Yes Yes No
FEVER (Thorne et al., 2018) Society Fact checking Fake news articles English 185,445 claims Text 3 values Mainstream No Yes No
EMERGENT (Ferreira & Vlachos, 2016) Society, technology Rumor detection Rumors English 300 claims, and 2,595 associated article headlines Text 3 values Mainstream + social media (Twitter) Yes Yes No
FakeNewsNet (Shu et al., 2018) Society, politics Fake detection Fake news articles English 422 news Text, images 2 values Mainstream + social media (Twitter) Yes Yes No
Benjamin Political News Dataset (Horne & Adali, 2017) Politics Fake detection Fake news articles English 225 stories Text 3 values Mainstream Yes Yes Yes (2014-2015)
Burfoot Satire News Dataset (Burfoot & Baldwin, 2009) Politics, economy, technology, society Fake detection Satire English 4,233 newssamples Text 2 values Mainstream Yes Yes No
BuzzFeed News dataset (Horne & Adali, 2017) Politics Fake detection Fake news articles English 2,283 news samples Text 4 values Social media (Facebook) Yes Yes Yes (2016-2017)
MisInfoText dataset (Torabi & Taboada, 2019) Society Fact checking Fake news articles English 1,692 news articles Text 4 values for BuzzFeed and 5 values for Snopes Mainstream Yes Yes No
Ott et al.’s dataset (Ott et al., 2011) Tourism Fake detection Fake reviews English 800 reviews Text 2 values Social media (TripAdvisor) No Yes No
FNC-1 dataset (Riedel et al., 2017) Politics, society, technology Fake detection Fake news articles English 49972 articles Text 4 values Mainstream Yes Yes No
Spanish fake news corpus (Posadas-Durán et al., 2019) Science, Sport, Economy, Education, Entertainment, Politics, Health, Security, Society Fake detection Fake news articles Spanish 971 news Text 2 values Mainstream Yes Yes Yes (January 2018 - July 2018)
Fake_or_real_news (Dutta et al., 2019) Politics, society Fake detection Fake news articles English 6,337 articles Text 2 values Mainstream Yes Yes No
TSHP-17 (Rashkin et al., 2017) Politics Fact checking Fake news articles English 33,063 articles Text 6 values for PolitiFact and 4 values for unreliable sources Mainstream Yes Yes No
QProp (Barrón-Cedeno et al., 2019) Politics Fact checking Fake news articles English 51,294 articles Text 2 values Mainstream Yes Yes No
NELA-GT-2018 (Nørregaard, Horne & Adalı, 2019) Politics Fake detection Fake news articles English 713000 articles Text 2 values Mainstream Yes Yes Yes(February 2018 - November 2018)
TW_info (Jang, Park & Seo, 2019) Politics Fake detection Fake news articles English 3472 articles Text 2 values social media (Twitter) Yes Yes Yes (January 2015 - April 2019
FCV-2018 (Papadopoulou et al., 2019) Society Fake detection Fake news content English, Russian, Spanish, Arabic, German, Catalan, Japanese, and Portuguese 380 videos and 77258 tweets Videos, text 2 values social media (YouTube, Facebook, Twitter) Yes Yes Yes (April 2017 - July 2017
Verification Corpus (Boididou et al., 2018) Society Veracity classification Hoaxes English, Spanish, Dutch, French 15629 posts Text, images, videos 2 values social media (Twitter) Yes Yes Yes (2012-2015)
CNN / Daily Mail summarization dataset (Jwa et al., 2019) Politics, society, crime, sport, business, technology, health Fake detection Fake news articles English 287000 articles Text 4 values Mainstream Yes Yes Yes (April 2007 - April 2015
Zheng et al.’s dataset (Zheng et al., 2018) Society Clickbait detection Clickbait Chinese 14922 headlines Text 2 values Mainstream + social media (Wechat) Yes Yes No
Tam et al.’s dataset (Tam et al., 2019) Politics, technology, science, crime, fraud and scam, fauxtography Rumor detection Rumors English 1022 rumors and 4 million tweets Text 2 values social media (Twitter) Yes Yes Yes (May 2017 - November 2017)