Skip to main content
. 2022 Jul 7;28(6):2391–2422. doi: 10.1007/s00530-022-00966-y

Table 9.

Multimodal Fake News datasets

Dataset Year of release Statistics Domain Contents Labels Collected from Used in
Twitter 15 [144] 2015

361 (I)

7032 (F)

5008 (R)

Posts related

to 11 events

Text, visual 2 Twitter [4, 15, 2628, 99]
Twitter 16 [89] 2016

413 (I)

9596 (F)

6225 (R)

Posts related

to 17 events

Text, visual 2 Twitter [25, 91, 111, 129]
Weibo [25] 2016

9528 (I)

4749 (F)

4779 (R)

Crawl the verfi

ed false rumor posts from May, 2012 to

Jan, 2016

Text, visual 2

Weibo (Non-rumor tweets are verifi

ed by Xinhua News

Agency, an authoritative news agency in China)

[4, 15, 2528, 91, 91, 99]
PHEME [12] 2016

2672 (I)

1972 (F)

3830 (R)

9 different events,

which include 5 cases of breaking news

Tweet, conversational threads 3 Twitter [16, 92, 96, 99]
ALLData [100] 2018

20,015 (I)

11,941 (F)

8074 (R)

2016 US Presidential elections

The title, text, image,

author and website

2

Fake and real

news scraped from 240 websites and authoritative news websites, i.e., the New York Times, Washington Post, etc. respectively

[100, 110, 111]
FakeNewsNet [120] 2019

19,200 (I)

5367 (F)

17,222 (R)

Politics, Entertainment Text, image url, conversational threads, location, and timestamp of engagement 2

Content is crawled from PolitiFact, GossipCop, E! online;

For user engagements Twitter API is used

[94, 95, 97, 98]

Note: I—Total Number of Images, F—Number of Fake claims, R—Number of Real claims