Skip to main content
. 2021 Jun 17;7:e598. doi: 10.7717/peerj-cs.598

Table 1. English datasets used in cross-dataset generalisation studies.

Positive labels are listed with their original wording. Expert annotation type include authors and experts in social science and related fields. ?: Type of annotations not available in original paper, the found descriptions are thus included. Note that only datasets used in generalisation studies are listed—for comprehensive lists of hate speech datasets, see Vidgen & Derczynski (2020) and Poletto et al. (2020).

Dataset name Publication Source Positive labels Annotator type
Waseem Waseem & Hovy (2016)
Waseem (2016)
Twitter Racism
Sexism
Expert;
Expert and crowdsourcing
Davidson Davidson et al. (2017) Twitter Hate speech
Offensive
Crowdsourcing
Founta Founta et al. (2018) Twitter Hate speech
Offensive
Crowdsourcing
HatEval Basile et al. (2019) Twitter Hateful Expert and crowdsourcing
Kaggle Jigsaw (2018) Wikipedia Toxic
Severe toxic
Obscene
Threat
Insult
Identity hate
Crowdsourcing
Gao Gao & Huang (2017) Fox News Hateful ? (Native speakers)
AMI Fersini, Nozza & Rosso (2018)
Fersini, Rosso & Anzovino (2018)
Twitter Misogynous Expert
Warner Warner & Hirschberg (2012) Yahoo!
American Jewish Congress
Anti-semitic
Anti-black
Anti-asian
Anti-woman
Anti-muslim
Anti-immigrant
Other-hate
? (Volunteer)
Zhang Zhang, Robinson & Tepper (2018) Twitter Hate Expert
Stromfront De Gibert et al. (2018) Stormfront Hate Expert
Kumar Kumar et al. (2018b) Facebook, Twitter Overtly aggressive
Covertly aggressive
Expert
Wulczyn Wulczyn, Thain & Dixon (2017) Wikipedia Attacking Crowdsourcing
OLID (OffensEval) Zampieri et al. (2019a) Twitter Offensive Crowdsourcing
AbuseEval Caselli et al. (2020) Twitter Explicit (abuse)
Implicit (abuse)
Expert
Kolhatkar Kolhatkar et al. (2019) The Globe and Mail Very toxic
Toxic
Mildly toxic
Crowdsourcing
Razavi Razavi et al. (2010) Natural
Semantic
Module
Usenet
Flame Expert
Golbeck Golbeck et al. (2017) Twitter Harassing Expert