Skip to main content
. 2025 Sep 17;11:e3133. doi: 10.7717/peerj-cs.3133

Table 2. Benchmark Arabic hate speech datasets: sources, reference publications, collection platforms, sizes, and annotation labels.

Dataset/link Ref./Year Platform Size Labels
Alakrot A.etal dataset Alakrot, Murray & Nikolov (2018a) YouTube 15,050 Not offensive, Offensive
Religious Hate Speech Detection dataset Albadi, Kurdi & Mishra (2018) Twitter 5,569 Hate, Not hate
MLMA-hate-speech Ousidhoum et al. (2021a) Twitter 3,353 Disability, Gender, Religion, Sexual orientation
L-HSAB-First-Arabic-Levantine-Hate Speech-Dataset Mulki et al. (2019) Twitter 5,846 Normal, Hate, Abusive
OSACT4 Shared Task on Offensive Language Detection (Subtask A and B) Hassan et al. (2020) Twitter 10,000 Task A: OFF, NOT OFF/Task B: HS, NOTH S/
COVID-19-Arabic Tweets-Dataset Alshalan et al. (2020) Twitter 975,316 tweet used in Alshalan et al. (2020) Hate (low, average, high), Non hate
Dataset Hate speech detection in Arabic Twittersphere Alshalan & Al-Khalifa (2020) Twitter 9,316 Abusive, hateful, normal
Multi Platforms Offensive Language Dataset (MPOLD) Chowdhury et al. (2020) Facebook, Twitter, YouTube 4,000 OFF, NOT OFF
Fine-Grained H.S Detection on Arabic Twitter Ben Nessir et al. (2022), Shapiro, Khalafallah & Torki (2022) Twitter 13,000 Subtask A: OFF, NOT OFF/Subtask B: HS, NOT HS/Subtask C: Ideology, Religion, disability, race, Social class, and Gender
Arabic Hate Speech Dataset 2023 Ahmad et al. (2024) Twitter 403,688 Negative, Neutral, Positive, Very positive