|
Tanase, Cercel & Chiru (2020)
|
Tokenization using BERT specific tokenizer, normalizing hashtag and replacing emoji’s with textual description. |
Offensive 2020 dataset/Arabic subset 7000 tweets. Classes: offensive or not offensive |
BERT-based model F1 = 82.19% |
The authors suggest that the limited availability of training data for non-English languages could improve the performance of multilingual models. They plan to use transfer learning to leverage similar tasks in the same language to enhance offensive language detection models. |
|
Socha (2020)
|
Multiple consecutive user mentions replaced with a single. All tweets truncated or padded to common length. |
Dataset in Almaliki et al. (2023)/12,698 tweets. Classes: offensive (OFF)/not offensive (NOT) |
Monolingual: Arabic BERT BASE F1 = 86% |
The article does not mention any limitation, challenges or future research directions. Observation: Monolingual models outperform multilingual models for Arabic. A minimal amount of preprocessing was done. |
|
Alami et al. (2020)
|
Substitute emoji’s with special token and translate emoji’s meanings from English to Arabic, then concatenate emoji’s-free tweets with their Arabic meanings. Tokenization. |
The Arabic dataset used in OffensEval 2020/10,000 tweet. Classes: offensive (OFF), not offensive (NOT OFF) |
AraBERT F1 = 90.17% |
The AraBERT model faced a problem with the (MASK0 token not being included in the fine-tuning dataset. The issue was resolved by replacing Twitter emojis with (MASK) tokens. Future work includes using advanced word embeddings. |
|
Abdul-Mageed, Elmadany & Nagoudi (2021)
|
Removing diacritics and replacing URLs, user mentions, and hashtags with generic string tokens (URL, USER, HASHTAG). Tokenization. |
For shared task (subtasks A and B) Masadeh, Davanager & Muaad (2022)/10,000 tweet. Classes: social meaning task hate and offensive detection (Hate, not-hate/offensive, not-offensive) |
ARBERT F1 = 83% for hate, F1 = 90% for offensive. MARBERT F1 = 84.79% for hate, F1 = 92.41% for offensive. |
The authors aim to improve multilingual language models by self-training and creating models that use less energy, as high inference costs and lack of diversity in non-English pre-training data limit their effectiveness. |
|
Hadj Ameur & Aliane (2021)
|
Removing diacritical marks, links, user’s references, elongated and repeated characters. Normalization. Tokenization. |
AraCOVID19-MFH dataset/10,828 tweets. Classes: Yes, No, Indeterminate |
AraBERTCov19 F1 = 98.58% |
The tweet preprocessing was used for model training, not released with the dataset. Authors plan to re-annotate using multiple annotators and expand the annotated dataset with COVID-19 events and discussions. |
|
Masadeh, Davanager & Muaad (2022)
|
Remove punctuation, slang, and stop words. Tokenization. Stemming and Lemmatization. |
Dataset used in (Alghamdi et al., 2024)/6,164 tweet+ Arabic Jordanian General Tweets (AJGT) corpus, with 900 tweet. Classes: Hate, Non-hate |
BERT-AJGT Acc = 79% |
The study focuses on detecting religious hate speech in Arabic, addressing mixed language issues. The future plan is exploring methods for detecting racism, misogyny, and religious prejudice. |
|
Boulouard et al. (2022)
|
Remove emoji’s, punctuation, stop words, and extra letters used for emphasis. Some words stemmed and lemmatized. Tokenization. |
11,268 Arabic YouTube comments. Classes: Hateful: 1, Non-hateful: 0 |
AraBERT Pre = 95%, F1 = 95%, Rec = 96%, Acc = 96% |
BERT models need to handle Arabic dialects, but their versatility limits multilingual performance. Future plans involve using more Levantine and North-African dialect datasets, including “Arabizi”. |
|
Althobaiti (2022)
|
Remove HTML tags, hashtags, mentions, diacritical, punctuation, mathematical signs and symbols, URLs, retweets RT and symbols different from emoji’s. Normalization. |
The Arabic dataset in Zaghouani, Mubarak & Biswas (2024), consisting of 12,698 Arabic tweets. Classes: Offensive language detection: OFF, NOT OFF. Hate speech detection: HS, NOT HS |
BERT-Based Offensive language detection: F1 = 84.3%. Hate speech detection: F1 = 81.8% |
The article does not mention any limitation, challenges, or future research directions.Observations: Additional research is needed to properly understand the influence of emojis and their textual explanations, as the dataset used in the study may be too small and uneven. |
|
Alzu’bi et al. (2022)
|
Remove URLs, mentions, diacritics, tatweel, punctuation, noisy signals in the tweet. Emoji’s translated to Arabic using an English to Arabic model. |
OSCAT5 Arabic hate speech task/12,698 tweets. Classes: OFF, NOT OFF |
AraBERTv0.2-Twitter-large Pre = 85.2%, F1 = 84.9%, Rec = 84.7%, Acc = 86.4% |
Dialect mismatch in pre-trained models makes normalizing tweet dialects, extracting relevant features like POS tags and NER, and recognizing offensive tweets challenging. Future research directions are not explicitly mentioned in the article. |
|
Ben Nessir et al. (2022)
|
Remove white spaces, non-Arabic tokens, USER, URL, and emoji’s. Normalizing all the hashtags by simply decomposing them. |
Dataset in Zaghouani, Mubarak & Biswas (2024)/12,698 tweets. Classes: Subtask A: offensive, not offensive. Subtask B: hate, not hate. Subtask C: fine-grained type of hate speech |
MARBERT fine-tuned with QRNN Acc = 85.4% for Subtask A, Acc = 94.1% for Subtask B, Acc = 91.9% for Subtask C on the test dataset. |
Language complexity in Arabic, cultural, political, and religious dependence, and dialect differences contribute to unbalanced data and class proportions. Future research should explore meta-learning, focus loss, semi-supervised learning, and incorporating disabled and religious minorities. |
|
Shapiro, Khalafallah & Torki (2022)
|
Remove repeated characters, emoji’s, diacritic and symbols. Normalization. |
Dataset in Zaghouani, Mubarak & Biswas (2024)/12,698 tweets. Classes: Subtask A: offensive, not offensive. Subtask B: hate, not hate. Subtask C: fine-grained type of hate speech |
MarBERT v2 Subtask A F1 = 84.1%, Subtask B F1 = 81.7%, Subtask C F1 = 47.6% |
Small or unbalanced dataset overfitting. Larger data sets degrade contrastive loss. Future solutions include using a language-agnostic encoder with contrastive aim and utilizing data from multiple languages for the same function to address data imbalance. |
|
Almaliki et al. (2023)
|
Removing @username, URLs, hashtags, punctuation. Tokenization Normalization. |
9,352 tweets. Classes: normal, abusive, hate speech |
ABMM (Arabic BERT-Mini Model) Pre = F1 = Rec = Acc = 98.6% |
The study suggests incorporating data from Facebook and exploring text representation methods like AraVec to improve neural network model training and enhance the dataset, despite hardware limitations. |
|
de Paula et al. (2023)
|
Removing punctuation, special characters, stop words. Converting lower case to upper. Stemming. Tokenization. Lemmatization. |
CERIST NLP challenge dataset/10,828 tweets. Classes: Hateful, Not Hateful |
AraBERT F1 = 60%, Acc = 86% |
Dataset limited to COVID-19 disinformation domain. Small proportion of hate speech in the dataset (11%). The article does not mention any future research directions. |
|
Khezzar, Moursi & Al Aghbari (2023)
|
Removing hashtags, stop words, filter out irrelevant symbols. Lemmatization Normalization. |
arHateDataset/34,107 tweet. Classes: hate, normal |
AraBERT F1 = 93% |
Problems with data imbalances and Arabic dialect complexity. The article does not suggest future research directions. |
|
Chiker (2023)
|
Removing elongations, non-Arabic characters, numbers, symbols, emoticons, punctuation, hashtags, web addresses, empty lines, diacritics. and stop words. Normalization. |
Provided by CERIST/10,278 Comments from Twitter and others social media. Classes: Hateful, Not hateful |
BERT + GRU and LSTM For focal loss training F1 = 98.02%. For data augmentation F1 = 99.14% |
Imbalance between “hateful” and “not hateful” classes. The article does not suggest future research directions. |
|
Alghamdi et al. (2024)
|
Removing diacritics, punctuation, repeated characters, symbols, special characters, URLs, English tokens, emoji’s. Normalization. |
AraTar corpus/11,219 tweets. Classes: Task1: RH (Religious Hate), EH (Ethnic Hate), NH (Nationality Hate), GH (Gender Hate), UDH (Undefined hate), CL (Clean) |
AraBERTv0.2-twitter (base) F1 = 84.5% |
Not all Arabic dialects are incorporated. The future plan is to improve the corpus representation for underrepresented hate targets with data augmentation. |
|
Zaghouani, Mubarak & Biswas (2024)
|
Removing unwanted characters, English words, and punctuation. |
15,965 tweets. Classes: multi-labels. For hate speech and offensive: Yes, No |
AraBERT F1 = 66% for hate speech detection. F1 = 65% for offensive language detection. |
Arabic regional backgrounds of annotators may affect labeling accuracy. The article does not suggest future research directions. |
|
Bensoltane & Zaki (2024)
|
Removing dates, time, numbers both in English and Arabic, URLs, and Twitter-specific symbols. |
OSCAT-5 dataset/12,698 tweets. Classes: offensive, normal, hate (disability, social class, race, gender, religion, ideology). |
MARBERT v2+BiGRU F1 = 61.68% |
Unbalanced dataset. The future plan is to Combine BERT with different neural network designs and investigate transformer-based models. Find solutions for unbalanced datasets. |
|
Eddine & Boualleg (2024)
|
Removing mentions, URLs, RT, hashtags, punctuation, special characters, numerical characters, repeated characters, Arabic stop words, non-Arabic letters, new lines and diacritics. Normalization. |
11,634 tweets +6853 tweets used for data augmentation. Classes: non-hate, general hate speech, sexism, racism, and religious hate speech. |
Ensemble learning based on pre-trained models. F1 = 85.48% using majority voting and 85.10% using average voting. Data-augmented model F1 = 85.65% |
Confined to specific dataset and time period. The future plan is to improve the contextual embedding model, classify Algerian hate speech, and track trends in hate speech. |
|
Asiri & Saleh (2024)
|
replace user mentions with “USER”, URLs with “URL”, and newline with “NL”. |
24,500 tweets, by data augmentation to over 35,000 tweets to address class imbalance. Classes: Offensive/non-offensive. Multi-classes: general insults, hate speech, or sarcasm |
91% F1-score with data augmentation techniques using the AraBERT model |
limited regional coverage reduces model generalizability, while models such as AraBERT demand substantial computational resources. Future work should prioritize: (1) developing comprehensive dialect-specific datasets, (2) refining dialect-aware NLP tools, and (3) optimizing models for dialectal variations. |
|
Mazari, Benterkia & Takdenti (2024)
|
replace e-mail address and user mentions with <user>, URLs with <url>, and numbers by the ¡number¿, etc. Remove Arabic diacritics and elongations, pictographs, symbols, flags, etc. |
OSACT2020 dataset/10,000 posts Classes: Offensive, Not Offensive |
Ensemble learning models based on BERT, 94.56% F1-score. |
Class imbalance presents a significant challenge in datasets. Future work will evaluate models for detecting offensive Arabic language forms, explore pretrained BERT variants, and generative AI models to address challenges in detecting such language. |
|
Mousa et al. (2024)
|
Cleaning, normalization, Farasa segmentation, and tokenization. |
13,000 tweets Multiclasses: racism, bullying, insult, obscene language, and non-offensive content. |
ArabicBERT–BiLSTM-RBF with F score 98.4%. |
limitations including computational complexity from cascaded models, extended training times due to large datasets, and reliance on multiple machine learning model combinations. Future work will focus on: (1) adopting faster contextual models to replace BERT architectures, (2) optimizing parameters and feature extraction for efficiency, (3) integrating attention mechanisms for acceleration, and (4) evaluating cross-lingual performance. |