| Architecture |
Sequential processing (CNNs: local features RNNs: temporal dependencies) |
Self-attention mechanisms (context-aware embeddings) |
| Preprocessing complexity |
High (tokenization, normalization, dialect handling) |
Moderate (BERT tokenizers handle dialects better) |
| Data efficiency |
Requires 10k+ labeled samples |
Effective with 5k+ samples (transfer learning) |
| Key strengths |
• Interpretable feature extraction |
• Contextual understanding |
| • Hardware efficiency |
• Cross-dialect adaptability |
| • Effective for short texts |
• State-of-the-art performance |
| Limitations |
• Struggles with long-range dependencies |
• Computational intensity |
| • Dialect generalization issues |
• Arabic-specific pretraining needed |
| • Platform-specific bias |
• Data imbalance sensitivity |
| Top techniques |
• Improved Bi-LSTM (92% F1) |
• AraBERT (95% F1) |
| • CNN-LSTM (83.65% F1) |
• MARBERT (92.41% F1) |
| • Bi-GRU (75.8% F1) |
• AraBERTv0.2 (84.5% F1) |
| Dataset dependencies |
YouTube/Twitter-centric (OSACT4, ArHS, OffensEval) |
Multi-platform (Twitter, YouTube, Facebook) |
| Fine-tuning |
N/A (fixed architectures) |
Layer freezing, adapter modules, task-specific pretraining |