Subject area | Computer Science |
More specific subject area | Arabic Language, Text Classification, Machine Learning, Natural Language Processing |
Type of data | Text files |
How data was acquired | By scraping news websites |
Data format | Raw |
Experimental factors |
Texts are not cleaned or stemmed. Texts are organized into files; each file is one news article. Text files are grouped in folders where each folder corresponds to a category. |
Experimental features | The dataset contains almost 200k articles, organized into a maximum of 7 categories. |
Data source location | N/A |
Data accessibility | Data is free, publicly available and can be downloaded from: |