Skip to main content
. 2019 Jun 4;25:104076. doi: 10.1016/j.dib.2019.104076

Specifications Table

Subject area Computer Science
More specific subject area Arabic Language, Text Classification, Machine Learning, Natural Language Processing
Type of data Text files
How data was acquired By scraping news websites
Data format Raw
Experimental factors Texts are not cleaned or stemmed.
Texts are organized into files; each file is one news article.
Text files are grouped in folders where each folder corresponds to a category.
Experimental features The dataset contains almost 200k articles, organized into a maximum of 7 categories.
Data source location N/A
Data accessibility Data is free, publicly available and can be downloaded from:https://data.mendeley.com/datasets/57zpx667y9