SANAD: Single-label Arabic News Articles Dataset for automatic text categorization

. 2019 Jun 4;25:104076. doi: 10.1016/j.dib.2019.104076

Specifications Table

Subject area	Computer Science
More specific subject area	Arabic Language, Text Classification, Machine Learning, Natural Language Processing
Type of data	Text files
How data was acquired	By scraping news websites
Data format	Raw
Experimental factors	Texts are not cleaned or stemmed. Texts are organized into files; each file is one news article. Text files are grouped in folders where each folder corresponds to a category.
Experimental features	The dataset contains almost 200k articles, organized into a maximum of 7 categories.
Data source location	N/A
Data accessibility	Data is free, publicly available and can be downloaded from:https://data.mendeley.com/datasets/57zpx667y9