. 2018 Mar 13;13(3):e0194317. doi: 10.1371/journal.pone.0194317

Table 1. Sentiment label distribution of Twitter datasets in 13 languages.

The last column is a qualitative assessment of the annotation quality, based on the levels of the self- and inter-annotator agreement.

Language		Negative	Neutral	Positive	Total	Quality
Albanian	alb	7,062	15,066	23,630	45,758	poor
Bulgarian	bul	14,374	28,961	19,932	63,267	fair
English	eng	23,250	38,457	25,721	87,428	v.good
German	ger	19,039	52,166	26,743	97,948	fair
Hungarian	hun	9,062	17,833	30,410	57,305	good
Polish	pol	59,027	48,658	84,245	191,930	good
Portuguese	por	56,008	53,026	43,009	152,043	fair
Russian	rus	30,249	37,401	25,671	93,321	good
Ser/Cro/Bos	scb	58,796	61,265	73,766	193,827	fair
Slovak	slk	15,060	13,112	30,598	58,770	good
Slovenian	slv	34,164	48,458	30,210	112,832	good
Spanish	spa	27,675	88,481	117,048	233,204	poor
Swedish	swe	22,381	15,387	13,630	51,398	good
Total		376,147	518,271	544,613	1,439,031