Table 1. Sentiment label distribution of Twitter datasets in 13 languages.
Language | Negative | Neutral | Positive | Total | Quality | |
---|---|---|---|---|---|---|
Albanian | alb | 7,062 | 15,066 | 23,630 | 45,758 | poor |
Bulgarian | bul | 14,374 | 28,961 | 19,932 | 63,267 | fair |
English | eng | 23,250 | 38,457 | 25,721 | 87,428 | v.good |
German | ger | 19,039 | 52,166 | 26,743 | 97,948 | fair |
Hungarian | hun | 9,062 | 17,833 | 30,410 | 57,305 | good |
Polish | pol | 59,027 | 48,658 | 84,245 | 191,930 | good |
Portuguese | por | 56,008 | 53,026 | 43,009 | 152,043 | fair |
Russian | rus | 30,249 | 37,401 | 25,671 | 93,321 | good |
Ser/Cro/Bos | scb | 58,796 | 61,265 | 73,766 | 193,827 | fair |
Slovak | slk | 15,060 | 13,112 | 30,598 | 58,770 | good |
Slovenian | slv | 34,164 | 48,458 | 30,210 | 112,832 | good |
Spanish | spa | 27,675 | 88,481 | 117,048 | 233,204 | poor |
Swedish | swe | 22,381 | 15,387 | 13,630 | 51,398 | good |
Total | 376,147 | 518,271 | 544,613 | 1,439,031 |