Skip to main content
. 2022 Oct 10;26(2):1425–1446. doi: 10.1007/s10586-022-03754-5

Table 2.

The short-text datasets and their respective percentages of positive-negative instances

S # Dataset # of instances Positive % Negative
1 Sentiment140 (S) 1.6 million 50 50
2 Jigsaw toxic comments (J) 312,735 7 65
3 Landslide (L) 282,152 83 17
4 DrugsCom (D) 215,063 75 25
5 Amazon baby reviews (A) 183,531 76 14
6 IMDB movie reviews (I) 50,000 50 50
7 Google playstore reviews (G) 64,295 74 26
8 Coronavirus_archive (C) 46,162 41 59
9 Reddit data (R) 37,249 43 35
10 Women clothing ecommerce (W) 23,486 82 18