Table 1.
Dataset | Classes | Train | Dev | Labels | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
COVID-19 Category (CC) | 2 | 3,094 | 1.031 | Personal | News | ||||||
Vaccine Sentiment (VC) | 3 | 5.000 | 3.000 | N | Neutral | Positive | |||||
Maternal Vaccine Stance (MVS) | 4 | 1.361 | 817 | Disc | A | N | Promotional | ||||
Stanford Sentiment Treebank 2 (SST-2) | 2 | 67.349 | 872 | Negative | Positive | ||||||
Twitter Sentiment SemEval (SE) | 3 | 6.000 | 817 | Neg | Neutral | Positive |
All five evaluation datasets are multi-class datasets with sometimes strong label imbalance, visualized by the proportional bar width in the label column. N and Neg stand for negative; Disc and A stand for discouraging and ambiguous, respectively.