Table 2.
The mean P, R, and F1 scores for the 3 labels in the FluTrack dataset
Related vs unrelated |
Awareness vs infection |
Self vs other |
|||||||
---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | |
Linear SVM | 0.766 | 0.823 | 0.793 | 0.821 | 0.816 | 0.818 | 0.766 | 0.823 | 0.793 |
CNN GloVe 300 | 0.809c | 0.850 b | 0.827c | 0.903c | 0.906 c | 0.905 c | 0.809c | 0.847b | 0.827c |
CNN Twitter GloVe 50 | 0.813c | 0.832 | 0.822c | 0.850c | 0.848c | 0.849c | 0.813c | 0.832 | 0.823c |
CNN Twitter GloVe 100 | 0.816 c | 0.850 b | 0.832 c | 0.919c | 0.881c | 0.900c | 0.816 c | 0.850 b | 0.832 c |
CNN Twitter GloVe 200 | 0.800c | 0.822 | 0.811c | 0.866c | 0.882c | 0.874c | 0.800c | 0.822 | 0.811c |
CNN Word2Vec 300 | 0.796c | 0.839a | 0.817c | 0.902c | 0.903c | 0.903c | 0.796c | 0.839a | 0.817c |
BiLSTM GloVe 300 | 0.771 | 0.836 | 0.802b | 0.857c | 0.771 | 0.812 | 0.771a | 0.836 | 0.802b |
BiLSTM Twitter GloVe 50 | 0.759 | 0.845a | 0.799a | 0.748 | 0.760 | 0.754 | 0.759 | 0.845a | 0.799a |
BiLSTM Twitter GloVe 100 | 0.795c | 0.794 | 0.794 | 0.821 | 0.752 | 0.785 | 0.795c | 0.794 | 0.794 |
BiLSTM Twitter GloVe 200 | 0.767 | 0.837 | 0.800a | 0.876c | 0.737 | 0.800 | 0.767 | 0.837 | 0.800a |
BiLSTM Word2Vec 300 | 0.788c | 0.829 | 0.808c | 0.833a | 0.819 | 0.826 | 0.788c | 0.829 | 0.808c |
P: precision; R: recall. Bold font indicates the best result obtained in each column.
P value (resulting from the Wilcoxon signed rank test) between .05 and .01.
P value (resulting from the Wilcoxon signed rank test) between .01 and 0001.
P value (resulting from the Wilcoxon signed rank test) that is ≤.001.