Table 6:
F1 score | A | B | C | D | E | F | G | H | I | Mean | Mean (A – H) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Human Scorer | μ | 0.62 | 0.57 | 0.65 | 0.68 | 0.62 | 0.65 | 0.71 | 0.61 | 0.32 | 0.60 | 0.64 |
(σ) | (0.17) | (0.16) | (0.14) | (0.16) | (0.19) | (0.14) | (0.11) | (0.17) | (0.2) | (0.19) | (0.16) | |
Model | μ | 0.7 | 0.67 | 0.68 | 0.69 | 0.70 | 0.72 | 0.7 | 0.66 | 0.71 | 0.69 | 0.69 |
(σ) | (0.14) | (0.14) | (0.11) | (0.1) | (0.13) | (0.08) | (0.08) | (0.12) | (0.12) | (0.12) | (0.11) | |
p-val | 0.033 | 0.006 | 0.318 | 0.755 | 0.038 | 0.016 | 0.67 | 0.19 | 1.6e-14 | 6.9e-12 | 1.7e-5 |