Table 2.
Main category | Subcategory | Count | Subtotala |
---|---|---|---|
Unanswerable | (a) Vague question | 6 | 14 |
(b) Expert deemed unanswerable using only text | 8 | ||
System answered | (c) Expert judged the system acceptable as the gold | 6 | 18 |
(d) Expert sided with the system against the gold | 12 | ||
(e) Real FN | 18 | 68 | |
(f) Expert disagreed with both the system and gold | 7 | ||
System refrained | (g) Real FN | 24 | |
(h) Correct answer ranked second place | 19 |
aThis column stands for a “redemption” perspective: 14% that the system was not supposed to make it, 18% where the system answer was actually right, and 68% that the system was truly attributed for the FN.