Table 6. IAA and concordance between the model and humans regarding arguments identified in sentences.
Argument | Annotator 1 (N) | Annotator 1 (Argument) | Annotator 2 (N) | Annotator 2 (Argument) | Neural network (N) | Neural network (Argument) | Precision | Recall | F1 score | K (Index ± SE) |
---|---|---|---|---|---|---|---|---|---|---|
PAR_RDNS | 6,207 (74%) | 297 (4%) | 277 (3%) | 1,633 (19%) | 0.8550 | 0.8461 | 0.8505 | 0.8063 ± 0.0078 | ||
CHILD_OPIN | 6,207 (74%) | 277 (3%) | ||||||||
PSY_REP | 7,835 (93%) | 90 (1%) | 114 (1%) | 375 (4%) | 2,177 (65%) | 128 (4%) | 0.8698 | 0.8358 | 0.8524 | 0.7888 ± 0.0117 |
PAR_RELAT | 7,951 (94%) | 92 (1%) | 90 (1%) | 281 (3%) | 3,195 (96%) | 18 (1%) | 0.7574 | 0.7534 | 0.7554 | 0.7441 ± 0.0182 |
BEST_INT | 5,974 (71%) | 563 (7%) | 299 (4%) | 1,578 (19%) | 3,184 (96%) | 32 (1%) | 0.8407 | 0.7370 | 0.7855 | 0.7186 ± 0.0090 |
PAR_DED | 7,435 (88%) | 158 (2%) | 270 (3%) | 551 (7%) | 2,064 (78%) | 104 (3%) | 0.6711 | 0.7772 | 0.7203 | 0.6924 ± 0.0140 |
3,004 (90%) | 62 (2%) | 78 (2%) | 184 (4%) |