. 2024 Oct 22;10:e2293. doi: 10.7717/peerj-cs.2293

Table 6. IAA and concordance between the model and humans regarding arguments identified in sentences.

Argument	Annotator 1 (N)	Annotator 1 (Argument)	Annotator 2 (N)	Annotator 2 (Argument)	Neural network (N)	Neural network (Argument)	Precision	Recall	F1 score	K (Index ± SE)
PAR_RDNS	6,207 (74%)	297 (4%)	277 (3%)	1,633 (19%)			0.8550	0.8461	0.8505	0.8063 ± 0.0078
CHILD_OPIN					6,207 (74%)	277 (3%)
PSY_REP	7,835 (93%)	90 (1%)	114 (1%)	375 (4%)	2,177 (65%)	128 (4%)	0.8698	0.8358	0.8524	0.7888 ± 0.0117
PAR_RELAT	7,951 (94%)	92 (1%)	90 (1%)	281 (3%)	3,195 (96%)	18 (1%)	0.7574	0.7534	0.7554	0.7441 ± 0.0182
BEST_INT	5,974 (71%)	563 (7%)	299 (4%)	1,578 (19%)	3,184 (96%)	32 (1%)	0.8407	0.7370	0.7855	0.7186 ± 0.0090
PAR_DED	7,435 (88%)	158 (2%)	270 (3%)	551 (7%)	2,064 (78%)	104 (3%)	0.6711	0.7772	0.7203	0.6924 ± 0.0140
	3,004 (90%)	62 (2%)	78 (2%)	184 (4%)