Table 3.

Agreement statistics between human raters in expert group, non-expert group, and combined.

Rater experience	Inter-rater-agreement	Cronbach alpha	Mean recall (SD) (%)	Mean precision (SD) (%)	F1	Windowed gold standard (GS) used to calculate precision and recall
Expert	0.86	0.80	79 (2)	91 (1)	0.846	Average of each expert compared to Expert GS (with compared expert removed)
Non-expert	0.73	0.65	68 (15)	77 (7)	0.722	Average of each non-expert compared to Expert GS
Combined raters	0.74	0.68	76 (13)	83 (9)	0.793	Average of each rater (non-expert and expert) compared to Expert GS (removing compared expert from GS)