. 2023 Sep 20;63(19):6053–6067. doi: 10.1021/acs.jcim.3c00422

Table 3. Full Breakdown of the Combined Arrow Detection/Classification Model Metrics on Our Evaluation Set^a.

	TP	FN	FP	recall	precision	F-score
arrow detection	1131	43	51	96.3%	95.7%	96.0%
solid A. classification	1071	48	38	95.7%	96.6%	96.1%
curly A. classification	33	5	17	86.8%	66.0%	75.0%
equilibrium A. classification	12	4	5	75.0%	70.6%	72.7%
resonance A. classification	0	0	6	N/A	0%	N/A

In the first row, the overall metrics for arrow detection are shown (c.f. Table 1), and in rows 2–5, evaluation of the classification task is shown. This task is assessed independently from the detection task—detected arrows are included in false negatives in rows 2–5. In the evaluation set, no resonance arrows were present.

Table 3. Full Breakdown of the Combined Arrow Detection/Classification Model Metrics on Our Evaluation Seta.

Table 3. Full Breakdown of the Combined Arrow Detection/Classification Model Metrics on Our Evaluation Set^a.