Table 3. Full Breakdown of the Combined Arrow Detection/Classification Model Metrics on Our Evaluation Seta.
TP | FN | FP | recall | precision | F-score | |
---|---|---|---|---|---|---|
arrow detection | 1131 | 43 | 51 | 96.3% | 95.7% | 96.0% |
solid A. classification | 1071 | 48 | 38 | 95.7% | 96.6% | 96.1% |
curly A. classification | 33 | 5 | 17 | 86.8% | 66.0% | 75.0% |
equilibrium A. classification | 12 | 4 | 5 | 75.0% | 70.6% | 72.7% |
resonance A. classification | 0 | 0 | 6 | N/A | 0% | N/A |
In the first row, the overall metrics for arrow detection are shown (c.f. Table 1), and in rows 2–5, evaluation of the classification task is shown. This task is assessed independently from the detection task—detected arrows are included in false negatives in rows 2–5. In the evaluation set, no resonance arrows were present.