. 2023 Jul 12;4(1):100218. doi: 10.1016/j.xjidi.2023.100218

Table 8.

Average F₁-Box Score of each Specialist Versus the Consensus (Ground Truth)

Specialist	F₁-Box Score
Specialist	Mild	Moderate	Severe
A	0.65	0.58	0.53
B	0.44	0.33	0.23
C	0.63	0.55	0.41
D	0.51	0.37	0.38
E	0.62	0.52	0.49
Mean	0.57	0.47	0.41

To be comparable to the models, we computed the F₁-box score of each specialist to the ground truth in every validation split and then obtained the mean and SD. The “none” category (i.e., the healthy images) has been omitted because they were not annotated, and their agreement on this category is not required because it is always 1.