Skip to main content
. 2023 May 22;26(6):1054–1067. doi: 10.1038/s41593-023-01332-5

Extended Data Fig. 1. Human-Human agreement is high and exceeds supervised behavioral classifier output.

Extended Data Fig. 1

a, Comparison of annotation agreement between two humans and between a human and a supervised behavioral classifier (DeepEthogram; Methods). n = 30 videos for all groups; same videos re-annotated by second human and held out as test dataset for supervised classifier. Blue, human-human agreement; pink, human-DeepEthogram agreement. Box bounds, 25th and 75th percentile; red line, median; whiskers, 5th and 95th percentile; +, outliers. background, unlabeled frames; all, all behaviors combined. ***p < 0.001, two-sided Wilcoxon rank sum test (Supplementary Table 7). b, Representative ethograms from six of 30 videos, displaying the annotations of two humans (top two rows of each plot) and the output of DeepEthogram (bottom row). Values within parenthesis at left, F1 score of the corresponding annotation compared with Human1 (top row annotation) for all behaviors combined. t = 0 marks the time of completed egg expulsion (egg out).