Skip to main content
. 2022 Apr 12;19(4):496–504. doi: 10.1038/s41592-022-01443-0

Fig. 2. Multi-animal DeepLabCut keypoint detection and whole-body assembly performance.

Fig. 2

a, Distribution of keypoint prediction error for DLCRNet_ms5 with stride 8 (70% train and 30% test split). Violin plots display train (top) and test (bottom) errors. Vertical dotted lines are the first, second and third quartiles. Median test errors were 2.69, 5.62, 4.65 and 2.80 pixels for the illustrated datasets, in order. Gray numbers indicate PCK. Only the first five keypoints of the parenting dataset belong to the pups; the 12 others are keypoints of the adult mouse. b, Illustration of our data-driven skeleton selection algorithm. Mouse cartoon adapted with permission from ref. 29 under a Creative Commons licence (https://creativecommons.org/licenses/by/4.0/). c, Animal assembly quality as a function of part affinity graph (skeleton) size for baseline (user-defined) versus data-driven skeleton definitions. The top row displays the fraction of keypoints left unconnected after assembly, whereas the bottom row designates the accuracy of their grouping into distinct animals. The colored dots mark statistically significant interactions (two-way, repeated-measures ANOVA; see Supplementary Tables 14 for full statistics). Light red vertical bars highlight the graph automatically selected. d, mAP as a function of graph size. Shown on test data held out from 70% train and 30% test splits. The associative embedding method does not rely on a graph. The performance of MMPose’s implementation of ResNet-AE and HRNet-AE bottom-up variants is shown for comparison against our multi-stage architecture DLCRNet_ms5, here called Baseline. Data-driven is Baseline plus calibration method (one-way ANOVA show significant effects of the model: P values, tri-mouse 8.8 × 10−8, pups 6.5 × 10−13, marmosets 3.8 × 10−11, fish 4.0 × 10−12). e, Marmoset ID–Example test image together with overlaid animal identity prediction accuracy per keypoint averaged over all test images and test splits. With ResNet50_stride8, accuracy peaks at 99.2% for keypoints near the head and drops to only 95.1% for more distal parts. In the lower panel, plus signs denote individual splits, circles show the averages.