Skip to main content
. 2024 Jun 21;15:5165. doi: 10.1038/s41467-024-48792-2

Table 2.

Main results on quadruped benchmarks

Method Pre-trained weights Data ratio mAP RMSE Dataset NE_IID NE_OOD Architecture
Zero-shot SuperAnimal 68.038 12.971 AP-10K HRNetw32
Zero-shot SuperAnimal 66.110 12.849 AP-10K AnimalTokenPose
Transfer learning ImageNet 1.00 70.548 11.228 AP-10K HRNetw32
Memory replay SuperAnimal 1.00 80.113 11.296 AP-10K HRNetw32
Zero-shot AP-10K 79.447 5.774 AnimalPose HRNetw32
Zero-shot SuperAnimal 84.639 4.884 AnimalPose HRNetw32
Zero-shot SuperAnimal 83.043 5.154 AnimalPose AnimalTokenPose
Transfer learning ImageNet 1.00 86.864 5.757 AnimalPose HRNetw32
Fine-tuning AP-10K 1.00 86.794 4.860 AnimalPose HRNetw32
Memory replay SuperAnimal 1.00 87.034 4.636 AnimalPose HRNetw32
Zero-shot AP-10K 65.729 4.929 Horse-10 0.296 0.287 HRNetw32
Zero-shot SuperAnimal 71.205 3.958 Horse-10 0.227 0.228 HRNetw32
Zero-shot SuperAnimal 68.977 4.081 Horse-10 0.239 0.233 AnimalTokenPose
Transfer learning ImageNet 0.01 0.934 46.255 Horse-10 2.369 2.36 HRNetw32
Transfer learning ImageNet 1.00 90.516 1.837 Horse-10 0.036 0.135 HRNetw32
Fine-tuning AP-10K 0.01 66.284 5.029 Horse-10 0.286 0.285 HRNetw32
Fine-tuning AP-10K 1.00 93.973 1.220 Horse-10 0.036 0.083 HRNetw32
Memory replay SuperAnimal 0.01 73.366 3.719 Horse-10 0.209 0.202 HRNetw32
Memory replay SuperAnimal 1.00 95.165 1.153 Horse-10 0.040 0.073 HRNetw32
Zero-shot AP-10K 40.389 37.417 iRodent HRNetw32
Zero-shot SuperAnimal 58.557 33.496 iRodent HRNetw32
Zero-shot SuperAnimal 55.415 34.666 iRodent AnimalTokenPose
Transfer learning AP-10K 0.01 12.910 92.649 iRodent HRNetw32
Transfer learning ImageNet 0.01 0.785 152.225 iRodent HRNetw32
Transfer learning ImageNet 1.00 58.857 35.651 iRodent HRNetw32
Fine-tuning AP-10K 0.01 43.144 37.704 iRodent HRNetw32
Fine-tuning AP-10K 1.00 61.635 26.758 iRodent HRNetw32
Memory replay SuperAnimal 0.01 60.853 31.801 iRodent HRNetw32
Memory replay SuperAnimal 1.00 72.971 24.884 iRodent HRNetw32

Here, the base SuperAnimal-Quadruped model had none of the held-out datasets. Full results can be found in Fig. 2 for fine-tuning with different amounts of data, but the best fine-tuning performance is shown, which matches the top performance of the SuperAnimal (SA) variant as shown in Fig. 2. Cao et al.33 do not report a unified single mAP, rather per animal, therefore we trained a model using their dataset to estimate top-line performance if only trained on AP. Number as reported in ref. 41 using the data from ref. 31.