Table 2.
Method | Pre-trained weights | Data ratio | mAP | RMSE | Dataset | NE_IID | NE_OOD | Architecture |
---|---|---|---|---|---|---|---|---|
Zero-shot | SuperAnimal | – | 68.038 | 12.971 | AP-10K | – | – | HRNetw32 |
Zero-shot | SuperAnimal | – | 66.110 | 12.849 | AP-10K | – | – | AnimalTokenPose |
Transfer learning | ImageNet | 1.00 | 70.548 | 11.228 | AP-10K | – | – | HRNetw32 |
Memory replay | SuperAnimal | 1.00 | 80.113 | 11.296 | AP-10K | – | – | HRNetw32 |
Zero-shot | AP-10K | – | 79.447 | 5.774 | AnimalPose | – | – | HRNetw32 |
Zero-shot | SuperAnimal | – | 84.639 | 4.884 | AnimalPose | – | – | HRNetw32 |
Zero-shot | SuperAnimal | – | 83.043 | 5.154 | AnimalPose | – | – | AnimalTokenPose |
Transfer learning | ImageNet | 1.00 | 86.864 | 5.757 | AnimalPose | – | – | HRNetw32 |
Fine-tuning | AP-10K | 1.00 | 86.794 | 4.860 | AnimalPose | – | – | HRNetw32 |
Memory replay | SuperAnimal | 1.00 | 87.034 | 4.636 | AnimalPose | – | – | HRNetw32 |
Zero-shot | AP-10K | – | 65.729 | 4.929 | Horse-10 | 0.296 | 0.287 | HRNetw32 |
Zero-shot | SuperAnimal | – | 71.205 | 3.958 | Horse-10 | 0.227 | 0.228 | HRNetw32 |
Zero-shot | SuperAnimal | – | 68.977 | 4.081 | Horse-10 | 0.239 | 0.233 | AnimalTokenPose |
Transfer learning | ImageNet | 0.01 | 0.934 | 46.255 | Horse-10 | 2.369 | 2.36 | HRNetw32 |
Transfer learning | ImageNet | 1.00 | 90.516 | 1.837 | Horse-10 | 0.036 | 0.135 | HRNetw32 |
Fine-tuning | AP-10K | 0.01 | 66.284 | 5.029 | Horse-10 | 0.286 | 0.285 | HRNetw32 |
Fine-tuning | AP-10K | 1.00 | 93.973 | 1.220 | Horse-10 | 0.036 | 0.083 | HRNetw32 |
Memory replay | SuperAnimal | 0.01 | 73.366 | 3.719 | Horse-10 | 0.209 | 0.202 | HRNetw32 |
Memory replay | SuperAnimal | 1.00 | 95.165 | 1.153 | Horse-10 | 0.040 | 0.073 | HRNetw32 |
Zero-shot | AP-10K | – | 40.389 | 37.417 | iRodent | – | – | HRNetw32 |
Zero-shot | SuperAnimal | – | 58.557 | 33.496 | iRodent | – | – | HRNetw32 |
Zero-shot | SuperAnimal | – | 55.415 | 34.666 | iRodent | – | – | AnimalTokenPose |
Transfer learning | AP-10K | 0.01 | 12.910 | 92.649 | iRodent | – | – | HRNetw32 |
Transfer learning | ImageNet | 0.01 | 0.785 | 152.225 | iRodent | – | – | HRNetw32 |
Transfer learning | ImageNet | 1.00 | 58.857 | 35.651 | iRodent | – | – | HRNetw32 |
Fine-tuning | AP-10K | 0.01 | 43.144 | 37.704 | iRodent | – | – | HRNetw32 |
Fine-tuning | AP-10K | 1.00 | 61.635 | 26.758 | iRodent | – | – | HRNetw32 |
Memory replay | SuperAnimal | 0.01 | 60.853 | 31.801 | iRodent | – | – | HRNetw32 |
Memory replay | SuperAnimal | 1.00 | 72.971 | 24.884 | iRodent | – | – | HRNetw32 |
Here, the base SuperAnimal-Quadruped model had none of the held-out datasets. Full results can be found in Fig. 2 for fine-tuning with different amounts of data, but the best fine-tuning performance is shown, which matches the top performance of the SuperAnimal (SA) variant as shown in Fig. 2. Cao et al.33 do not report a unified single mAP, rather per animal, therefore we trained a model using their dataset to estimate top-line performance if only trained on AP. Number as reported in ref. 41 using the data from ref. 31.