Skip to main content
. Author manuscript; available in PMC: 2024 Jan 25.
Published in final edited form as: Int J Comput Vis. 2023 Feb 22;131(6):1389–1405. doi: 10.1007/s11263-023-01756-3

Table 1. Quantitative comparison with other state-of-the-art 2D and 3D animal and human pose estimation methods.

We report the absolute 3D MPJPE in millimeters for each approach using four different fractions of training data.

Protocol 1 (absolute 3D MPJPE, mm)

Training set fraction
5% 10% 50% 100%

2D pose estimation methods (+ post hoc triangulation)

DLC (Mathis et al., 2018) 11.0973 11.0512 9.8934 8.9060
SimpleBaseline (Xiao, Wu, & Wei, 2018) 18.0990 14.6191 7.3636 5.9555
SimpleBaseline 18.5675 16.5800 8.3573 6.6957
DLC + soft argmax 11.0323 9.2244 6.3545 6.4739
DLC + 2D variant of our temporal constraint* 8.5432 9.1236 5.9526 6.0390

3D monocular pose estimation methods

Temporal Convolution* (Pavllo et al., 2019) - - - 17.6337

3D multi-view pose estimation methods

Learnable Triangulation (Iskakov et al., 2019) 18.7795 15.6614 8.9729 6.3177
DANNCE (Dunn et al., 2021) 12.8754 10.9085 4.9912 4.3614

Ours (temporal baseline)* 12.4940 7.1162 4.8347 4.3749
Ours (temporal + extra)* 8.1706 6.6927 5.0461 4.1409

The methods that use ground truth 2D bounding boxes during inference are masked by †.

The methods that use temporal information during training are masked by *.

For the monocular approach, the reported metric results are separately computed and averaged across all camera views.