Self-Supervised Keypoint Discovery in Behavioral Videos

. Author manuscript; available in PMC: 2023 Jan 9.

Published in final edited form as: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2022 Sep 27;2022:2161–2170. doi: 10.1109/cvpr52688.2022.00221

Figure 2. — I_t and *I_t+T* are video frames at time t and *t + T*. Both frame I_t and frame I_t+T are fed to an appearance encoder Φ and a pose decoder Ψ. Given the appearance feature from I_t and geometry features from both I_t and I_t+T (Sec 3.1), our model reconstructs the spatiotemporal difference (Sec 3.2.1) computed from two frames using the reconstruction decoder ψ.

Figure 2. B-KinD, an approach for keypoint discovery from spatiotemporal difference reconstruction.