Skip to main content
. 2022 Apr 12;19(4):496–504. doi: 10.1038/s41592-022-01443-0

Fig. 3. Linking whole-body assemblies across time.

Fig. 3

a, Ground truth and reconstructed animal tracks (with DLCRNet and ellipse tracking), together with video frames illustrating representative scene challenges. b, The identities of animals detected in a frame are propagated across frames using local matching between detections and trackers (with costs, ‘motion’ for all datsets and ‘distance’ for fish). c, Tracklets are represented as nodes of a graph, whose edges encode the likelihood that the connected pair of tracklet belongs to the same track. d, Four cost functions modeling the affinity between tracklets are implemented: shape similarity using the undirected Hausdorff distance between finite sets of keypoints (i); spatial proximity in Euclidean space (ii); motion affinity using bidirectional prediction of a tracklet’s location (iii); and dynamic similarity via Hankelets and time-delay embedding of a tracklet’s centroid (iv). e, Tracklet stitching performance versus box and ellipse tracker baselines (arrows indicate if higher or lower number is better), using MOTA, as well as rates of false negative (FN), false positives (FP) and identity switch expressed in events per animal and per sequence of 100 frames. Inset shows that incorporating appearance/identity prediction in the stitching further reduces the number of switches and improves full track reconstruction. Total number of frames: tri-mouse, 2,330; parenting, 2,670; marmosets, 15,000 and fish, 601.