A. Example frame sequence from the mirror-mouse dataset. Predictions from a DeepLabCut model (trained on 631 frames) are overlaid (magenta ×), along with the ground truth (green +). Open white circles denote the location of the same body part (left hind paw) in the other (top) view; given the geometry of this setup, a large horizontal displacement between the top and bottom predictions indicates an error. Each frame is accompanied with “standard outlier detectors,” including confidence, temporal difference loss (shaded in blue), and “proposed outlier detectors,” including multi-view PCA loss (shaded in red; Pose PCA excluded for simplicity). indicates an inlier as defined by each metric, and indicates an outlier. Confidence is high for all frames shown, and the temporal difference loss misses error frame 294 which does not contain an immediate jump, and flags frame 292 which demonstrates a jump to the correct location. Multi-view PCA captures these correctly. B. Example traces from the same video. Blue background denotes times where standard outlier detection methods flag frames: confidence falls below a threshold (0.9) and/or the temporal difference loss exceeds a threshold (20 pixels). Red background indicates times where the multi-view PCA error exceeds a threshold (20 pixels). Purple background indicates both conditions are met. C. The total number of keypoints flagged as outliers by each metric, and their overlap. D. Area under the receiver operating characteristic curve (AUROC) for each paw, for DeepLabCut models trained with 75 and 631 labeled frames (left and right columns, respectively). AUROC=1 indicates the metric perfectly identifies all nominal outliers in the video data; 0.5 indicates random guessing. AUROC values are computed across all frames from 20 test videos; boxplot variability is over n=5 random subsets of training11 data. Boxes use 25th/50th/75th percentiles for min/center/max; whiskers extend to 1.5 * IQR (inter-quartile range).