Fig. 1. Heatmap of the ground-truth explanation annotations across hospitals.
We average the explanation annotations for the a, needle handling and b, needle driving video samples in the test set of the Monte Carlo folds (see Supplementary Table 2 for total number of samples), and present them over a normalized time index, where 0 and 1 reflect the beginning and end of a video sample, respectively. A darker shade (which ranges from 0 to 1 as per the colour bars) implies that a segment of time is of greater importance.