Table 1.
Task | Activity | Details | Hospital | Videos | Video samples | Surgeons | Generalization to |
---|---|---|---|---|---|---|---|
Subphase recognition | Suturing | VUA | USC | 78 | 4,774 | 19 | Videos |
SAH | 60 | 2,115 | 8 | Hospitals | |||
HMH | 20 | 1,122 | 5 | Hospitals | |||
USC | 48 | Inference on entire videos | |||||
Gesture classification | Suturing | VUA | USC | 78 | 1,241 | 19 | Videos |
Laboratory | JIGSAWS | 39 | 793 | 8 | Users | ||
DVC | UCL | 36 | 1,378 | 8 | Videos | ||
Dissection | NS | USC | 86 | 1,542 | 15 | Videos | |
SAH | 60 | 540 | 8 | Hospitals | |||
USC | 154 | Inference on entire unlabelled videos | |||||
RAPN | USC | 27 | 339 | 16 | Procedures | ||
Skill assessment | Suturing | Needle handling | USC | 78 | 912 | 19 | Videos |
SAH | 60 | 240 | 18 | Hospitals | |||
HMH | 20 | 184 | 5 | Hospitals | |||
Needle driving | USC | 78 | 530 | 19 | Videos | ||
SAH | 60 | 280 | 18 | Hospitals | |||
HMH | 20 | 220 | 5 | Hospitals |
Note that we train our model, SAIS, exclusively on data from hospitals whose names are shown in bold following a ten-fold Monte Carlo cross-validation setup. For an exact breakdown of the number of video samples in each fold and training, validation and test split, please refer to Supplementary Tables 1–5. The data from the remaining hospitals are exclusively used for inference. We perform inference on entire videos from hospitals whose names are shown in italics. Except for the task of subphase recognition, SAIS is always trained and evaluated on a class-balanced set of data whereby each category (low skill and high skill) contains the same number of samples. This prevents SAIS from being negatively affected by a sampling bias during training, and allows for a more intuitive appreciation of the evaluation results.