Table 8.
Class-wise performance (F1-score) on coarse-grained medical instructional videos of MedVidCL dataset using the best monomodel and multimodal models from Table 6.
| Models | Pain | Oral Health | Brain & Nerves | Infection | Equipment | ENT | Musculoskeletal Health | First Aid | Eye | Cardiovascular Health | Surgery | Respiratory | Systemic | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (1) | BigBird-Base | 35.90 | 85.11 | 45.16 | 00.00 | 44.00 | 72.97 | 73.09 | 64.48 | 70.18 | 26.23 | 00.00 | 70.59 | 33.33 |
| (2) | ViT + Transformer | 21.79 | 57.83 | 4.55 | 00.00 | 45.10 | 22.41 | 66.25 | 50.91 | 14.29 | 11.63 | 11.76 | 26.09 | 3.51 |
| (3) | L + V (ViT) + Transformer | 21.47 | 55.91 | 6.45 | 00.00 | 45.74 | 22.07 | 63.09 | 50.00 | 4.35 | 10.08 | 21.05 | 28.57 | 2.63 |