Table 4.
Detailed MedVidQA dataset statistics for questions, videos, and visual answers.
| Dataset Detail | Train | Validation | Test | Total |
|---|---|---|---|---|
| Medical instructional videos | 800 | 49 | 50 | 899 |
| Video duration (hours) | 86.37 | 4.54 | 4.79 | 95.71 |
| Mean video duration (seconds) | 388.68 | 333.89 | 345.42 | 383.29 |
| Questions and visual answers | 2,710 | 145 | 155 | 3,010 |
| Minimum question length | 5 | 6 | 5 | 5 |
| Maximum question length | 25 | 21 | 24 | 25 |
| Mean question length | 11.67 | 11.76 | 12 | 11.81 |
| Minimum visual answer length (seconds) | 3 | 10 | 4 | 3 |
| Maximum visual answer length (seconds) | 298 | 267 | 243 | 298 |
| Mean visual answer length (seconds) | 62.29 | 66.81 | 56.92 | 62.23 |
| Proportion of visual answer to the video (%) | 15.81 | 21.10 | 17.67 | 16.16 |
| Mode visual answer length (seconds) | 34 | 36 | 25 | 34 |
Question length denotes the number of tokens in the questions after performing tokenization with NLTK36 tokenizer.