Table 2. The accuracy of GPT-3.5, GPT-4, and GPT-4V for the 33rd-35th Japanese Board of Orthopaedic Surgery Examination (JBOSE).
The accuracy was defined as the percentage of correct answers. All questions include both text-based and image-based questions.
| JBOSE | 33rd | 34th | 35th | |
| Number of correct answers/total questions, accuracy (%) | ||||
| GPT-4 | All questions | 59/98, (60) | 54/99, (55) | 59/97, (61) |
| Text-based questions | 49/77, (64) | 46/73, (63) | 47/64, (73) | |
| Image-based questions without image | 10/21, (48) | 8/26, (31) | 12/33, (36) | |
| GPT-4V | Image-based questions with image | 8/21, (38) | 9/26, (35) | 13/33, (39) |
| GPT-3.5 | All questions | 27/98, (28) | 32/99, (32) | 29/97, (30) |
| Text-based questions | 20/77, (26) | 25/73, (34) | 23/64, (36) | |
| Image-based questions without image | 7/21, (33) | 7/26, (27) | 6/33, (18) | |