Table 3. Average of three-year total accuracy of GPT-4, GPT-3.5, and GPT-4V for all questions, text-based questions, and image-based questions.
The accuracy was defined as the percentage of correct answers. All questions include both text-based and image-based questions.
* p < 0.05. † p-value was calculated between GPT-4 and GPT-4V. § p-value was calculated between GPT-4 and GPT-3.5. N.A.: not available.
| GPT-4 | GPT-3.5 | GPT-4V | p-value | |
| Number of correct answers/total questions, accuracy (%) | ||||
| All questions | 172/294, (59) | 88/294, (30) | N.A. | <0.001*§ |
| Text-based questions | 142/214, (66) | 68/214, (32) | N.A. | 0.002*§ |
| Image-based questions without image | 30/80, (38) | 20/80, (25) | N.A. | 0.1§ |
| Image-based questions with image | N.A. | N.A. | 30/80, (38) | 0.9† |