Skip to main content
. 2024 Jul 12;35(1):506–516. doi: 10.1007/s00330-024-10902-5

Table 1.

Comparison of the diagnostic accuracy between ChatGPT and radiologists

Correct answer (accuracy rate [%])
Final diagnosis p value* Differential diagnosis p value*
GPT-4-based ChatGPT 46/106 (43%) 62/106 (58%)
Reader 1 (Radiology resident) 43/106 (41%) 0.78 61/106 (58%) 0.99
Reader 2 (Board-certified radiologist) 56/106 (53%) 0.22 71/106 (67%) 0.26
GPT-4V-based ChatGPT 9/106 (8%) 15/106 (14%)
Reader 1 (Radiology resident) 43/106 (41%) < 0.001** 61/106 (58%) < 0.001**
Reader 2 (Board-certified radiologist) 56/106 (53%) < 0.001** 71/106 (67%) < 0.001**

ChatGPT Chat Generative Pre-trained Transformer, GPT-4 Generative Pre-trained Transformer-4, GPT-4V Generative Pre-trained Transformer-4 with vision

*Chi-square tests are performed to compare the accuracy rates between GPT-4-based ChatGPT and each radiologist, as well as between GPT-4V-based ChatGPT and each radiologist

**p < 0.05