. 2024 Jul 12;35(1):506–516. doi: 10.1007/s00330-024-10902-5

Table 1.

Comparison of the diagnostic accuracy between ChatGPT and radiologists

	Correct answer (accuracy rate [%])
	Final diagnosis	p value^*	Differential diagnosis	p value^*
GPT-4-based ChatGPT	46/106 (43%)		62/106 (58%)
Reader 1 (Radiology resident)	43/106 (41%)	0.78	61/106 (58%)	0.99
Reader 2 (Board-certified radiologist)	56/106 (53%)	0.22	71/106 (67%)	0.26
GPT-4V-based ChatGPT	9/106 (8%)		15/106 (14%)
Reader 1 (Radiology resident)	43/106 (41%)	< 0.001**	61/106 (58%)	< 0.001**
Reader 2 (Board-certified radiologist)	56/106 (53%)	< 0.001**	71/106 (67%)	< 0.001**

ChatGPT Chat Generative Pre-trained Transformer, GPT-4 Generative Pre-trained Transformer-4, GPT-4V Generative Pre-trained Transformer-4 with vision

*Chi-square tests are performed to compare the accuracy rates between GPT-4-based ChatGPT and each radiologist, as well as between GPT-4V-based ChatGPT and each radiologist

**p < 0.05