Table 1.
Comparison of the diagnostic accuracy between ChatGPT and radiologists
| Correct answer (accuracy rate [%]) | ||||
|---|---|---|---|---|
| Final diagnosis | p value* | Differential diagnosis | p value* | |
| GPT-4-based ChatGPT | 46/106 (43%) | 62/106 (58%) | ||
| Reader 1 (Radiology resident) | 43/106 (41%) | 0.78 | 61/106 (58%) | 0.99 |
| Reader 2 (Board-certified radiologist) | 56/106 (53%) | 0.22 | 71/106 (67%) | 0.26 |
| GPT-4V-based ChatGPT | 9/106 (8%) | 15/106 (14%) | ||
| Reader 1 (Radiology resident) | 43/106 (41%) | < 0.001** | 61/106 (58%) | < 0.001** |
| Reader 2 (Board-certified radiologist) | 56/106 (53%) | < 0.001** | 71/106 (67%) | < 0.001** |
ChatGPT Chat Generative Pre-trained Transformer, GPT-4 Generative Pre-trained Transformer-4, GPT-4V Generative Pre-trained Transformer-4 with vision
*Chi-square tests are performed to compare the accuracy rates between GPT-4-based ChatGPT and each radiologist, as well as between GPT-4V-based ChatGPT and each radiologist
**p < 0.05