Skip to main content
. 2024 Jul 9;16(7):e64204. doi: 10.7759/cureus.64204

Table 1. Comparison of AI model performance by question difficulty.

*indicates significance

AI: artificial intelligence

Difficulty   Questions   ChatGPT-3.5   GPT-4   Bard   P-values  
ChatGPT-3.5 vs GPT-4   ChatGPT-3.5 vs Bard   GPT-4 vs Bard  
Easy   360   251 (69.7%)   333 (92.5%)   275 (76.4%)   <0.001*   0.053   <0.001*  
Moderate   353   190 (53.8%)   291 (82.4%)   223 (63.2%)   <0.001*   <0.05*   <0.001*  
Hard   364   154 (42.3%)   223 (61.3%)   166 (45.6%)   <0.001*   0.411   <0.001*  
Overall   1077   595 (55.3%)   847 (78.7%)   664 (61.7%)   <0.001*   <0.01*   <0.001*