Skip to main content

View full-text article in PMC

. 2024 Jul 9;16(7):e64204. doi: 10.7759/cureus.64204

Table 1. Comparison of AI model performance by question difficulty.

^*indicates significance

AI: artificial intelligence

Difficulty	Questions	ChatGPT-3.5	GPT-4	Bard	P-values
Difficulty	Questions	ChatGPT-3.5	GPT-4	Bard	ChatGPT-3.5 vs GPT-4	ChatGPT-3.5 vs Bard	GPT-4 vs Bard
Easy	360	251 (69.7%)	333 (92.5%)	275 (76.4%)	<0.001*	0.053	<0.001*
Moderate	353	190 (53.8%)	291 (82.4%)	223 (63.2%)	<0.001*	<0.05*	<0.001*
Hard	364	154 (42.3%)	223 (61.3%)	166 (45.6%)	<0.001*	0.411	<0.001*
Overall	1077	595 (55.3%)	847 (78.7%)	664 (61.7%)	<0.001*	<0.01*	<0.001*