Table 2:
Test | Image Questions | p-value | No Image Questions | p-value |
---|---|---|---|---|
GPT-4 (4) | 40.7% (37.0% – 44.4%) | p< 0.001 (4 vs. G) p= 0.885 (4 vs. 4T) p< 0.001 (G vs. 4T) p= 0.821 (4o vs. 4T) p< 0.001 (4o vs. G) p= 0.956 (4o vs. 4) |
59.2% (58.2% – 60.6%) | p< 0.001 (4 vs. G) p< 0.001 (4 vs. 4T) p< 0.001 (G vs. 4T) p= 0.001 (4o vs. 4T) p< 0.001 (4o vs. G) p< 0.001 (4o vs. 4) |
Gemini (G) | 22.2% (20.4% – 25.9%) | 44.7% (44.0% – 46.1%) | ||
GPT-4 Turbo (4T) | 44.4% (40.7% – 48.1%) | 62.4% (62.1% – 63.8%) | ||
GPT-4o (4o) | 44.4% (42.2% – 46.7%) | 66.7% (65.7% – 67.7%) |
Values are presented as median percentiles and 95% confidence intervals of 30 attempts at the exam.