Table 2.
Accuracy by cognitive level for AI models.
| AI Model | Odds Ratio | 95% Confidence Interval | P-value |
|---|---|---|---|
| ChatGPT-4o | 1.166 | [0.522, 2.605] | 0.709 |
| Llama 70B | 0.358 | [0.170, 0.754] | 0.007a |
| Llama 405B | 1.379 | [0.630, 3.018] | 0.421 |
Only Llama 3.1 70B showed a significant decrease in odds of answering questions belonging to cognitive level 2 (n = 61) correctly compared to cognitive level 1 (n = 193).
Statistically significant.