Skip to main content
letter
. 2025 Jan 6;5(2):95–99. doi: 10.1016/j.aopr.2025.01.002

Table 2.

Accuracy by cognitive level for AI models.

AI Model Odds Ratio 95% Confidence Interval P-value
ChatGPT-4o 1.166 [0.522, 2.605] 0.709
Llama 70B 0.358 [0.170, 0.754] 0.007a
Llama 405B 1.379 [0.630, 3.018] 0.421

Only Llama 3.1 70B showed a significant decrease in odds of answering questions belonging to cognitive level 2 (n ​= ​61) correctly compared to cognitive level 1 (n ​= ​193).

a

Statistically significant.