Table 2. Summary of overall performance.
Statistically significant (p < 0.05); Statistical analysis performed using Fisher's exact test.
| Metric | Claude | Chat GPT-4 | Difference | P-Value |
| Correct Answers | 32/100 | 23/100 | +9 | 0.042 |
| Accuracy (%) | 32% | 23% | +9% | 0.042 |
| 95% CI | 23.1-41.8% | 15.2-32.4% | - | - |