Skip to main content
. 2023 Jun 22;15(6):e40822. doi: 10.7759/cureus.40822

Figure 3. Comparing the performance of GPT-3.5, GPT-4, and humans on StatPearls questions divided by difficulty levels.

Figure 3

Level 1 indicated the “basic” difficulty level and tested recall; Level 2 indicated “moderate” difficulty and tested the ability to comprehend basic facts; Level 3 was described as “difficult” and tested application, or knowledge use in care; Level 4 was considered an “expert” high-complexity question and tested analysis and evaluation skills.

*, **, † indicates statistical significance