Table 1. Percentage of questions answered correctly by GPT-3.5 vs. GPT-4 vs. humans by ophthalmology sub-category.
Bolding indicates statistical significance
| Ophthalmology Subcategory | GPT-3.5 Questions Answered Correctly (%) | GPT-4 Questions Answered Correctly (%) | Human Questions Answered Correctly (%) | GPT-3.5 vs GPT-4 P-Value | GPT-3.5 vs Human P-Value | GPT-4 vs Human P-Value |
| Lens & Cataract (n = 42) | 45 | 52 | 57 | 0.518 | 0.163 | 0.569 |
| External Disease & Cornea (n = 43) | 58 | 70 | 56 | 0.267 | 0.833 | 0.085 |
| Glaucoma (n = 43) | 65 | 84 | 59 | 0.048 | 0.614 | 0.003 |
| Neuro (n = 42) | 69 | 79 | 59 | 0.327 | 0.212 | 0.007 |
| Optics (n = 42) | 38 | 69 | 48 | 0.004 | 0.284 | 0.017 |
| Pathology & Tumors (n = 44) | 45 | 70 | 58 | 0.017 | 0.124 | 0.128 |
| Pediatrics (n = 43) | 63 | 79 | 63 | 0.099 | 0.951 | 0.023 |
| Oculoplastics (n = 42) | 57 | 83 | 59 | 0.008 | 0.851 | < 0.001 |
| Refractive Surgery (n = 42) | 48 | 69 | 58 | 0.002 | 0.397 | 0.003 |
| Retina & Vitreous (n = 42) | 67 | 74 | 63 | 0.480 | 0.652 | 0.157 |
| Intraocular Inflammation & Uveitis (n = 42) | 55 | 76 | 61 | 0.039 | 0.496 | 0.054 |
| Total (n = 467) | 55 | 73 | 58 | < 0.001 | 0.231 | <0.001 |