. 2023 Jun 22;15(6):e40822. doi: 10.7759/cureus.40822

Table 1. Percentage of questions answered correctly by GPT-3.5 vs. GPT-4 vs. humans by ophthalmology sub-category.

Bolding indicates statistical significance

Ophthalmology Subcategory	GPT-3.5 Questions Answered Correctly (%)	GPT-4 Questions Answered Correctly (%)	Human Questions Answered Correctly (%)	GPT-3.5 vs GPT-4 P-Value	GPT-3.5 vs Human P-Value	GPT-4 vs Human P-Value
Lens & Cataract (n = 42)	45	52	57	0.518	0.163	0.569
External Disease & Cornea (n = 43)	58	70	56	0.267	0.833	0.085
Glaucoma (n = 43)	65	84	59	0.048	0.614	0.003
Neuro (n = 42)	69	79	59	0.327	0.212	0.007
Optics (n = 42)	38	69	48	0.004	0.284	0.017
Pathology & Tumors (n = 44)	45	70	58	0.017	0.124	0.128
Pediatrics (n = 43)	63	79	63	0.099	0.951	0.023
Oculoplastics (n = 42)	57	83	59	0.008	0.851	< 0.001
Refractive Surgery (n = 42)	48	69	58	0.002	0.397	0.003
Retina & Vitreous (n = 42)	67	74	63	0.480	0.652	0.157
Intraocular Inflammation & Uveitis (n = 42)	55	76	61	0.039	0.496	0.054
Total (n = 467)	55	73	58	< 0.001	0.231	<0.001