Figure 2. Comparing the performance of GPT-3.5, GPT-4, and humans on StatPearls questions divided into generalized anatomically based categories.
The “anterior segment” included cornea, cataract, and refractive surgery categories; the “posterior segment” included the retina and vitreous category; the “other” category was comprised of neuro-ophthalmology, pediatrics, and oculoplastics. Questions from the glaucoma, pathology, and uveitis categories were individually divided amongst the “anterior,” “posterior,” and “other” categories according to question content.
*, ** indicates statistical significance