. 2024 Mar 31;21(1):128–146. doi: 10.14245/ns.2347310.655

Table 1.

ChatGPT’s performance (GPT-3.5 & GPT-4.0) compared to the North American Spine Society (NASS) clinical guidelines for clinical questions relating to antibiotic prophylaxis in spine surgery, broken down by question category

Antibiotic prophylaxis clinical question category	GPT-3.5			GPT-4.0
Antibiotic prophylaxis clinical question category	Accurate	Inaccurate	Overconfident	Accurate	Inaccurate	Cited NASS
All questions (n = 16)	10 (62.5)	6 (37.5)	4 (25)	13 (81)	3 (19)	10 (62.5)
Efficacy (n = 4)	3 (75)	1 (25)	1 (25)	4 (100)	0 (0)	3 (75)
Protocol (n = 4)	1 (25)	3 (75)	3 (75)	3 (75)	1 (25)	3 (75)
Redosing (n = 1)	1 (100)	0 (0)	1 (100)	1 (100)	0 (0)	1 (100)
Discontinuation(n = 1)	0 (0)	1 (100)	-	0 (0)	1 (100)	-
Wound drains (n = 1)	1 (100)	0 (0)	-	1 (100)	0 (0)	1 (100)
Body habitus (n = 1)	1 (100)	0 (0)	-	1 (100)	0 (0)	1 (100)
Comorbidities (n = 2)	2 (100)	0 (0)	-	1 (50)	1 (50)	-
Complications (n = 2)	1 (50)	1 (50)	-	2 (100)	0 (0)	1 (50)

Values are presented as number (%).

ChatGPT, chat generative pre-trained transformer.