Skip to main content
. 2024 Mar 31;21(1):128–146. doi: 10.14245/ns.2347310.655

Table 1.

ChatGPT’s performance (GPT-3.5 & GPT-4.0) compared to the North American Spine Society (NASS) clinical guidelines for clinical questions relating to antibiotic prophylaxis in spine surgery, broken down by question category

Antibiotic prophylaxis clinical question category GPT-3.5
GPT-4.0
Accurate Inaccurate Overconfident Accurate Inaccurate Cited NASS
All questions (n = 16) 10 (62.5) 6 (37.5) 4 (25) 13 (81) 3 (19) 10 (62.5)
Efficacy (n = 4) 3 (75) 1 (25) 1 (25) 4 (100) 0 (0) 3 (75)
Protocol (n = 4) 1 (25) 3 (75) 3 (75) 3 (75) 1 (25) 3 (75)
Redosing (n = 1) 1 (100) 0 (0) 1 (100) 1 (100) 0 (0) 1 (100)
Discontinuation(n = 1) 0 (0) 1 (100) - 0 (0) 1 (100) -
Wound drains (n = 1) 1 (100) 0 (0) - 1 (100) 0 (0) 1 (100)
Body habitus (n = 1) 1 (100) 0 (0) - 1 (100) 0 (0) 1 (100)
Comorbidities (n = 2) 2 (100) 0 (0) - 1 (50) 1 (50) -
Complications (n = 2) 1 (50) 1 (50) - 2 (100) 0 (0) 1 (50)

Values are presented as number (%).

ChatGPT, chat generative pre-trained transformer.