Skip to main content
. 2023 Dec 16;15(12):e50629. doi: 10.7759/cureus.50629

Table 3. Assessment of ChatGPT models’ performance per topic and the overall performance across topics.

O&P: Ova and parasite examination; AST: antimicrobial susceptibility testing; MRSA: methicillin-resistant Staphylococcus aureus; UTI: urinary tract infection; PCR: polymerase chain reaction; ID: microbial identification; Dx: diagnostic approach; CLEAR: Completeness, Lack of false information, Evidence support, Appropriateness, and Relevance.

The average scores were calculated by the sum of the two raters’ scores divided by 2.

CASE Query classification Average CLEAR score for ChatGPT-3.5 Average CLEAR score for ChatGPT-4 t-test
Average performance in ID 3.4 (Very good) 3.83 (Very good) t(3)=-3.087, P=0.054
Q1 (O&P examination) ID 1.7 (Poor) 1.8 (Satisfactory)
Q5 (Candida albicans identification) ID 4.1 (Very good) 4.8 (Excellent)
Q7 (Brucella spp. identification) ID 4.4 (Excellent) 4.7 (Excellent)
Q10 (Salmonella enterica identification) ID 3.4 (Very good) 4.0 (Very good)
Average performance in AST 1.87 (Satisfactory) 2.37 (Satisfactory) t(2)=-1.387, P=0.300
Q2 (AST for colistin) AST 1.4 (Poor) 1.6 (Poor)
Q3 (MRSA resistance to all beta-lactams) AST 2.8 (Good) 3.2 (Good)
Q4 (Enterococci resistance to clindamycin) AST 1.4 (Poor) 2.3 (Satisfactory)
Average performance in Dx 2.4 (Satisfactory) 3.2 (Good) t(2)=-2.402, P=0.138
Q6 (Laboratory diagnosis of UTI) Dx 2.9 (Good) 2.9 (Good)
Q8 (Interpretation of real-time PCR testing for respiratory viruses/atypical bacteria) Dx 1.5 (Poor) 3.5 (Very good)
Q9 (Sputum quality assessment for microbiologic culture) Dx 2.8 (Good) 3.3 (Good)
Overall performance across the three categories 2.64 (Good) 3.21 (Good) t(9)=-3.143, P=0.012