Skip to main content
. 2024 Sep 3;6(6):847–854. doi: 10.1016/j.jhsg.2024.07.011

Table 2.

ChatGPT and Isabel’s Accuracy in Capturing Correct Diagnoses When Excluding Versus Including Partially Correct Diagnoses, Within the Top 10, Top 3, and Top 1 Differential

Accuracy Measurement Excluding Partially Correct Diagnoses
Including Partially Correct Diagnoses
Isabel ChatGPT Isabel ChatGPT
Top 1 accuracy (%) 3/16 (19%) 9/16 (56%) 5/16 (31%) 9/16 (56%)
Top 3 accuracy (%) 5/16 (31%) 12/16 (75%) 10/16 (63%) 12/16 (75%)
Top 10 accuracy (%) 7/16 (44%) 14/16 (88%) 12/16 (75%) 15/16 (94%)
Median rank of diagnosis (IQR) 2 (IQR = 3) 1 (IQR = 1) 1.5 (IQR = 2) 1 (IQR = 2)

Median rank of the diagnosis within the differential list is reported with an IQR.

P < .05.