Table 2.
ChatGPT and Isabel’s Accuracy in Capturing Correct Diagnoses When Excluding Versus Including Partially Correct Diagnoses, Within the Top 10, Top 3, and Top 1 Differential∗
| Accuracy Measurement | Excluding Partially Correct Diagnoses |
Including Partially Correct Diagnoses |
||
|---|---|---|---|---|
| Isabel | ChatGPT | Isabel | ChatGPT | |
| Top 1 accuracy (%) | 3/16 (19%)∗ | 9/16 (56%)∗ | 5/16 (31%) | 9/16 (56%) |
| Top 3 accuracy (%) | 5/16 (31%)∗ | 12/16 (75%)∗ | 10/16 (63%) | 12/16 (75%) |
| Top 10 accuracy (%) | 7/16 (44%)∗ | 14/16 (88%)∗ | 12/16 (75%) | 15/16 (94%) |
| Median rank of diagnosis (IQR) | 2 (IQR = 3) | 1 (IQR = 1) | 1.5 (IQR = 2) | 1 (IQR = 2) |
Median rank of the diagnosis within the differential list is reported with an IQR.
P < .05.