Table 2.
Diagnostic evaluation indicators.
| Dataset | model | TPa | FNb | FPc | TNd | Sensitivity | Specificity | PPVe | NPVf | PLRg | NLRh | Accuracy |
| Training set | Gpt-4 Model 1 | 10 | 34 | 5 | 73 | 0.2273 | 0.9359 | 0.6667 | 0.6822 | 3.5455 | 0.8257 | 0.6803 |
| Training set | GPT-4 Model 2 | 20 | 24 | 6 | 72 | 0.4545 | 0.9231 | 0.7692 | 0.7500 | 5.9091 | 0.5909 | 0.7541 |
| Training set | GPT-4 Model 3 | 26 | 18 | 12 | 66 | 0.5909 | 0.8462 | 0.6842 | 0.7857 | 3.8409 | 0.4835 | 0.7541 |
| Training set | GPT-4 Model 4 | 16 | 28 | 13 | 65 | 0.3636 | 0.8333 | 0.5517 | 0.6989 | 2.1818 | 0.7636 | 0.6639 |
| Training set | GPT-4 Model 5 | 38 | 6 | 4 | 74 | 0.8636 | 0.9487 | 0.9048 | 0.9250 | 16.8409 | 0.1437 | 0.9180 |
| Training set | GPT-3.5 Model 1 | 13 | 31 | 14 | 64 | 0.2955 | 0.8205 | 0.4815 | 0.6737 | 1.6461 | 0.8587 | 0.6311 |
| Training set | GPT-3.5 Model 2 | 22 | 22 | 24 | 54 | 0.5000 | 0.6923 | 0.4783 | 0.7105 | 1.6250 | 0.7222 | 0.6230 |
| Training set | GPT-3.5 Model 3 | 26 | 18 | 28 | 50 | 0.5909 | 0.6410 | 0.4815 | 0.7353 | 1.6461 | 0.6382 | 0.6230 |
| Training set | GPT-3.5 Model 4 | 22 | 22 | 31 | 47 | 0.5000 | 0.6026 | 0.4151 | 0.6812 | 1.2581 | 0.8298 | 0.5656 |
| Training set | GPT-3.5 Model 5 | 25 | 19 | 21 | 57 | 0.5682 | 0.7308 | 0.5435 | 0.7500 | 2.1104 | 0.5909 | 0.6721 |
| Test set | GPT-4 Model 5 | 17 | 5 | 5 | 25 | 0.7727 | 0.8333 | 0.7727 | 0.8333 | 4.6364 | 0.2727 | 0.8077 |
| Test set | GPT-3.5 Model 5 | 12 | 10 | 9 | 21 | 0.5455 | 0.7000 | 0.5714 | 0.6774 | 1.8182 | 0.6494 | 0.6346 |
aTP: true positive.
bFN: false negative.
cFP: false positive.
dTN: true negative.
ePPV: positive predictive value.
fNPV: negative predictive value.
gPLR: positive likelihood ratio.
hNLR: negative likelihood ratio.