Skip to main content
. 2025 Jan 8;15(3):153–158. doi: 10.5415/apallergy.0000000000000179

Table 1.

Experiments’ confusion matrix and performance indicators for each GPT model combination with and without the WAO criteria

GPT model Confusion matrix Precision Sensitivity Specificity Accuracy Kappa agreement
4 Turbo TP: 48 FP: 5
TN: 916 FN: 0
90.6% 100% 99.5% 99.5% 0.95 almost perfect
4 Turbo W/criteria TP: 48 FP: 6
TN: 915 FN: 0
88.9% 100% 99.3% 99.4% 0.94 almost perfect
3.5 + 4 TP: 47 FP: 11
TN: 910 FN: 1
81.0% 97.9% 98.8% 98.8% 0.88 almost perfect
3.5 + 4 W/criteria TP: 48 FP: 9
TN: 912 FN: 0
84.2% 100% 99.0% 99.1% 0.90 almost perfect
3.5 TP: 48 FP: 59
TN: 862 FN: 0
44.9% 100% 93.6% 93.9% 0.59 moderate
3.5 W/criteria TP: 48 FP: 31
TN: 890 FN: 0
60.8% 100% 96.6% 96.8% 0.74 substantial