Table 1.
Experiments’ confusion matrix and performance indicators for each GPT model combination with and without the WAO criteria
| GPT model | Confusion matrix | Precision | Sensitivity | Specificity | Accuracy | Kappa agreement |
|---|---|---|---|---|---|---|
| 4 Turbo | TP: 48 FP: 5 TN: 916 FN: 0 |
90.6% | 100% | 99.5% | 99.5% | 0.95 almost perfect |
| 4 Turbo W/criteria | TP: 48 FP: 6 TN: 915 FN: 0 |
88.9% | 100% | 99.3% | 99.4% | 0.94 almost perfect |
| 3.5 + 4 | TP: 47 FP: 11 TN: 910 FN: 1 |
81.0% | 97.9% | 98.8% | 98.8% | 0.88 almost perfect |
| 3.5 + 4 W/criteria | TP: 48 FP: 9 TN: 912 FN: 0 |
84.2% | 100% | 99.0% | 99.1% | 0.90 almost perfect |
| 3.5 | TP: 48 FP: 59 TN: 862 FN: 0 |
44.9% | 100% | 93.6% | 93.9% | 0.59 moderate |
| 3.5 W/criteria | TP: 48 FP: 31 TN: 890 FN: 0 |
60.8% | 100% | 96.6% | 96.8% | 0.74 substantial |