Table 3.
Results for 2 program runs for 4 model configurations on different days
| Confusion matrix | |||
|---|---|---|---|
| GPT model | First run | Second run | Changes |
| 4 Turbo | TP: 48 TN: 916 FP: 5 FN: 0 |
TP: 48 TN: 917 FP: 4 FN: 0 |
1 |
| 4 Turbo W/criteria | TP: 48 TN: 915 FP: 6 FN: 0 |
TP: 48 TN: 912 FP: 9 FN: 0 |
3 |
| 3.5 + 4 | TP: 47 TN: 910 FP: 11 FN: 1 |
TP: 48 TN: 909 FP: 12 FN: 0 |
2 |
| 3.5 + 4 W/criteria | TP: 48 TN: 912 FP: 9 FN: 0 |
TP: 48 TN: 913 FP: 8 FN: 0 |
1 |
| 3.5 | TP: 48 TN: 862 FP: 59 FN: 0 |
TP: 48 TN: 862 FP: 59 FN: 0 |
0 |
| 3.5 W/criteria | TP: 48 TN: 890 FP: 31 FN: 0 |
TP: 48 TN: 887 FP: 34 FN: 0 |
3 |
There were not many changes.