Table 3.
Non-parametric statistical tests comparing terminal performance at DAP and post-DAP for curious model variants.
Mann-Whitney U-Test (n = 128, α = 0.025, Bonferroni corrected) | |||
---|---|---|---|
Validation loss | DAP (T = 30, 000) | Post-DAP (T = 60, 000) | |
C/B < C/PE | Statistics | 6558.0 | 5911.0 |
p-value | 0.0029 | 5.9E-5 | |
C/B < PG/IRS | Statistics | 6275.0 | 5062.0 |
p-value | 0.0006 | 6.4E-8 |
Following Table 2, even though the boredom score came close to other curious variants (C/PE and PG/IRS), the boredom variant still outperformed the other two on statistical grounds.