Table 5.
J | Random-From-Train (Same CSMF) | Random Allocation | Random-From-Train (Resampled CSMF) |
---|---|---|---|
5 | 0.980 | 0.075 | 0.092 |
15 | 0.964 | 0.028 | 0.027 |
25 | 0.953 | 0.016 | 0.016 |
35 | 0.945 | 0.010 | 0.007 |
50 | 0.933 | 0.006 | −0.005 |
This table demonstrates the importance of resampling the CSMF distribution in the test set; if the test and train sets have the same CSMF distribution, then simple approaches like Random-From-Train, as well as state-of-the-art approaches like King-Lu [23], can appear to have better performance than is justified, due to “overfitting”