Skip to main content
. 2015 Oct 12;13:28. doi: 10.1186/s12963-015-0061-1

Table 5.

CCCSMF accuracy of Random Allocation and Random-From-Train with and without resampling the test CSMF distribution.

J Random-From-Train (Same CSMF) Random Allocation Random-From-Train (Resampled CSMF)
5 0.980 0.075 0.092
15 0.964 0.028 0.027
25 0.953 0.016 0.016
35 0.945 0.010 0.007
50 0.933 0.006 −0.005

This table demonstrates the importance of resampling the CSMF distribution in the test set; if the test and train sets have the same CSMF distribution, then simple approaches like Random-From-Train, as well as state-of-the-art approaches like King-Lu [23], can appear to have better performance than is justified, due to “overfitting”