. 2015 Oct 12;13:28. doi: 10.1186/s12963-015-0061-1

Table 5.

CCCSMF accuracy of Random Allocation and Random-From-Train with and without resampling the test CSMF distribution.

J	Random-From-Train (Same CSMF)	Random Allocation	Random-From-Train (Resampled CSMF)
5	0.980	0.075	0.092
15	0.964	0.028	0.027
25	0.953	0.016	0.016
35	0.945	0.010	0.007
50	0.933	0.006	−0.005

This table demonstrates the importance of resampling the CSMF distribution in the test set; if the test and train sets have the same CSMF distribution, then simple approaches like Random-From-Train, as well as state-of-the-art approaches like King-Lu [23], can appear to have better performance than is justified, due to “overfitting”