Skip to main content
. 2013 Jul 8;8(7):e69328. doi: 10.1371/journal.pone.0069328

Figure 3. Traditional parametric tests are invalid for the assessment of statistical significance of classification performance.

Figure 3

A) Binomial Test. B) Student’s T-test. Both panels show the percentage of 1000 experiments with null data that were considered significant at the 0.05 level. Because the data had no signal, all of these are false positives. The dashed lines show the expected percentage of false positives for a correct statistical test (5%). The use of multiple CV sets led to large overestimates of significance. This analysis shows that neither binomial, nor Student’s T-tests can be used to determine the statistical significance of classifier performance.