Table 4.
A comparison between IA and κ on a Machine Learning domain
| FK | FS | FB | KS | KB | SB | ρ | rs | ||
|---|---|---|---|---|---|---|---|---|---|
| DS0 | IA | 0.28 (5) | 0.42 (2) | 0.30 (3) | 0.28 (4) | 0.56 (1) | 0.19 (6) | 0.98 | 0.77 |
| κ | 0.46 (3) | 0.63 (2) | 0.43 (5) | 0.44 (4) | 0.72 (1) | 0.31 (6) | |||
| DS1 | IA | 0.54 (2) | 0.33 (6) | 0.71 (1) | 0.36 (5) | 0.53 (3) | 0.39 (4) | 0.92 | 0.83 |
| κ | 0.73 (2) | 0.56 (5) | 0.77 (1) | 0.58 (4) | 0.63 (3) | 0.54 (6) | |||
| DS2 | IA | 0.63 (1) | 0.52 (2) | 0.22 (4) | 0.47 (3) | 0.17 (5) | 0.14 (6) | 0.98 | 0.94 |
| κ | 0.79 (1) | 0.64 (2) | 0.37 (4) | 0.57 (3) | 0.30 (6) | 0.35 (5) | |||
| DS3 | IA | 0.14 (4) | 0.33 (1) | 0.28 (2) | 0.08 (6) | 0.10 (5) | 0.19 (3) | 0.93 | 0.94 |
| κ | 0.21 (5) | 0.51 (1) | 0.41 (2) | 0.20 (6) | 0.28 (4) | 0.41 (3) | |||
| DS4 | IA | 0.11 (3) | 0.25 (1) | 0.18 (2) | 0.04 (6) | 0.11 (4) | 0.04 (5) | 0.61 | 0.60 |
| κ | 0.15 (4) | 0.29 (2) | 0.17 (3) | 0.08 (5) | 0.36 (1) | 0.03 (6) | |||
| DS5 | IA | 0.05 (6) | 0.28 (3) | 0.43 (1) | 0.06 (5) | 0.06 (4) | 0.34 (2) | 0.99 | 0.77 |
| κ | 0.23 (4) | 0.55 (3) | 0.67 (1) | 0.21 (5) | 0.21 (6) | 0.62 (2) |
Six data sets from the UCI Machine Learning Repository [19] were considered: the Congressional Voting Records Data Set (DS0) [39], the Breast Cancer Wisconsin (Diagnostic) Data Set (DS1) [52], the Iris Data Set (DS2) [21], the Spambase Data Set (DS3) [27], the Tic-Tac-Toe Endgame Data Set (DS4) [3], and the Heart Disease Data Set (DS5) [28]. Each of the data sets were used to train random forest, k-nearest neighbours, stochastic gradient (SGD) and naïve Bayes models. Then the pairs of models random forest-kNN (FK), random forest-SGD (FS), random forest-naïve Bayes (FB), kNN-SGD (KS), kNN-naïve Bayes (KB), and SGD-naïve Bayes (SB) were compared according to their correct classifications of the data set entries and their IA and κ were evaluated. Finally, the Spearman’s rank correlation coefficient (rs) [48] between the sequences of IA s and κ s was computed. All the reported values were rounded up to the second decimal digit. The numbers inside round parentheses in the table represent the rank of the associated value among those on the same row