Table 6.
U test statistic and p values as calculated for the differences between the model performance and explanation separability of the baseline and explanation ensemble models; a two-sided test was used.
| Dataset (Task) | Model performance | Explanation consistency | ||
|---|---|---|---|---|
| U Statistic | p value | U statistic | p value | |
| BCW | 75 | 0.00249292 | 774 | 0.009378 |
| KAIMRC (Regression) | 81 | 0.00040946 | 6475 | |
| KAIMRC (Classification) | 51 | 0.04988344 | 3066 | |
| Codon usage (DNA) | 81 | 0.00039825 | 5606.5 | |
| Codon usage (Kingdom) | 0 | 0.00018267 | 11205 | |
| MIMIC-IV | 72 | 0.10397974 | 1350 | |