Table 3.
Area under the curve (AUC) of interpretability methods for each model and each classification performance metric evaluated using ROAR. AUC is measured for two prediction metrics (AURPC and AUROC) respectively. Lower AUC indicates more rapid prediction performance drop and better feature importance interpretation.
Interpreters | AutoInt | LSTM | TCN | Transformer | IMVLSTM | |||||
---|---|---|---|---|---|---|---|---|---|---|
AUPRC | AUROC | AUPRC | AUROC | AUPRC | AUROC | AUPRC | AUROC | AUPRC | AUROC | |
Random | 0.401 | 0.842 | 0.615 | 0.909 | 0.605 | 0.901 | 0.662 | 0.918 | 0.669 | 0.915 |
Glassbox | × | × | × | × | × | × | × | × | 0.533 | 0.892 |
Saliency | × | × | 0.558 | 0.898 | 0.587 | 0.893 | 0.616 | 0.909 | 0.566 | 0.884 |
IntegratedGradients | × | × | 0.586 | 0.899 | 0.593 | 0.899 | 0.588 | 0.903 | 0.465 | 0.863 |
DeepLift | × | × | 0.575 | 0.900 | 0.598 | 0.898 | 0.594 | 0.905 | 0.542 | 0.883 |
GradientShap | × | × | 0.561 | 0.893 | 0.592 | 0.899 | 0.600 | 0.904 | 0.470 | 0.858 |
DeepLiftShap | × | × | 0.569 | 0.897 | 0.607 | 0.901 | 0.619 | 0.909 | 0.554 | 0.887 |
SaliencyNoiseTunnel | × | × | 0.551 | 0.892 | 0.581 | 0.896 | 0.578 | 0.899 | 0.475 | 0.851 |
ShapleySampling | 0.456 | 0.866 | 0.628 | 0.910 | 0.613 | 0.898 | 0.655 | 0.916 | 0.668 | 0.917 |
FeaturePermutation | 0.454 | 0.866 | 0.624 | 0.910 | 0.616 | 0.903 | 0.655 | 0.917 | 0.677 | 0.918 |
FeatureAblation | 0.279 | 0.733 | 0.438 | 0.811 | 0.479 | 0.824 | 0.425 | 0.792 | 0.408 | 0.830 |
Occlusion | 0.456 | 0.866 | 0.617 | 0.909 | 0.609 | 0.898 | 0.653 | 0.917 | 0.684 | 0.920 |
ArchDetect | 0.251 | 0.696 | 0.369 | 0.774 | 0.446 | 0.818 | 0.379 | 0.784 | 0.382 | 0.805 |