Table 3.
Modality → | Tabular | ||||||
---|---|---|---|---|---|---|---|
Dataset → | IHDP | TCGA | IHDP-C | ||||
Method ↓ | Trees | Depth | (In-sample) | (Out-of-sample) | (In-sample) | (Out-of-sample) | AMSE |
Decision Tree | - | 6 | 0.693±0.028 | 0.613±0.045 | 0.200±0.012 | 0.202±0.012 | 21.773±0.190 |
- | 100 | 0.638±0.031 | 0.549±0.052 | 0.441±0.004 | 0.445±0.004 | 23.382±0.342 | |
Random Forest | 1 | 6 | 0.801±0.039 | 0.666±0.055 | 19.214±0.163 | 19.195±0.163 | 21.576±0.185 |
1 | 100 | 0.734±0.041 | 0.653±0.056 | 0.536±0.011 | 0.538±0.012 | 33.285±0.940 | |
10 | 100 | 0.684±0.033 | 0.676±0.034 | 0.536±0.011 | 0.538±0.012 | 38.299±0.841 | |
NAM | - | - | 0.260±0.031 | 0.250±0.032 | - | - | 24.706±0.756 |
ENRL | 1 | 6 | 4.104±1.060 | 3.759±0.087 | 10.938±2.019 | 10.942±2.019 | 24.720±0.985 |
1 | 100 | 4.094±0.032 | 4.099±0.107 | 10.938±2.019 | 10.942±2.019 | 24.900 ± 0.470 | |
Causal Forest | 1 | 6 | 0.144±0.019 | 0.275±0.035 | - | - | - |
1 | 100 | 0.151±0.019 | 0.278±0.033 | - | - | - | |
100 | max | 0.124±0.015 | 0.230±0.031 | - | - | - | |
BART | 1 | - | 1.335±0.159 | 1.132±0.125 | 230.74±0.312 | 236.81±0.531 | 12.063±0.410 |
N | - | 0.232±0.039 | 0.284±0.036 | - | - | 4.323±0.342 | |
DISCRET (ours) | - | 6 | 0.089±0.040 | 0.150±0.034 | 0.076±0.019 | 0.098±0.007 | 0.801±0.165 |
TransTEE + DISCRET (ours)* | - | - | 0.082±0.009 | 0.120±0.014 | 0.058±0.010 | 0.055±0.009 | 0.102±0.007 |
We bold the smallest estimation error for each dataset, and underline the second smallest one. Results in the first row for each method are duplicated from Table 2. For BART, we set for IHDP, and for TCGA and IHDP-C due to large feature number of features in the latter. We show that DISCRET outperforms self-interpretable models and has simpler rules regardless of the model complexity used. Asterisk (*) indicates model is not self-interpretable.