Skip to main content
. Author manuscript; available in PMC: 2024 Aug 28.
Published in final edited form as: Proc Mach Learn Res. 2024 Jul;235:53597–53618.

Table 3.

ITE estimation errors (lower is better) at varying complexities for self-interpretable models.

Modality → Tabular
Dataset → IHDP TCGA IHDP-C
Method ↓ Trees Depth ϵATE(In-sample) ϵATE(Out-of-sample) ϵATE(In-sample) ϵATE (Out-of-sample) AMSE
Decision Tree - 6 0.693±0.028 0.613±0.045 0.200±0.012 0.202±0.012 21.773±0.190
- 100 0.638±0.031 0.549±0.052 0.441±0.004 0.445±0.004 23.382±0.342
Random Forest 1 6 0.801±0.039 0.666±0.055 19.214±0.163 19.195±0.163 21.576±0.185
1 100 0.734±0.041 0.653±0.056 0.536±0.011 0.538±0.012 33.285±0.940
10 100 0.684±0.033 0.676±0.034 0.536±0.011 0.538±0.012 38.299±0.841
NAM - - 0.260±0.031 0.250±0.032 - - 24.706±0.756
ENRL 1 6 4.104±1.060 3.759±0.087 10.938±2.019 10.942±2.019 24.720±0.985
1 100 4.094±0.032 4.099±0.107 10.938±2.019 10.942±2.019 24.900 ± 0.470
Causal Forest 1 6 0.144±0.019 0.275±0.035 - - -
1 100 0.151±0.019 0.278±0.033 - - -
100 max 0.124±0.015 0.230±0.031 - - -
BART 1 - 1.335±0.159 1.132±0.125 230.74±0.312 236.81±0.531 12.063±0.410
N - 0.232±0.039 0.284±0.036 - - 4.323±0.342
DISCRET (ours) - 6 0.089±0.040 0.150±0.034 0.076±0.019 0.098±0.007 0.801±0.165
TransTEE + DISCRET (ours)* - - 0.082±0.009 0.120±0.014 0.058±0.010 0.055±0.009 0.102±0.007

We bold the smallest estimation error for each dataset, and underline the second smallest one. Results in the first row for each method are duplicated from Table 2. For BART, we set N=200 for IHDP, and N=10 for TCGA and IHDP-C due to large feature number of features in the latter. We show that DISCRET outperforms self-interpretable models and has simpler rules regardless of the model complexity used. Asterisk (*) indicates model is not self-interpretable.