Table 9.
Coverage for out-of-sample 75% prediction intervals and average interval width for BART-BMA, RF using conformal prediction bartMachine and dbarts for the Friedman example. Perfect calibration is 75% hence the model with the lowest average interval width and a coverage as close to 75% as possible is most desirable. Items in bold refer to the best calibrated model with respect to interval coverage and interval width for each simulated dataset.
Coverage | Average Interval Width | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|||||||||
p | BART BMA | RF CP Intervals | bart Machine | dbarts default | dbarts best | BART BMA | RF CP Intervals | bart Machine | dbarts default | dbarts best |
100 | 75.8% | 75% | 77.2% | 70.4% | 61.2% | 6.86 | 6.84 | 4.45 | 4.08 | 2.69 |
1000 | 74.0% | 79% | 79.0% | 59.4% | 62.6% | 6.89 | 7.89 | 6.16 | 4.44 | 3.63 |
5000 | 72.6% | 79% | 87.0% | 61.0% | 64.8% | 6.84 | 8.48 | 9.74 | 5.97 | 4.34 |
10000 | 73.4% | 78% | 76.8% | 68.6% | 64.2% | 6.84 | 8.62 | 9.89 | 7.10 | 5.19 |
15000 | 73.4% | 78% | 75.2% | 69.0% | 67.0% | 6.84 | 8.47 | 10.73 | 7.91 | 5.73 |
100000 | 71.8% | 79% | - | 59.0% | 73.4% | 6.91 | 9.30 | - | 9.21 | 8.88 |
500000 | 70.2% | - | - | 56.6% | - | 6.88 | - | - | 9.14 | - |