Table 2.
Performance comparison of Hosmer-Lemeshow (HL) and our new calibration test from first principles (FP). The null hypothesis that the model is correct is false; the degree of incorrectness is given by the angles between the correct and incorrect β vectors (left portion) and the slope (right portion) of the logistic regression models. AUC denotes area under the ROC curve of the incorrect model. The p-value is for the comparison of HL vs. FP type II errors (* denotes the test with significantly lower type II error at α level 0.05).
angle incorrect model | increase in slope | |||||
---|---|---|---|---|---|---|
10° | 20° | 30° | 10% | 20% | 30% | |
dim = 5: | ||||||
average AUC | 0.855 | 0.837 | 0.808 | 0.862 | 0.862 | 0.862 |
HL type II error | 83.2%* | 48.1% | 2.8% | 67.8% | 28.4% | 4.9% |
FP type II error | 87.5% | 41.7%* | 2.4% | 69.0% | 27.4% | 5.2% |
p-value | 0.0065 | 0.004 | 0.574 | 0.5638 | 0.6181 | 0.7593 |
| ||||||
dim = 10: | ||||||
AUC | 0.856 | 0.836 | 0.807 | 0.862 | 0.862 | 0.862 |
HL type II error | 87.5% | 44.7% | 2.2% | 67.1 | 26.8 | 4.9 |
FP type II error | 87.7% | 39.2%* | 1.7% | 68.1 | 26.2 | 5.4 |
p-value | 0.892 | 0.0127 | 0.4188 | 0.6328 | 0.7611 | 0.613 |
| ||||||
dim = 20: | ||||||
AUC | 0.855 | 0.835 | 0.805 | 0.861 | 0.861 | 0.861 |
HL type II error | 84.8% | 44.4% | 1.8% | 68.0% | 25.8% | 5.1% |
FP type II error | 86.6% | 38.4%* | 1.9% | 67.2% | 27.5% | 6.6% |
p-value | 0.2503 | 0.0065 | 0.868 | 0.7023 | 0.3899 | 0.1530 |