. 2012 Nov 3;2012:164–169.

Table 1.

Performance comparison of Hosmer-Lemeshow (HL) and our new calibration test from first principles (FP). The null hypothesis that the model is correct is true. AUC denotes area under the ROC curve of the correct model. The p-value is for the comparison of HL vs. FP type I errors (^* denotes the test with significantly lower type I error at α level 0.05).

	data dimension
	5	10	20
average AUC	0.862	0.862	0.861
HL type I error	11.8%	12.1%	13.1%
FP type I error	5%^*	5%^*	5.2%^*
p-value	< 0.001	< 0.001	< 0.001