. 2012 Nov 3;2012:164–169.

Table 2.

Performance comparison of Hosmer-Lemeshow (HL) and our new calibration test from first principles (FP). The null hypothesis that the model is correct is false; the degree of incorrectness is given by the angles between the correct and incorrect β vectors (left portion) and the slope (right portion) of the logistic regression models. AUC denotes area under the ROC curve of the incorrect model. The p-value is for the comparison of HL vs. FP type II errors (^* denotes the test with significantly lower type II error at α level 0.05).

	angle incorrect model			increase in slope
	10°	20°	30°	10%	20%	30%
dim = 5:
average AUC	0.855	0.837	0.808	0.862	0.862	0.862
HL type II error	83.2%^*	48.1%	2.8%	67.8%	28.4%	4.9%
FP type II error	87.5%	41.7%^*	2.4%	69.0%	27.4%	5.2%
p-value	0.0065	0.004	0.574	0.5638	0.6181	0.7593

dim = 10:
AUC	0.856	0.836	0.807	0.862	0.862	0.862
HL type II error	87.5%	44.7%	2.2%	67.1	26.8	4.9
FP type II error	87.7%	39.2%^*	1.7%	68.1	26.2	5.4
p-value	0.892	0.0127	0.4188	0.6328	0.7611	0.613

dim = 20:
AUC	0.855	0.835	0.805	0.861	0.861	0.861
HL type II error	84.8%	44.4%	1.8%	68.0%	25.8%	5.1%
FP type II error	86.6%	38.4%^*	1.9%	67.2%	27.5%	6.6%
p-value	0.2503	0.0065	0.868	0.7023	0.3899	0.1530