. 2020 Feb 27;27(4):621–633. doi: 10.1093/jamia/ocz228

Table 2.

Discrimination and calibration results of the LR and SVM models applied to the test set

	LR	LR Platt scaling	LR isotonic regression	LR BBQ	SVM	SVM Platt scaling	SVM isotonic regression	SVM BBQ
AUROC	0.870	0.870	0.870	0.867	0.870	0.870	0.870	0.862
Brier score	0.087	0.088	0.088	0.089	0.111	0.086	0.088	0.090
Spiegelhalter z score	0.762	0.417	0.087	0.748	2.21	0.826	0.693	0.731
Spiegelhalter P value	.223	.338	.465	.227	.013^a	.204	.244	.232
Average absolute error	0.177	0.177	0.177	0.182	0.236	0.177	0.177	0.185
H-L C-statistics	5.88	24.6	11.7	16.0	176	4.75	12.7	28.0
H-L C-statistic P value	.661	.002^a	.167	.042^a	<1 × 10^–22a	.784	.122	4.71 × 10^–4a
H-L H-statistics	9.18	16.6	10.1	11.5	160	11.2	8.15	1.86
H-L H-statistic P value	.327	.030^a	.259	.174	<1 × 10^–22a	.188	.419	.984
MCE	0.038	0.072	0.033	0.042	0.403	0.028	0.034	0.052
ECE	0.014	0.035	0.012	0.022	0.109	0.011	0.018	0.027
Cox’s slope	1.070	1.074	0.953	1.020	5.014^a	1.087	1.023	1.008
Cox’s intercept	0.080	0.072	–0.092	–0.007	6.193^a	0.081	–0.001	–0.02
ICI	0.010	0.034	0.012	0.012	0.104	0.008	0.013	0.020

Discrimination is measured by the AUROC. The Brier score is a combined measure of discrimination and calibration. Calibration is measured by the Spiegelhalter z test, average absolute error, H-L test, MCE, ECE, Cox slope and intercept, and ICI. SVM estimates for the test set produced were improperly calibrated. Application of Platt scaling, isotonic regression, or BBQ was performed.

AUROC: area under the receiver-operating characteristic curve; BBQ: Bayesian Binning into Quantiles; ECE: expected calibration error; H-L, Hosmer-Lemeshow; ICI: integrated calibration index; LR: logistic regression; MCE: maximum calibration error; NIS: Nationwide Inpatient Sample; SVM: support vector machine.

shows significance.