Skip to main content
. 2020 Feb 27;27(4):621–633. doi: 10.1093/jamia/ocz228

Table 2.

Discrimination and calibration results of the LR and SVM models applied to the test set

LR LR Platt scaling LR isotonic regression LR BBQ SVM SVM Platt scaling SVM isotonic regression SVM BBQ
AUROC 0.870 0.870 0.870 0.867 0.870 0.870 0.870 0.862
Brier score 0.087 0.088 0.088 0.089 0.111 0.086 0.088 0.090
Spiegelhalter z score 0.762 0.417 0.087 0.748 2.21 0.826 0.693 0.731
Spiegelhalter P value .223 .338 .465 .227 .013a .204 .244 .232
Average absolute error 0.177 0.177 0.177 0.182 0.236 0.177 0.177 0.185
H-L C-statistics 5.88 24.6 11.7 16.0 176 4.75 12.7 28.0
H-L C-statistic P value .661 .002a .167 .042a <1 × 10–22a .784 .122 4.71 × 10–4a
H-L H-statistics 9.18 16.6 10.1 11.5 160 11.2 8.15 1.86
H-L H-statistic P value .327 .030a .259 .174 <1 × 10–22a .188 .419 .984
MCE 0.038 0.072 0.033 0.042 0.403 0.028 0.034 0.052
ECE 0.014 0.035 0.012 0.022 0.109 0.011 0.018 0.027
Cox’s slope 1.070 1.074 0.953 1.020 5.014a 1.087 1.023 1.008
Cox’s intercept 0.080 0.072 –0.092 –0.007 6.193a 0.081 –0.001 –0.02
ICI 0.010 0.034 0.012 0.012 0.104 0.008 0.013 0.020

Discrimination is measured by the AUROC. The Brier score is a combined measure of discrimination and calibration. Calibration is measured by the Spiegelhalter z test, average absolute error, H-L test, MCE, ECE, Cox slope and intercept, and ICI. SVM estimates for the test set produced were improperly calibrated. Application of Platt scaling, isotonic regression, or BBQ was performed.

AUROC: area under the receiver-operating characteristic curve; BBQ: Bayesian Binning into Quantiles; ECE: expected calibration error; H-L, Hosmer-Lemeshow; ICI: integrated calibration index; LR: logistic regression; MCE: maximum calibration error; NIS: Nationwide Inpatient Sample; SVM: support vector machine.

a

shows significance.