Table 2.
LR | LR Platt scaling | LR isotonic regression | LR BBQ | SVM | SVM Platt scaling | SVM isotonic regression | SVM BBQ | |
---|---|---|---|---|---|---|---|---|
AUROC | 0.870 | 0.870 | 0.870 | 0.867 | 0.870 | 0.870 | 0.870 | 0.862 |
Brier score | 0.087 | 0.088 | 0.088 | 0.089 | 0.111 | 0.086 | 0.088 | 0.090 |
Spiegelhalter z score | 0.762 | 0.417 | 0.087 | 0.748 | 2.21 | 0.826 | 0.693 | 0.731 |
Spiegelhalter P value | .223 | .338 | .465 | .227 | .013a | .204 | .244 | .232 |
Average absolute error | 0.177 | 0.177 | 0.177 | 0.182 | 0.236 | 0.177 | 0.177 | 0.185 |
H-L C-statistics | 5.88 | 24.6 | 11.7 | 16.0 | 176 | 4.75 | 12.7 | 28.0 |
H-L C-statistic P value | .661 | .002a | .167 | .042a | <1 × 10–22a | .784 | .122 | 4.71 × 10–4a |
H-L H-statistics | 9.18 | 16.6 | 10.1 | 11.5 | 160 | 11.2 | 8.15 | 1.86 |
H-L H-statistic P value | .327 | .030a | .259 | .174 | <1 × 10–22a | .188 | .419 | .984 |
MCE | 0.038 | 0.072 | 0.033 | 0.042 | 0.403 | 0.028 | 0.034 | 0.052 |
ECE | 0.014 | 0.035 | 0.012 | 0.022 | 0.109 | 0.011 | 0.018 | 0.027 |
Cox’s slope | 1.070 | 1.074 | 0.953 | 1.020 | 5.014a | 1.087 | 1.023 | 1.008 |
Cox’s intercept | 0.080 | 0.072 | –0.092 | –0.007 | 6.193a | 0.081 | –0.001 | –0.02 |
ICI | 0.010 | 0.034 | 0.012 | 0.012 | 0.104 | 0.008 | 0.013 | 0.020 |
Discrimination is measured by the AUROC. The Brier score is a combined measure of discrimination and calibration. Calibration is measured by the Spiegelhalter z test, average absolute error, H-L test, MCE, ECE, Cox slope and intercept, and ICI. SVM estimates for the test set produced were improperly calibrated. Application of Platt scaling, isotonic regression, or BBQ was performed.
AUROC: area under the receiver-operating characteristic curve; BBQ: Bayesian Binning into Quantiles; ECE: expected calibration error; H-L, Hosmer-Lemeshow; ICI: integrated calibration index; LR: logistic regression; MCE: maximum calibration error; NIS: Nationwide Inpatient Sample; SVM: support vector machine.
shows significance.