Skip to main content
. 2019 Oct 23;7(4):e14993. doi: 10.2196/14993

Table 3.

Comparison of model’s performance metrics applied on the balanced training dataset using 10-fold cross-validation and the imbalanced test dataset to predict delirium after cardiac surgery. Performance metrics: accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and Cohen kappa. All measures are reported out of 100% with standard deviation in brackets as a measure of variability.

Model Accuracy Δa Sensitivity Δ Specificity Δ PPVb Δ NPVc Δ Kappa Δ
Dataset: 10-fold cross-validation applied on the balanced training dataset (N=1014, delirium=50%)

ANNd 71.7 (4.3) nse 71.8 (7) +f 71.6 (7) g 71.7 (5) 71.7 (7) 43.3 (9) ns

BBNh 71.3 (4.4) ns 72.2 (7) + 71.2 (7) 69.9 (5) 71.3 (7) 43.1 (9) ns

DTi 70.1 (4.3) ns 68.1 (7) ns 72.9 (9) ns 72.9 (5) ns 72.6 (9) ns 43.3 (8) ns

LRj 73.3 (4.4) Bk 69.8 (7) B 76.7 (7) B 75 (5) B 75.6 (6) B 44.5 (9) B

NBl 73.0 (4.2) ns 64.8 (7) ns 79.5 (5) + 74.4 (5) ns 79.5 (5) + 42.9 (8) ns

RFm 72.5 (4.4) ns 74.3 (7) + 71.7 (7) 72.1 (4) ns 72.8 (7) 45.7 (9) ns

SVMn 71.3 (4.5) ns 60.2 (8) 83.8 (5) + 77.8 (5) + 83.1 (5) + 43.2 (9) ns
Dataset: Imbalanced test dataset (N=1117, delirium=11.4%)

ANN 74.3 (3.2) ns 67.7 (5) + 72.9 (5) ns 24.3 (14) ns 94.6 (5) ns 22.85 (9) ns

BBN 74.1 (3.8) ns 68.7 (9) + 70.8 (9) 22.9 (15) ns 94.5 (6) ns 21.81 (11) ns

DT 74.4 (5.4) ns 66.9 (10) + 75.4 (10) ns 25.8 (17) ns 94.7 (10) ns 24.97 (13) ns

LR 75.6 (4.7) B 64.6 (9) B 77.1 (7) B 26.5 (16) B 94.4 (8) B 22.6 (13) B

NB 71.7 (3.1) 66.1 (12) ns 72.4 (8) 23.5 (18) ns 94.3 (9) ns 21.55 (10) ns

RF 75.4 (3.4) ns 72.4 (4) + 72.4 (4) 25.2 (8) + 95.3 (4) + 24.69 (7) ns

SVM 78.9 (2.1) + 62.2 (4) ns 81.1(3.2) + 29.7 (12) + 94.4 (6) ns 29.33 (9) +

aChange compared to base model (B).

bPPV: positive predictive value.

cNPV: negative predictive value.

dANN: artificial neural networks.

ens: not a statistically significant change in performance (P≥.05).

fStatistically significant improvement of performance metric (P<.05).

gStatistically significant deterioration of performance metric (P<.05).

hBBN: Bayesian belief networks.

iDT: J48 decision tree.

jLR: logistic regression.

kB: base comparator (reference) algorithm.

lNB: naïve Bayesian.

mRF: random forest.

nSVM: support vector machines.