Skip to main content
. 2019 Oct 23;7(4):e14993. doi: 10.2196/14993

Table 4.

Comparison of model’s performance metrics applied on the balanced training dataset using 10-fold cross-validation and the imbalanced test dataset to predict delirium after cardiac surgery. Performance metrics: receiver operator curve-area under the curve, harmonic mean of precision and recall, and precision-recall curve-area under the curve. All measures are reported out of 100% with standard deviation in brackets as a measure of variability.

Model ROC-AUCa F1 scoreb PRC-AUCc


Yesd Δe Nof Δ Avgg Δ Yes Δ No Δ Avg Δ
Dataset: 10-fold cross-validation applied on the balanced training dataset (N=1014, delirium=50%)

ANNh 80.4 (4) nsi 71.7 (5) ns 71.7 (5) ns 71.7 (5) ns 78.5 (5) ns 80.1 (5) ns 79.3 (5) ns

BBNj 77.4 (4) k 70.1 (5) ns 69.1 (5) ns 69.6 (5) ns 75.3 (5) ns 77.3 (5) ns 76.3 (5)

DTl 77.2 (4) ns 70.9 (4) ns 72.4 (4) ns 71.7 (4) ns 74.4 (5) ns 73.8 (5) ns 73.8 (5)

LRm 81.4 (4) Bn 72.3 (5) B 74.2 (5) B 73.2 (5) B 79.8 (5) B 81 (5) B 80.4 (5) B

NBo 79.9 (4) ns 72.7 (5) ns 73.2 (5) ns 73 (5) ns 78.1 (5) ns 79.8 (5) ns 78.9 (5) ns

RFp 81.3 (4) ns 74.1 (5) ns 72.6 (5) ns 73.3 (5) ns 78.8 (5) ns 81 (5) ns 79.9 (5) ns

SVMq 81.1 (5) ns 67.2 (6) 74.4 (6) ns 71.1 (6) 80.4 (5) ns 80.5 (5) ns 80.4 (5) ns
Dataset: Imbalanced test dataset (N=1117, delirium=11.4%)

ANN 78.2 (6) ns 35.8 (9) ns 82.4 (9) ns 77.1 (9) ns 30.4 (9) +r 96.2 (9) ns 88.7 (9) ns

BBN 77.3 (6) ns 34.3 (8) ns 82.9 (8) ns 76.6 (8) ns 30.7 (8) + 95.8 (8) ns 88.4 (8) ns

DT 74.6 (7) 37.3 (8) ns 83.9 (8) ns 78.6 (8) ns 25.3 (8) ns 94.3 (8) ns 86.5 (8) ns

LR 77.5 (5) B 37.6 (11) B 84.9 (11) B 79.5 (11) B 27.1 (10) B 97.1 (10) B 88.4 (10) B

NB 75.6 (8) ns 34.7 (10) ns 81.9 (10) ns 76.6 (10) ns 28.7 (9) ns 95.6 (9) ns 88.0 (9) ns

RF 78.0 (4) ns 37.4 (8) ns 82.3 (8) ns 77.2 (8) ns 28.3 (8) ns 96.3 (8) ns 88.6 (8) ns

SVM 77.2 (6) ns 40.2 (7) + 87.2 (7) + 81.9 (7) + 29.6 (9) + 96.0 (9) ns 88.4 (9) ns

aROC-AUC: receiver operator curve-area under the curve.

bF1 score: harmonic mean of precision and recall.

cPRC-AUC: precision-recall curve-area under the curve.

dYes: positive instances or patients who developed delirium.

eChange compared to base model (B)

fNo: negative instances or patients who did not develop delirium.

gAvg: weighted average measured as the sum of all values in that metric, each weighted according to the number of instances with that particular class label by multiplying that value by the number of instances in that class, then divided by the total number of instances in the dataset.

hANN: artificial neural networks.

ins: not a statistically significant change in performance (P≥.05).

jBBN: Bayesian belief networks.

kStatistically significant deterioration of performance metric (P<.05).

lDT: J48 decision tree.

mLR: logistic regression.

nB: base comparator (reference) algorithm.

oNB: naïve Bayesian.

pRF: random forest.

qSVM: support vector machines.

rStatistically significant improvement of performance metric (P<.05).