Skip to main content
. 2020 Apr 20;3:58. doi: 10.1038/s41746-020-0248-0

Table 4.

a McNemar test results comparing logistic regression (LR) models and deep neural network (DNN) models classification errors when choosing best thresholds by the highest F1 score. b McNemar test results comparing individual DNN to combined DNN.

AKIa Reintubation Mortality Any outcome
Logistic regression model DNN model p p < 0.05 p p < 0.05 p p < 0.05 p p < 0.05
LR OFS DNN combined RFS 4.62E−15 TRUE 4.39E−01 FALSE 1.77E−06 TRUE 5.92E−34 TRUE
LR OFS DNN combined OFS 1.34E−11 TRUE 8.42E−06 TRUE 8.78E−01 FALSE 6.05E−03 TRUE
LR OFS DNN combined OFS + MAP features 8.01E−10 TRUE 5.08E−01 FALSE 1.26E−01 FALSE 2.54E−21 TRUE
LR OFS DNN individual OFS 5.92E−01 FALSE 5.72E−04 TRUE 2.01E−02 TRUE 1.90E−02 TRUE
LR OFS DNN individual RFS 3.34E−02 TRUE 1.33E−06 TRUE 2.12E−12 TRUE 1.32E−07 TRUE
LR OFS DNN individual OFS + MAP Features 3.38E22 TRUE 5.29E−06 TRUE 2.89E−03 TRUE 7.37E16 TRUE
LR RFS DNN combined RFS 2.39E−10 TRUE 3.15E−01 FALSE 1.75E−01 FALSE 7.52E−04 TRUE
LR RFS DNN combined OFS 7.48E−08 TRUE 3.12E−05 TRUE 1.82E−03 TRUE 4.49E−24 TRUE
LR RFS DNN combined OFS + MAP features 3.63E−06 TRUE 6.80E−01 FALSE 8.58E−06 TRUE 3.67E−37 TRUE
LR RFS DNN individual OFS 1.28E−02 TRUE 1.76E−03 TRUE 8.14E−02 FALSE 2.86E−03 TRUE
LR RFS DNN individual RFS 9.53E−01 FALSE 3.56E−06 TRUE 4.77E−05 TRUE 3.25E−09 TRUE
LR RFS DNN individual OFS + MAP features 1.36E−17 TRUE 3.03E−05 TRUE 3.21E−01 FALSE 6.21E−18 TRUE
LR OFS + MAP features DNN combined RFS 4.54E−14 TRUE 6.38E−01 FALSE 1.77E−06 TRUE 4.11E−02 TRUE
LR OFS + MAP features DNN combined OFS 7.89E−11 TRUE 2.51E−06 TRUE 8.83E−01 FALSE 1.49E−18 TRUE
LR OFS + MAP features DNN combined OFS + MAP features 7.09E−09 TRUE 3.43E01 FALSE 1.35E−01 FALSE 2.81E−31 TRUE
LR OFS + MAP features DNN individual OFS 2.90E−01 FALSE 1.41E−04 TRUE 3.57E02 TRUE 1.15E−01 FALSE
LR OFS + MAP features DNN individual RFS 1.09E−01 FALSE 3.59E−07 TRUE 4.03E−12 TRUE 5.36E−06 TRUE
LR OFS + MAP features DNN individual OFS + MAP features 3.81E−21 TRUE 9.69E−07 TRUE 6.60E−03 TRUE 2.09E−13 TRUE
AKIa Reintubation Mortality Any Outcome
DNN individual DNN combined p p < 0.05 p p < 0.05 p p < 0.05 p p < 0.05
DNN individual OFS DNN combined OFS 7.78E−03 TRUE 1.00E+00 FALSE 6.16E−01 FALSE 5.58E−01 FALSE
DNN individual OFS DNN combined OFS + MAP features 2.50E−01 FALSE 6.54E−40 TRUE 1.67E−38 TRUE 7.99E−13 TRUE
DNN individual OFS DNN combined RFS 1.34E−01 FALSE 2.74E−51 TRUE 2.46E−47 TRUE 9.38E−28 TRUE
DNN individual RFS DNN combined OFS 1.42E−07 TRUE 2.76E−05 TRUE 1.05E−07 TRUE 1.50E−02 TRUE
DNN individual RFS DNN combined OFS + MAP features 1.42E−01 FALSE 1.93E−18 TRUE 2.36E−15 TRUE 4.71E−05 TRUE
DNN individual RFS DNN combined RFS 2.54E−01 FALSE 3.36E−29 TRUE 1.21E−23 TRUE 2.92E−16 TRUE
DNN individual OFS + MAP features DNN combined OFS 1.80E−10 TRUE 1.97E−27 TRUE 4.81E−31 TRUE 4.93E−07 TRUE
DNN individual OFS + MAP features DNN combined OFS + MAP features 2.51E−03 TRUE 1.28E−02 TRUE 4.41E−02 TRUE 1.06E−01 FALSE
DNN individual OFS + MAP features DNN combined RFS 1.04E−02 TRUE 4.93E−07 TRUE 2.40E−05 TRUE 8.26E−09 TRUE

McNemar test p values < 0.05 were considered significant, indicating that the classifiers have significantly different proportion of errors when classifying acute kidney injury (AKI), reintubation, mortality, or any outcome for the test set (N = 11,996) when comparing the logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome.

Bolded results are the smallest p values for the given outcome.

An example of how to interpret this table is: for correctly classifying any outcome, all LR and DNN models were significantly different (p < 0.05) from each other except for LR OFS + MAP and DNN Individual OFS. The best performing F1 score LR model was LR OFS (F1 score 0.504, sensitivity 0.542, specificity 0.941, and precision 0.471) and the best performing DNN model was DNN individual OFS + MAP (F1 score 0.482; sensitivity 0.584; specificity 0.918; and precision 0.41).

aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all results for AKI are from those patients with AKI labels.