Table 4.
a McNemar test results comparing logistic regression (LR) models and deep neural network (DNN) models classification errors when choosing best thresholds by the highest F1 score. b McNemar test results comparing individual DNN to combined DNN.
| AKIa | Reintubation | Mortality | Any outcome | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Logistic regression model | DNN model | p | p < 0.05 | p | p < 0.05 | p | p < 0.05 | p | p < 0.05 |
| LR OFS | DNN combined RFS | 4.62E−15 | TRUE | 4.39E−01 | FALSE | 1.77E−06 | TRUE | 5.92E−34 | TRUE |
| LR OFS | DNN combined OFS | 1.34E−11 | TRUE | 8.42E−06 | TRUE | 8.78E−01 | FALSE | 6.05E−03 | TRUE |
| LR OFS | DNN combined OFS + MAP features | 8.01E−10 | TRUE | 5.08E−01 | FALSE | 1.26E−01 | FALSE | 2.54E−21 | TRUE |
| LR OFS | DNN individual OFS | 5.92E−01 | FALSE | 5.72E−04 | TRUE | 2.01E−02 | TRUE | 1.90E−02 | TRUE |
| LR OFS | DNN individual RFS | 3.34E−02 | TRUE | 1.33E−06 | TRUE | 2.12E−12 | TRUE | 1.32E−07 | TRUE |
| LR OFS | DNN individual OFS + MAP Features | 3.38E−22 | TRUE | 5.29E−06 | TRUE | 2.89E−03 | TRUE | 7.37E−16 | TRUE |
| LR RFS | DNN combined RFS | 2.39E−10 | TRUE | 3.15E−01 | FALSE | 1.75E−01 | FALSE | 7.52E−04 | TRUE |
| LR RFS | DNN combined OFS | 7.48E−08 | TRUE | 3.12E−05 | TRUE | 1.82E−03 | TRUE | 4.49E−24 | TRUE |
| LR RFS | DNN combined OFS + MAP features | 3.63E−06 | TRUE | 6.80E−01 | FALSE | 8.58E−06 | TRUE | 3.67E−37 | TRUE |
| LR RFS | DNN individual OFS | 1.28E−02 | TRUE | 1.76E−03 | TRUE | 8.14E−02 | FALSE | 2.86E−03 | TRUE |
| LR RFS | DNN individual RFS | 9.53E−01 | FALSE | 3.56E−06 | TRUE | 4.77E−05 | TRUE | 3.25E−09 | TRUE |
| LR RFS | DNN individual OFS + MAP features | 1.36E−17 | TRUE | 3.03E−05 | TRUE | 3.21E−01 | FALSE | 6.21E−18 | TRUE |
| LR OFS + MAP features | DNN combined RFS | 4.54E−14 | TRUE | 6.38E−01 | FALSE | 1.77E−06 | TRUE | 4.11E−02 | TRUE |
| LR OFS + MAP features | DNN combined OFS | 7.89E−11 | TRUE | 2.51E−06 | TRUE | 8.83E−01 | FALSE | 1.49E−18 | TRUE |
| LR OFS + MAP features | DNN combined OFS + MAP features | 7.09E−09 | TRUE | 3.43E−01 | FALSE | 1.35E−01 | FALSE | 2.81E−31 | TRUE |
| LR OFS + MAP features | DNN individual OFS | 2.90E−01 | FALSE | 1.41E−04 | TRUE | 3.57E−02 | TRUE | 1.15E−01 | FALSE |
| LR OFS + MAP features | DNN individual RFS | 1.09E−01 | FALSE | 3.59E−07 | TRUE | 4.03E−12 | TRUE | 5.36E−06 | TRUE |
| LR OFS + MAP features | DNN individual OFS + MAP features | 3.81E−21 | TRUE | 9.69E−07 | TRUE | 6.60E−03 | TRUE | 2.09E−13 | TRUE |
| AKIa | Reintubation | Mortality | Any Outcome | ||||||
|---|---|---|---|---|---|---|---|---|---|
| DNN individual | DNN combined | p | p < 0.05 | p | p < 0.05 | p | p < 0.05 | p | p < 0.05 |
| DNN individual OFS | DNN combined OFS | 7.78E−03 | TRUE | 1.00E+00 | FALSE | 6.16E−01 | FALSE | 5.58E−01 | FALSE |
| DNN individual OFS | DNN combined OFS + MAP features | 2.50E−01 | FALSE | 6.54E−40 | TRUE | 1.67E−38 | TRUE | 7.99E−13 | TRUE |
| DNN individual OFS | DNN combined RFS | 1.34E−01 | FALSE | 2.74E−51 | TRUE | 2.46E−47 | TRUE | 9.38E−28 | TRUE |
| DNN individual RFS | DNN combined OFS | 1.42E−07 | TRUE | 2.76E−05 | TRUE | 1.05E−07 | TRUE | 1.50E−02 | TRUE |
| DNN individual RFS | DNN combined OFS + MAP features | 1.42E−01 | FALSE | 1.93E−18 | TRUE | 2.36E−15 | TRUE | 4.71E−05 | TRUE |
| DNN individual RFS | DNN combined RFS | 2.54E−01 | FALSE | 3.36E−29 | TRUE | 1.21E−23 | TRUE | 2.92E−16 | TRUE |
| DNN individual OFS + MAP features | DNN combined OFS | 1.80E−10 | TRUE | 1.97E−27 | TRUE | 4.81E−31 | TRUE | 4.93E−07 | TRUE |
| DNN individual OFS + MAP features | DNN combined OFS + MAP features | 2.51E−03 | TRUE | 1.28E−02 | TRUE | 4.41E−02 | TRUE | 1.06E−01 | FALSE |
| DNN individual OFS + MAP features | DNN combined RFS | 1.04E−02 | TRUE | 4.93E−07 | TRUE | 2.40E−05 | TRUE | 8.26E−09 | TRUE |
McNemar test p values < 0.05 were considered significant, indicating that the classifiers have significantly different proportion of errors when classifying acute kidney injury (AKI), reintubation, mortality, or any outcome for the test set (N = 11,996) when comparing the logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome.
Bolded results are the smallest p values for the given outcome.
An example of how to interpret this table is: for correctly classifying any outcome, all LR and DNN models were significantly different (p < 0.05) from each other except for LR OFS + MAP and DNN Individual OFS. The best performing F1 score LR model was LR OFS (F1 score 0.504, sensitivity 0.542, specificity 0.941, and precision 0.471) and the best performing DNN model was DNN individual OFS + MAP (F1 score 0.482; sensitivity 0.584; specificity 0.918; and precision 0.41).
aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all results for AKI are from those patients with AKI labels.