. 2020 Apr 20;3:58. doi: 10.1038/s41746-020-0248-0

Table 4.

a McNemar test results comparing logistic regression (LR) models and deep neural network (DNN) models classification errors when choosing best thresholds by the highest F1 score. b McNemar test results comparing individual DNN to combined DNN.

		AKI^a		Reintubation		Mortality		Any outcome
Logistic regression model	DNN model	p	p < 0.05	p	p < 0.05	p	p < 0.05	p	p < 0.05
LR OFS	DNN combined RFS	4.62E−15	TRUE	4.39E−01	FALSE	1.77E−06	TRUE	5.92E−34	TRUE
LR OFS	DNN combined OFS	1.34E−11	TRUE	8.42E−06	TRUE	8.78E−01	FALSE	6.05E−03	TRUE
LR OFS	DNN combined OFS + MAP features	8.01E−10	TRUE	5.08E−01	FALSE	1.26E−01	FALSE	2.54E−21	TRUE
LR OFS	DNN individual OFS	5.92E−01	FALSE	5.72E−04	TRUE	2.01E−02	TRUE	1.90E−02	TRUE
LR OFS	DNN individual RFS	3.34E−02	TRUE	1.33E−06	TRUE	2.12E−12	TRUE	1.32E−07	TRUE
LR OFS	DNN individual OFS + MAP Features	3.38E−22	TRUE	5.29E−06	TRUE	2.89E−03	TRUE	7.37E−16	TRUE
LR RFS	DNN combined RFS	2.39E−10	TRUE	3.15E−01	FALSE	1.75E−01	FALSE	7.52E−04	TRUE
LR RFS	DNN combined OFS	7.48E−08	TRUE	3.12E−05	TRUE	1.82E−03	TRUE	4.49E−24	TRUE
LR RFS	DNN combined OFS + MAP features	3.63E−06	TRUE	6.80E−01	FALSE	8.58E−06	TRUE	3.67E−37	TRUE
LR RFS	DNN individual OFS	1.28E−02	TRUE	1.76E−03	TRUE	8.14E−02	FALSE	2.86E−03	TRUE
LR RFS	DNN individual RFS	9.53E−01	FALSE	3.56E−06	TRUE	4.77E−05	TRUE	3.25E−09	TRUE
LR RFS	DNN individual OFS + MAP features	1.36E−17	TRUE	3.03E−05	TRUE	3.21E−01	FALSE	6.21E−18	TRUE
LR OFS + MAP features	DNN combined RFS	4.54E−14	TRUE	6.38E−01	FALSE	1.77E−06	TRUE	4.11E−02	TRUE
LR OFS + MAP features	DNN combined OFS	7.89E−11	TRUE	2.51E−06	TRUE	8.83E−01	FALSE	1.49E−18	TRUE
LR OFS + MAP features	DNN combined OFS + MAP features	7.09E−09	TRUE	3.43E−01	FALSE	1.35E−01	FALSE	2.81E−31	TRUE
LR OFS + MAP features	DNN individual OFS	2.90E−01	FALSE	1.41E−04	TRUE	3.57E−02	TRUE	1.15E−01	FALSE
LR OFS + MAP features	DNN individual RFS	1.09E−01	FALSE	3.59E−07	TRUE	4.03E−12	TRUE	5.36E−06	TRUE
LR OFS + MAP features	DNN individual OFS + MAP features	3.81E−21	TRUE	9.69E−07	TRUE	6.60E−03	TRUE	2.09E−13	TRUE

		AKI^a		Reintubation		Mortality		Any Outcome
DNN individual	DNN combined	p	p < 0.05	p	p < 0.05	p	p < 0.05	p	p < 0.05
DNN individual OFS	DNN combined OFS	7.78E−03	TRUE	1.00E+00	FALSE	6.16E−01	FALSE	5.58E−01	FALSE
DNN individual OFS	DNN combined OFS + MAP features	2.50E−01	FALSE	6.54E−40	TRUE	1.67E−38	TRUE	7.99E−13	TRUE
DNN individual OFS	DNN combined RFS	1.34E−01	FALSE	2.74E−51	TRUE	2.46E−47	TRUE	9.38E−28	TRUE
DNN individual RFS	DNN combined OFS	1.42E−07	TRUE	2.76E−05	TRUE	1.05E−07	TRUE	1.50E−02	TRUE
DNN individual RFS	DNN combined OFS + MAP features	1.42E−01	FALSE	1.93E−18	TRUE	2.36E−15	TRUE	4.71E−05	TRUE
DNN individual RFS	DNN combined RFS	2.54E−01	FALSE	3.36E−29	TRUE	1.21E−23	TRUE	2.92E−16	TRUE
DNN individual OFS + MAP features	DNN combined OFS	1.80E−10	TRUE	1.97E−27	TRUE	4.81E−31	TRUE	4.93E−07	TRUE
DNN individual OFS + MAP features	DNN combined OFS + MAP features	2.51E−03	TRUE	1.28E−02	TRUE	4.41E−02	TRUE	1.06E−01	FALSE
DNN individual OFS + MAP features	DNN combined RFS	1.04E−02	TRUE	4.93E−07	TRUE	2.40E−05	TRUE	8.26E−09	TRUE

McNemar test p values < 0.05 were considered significant, indicating that the classifiers have significantly different proportion of errors when classifying acute kidney injury (AKI), reintubation, mortality, or any outcome for the test set (N = 11,996) when comparing the logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome.

Bolded results are the smallest p values for the given outcome.

An example of how to interpret this table is: for correctly classifying any outcome, all LR and DNN models were significantly different (p < 0.05) from each other except for LR OFS + MAP and DNN Individual OFS. The best performing F1 score LR model was LR OFS (F1 score 0.504, sensitivity 0.542, specificity 0.941, and precision 0.471) and the best performing DNN model was DNN individual OFS + MAP (F1 score 0.482; sensitivity 0.584; specificity 0.918; and precision 0.41).

^aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all results for AKI are from those patients with AKI labels.