. 2020 Apr 20;3:58. doi: 10.1038/s41746-020-0248-0

Table 3.

Best threshold chosen by highest F1 score.

AKI^a
Score	Threshold	F1 score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision (95% CI)	TN	FP	FN	TP	Accuracy (%)
ASA	3	0.412 (0.393–0.43)	0.914 (0.896–0.93)	0.27 (0.255–0.284)	0.266 (0.251–0.281)	901	2439	83	884	41.4
LR OFS	0.273071	0.538 (0.512–0.563)	0.631 (0.597–0.661)	0.793 (0.78–0.807)	0.469 (0.442–0.497)	2650	690	357	610	75.7
LR OFS + MAP features	0.27574	0.537 (0.512–0.563)	0.624 (0.59–0.654)	0.798 (0.785–0.812)	0.472 (0.444–0.5)	2666	674	364	603	75.9
LR RFS	0.287606	0.537 (0.51–0.563)	0.607 (0.575–0.637)	0.811 (0.798–0.823)	0.482 (0.454–0.511)	2708	632	380	587	76.5
DNN individual OFS	0.408436	0.545 (0.52–0.569)	0.654 (0.622–0.682)	0.784 (0.77–0.798)	0.467 (0.441–0.493)	2618	722	335	632	75.5
DNN individual OFS + MAP features	0.481765	0.559 (0.533–0.587)	0.548 (0.515–0.579)	0.881 (0.87–0.892)	0.571 (0.542–0.603)	2942	398	437	530	80.6
DNN individual RFS	0.406397	0.542 (0.516–0.568)	0.618 (0.586–0.648)	0.808 (0.794–0.821)	0.483 (0.455–0.51)	2699	641	369	598	76.5
DNN combined OFS	0.906036	0.548 (0.521–0.575)	0.568 (0.536–0.598)	0.854 (0.843–0.865)	0.53 (0.501–0.559)	2853	487	418	549	79.0
DNN combined OFS + MAP features	0.901522	0.549 (0.524–0.575)	0.58 (0.55–0.61)	0.846 (0.833–0.857)	0.521 (0.493–0.552)	2825	515	406	561	78.6
DNN combined RFS	0.869984	0.557 (0.53–0.583)	0.575 (0.543–0.606)	0.858 (0.846–0.87)	0.539 (0.51–0.569)	2865	475	411	556	79.4

Reintubation
Score	Threshold	F1 score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision (95% CI)	TN	FP	FN	TP	Accuracy (%)
ASA	4	0.152 (0.121–0.182)	0.44 (0.361–0.517)	0.941 (0.937–0.945)	0.092 (0.072–0.112)	11,142	695	89	70	93.5
LR OFS	0.08	0.21 (0.157–0.261)	0.296 (0.223–0.366)	0.98 (0.977–0.982)	0.163 (0.121–0.207)	11,595	242	112	47	97.0
LR OFS + MAP features	0.081	0.223 (0.168–0.276)	0.314 (0.24–0.389)	0.98 (0.977–0.982)	0.172 (0.129–0.22)	11,597	240	109	50	97.1
LR RFS	0.079193	0.211 (0.161–0.262)	0.302 (0.231–0.375)	0.979 (0.977–0.982)	0.163 (0.121–0.207)	11,590	247	111	48	97.0
DNN individual OFS	0.715748	0.21 (0.16–0.257)	0.333 (0.257–0.406)	0.975 (0.972–0.978)	0.153 (0.115–0.192)	11,544	293	106	53	96.7
DNN individual OFS + MAP features	0.734977	0.197 (0.149–0.243)	0.321 (0.247–0.397)	0.974 (0.971–0.977)	0.142 (0.104–0.179)	11,530	307	108	51	96.5
DNN individual RFS	0.687943	0.22 (0.17–0.269)	0.371 (0.297–0.445)	0.973 (0.97–0.976)	0.156 (0.117–0.196)	11,518	319	100	59	96.5
DNN combined OFS	0.769994	0.206 (0.164–0.252)	0.352 (0.284–0.428)	0.972 (0.969–0.975)	0.145 (0.113–0.181)	11,508	329	103	56	96.4
DNN combined OFS + MAP features	0.784518	0.228 (0.179–0.278)	0.34 (0.271–0.414)	0.978 (0.975–0.981)	0.171 (0.131–0.215)	11,576	261	105	54	96.9
DNN combined RFS	0.746933	0.213 (0.166–0.263)	0.289 (0.221–0.36)	0.981 (0.978–0.983)	0.168 (0.128–0.214)	11,610	227	113	46	97.2

Mortality
Score	Threshold	F1 score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision (95% CI)	TN	FP	FN	TP	Accuracy (%)
ASA	5	0.239 (0.138–0.356)	0.161 (0.088–0.253)	0.999 (0.998–0.999)	0.467 (0.3–0.667)	11,893	16	73	14	99.3
LR OFS	0.194	0.306 (0.208–0.402)	0.253 (0.167–0.346)	0.997 (0.996–0.998)	0.386 (0.265–0.516)	11,874	35	65	22	99.2
LR OFS + MAP features	0.203	0.306 (0.212–0.4)	0.253 (0.17–0.345)	0.997 (0.996–0.998)	0.386 (0.267–0.519)	11,874	35	65	22	99.2
LR RFS	0.135	0.287 (0.196–0.375)	0.299 (0.202–0.404)	0.994 (0.993–0.996)	0.277 (0.187–0.372)	11,841	68	61	26	98.9
DNN individual OFS	0.59	0.294 (0.202–0.389)	0.276 (0.188–0.383)	0.996 (0.994–0.997)	0.316 (0.215–0.429)	11,857	52	63	24	99.0
DNN individual OFS + MAP features	0.587	0.268 (0.181–0.36)	0.253 (0.167–0.356)	0.995 (0.994–0.996)	0.286 (0.192–0.391)	11,854	55	65	22	99.0
DNN individual RFS	0.55	0.278 (0.204–0.357)	0.368 (0.276–0.474)	0.991 (0.989–0.992)	0.224 (0.16–0.291)	11,798	111	55	32	98.6
DNN combined OFS	0.950117	0.271 (0.175–0.367)	0.218 (0.136–0.312)	0.997 (0.996–0.998)	0.358 (0.231–0.482)	11,875	34	68	19	99.1
DNN combined OFS + MAP features	0.975254	0.239 (0.138–0.344)	0.161 (0.089–0.244)	0.999 (0.998–0.999)	0.467 (0.294–0.64)	11,893	16	73	14	99.3
DNN combined RFS	0.868749	0.267 (0.183–0.346)	0.299 (0.205–0.393)	0.993 (0.992–0.995)	0.241 (0.164–0.325)	11,827	82	61	26	98.8

Any outcome
Score	Threshold	F1 score (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision (95% CI)	TN	FP	FN	TP	Accuracy (%)
ASA	4	0.36 (0.335–0.387)	0.309 (0.283–0.337)	0.96 (0.957–0.964)	0.431 (0.399–0.468)	10,494	435	737	330	90.2
LR OFS	0.122592	0.504 (0.48–0.529)	0.542 (0.513–0.572)	0.941 (0.936–0.945)	0.471 (0.445–0.498)	10,280	649	489	578	90.5
LR OFS + MAP features	0.12059	0.503 (0.48–0.53)	0.549 (0.521–0.58)	0.938 (0.934–0.943)	0.465 (0.439–0.492)	10,254	675	481	586	90.4
LR RFS	0.124499	0.503 (0.479–0.529)	0.532 (0.505–0.563)	0.943 (0.939–0.947)	0.477 (0.449–0.504)	10,305	624	499	568	90.6
DNN individual OFS	0.411454	0.479 (0.455–0.504)	0.515 (0.487–0.545)	0.938 (0.934–0.942)	0.448 (0.422–0.475)	10,252	677	518	549	90.0
DNN individual OFS + MAP features	0.395795	0.482 (0.46–0.506)	0.584 (0.555–0.616)	0.918 (0.913–0.923)	0.41 (0.386–0.434)	10,033	896	444	623	88.8
DNN individual RFS	0.402621	0.473 (0.449–0.498)	0.535 (0.508–0.567)	0.929 (0.924–0.934)	0.424 (0.399–0.452)	10,153	776	496	571	89.4
DNN combined OFS	0.710049	0.47 (0.445–0.496)	0.503 (0.475–0.534)	0.938 (0.934–0.942)	0.441 (0.412–0.47)	10,249	680	530	537	89.9
DNN combined OFS + MAP features	0.678431	0.475 (0.452–0.5)	0.587 (0.558–0.616)	0.914 (0.909–0.919)	0.399 (0.376–0.424)	9988	941	441	626	88.5
DNN combined RFS	0.632316	0.446 (0.423–0.469)	0.565 (0.535–0.595)	0.905 (0.9–0.911)	0.368 (0.345–0.39)	9894	1035	464	603	87.5

Comparison of F1 score, sensitivity, and specificity with best thresholds for acute kidney injury (AKI), reintubation, mortality, and any outcome with 95% CIs for the test set (N = 11,996) for the ASA score, logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome.

^aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all results for AKI are from those patients with AKI labels. Bolded are the best F1 scores for logistic regression and DNN models.