Table 1.
Precision, recall, and F1 score of ClinicNet, institutional order sets, and logistic regression when thresholded to similar levels of recall
| Evaluation metrics |
||||
|---|---|---|---|---|
| Models | Precision (95% CI) | Recall (95% CI) | F1 (95% CI) | AUROC (95% CI) |
| Logistic | 0.204 (0.200–0.208) | 0.469 (0.464–0.473) | 0.285 (0.280–0.289) | 0.815 (0.812–0.817) |
| Institutional | 0.149 (0.147–0.151) | 0.463 (0.458–0.469) | 0.226 (0.223–0.228) | |
| ClinicNet | 0.317 (0.314–0.320) | 0.468 (0.463–0.472) | 0.378 (0.375–0.381) | 0.908 (0.906–0.909) |
Note: As institutional order sets consist of a single threshold point, AUROC is left blank. Metrics were bootstrapped with a sample size of 10 000 for 1000 iterations to get reported CIs. Evaluation was performed at the patient-level rather than the clinical item-level. The following thresholds were used: Logistic regression = 0.11, ClinicNet = 0.50. Bold indicates highest metric.
Abbreviations: AUROC: area under the receiver operating characteristics; CI: confidence interval.