. 2020 Oct 27;20:276. doi: 10.1186/s12911-020-01284-x

Table 2.

Comparison table of performance metrics for MLA to standard scoring systems, at time of severe sepsis onset

	MLA ≥ 0.029 DAD training	MLA ≥ 0.030 DAD testing	MLA ≥ 0.017 CHH external validation	MEWS ≥ 2 DAD testing	SOFA ≥ 2 DAD testing	SIRS ≥ 1 DAD testing
AUROC (SD)	0.931 (0.01)	0.930 (0.01)	0.948 (0.01)	0.725	0.716	0.655
P value (MLA vs comparator)	–	–	–	P < 0.001	P < 0.001	P < 0.001
Sensitivity	0.800	0.800	0.800	0.845	0.750	0.868
Specificity	0.926	0.933	0.921	0.444	0.554	0.334
Accuracy	0.923	0.929	0.920	0.608	0.645	0.646
DOR	53.105	56.508	47.532	4.358	3.720	3.290
LR+	11.411	12.110	10.306	1.521	1.680	1.303
LR−	0.216	0.215	0.217	0.349	0.452	0.396

Detailed performance metrics for the Machine Learning Algorithm (MLA) and rules-based systems taken at the time of severe sepsis onset, using the Dascena Analysis Dataset for training and testing and the Cabell Huntington Hospital dataset for external validation. The score threshold reported for the MLA is the average over rounds of ten-fold cross-validation. AUROC for MLA versus comparators was performed using two-sample t-tests at 95% confidence. AUROC area under the receiver operating characteristic, MEWS Modified Early Warning Score, SOFA Sequential Organ Failure Assessment, SIRS Systemic Inflammatory Response Syndrome, DOR diagnostic odds ratio, LR likelihood ratio