. 2020 Nov 30;3:543405. doi: 10.3389/frai.2020.543405

TABLE 1.

Diagnostic performance for all seven doctors and the Babylon Triage and Diagnostic System (Babylon AI).

	Average recall (%)(95% CI)	Average precision (%)(95% CI)	F1-score (%)(95% CI)	Number of vignettes
Doctor A	80.9	42.9	56.1	47
Doctor B	64.1	36.8	46.7	78
Doctor C	93.8	53.5	68.1	48
Doctor D	84.3	38.1	52.5	51
Doctor E	90.0	33.9	49.2	70
Doctor F	90.2	43.3	58.5	51
Doctor G	84.3	56.5	67.7	51
Doctor average	83.9	43.6	57.0	56.6
—	(75.6–92.3)	(36.3–50.9)	(49.7–64.2)	—
Babylon AI	80.0	44.4	57.1	100

The diagnostic performance of the Babylon Triage and Diagnostic System is comparable to that of doctors in terms of the recall, precision (positive predictive value) and F1-score (harmonic mean of precision and recall) against the disease modeled by the clinical vignette.