. 2019 Jul 15;10:3111. doi: 10.1038/s41467-019-11012-3

Table 1.

Best weak supervision vs. hand labeled models

Model	Size	Precision	Recall	F1	AUROC	NDCG
HL	106	10.0 [1.3, 18.7]	20.0 [5.4, 34.6]	12.8 [2.5, 23.1]	85.4 [80.8, 90.0]	40.6 [36.4, 44.9]
HL + Aug.	106	30.7 [20.8, 40.6]	53.3 [38.7, 68.0]	37.8 [27.7, 47.9]	83.4 [79.5, 87.3]	55.7 [51.5, 59.9]
WS	4239	83.3 [64.5, 100.0]	53.3 [38.7, 68.0]	60.8 [50.6, 71.0]	91.4 [87.8, 95.0]	84.5 [81.1, 88.0]
WS + Aug.	4239	70.0 [55.4, 84.6]	60.0 [48.1, 72.0]	61.4 [55.3, 67.5]	94.4 [91.3, 97.6]	87.3 [83.6, 91.0]

WS indicates weak supervision models, HL indicates hand-labeled models, and Aug. indicates augmentation. Scores are computed with 95% confidence intervals (where n = the size column), with bold text indicating best performance overall