Skip to main content
. 2019 Jul 15;10:3111. doi: 10.1038/s41467-019-11012-3

Table 1.

Best weak supervision vs. hand labeled models

Model Size Precision Recall F1 AUROC NDCG
HL 106 10.0 [1.3, 18.7] 20.0 [5.4, 34.6] 12.8 [2.5, 23.1] 85.4 [80.8, 90.0] 40.6 [36.4, 44.9]
HL + Aug. 106 30.7 [20.8, 40.6] 53.3 [38.7, 68.0] 37.8 [27.7, 47.9] 83.4 [79.5, 87.3] 55.7 [51.5, 59.9]
WS 4239 83.3 [64.5, 100.0] 53.3 [38.7, 68.0] 60.8 [50.6, 71.0] 91.4 [87.8, 95.0] 84.5 [81.1, 88.0]
WS + Aug. 4239 70.0 [55.4, 84.6] 60.0 [48.1, 72.0] 61.4 [55.3, 67.5] 94.4 [91.3, 97.6] 87.3 [83.6, 91.0]

WS indicates weak supervision models, HL indicates hand-labeled models, and Aug. indicates augmentation. Scores are computed with 95% confidence intervals (where n = the size column), with bold text indicating best performance overall