Table 4.
Retrospective results, with respect to: per-article AUC, NDCG@20, precision@10 and precision@20. For each we report the means and standard deviations over the 133 articles for which candidate sets were annotated for the respective domains. All sentences not in candidate sets are assumed to be irrelevant, these results are therefore noisy and likely pessimistic. We bold cells corresponding to the best performing methods for each metric, PICO element pair.
| Method | Mean AUC (SD) | Mean NDCG@20 (SD) | Precision@3 (SD) | Precision@10 (SD) | Precision@20 (SD) |
|---|---|---|---|---|---|
| Population | |||||
| Direct only | 0.904 (0.106) | 0.530 (0.270) | 0.347 (0.298) | 0.183 (0.126) | 0.116 (0.070) |
| DS | 0.941 (0.063) | 0.484 (0.243) | 0.256 (0.242) | 0.202 (0.126) | 0.129 (0.075) |
| Nguyen | 0.917 (0.091) | 0.537 (0.275) | 0.328 (0.281) | 0.189 (0.128) | 0.117 (0.072) |
| SDS | 0.947 (0.059) | 0.548 (0.263) | 0.336 (0.276) | 0.212 (0.133) | 0.132 (0.076) |
| Interventions | |||||
| Direct only | 0.893 (0.099) | 0.493 (0.265) | 0.397 (0.293) | 0.216 (0.148) | 0.139 (0.086) |
| DS | 0.933 (0.068) | 0.507 (0.239) | 0.344 (0.295) | 0.250 (0.164) | 0.172 (0.099) |
| Nguyen | 0.921 (0.073) | 0.536 (0.254) | 0.419 (0.300) | 0.248 (0.162) | 0.158 (0.097) |
| SDS | 0.936 (0.063) | 0.530 (0.249) | 0.389 (0.323) | 0.252 (0.164) | 0.172 (0.099) |
| Outcomes | |||||
| Direct only | 0.837 (0.096) | 0.261 (0.241) | 0.180 (0.244) | 0.114 (0.117) | 0.080 (0.072) |
| DS | 0.896 (0.078) | 0.308 (0.223) | 0.117 (0.203) | 0.148 (0.133) | 0.120 (0.091) |
| Nguyen | 0.870 (0.085) | 0.339 (0.256) | 0.228 (0.268) | 0.151 (0.137) | 0.106 (0.084) |
| SDS | 0.900 (0.079) | 0.333 (0.233) | 0.138 (0.212) | 0.160 (0.134) | 0.124 (0.092) |