Table 2.
Evaluation of the workflow performance using the recommended practice as the reference standard
Case study | Type of analysis | Precision (%) | Sensitivity* (%) | F1-score (%) | Specificity (%) | Accuracy (%) | NWF, NS, ΔN (# of eligible abstracts) | # missed studies• | Workload reduction♦ % | Hours♣ saved |
---|---|---|---|---|---|---|---|---|---|---|
SR — diabetes (14, 314 abstracts) | Main analysisa | 71 | 88 | 79 | 99.3 | 98 | 655, 743, 88 | 0 | 63% | 91 h |
SA: k-NN2 = 25 | 64 | 94 | 76 | 99.7 | 97 | 700, 743, 43 | 0 | 49% | 70 h | |
SA: r = 300 | 70 | 89 | 78 | 99.4 | 97 | 660, 743, 83 | 0 | 62% | 89 h | |
SA: k-NN1 = 15 | 68 | 89 | 77 | 99.4 | 97 | 664, 743, 79 | 0 | 61% | 88 h | |
SA: ϕ = 80% | 72 | 88 | 79 | 99.3 | 98 | 653, 743, 90 | 0 | 63% | 91 h | |
SA: ϕ = 90% | 68 | 88 | 76 | 99.3 | 97 | 653, 743, 90 | 0 | 64% | 91 h | |
SA: 2 distance measuresb | 77 | 84 | 80 | 99.1 | 98 | 623, 743, 120 | 0 | 74% | 105 h | |
Scoping — KS methods (17, 200 abstracts) | Main analysisa | 72 | 89 | 79 | 99.3 | 97 | 852, 957, 105 | 6 | 55% | 95 h |
SA: k-NN2 = 25 | 65 | 95 | 77 | 99.7 | 97 | 907, 957, 50 | 3 | 39% | 68 h | |
SA: r = 300 | 72 | 90 | 80 | 99.4 | 97 | 858, 957, 99 | 5 | 54% | 92 h | |
SA: k-NN1 = 15 | 73 | 89 | 80 | 99.4 | 98 | 853, 957, 104 | 5 | 54% | 94 h | |
SA: ϕ = 80% | 72 | 89 | 79 | 99.4 | 98 | 847, 957, 110 | 8 | 55% | 95 h | |
SA: ϕ = 90% | 73 | 88 | 80 | 99.3 | 98 | 842, 957, 115 | 8 | 56% | 96 h | |
SA: 2 distance measuresb | 79 | 82 | 80 | 98.9 | 98 | 785, 957, 172 | 17 | 70% | 119 h |
*Sensitivity or recall; results of the sensitivity analyses are displayed in decreasing sensitivity of the workflow’s performance. ♣Person-hours that were tallied across reviewers. aThe main analysis was conducted with distance definitions from three feature representations (SVD-based, LDA-based and word-embedding features), a threshold ϕ = 70%, k-nearest-neighbor (k-NN1) for phase 1 of 8, k-NN for phase 2 (k-NN2) of 15, and initial sample size r = 600 (Table A1). SVD: singular value decomposition. LDA latent Dirichlet allocation. bThis sensitivity analysis used 2 distance measures from the SVD-based and word-embedding-based features. SR systematic review. SS scoping review. KS knowledge synthesis. SA sensitivity analysis. NN nearest-neighbors. ♦Workload reduction: the number of abstracts saved with the workflow, relative to the recommended practice of screening all abstracts by 2 reviewers. NWF—Number of eligible abstracts identified by the workflow. NS—Number of eligible abstracts identified via screening by 2 human reviewers (recommended practice). ΔN—The number of eligible studies missed by the workflow: NS − NWF. •The number of missed studies due to the full-text screening of the NWF eligible abstracts instead of full-text screening the NS eligible abstracts