Skip to main content
. 2021 May 26;10:156. doi: 10.1186/s13643-021-01700-x

Table 2.

Evaluation of the workflow performance using the recommended practice as the reference standard

Case study Type of analysis Precision (%) Sensitivity* (%) F1-score (%) Specificity (%) Accuracy (%) NWF, NS, ΔN (# of eligible abstracts) # missed studies Workload reduction % Hours saved
SR — diabetes (14, 314 abstracts) Main analysisa 71 88 79 99.3 98 655, 743, 88 0 63% 91 h
SA: k-NN2 = 25 64 94 76 99.7 97 700, 743, 43 0 49% 70 h
SA: r = 300 70 89 78 99.4 97 660, 743, 83 0 62% 89 h
SA: k-NN1 = 15 68 89 77 99.4 97 664, 743, 79 0 61% 88 h
SA: ϕ = 80% 72 88 79 99.3 98 653, 743, 90 0 63% 91 h
SA: ϕ = 90% 68 88 76 99.3 97 653, 743, 90 0 64% 91 h
SA: 2 distance measuresb 77 84 80 99.1 98 623, 743, 120 0 74% 105 h
Scoping — KS methods (17, 200 abstracts) Main analysisa 72 89 79 99.3 97 852, 957, 105 6 55% 95 h
SA: k-NN2 = 25 65 95 77 99.7 97 907, 957, 50 3 39% 68 h
SA: r = 300 72 90 80 99.4 97 858, 957, 99 5 54% 92 h
SA: k-NN1 = 15 73 89 80 99.4 98 853, 957, 104 5 54% 94 h
SA: ϕ = 80% 72 89 79 99.4 98 847, 957, 110 8 55% 95 h
SA: ϕ = 90% 73 88 80 99.3 98 842, 957, 115 8 56% 96 h
SA: 2 distance measuresb 79 82 80 98.9 98 785, 957, 172 17 70% 119 h

*Sensitivity or recall; results of the sensitivity analyses are displayed in decreasing sensitivity of the workflow’s performance. Person-hours that were tallied across reviewers. aThe main analysis was conducted with distance definitions from three feature representations (SVD-based, LDA-based and word-embedding features), a threshold ϕ = 70%, k-nearest-neighbor (k-NN1) for phase 1 of 8, k-NN for phase 2 (k-NN2) of 15, and initial sample size r = 600 (Table A1). SVD: singular value decomposition. LDA latent Dirichlet allocation. bThis sensitivity analysis used 2 distance measures from the SVD-based and word-embedding-based features. SR systematic review. SS scoping review. KS knowledge synthesis. SA sensitivity analysis. NN nearest-neighbors. Workload reduction: the number of abstracts saved with the workflow, relative to the recommended practice of screening all abstracts by 2 reviewers. NWF—Number of eligible abstracts identified by the workflow. NS—Number of eligible abstracts identified via screening by 2 human reviewers (recommended practice). ΔN—The number of eligible studies missed by the workflow: NS − NWF. The number of missed studies due to the full-text screening of the NWF eligible abstracts instead of full-text screening the NS eligible abstracts