. 2021 May 26;10:156. doi: 10.1186/s13643-021-01700-x

Table 2.

Evaluation of the workflow performance using the recommended practice as the reference standard

Case study	Type of analysis	Precision (%)	Sensitivity^* (%)	F1-score (%)	Specificity (%)	Accuracy (%)	N_WF, N_S, ΔN (# of eligible abstracts)	# missed studies^•	Workload reduction^♦ %	Hours^♣ saved
SR — diabetes (14, 314 abstracts)	Main analysis^a	71	88	79	99.3	98	655, 743, 88	0	63%	91 h
SR — diabetes (14, 314 abstracts)	SA: k-NN₂ = 25	64	94	76	99.7	97	700, 743, 43	0	49%	70 h
	SA: r = 300	70	89	78	99.4	97	660, 743, 83	0	62%	89 h
	SA: k-NN₁ = 15	68	89	77	99.4	97	664, 743, 79	0	61%	88 h
	SA: ϕ = 80%	72	88	79	99.3	98	653, 743, 90	0	63%	91 h
	SA: ϕ = 90%	68	88	76	99.3	97	653, 743, 90	0	64%	91 h
	SA: 2 distance measures^b	77	84	80	99.1	98	623, 743, 120	0	74%	105 h
Scoping — KS methods (17, 200 abstracts)	Main analysis^a	72	89	79	99.3	97	852, 957, 105	6	55%	95 h
Scoping — KS methods (17, 200 abstracts)	SA: k-NN₂ = 25	65	95	77	99.7	97	907, 957, 50	3	39%	68 h
	SA: r = 300	72	90	80	99.4	97	858, 957, 99	5	54%	92 h
	SA: k-NN₁ = 15	73	89	80	99.4	98	853, 957, 104	5	54%	94 h
	SA: ϕ = 80%	72	89	79	99.4	98	847, 957, 110	8	55%	95 h
	SA: ϕ = 90%	73	88	80	99.3	98	842, 957, 115	8	56%	96 h
	SA: 2 distance measures^b	79	82	80	98.9	98	785, 957, 172	17	70%	119 h

*Sensitivity or recall; results of the sensitivity analyses are displayed in decreasing sensitivity of the workflow’s performance. ^♣Person-hours that were tallied across reviewers. ^aThe main analysis was conducted with distance definitions from three feature representations (SVD-based, LDA-based and word-embedding features), a threshold ϕ = 70%, k-nearest-neighbor (k-NN₁) for phase 1 of 8, k-NN for phase 2 (k-NN₂) of 15, and initial sample size r = 600 (Table A1). SVD: singular value decomposition. LDA latent Dirichlet allocation. ^bThis sensitivity analysis used 2 distance measures from the SVD-based and word-embedding-based features. SR systematic review. SS scoping review. KS knowledge synthesis. SA sensitivity analysis. NN nearest-neighbors. ^♦Workload reduction: the number of abstracts saved with the workflow, relative to the recommended practice of screening all abstracts by 2 reviewers. N_WF—Number of eligible abstracts identified by the workflow. N_S—Number of eligible abstracts identified via screening by 2 human reviewers (recommended practice). Δ_N—The number of eligible studies missed by the workflow: N_S − N_WF. ^•The number of missed studies due to the full-text screening of the N_WF eligible abstracts instead of full-text screening the N_S eligible abstracts