Skip to main content
. 2023 Aug 28;68(17):175031. doi: 10.1088/1361-6560/acef8f

Table 3.

Inter-reader variability of lesion matching versus the performance of the automated lesion matching method (auto) by disease burden. Cases were divided into three disease-burden cohorts: low-(<10 lesions), intermediate- (10–29 lesions) and high- (30 or more lesions) burden. Data are reported as median (range). P-values are tests for significant differences between IRV and automated matching performance (Wilcoxon paired tests).

Precision Recall F1 score N d
Low burden (N = 14)
IRV 1.00 (0.25, 1.00) 1.00 (0.50, 1.00) 1.00 (0.40, 1.00) 0 (0, 3)
Auto 1.00 (0.50, 1.00) 1.00 (0.33, 1.00) 1.00 (0.40, 1.00) 0 (0, 3)
p 0.85 0.41 1.00 1.00
Intermediate burden (N = 17)
IRV 1.00 (0.80, 1.00) 1.00 (0.77, 1.00) 1.00 (0.79, 1.00) 0 (0, 7)
Auto 0.92 (0.59, 1.00) 0.91 (0.71, 1.00) 0.91 (0.65, 1.00) 2 (0, 14)
p 0.17 0.27 0.17 0.11
High burden (N = 9)
IRV 0.95 (0.69, 1.00) 0.91 (0.74, 1.00) 0.93 (0.72, 1.00) 5 (0, 58)
Auto 0.91 (0.66, 1.00) 0.86 (0.60, 1.00) 0.89 (0.63, 1.00) 17 (0, 59)
p 0.12 0.26 0.26 0.18