Skip to main content
. 2023 Aug 28;68(17):175031. doi: 10.1088/1361-6560/acef8f

Table 2.

Inter-reader variability of lesion matching versus the performance of the automated lesion matching method (auto) by disease cohorts. Data are reported as median (range). P-values are tests for significant differences between IRV and automated matching performance (Wilcoxon paired tests).

Precision Recall F1 score N d
NSCLC (N = 10)
IRV 0.97 (0.67, 1.00) 1.00 (0.50, 1.00) 0.98 (0.57, 1.00) 0.5 (0, 4)
Auto 0.92 (0.80, 1.00) 0.89 (0.71, 1.00) 0.91 (0.75, 1.00) 2.5 (0, 8)
p 0.74 0.26 0.40 0.18
Head and neck (N = 10)
IRV 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0 (0, 0)
Auto 1.00 (0.50, 1.00) 1.00 (0.33, 1.00) 1.00 (0.40, 1.00) 0 (0, 3)
p 0.11 0.11 0.11 0.08
Lymphoma (N = 10)
IRV 1.00 (0.25, 1.00) 1.00 (1.00, 1.00) 1.00 (0.40, 1.00) 0 (0, 3)
Auto 1.00 (0.92, 1.00) 1.00 (0.80, 1.00) 1.00 (0.86, 1.00) 0 (0, 4)
p 1.00 0.18 1.00 0.41
Advanced Cancers (N = 10)
IRV 0.92 (0.69, 1.00) 0.86 (0.74, 0.96) 0.89 (0.72, 0.96) 5.5 (2, 58)
Auto 0.88 (0.59, 1.00) 0.86 (0.60, 1.00) 0.87 (0.63, 1.00) 15.5 (0, 59)
p 0.24 0.95 0.86 0.53
ALL (N = 40)
IRV 1.00 (0.25, 1.00) 1.00 (0.50, 1.00) 1.00 (0.40, 1.00) 0 (0, 58)
Auto 0.97 (0.50, 1.00) 0.92 (0.33, 1.00) 0.94 (0.40, 1.00) 2 (0, 59)
p 0.14 0.05 0.12 0.06