Table I.
The performance of the artificial intelligence (AI) was evaluated in terms of root‐mean‐square deviation (RMSD) by comparing the AI to the clinical ground truth (CGT) for both the test data set (n = 7207) and a smaller subgroup (n = 188) of the test data for which manual digitization was repeated by another medical physicist (G1)
RMSD [mm] | Significance | |
---|---|---|
median (IQR) | ||
AI vs CGT (n = 7207) | 0.55 (0.35–0.86) | |
AI vs CGT (n = 188) | 0.52 (0.33–0.79) | P = 0.15 |
AI vs G1 (n = 188) | 0.75 (0.49–1.20) | P < 0.0001 |
CGT vs G1 (n = 188) | 0.80 (0.48–1.18) | P < 0.0001 |
The test data set was used as reference for statistical comparisons except for clinical ground truth (CGT) vs G1.