. 2020 Oct 27;47(12):6414–6420. doi: 10.1002/mp.14508

Table I.

The performance of the artificial intelligence (AI) was evaluated in terms of root‐mean‐square deviation (RMSD) by comparing the AI to the clinical ground truth (CGT) for both the test data set (n = 7207) and a smaller subgroup (n = 188) of the test data for which manual digitization was repeated by another medical physicist (G1)

	RMSD [mm]	Significance
	median (IQR)
AI vs CGT (n = 7207)	0.55 (0.35–0.86)
AI vs CGT (n = 188)	0.52 (0.33–0.79)	P = 0.15
AI vs G1 (n = 188)	0.75 (0.49–1.20)	P < 0.0001
CGT vs G1 (n = 188)	0.80 (0.48–1.18)	P < 0.0001

The test data set was used as reference for statistical comparisons except for clinical ground truth (CGT) vs G1.