. 2022 Jun 22;12:10566. doi: 10.1038/s41598-022-14672-2

Table 2.

Comparison of segmentation performance for the five DL training methods for all scans in the testing set.

Experimental DL methods	Evaluation metrics: median (range)
Experimental DL methods	DSC	Avg HD (mm)	HD95 (mm)	XOR
Train on ³He	0.961 (0.765, 0.981)	2.335 (35.91, 0.644)	10.00 (140.9, 1.934)	0.079 (0.613, 0.037)
Train on ¹²⁹Xe	0.964 (0.886, 0.983)	1.341 (3.911, 0.675)	4.809 (15.90, 1.875)	0.072 (0.253, 0.035)
Train on ³He, fine-tuned on ¹²⁹Xe	0.963 (0.892, 0.983)	1.384 (4.628, 0.636)	4.971 (29.80, 1.934)	0.075 (0.238, 0.034)
Train on ¹²⁹Xe, fine-tuned on ³He	0.968 (0.842, 0.983)	1.483 (10.84, 0.596)	4.935 (67.85, 1.563)	0.066 (0.372, 0.034
Combined ³He and ¹²⁹Xe training	0.971 (0.886, 0.983)	1.234 (5.630, 0.594)	4.193 (52.70, 1.875)	0.059 (0.255, 0.035)

Medians (ranges) are given; the best result for each metric is in bold.