. Author manuscript; available in PMC: 2021 Jun 8.

Published in final edited form as: Domain Adapt Represent Transf Med Image Learn Less Labels Imperfect Data (2019). 2019 Oct 13;2019:54–62. doi: 10.1007/978-3-030-33391-1_7

Table 1.

Performance of different methods on the target (POPPY) and the source domain (MICCAI 2017 WMH Challenge). We report the dice between our models’ predictions and the ground truth annotations in the source domain as well as the HD95. The evaluation on target domains is done with the Dice, the HD95, the volume difference (VD) and the recall. A significative rank measure is calculated across all metrics. Results are reported with the format median (IQR) in percentages for all metrics except the HD95 in mm. Best results are in bold andunderlined when significantly better than all others (p < 0.05 paired Wilcoxon tests).

	POPPY				MICCAI
	Dice	HD	VD	Recall	Dice	HD	Rank
PC+Adv+Aug	54.5 (10.6)	32.7 (9.8)	15.2 (22.8)	52.4 (14.4)	81.4 (9.6)	28.5 (8.6)	2.5
PC+Aug	53.2 (15.1)	39.2 (15.5)	25.4 (15.6)	43.5 (12.5)	81.6 (15.5)	18.6 (4.8)	3.3
PC	50.7 (17.0)	35.1 (11.9)	16.6 (21.4)	43.6 (11.0)	81.4 (22.6)	17.2 (3.6)	3.4
MT	48.6 (12.3)	33.6 (14.8)	33.7 (19.0)	40.9 (5.0)	80.0 (18.2)	20.0 (7.3)	4.3
Baseline+Aug	42.8 (14.6)	34.9 (11.1)	39.3 (22.3)	33.5 (12.6)	80.6 (14.8)	17.8 (4.9)	4.9
Baseline	43.0 (16.2)	33.3 (15.1)	40.3 (24.8)	33.3 (14.8)	81.1 (16.9)	17.5 (3.3)	5.6
Adv	41.8 (15.4)	32.6 (6.1)	25.2 (24.0)	33.5 (12.7)	82.5 (12.0)	17.6 (5.2)	5.7
Adv+Aug	41.4 (16.4)	36.6 (9.0)	38.0 (16.0)	33.6 (13.9)	81.9 (11.1)	19.7 (11.0)	6.3