. 2024 Feb 12;14:3522. doi: 10.1038/s41598-024-53528-9

Table 11.

Performance evolution on the test dataset, for the tumour between the baseline (BS) and the optimised hyperparameter combination.

Model	Dice			5mm SD			Precision			Recall			HD (mm)
Model	BS	OP	p	BS	OP	p	BS	OP	p	BS	OP	p	BS	OP	p
nnFormer	68.5	*69.7*	_	67.6	67.7	_	78.9	82.9	_	65.7	64.4	_	66.7	72.3	_
nnUNet	68.1	68.1	_	67.5	*67.8*	_	78.8	*84.2*	_	*67.9*	63.9	_	60.2	60.1	_
SegmentationNet	55.8	61.7	_	52.5	58.3	_	61.6	74.2	_	55.1	59.2	_	91.5	84.8	_
Swin-UNETR	47.7	55.1	_	42.0	50.9	***	48.7	78.2	*	57.2	48.0	***	162.9	*58.5*	**
TransBTS	51.4	62.1	**	46.2	59.2	_	58.4	73.1	***	53.6	59.9	*	176.7	73.7	***
UNETR	33.1	41.5	***	30.0	35.3	*	40.9	50.7	_	31.9	39.2	**	121.7	138.1	*
VT-UNet	54.3	56.6	_	49.6	53.9	_	61.1	71.6	_	55.8	56.0	**	98.0	72.7	_

Italic indicates the best inter-model value for each metric. Stars indicate the level of significance of differences between baseline and optimised results based on paired t-test and Wilcoxon signed-rank test depending on the distribution of the results on the test set according to the Shapiro-Wilk test (no star means not significant, * means p < 0.05, ** means p < 0.01, and *** means p < 0.001).