Skip to main content
. 2024 Feb 12;14:3522. doi: 10.1038/s41598-024-53528-9

Table 11.

Performance evolution on the test dataset, for the tumour between the baseline (BS) and the optimised hyperparameter combination.

Model Dice 5mm SD Precision Recall HD (mm)
BS OP p BS OP p BS OP p BS OP p BS OP p
nnFormer 68.5 69.7 _ 67.6 67.7 _ 78.9 82.9 _ 65.7 64.4 _ 66.7 72.3 _
nnUNet 68.1 68.1 _ 67.5 67.8 _ 78.8 84.2 _ 67.9 63.9 _ 60.2 60.1 _
SegmentationNet 55.8 61.7 _ 52.5 58.3 _ 61.6 74.2 _ 55.1 59.2 _ 91.5 84.8 _
Swin-UNETR 47.7 55.1 _ 42.0 50.9 *** 48.7 78.2 * 57.2 48.0 *** 162.9 58.5 **
TransBTS 51.4 62.1 ** 46.2 59.2 _ 58.4 73.1 *** 53.6 59.9 * 176.7 73.7 ***
UNETR 33.1 41.5 *** 30.0 35.3 * 40.9 50.7 _ 31.9 39.2 ** 121.7 138.1 *
VT-UNet 54.3 56.6 _ 49.6 53.9 _ 61.1 71.6 _ 55.8 56.0 ** 98.0 72.7 _

Italic indicates the best inter-model value for each metric. Stars indicate the level of significance of differences between baseline and optimised results based on paired t-test and Wilcoxon signed-rank test depending on the distribution of the results on the test set according to the Shapiro-Wilk test (no star means not significant, * means p < 0.05, ** means p < 0.01, and *** means p < 0.001).