Table 2.
(a) Dice Similarity Coefficient (DSC) and (b) Mean Surface Distance (MSD) (in units of mm) between different normalization methods (No Normalization [NoNorm], Batch Normalization [BN] in Train/Test mode, Instance Normalization [IN], and Layer Normalization [LN]) in DenseUnet (k=48) on the test set. “*” indicates significant difference (p < 0.05/8 = 6.25e-3 considering Bonferroni correction) when comparing each of the other scenarios to the NoNorm network. Without normalization, the network not only runs fastest but also obtains the best performance. Also, it is worth noting that using the training mode of BN during inference obtains a large performance improvement over using the testing mode of BN. We highlight the best performances in bold.
DenseUnet | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Liver | Pancreas | Right Kidney | Left Kidney | Stomach | Duodenum | Small Intestine | Spinal Cord | Vertebral Body | Spleen | Mean | P-value | Runtime (s) | |
NoNorm | 0.961±0.008 | 0.860±0.042 | 0.954±0.006 | 0.952±0.009 | 0.907±0.024 | 0.766±0.066 | 0.839±0.085 | 0.898±0.021 | 0.886±0.015 | 0.944±0.013 | 0.897±0.059 | - | 8.307 |
BN_TrainMode | 0.960±0.009 | 0.828±0.074 | 0.940±0.009 | 0.951±0.008 | 0.889±0.046 | 0.732±0.076 | 0.790±0.103 | 0.866±0.028 | 0.889±0.017 | 0.934±0.018 | 0.878±0.071* | 5.336e-07 | 18.052 |
BN_TestMode | 0.957±0.011 | 0.801±0.105 | 0.935±0.023 | 0.923±0.040 | 0.861±0.059 | 0.626±0.117 | 0.734±0.159 | 0.857±0.042 | 0.871±0.014 | 0.883±0.096 | 0.845±0.096* | 1.451e-06 | 9.703 |
IN | 0.960±0.009 | 0.826±0.068 | 0.944±0.007 | 0.948±0.010 | 0.888±0.042 | 0.726±0.078 | 0.782±0.116 | 0.851±0.039 | 0.874±0.028 | 0.935±0.023 | 0.874±0.074* | 8.155e-06 | 13.216 |
LN | 0.960±0.011 | 0.818±0.076 | 0.950±0.007 | 0.951±0.007 | 0.884±0.055 | 0.704±0.112 | 0.834±0.071 | 0.896±0.016 | 0.898±0.012 | 0.940±0.013 | 0.883±0.076* | 7.014e-04 | 13.503 |
Liver | Pancreas | Right Kidney | Left Kidney | Stomach | Duodenum | Small Intestine | Spinal Cord | Vertebral Body | Spleen | Mean | P-value | Runtime (s) | |
NoNorm | 1.135±0.204 | 1.307±0.472 | 0.693±0.083 | 0.812±0.377 | 1.905±0.698 | 2.189±0.865 | 2.771±2.608 | 0.678±0.098 | 0.994±0.200 | 1.047±0.436 | 1.353±0.669 | - | 8.307 |
BN_TrainMode | 1.188±0.269 | 2.270±1.167 | 1.212±0.893 | 0.934±0.493 | 2.732±1.241 | 2.931 ±1.295 | 5.081±3.932 | 0.875±0.219 | 1.097±0.200 | 1.180±0.632 | 1.950±1.269* | 4.297e-07 | 18.052 |
BN_TestMode | 1.370±0.290 | 2.265±1.153 | 1.006±0.449 | 1.058±0.572 | 3.420±2.691 | 4.192±2.462 | 3.890±2.526 | 0.933±0.254 | 1.211±0.340 | 2.129±2.216 | 2.147±1. 196* | 1.951e-07 | 9.703 |
IN | 1.211±0.263 | 2.946±1.734 | 1.262±1.104 | 1.003±0.586 | 2.808±1.612 | 3.773±1.640 | 5.743±4.552 | 0.905±0.211 | 1.656±1.208 | 1.276±0.788 | 2.258±1.485* | 4.803e-06 | 13.216 |
LN | 1.179±0.282 | 1.726±0.841 | 0.714±0.138 | 0.808±0.218 | 2.078±0.882 | 2.653±1.144 | 3.476±3.081 | 0.691±0.140 | 0.922±0.142 | 1.259±0.740 | 1.550±0.886* | 2.853e-04 | 13.503 |