Table 7:
For the Adult Cohort, we show the p-value for the two-sided Wilcoxon paired signed-rank test comparing the second (
LiviaNET) and third (
DeepNet) placed teams to the top (
CERES2) ranked team across the four hierarchies (Coarse, Lobe, Vermis, Lobule) of labeling and also the combination of all 38 labels (Consolidated). The mean Dice overlap for each method, at the respective hierarchy, is shown underneath the method’s name.
Hierarchy | p-value | ||
---|---|---|---|
Mean Dice Overlap | |||
Coarse |
![]() 0.9118 |
vs. ![]() 0.8967 |
6.9 × 10−3 † |
vs. ![]() 0.8908 |
6.1 × 10−5 ‡ | ||
Lobe |
![]() 0.8395 |
vs. ![]() 0.8289 |
2.2 × 10−1 |
vs. ![]() 0.8021 |
1.9 × 10−4 † | ||
Vermis |
![]() 0.8302 |
vs. ![]() 0.8012 |
1.2 × 10−2 |
vs. ![]() 0.8003 |
5.6 × 10−4 † | ||
Lobule |
![]() 0.7657 |
vs. ![]() 0.7168 |
5.5 × 10−5 ‡ |
vs. ![]() 0.7382 |
1.2 × 10−5 ‡ | ||
Consolidated |
![]() 0.8013 |
vs. ![]() 0.7657 |
3.0 × 10−7 ‡ |
vs. ![]() 0.7719 |
3.1 × 10−12 ‡ |
Denotes weak statistical significance (p-value < 0.001).
Denotes strong statistical significance (p-value < 0.0001).