Table 3.
BIANCA performance – scanner upgrade scenario – Summary of all the overlap measures between BIANCA output and the corresponding manual mask, calculated for the different analysis options tested in our study (using leave-one-out cross-validation whenever appropriate). Statistical tests performed on data to assess the impact of bias field correction, training modalities and FA inclusion/exclusion on the segmentation performance.
DI | FPR | FNR | cluster-level FPR | cluster-level FNR | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
WH1 | WH2 | WH1 | WH2 | WH1 | WH2 | WH1 | WH2 | WH1 | WH2 | |||
Overlap measures | Mean ± std | Option A | 0.52 ± 0.10 | 0.59 ± 0.07 | 0.05 ± 0.04 | 0.33 ± 0.16 | 0.63 ± 0.09 | 0.43 ± 0.08 | 0.09 ± 0.07 | 0.69 ± 0.14 | 0.610.13 | 0.48 ± 0.11 |
Option B | 0.75 ± 0.06 | 0.64 ± 0.03 | 0.18 ± 0.08 | 0.22 ± 0.10 | 0.28 ± 0.09 | 0.42 ± 0.07 | 0.33 ± 0.17 | 0.57 ± 0.18 | 0.35 ± 0.17 | 0.47 ± 0.09 | ||
Option C | 0.75 ± 0.06 | 0.73 ± 0.05 | 0.18 ± 0.08 | 0.26 ± 0.12 | 0.28 ± 0.09 | 0.24 ± 0.08 | 0.33 ± 0.17 | 0.53 ± 0.18 | 0.35 ± 0.17 | 0.35 ± 0.12 | ||
Option D | 0.76 ± 0.05 | 0.71 ± 0.04 | 0.22 ± 0.09 | 0.23 ± 0.11 | 0.23 ± 0.08 | 0.30 ± 0.07 | 0.42 ± 0.16 | 0.52 ± 0.17 | 0.28 ± 0.15 | 0.40 ± 0.10 | ||
Option E | 0.48 ± 0.11 | 0.45 ± 0.06 | 0.07 ± 0.05 | 0.09 ± 0.09 | 0.66 ± 0.09 | 0.68 ± 0.06 | 0.15 ± 0.09 | 0.17 ± 0.10 | 0.55 ± 0.14 | 0.63 ± 0.10 | ||
Effect of Bias field correction | Between-subject analysis: independent t-test | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | ||||||
Option A | 0.061 | < 0.001 *** | < 0.001 *** | < 0.001 *** | 0.020 * | |||||||
Option B | < 0.001 *** | 0.259 | < 0.001 *** | 0.004 ** | 0.049 * | |||||||
Within-subject analysis: paired t-test | Option A vs Option B | < 0.001 *** | 0.035 * | < 0.001 *** | 0.002 ** | < 0.001 *** | 0.531 | < 0.001 *** | < 0.001 *** | < 0.001 *** | 0.306 | |
Effect of Training modalities | Training - Scanner interaction: two-ways mixed ANOVA test | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | ||||||
Main effect of the Scanner (between-subject factor): independent t-test | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | |||||||
Option B | < 0.001 *** | 0.259 | < 0.001 *** | 0.004 ** | 0.049 * | |||||||
Option C | 0.433 | 0.071 | 0.272 | 0.013 ** | 0.998 | |||||||
Option D | 0.046 * | 0.861 | 0.049 * | 0.178 | 0.049 * | |||||||
Main effect of the Training (within-subject factor): repeated measures one-way ANOVA test (F-test and post-hocs) | 0.466 | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | 0.036 * | < 0.001 *** | < 0.001 *** | ||
Option B vs Option C | ——— | < 0.001 *** | ——— | < 0.001 *** | ——— | < 0.001 *** | ——— | 0.45 | ——— | < 0.001 *** | ||
Option B vs Option D | ——— | < 0.001 *** | < 0.001 *** | 0.309 | < 0.001 *** | < 0.001 *** | < 0.001 *** | 0.045 * | < 0.001 *** | < 0.001 *** | ||
Option C vs Option D | ——— | 0.044 * | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | 1 | < 0.001 *** | 0.002 ** | ||
Effect of FA inclusion/ exclusion | Between-subject analysis: independent t-test | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | WH1 vs WH2 | ||||||
Option D | 0.046 * | 0.861 | 0.049 * | 0.178 | 0.049 * | |||||||
Option E | 0.462 | 0.461 | 0.484 | 0.565 | 0.134 | |||||||
Within-subject analysis: paired t-test | Option D vs Option E | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** | < 0.001 *** |
Options tested in our study are: (A) without BC, single-site training, FA included; (B) with BC, single-site training, FA included; (C) with BC, site-specific training, FA included; (D) with BC, mixed training, FA included; (E) with BC, mixed training, FA excluded. For each metric we reported: (i) mean ± std values relative to all datasets involved in our study (WH1, WH2); (ii) impact exerted by bias field correction on BIANCA performance (between- and within-subject analysis performed using independent and paired t-tests respectively); (iii) impact exerted by training modalities on BIANCA performance (two-ways mixed ANOVA test assessing the interaction between training and scanner (between- and within-subject factors respectively); when the interaction term resulted being significant we decomposed the analysis in two separate components assessing the main effect of training (repeated measures one-way ANOVA test evaluating differences between the investigated options for each dataset involved in our study; F-test and post-hoc comparisons are displayed) and the main effect of scanner (independent t-test evaluating differences between the investigated dataset for each option involved in our analysis); (iv) impact exerted by FA inclusion/exclusion on BIANCA performance (between- and within-subject analysis performed using independent and paired t-tests respectively). Results relative to the statistical tests are all reported in terms of p-values: * (< 0.05), ** (< 0.01), *** (< 0.001). Legend: DI = Dice Similarity Index, FPR = False Positive Ratio, FNR = False Negative Ratio, WH1 = Whitehall dataset 1, WH2 = Whitehall dataset 2.