Skip to main content
. 2021 Aug 15;237:118189. doi: 10.1016/j.neuroimage.2021.118189

Table 3.

BIANCA performance – scanner upgrade scenario – Summary of all the overlap measures between BIANCA output and the corresponding manual mask, calculated for the different analysis options tested in our study (using leave-one-out cross-validation whenever appropriate). Statistical tests performed on data to assess the impact of bias field correction, training modalities and FA inclusion/exclusion on the segmentation performance.

DI FPR FNR cluster-level FPR cluster-level FNR
WH1 WH2 WH1 WH2 WH1 WH2 WH1 WH2 WH1 WH2
Overlap measures Mean ± std Option A 0.52 ± 0.10 0.59 ± 0.07 0.05 ± 0.04 0.33 ± 0.16 0.63 ± 0.09 0.43 ± 0.08 0.09 ± 0.07 0.69 ± 0.14 0.610.13 0.48 ± 0.11
Option B 0.75 ± 0.06 0.64 ± 0.03 0.18 ± 0.08 0.22 ± 0.10 0.28 ± 0.09 0.42 ± 0.07 0.33 ± 0.17 0.57 ± 0.18 0.35 ± 0.17 0.47 ± 0.09
Option C 0.75 ± 0.06 0.73 ± 0.05 0.18 ± 0.08 0.26 ± 0.12 0.28 ± 0.09 0.24 ± 0.08 0.33 ± 0.17 0.53 ± 0.18 0.35 ± 0.17 0.35 ± 0.12
Option D 0.76 ± 0.05 0.71 ± 0.04 0.22 ± 0.09 0.23 ± 0.11 0.23 ± 0.08 0.30 ± 0.07 0.42 ± 0.16 0.52 ± 0.17 0.28 ± 0.15 0.40 ± 0.10
Option E 0.48 ± 0.11 0.45 ± 0.06 0.07 ± 0.05 0.09 ± 0.09 0.66 ± 0.09 0.68 ± 0.06 0.15 ± 0.09 0.17 ± 0.10 0.55 ± 0.14 0.63 ± 0.10
Effect of Bias field correction Between-subject analysis: independent t-test WH1 vs WH2 WH1 vs WH2 WH1 vs WH2 WH1 vs WH2 WH1 vs WH2
Option A 0.061 < 0.001 *** < 0.001 *** < 0.001 *** 0.020 *
Option B < 0.001 *** 0.259 < 0.001 *** 0.004 ** 0.049 *
Within-subject analysis: paired t-test Option A vs Option B < 0.001 *** 0.035 * < 0.001 *** 0.002 ** < 0.001 *** 0.531 < 0.001 *** < 0.001 *** < 0.001 *** 0.306
Effect of Training modalities Training - Scanner interaction: two-ways mixed ANOVA test < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 ***
Main effect of the Scanner (between-subject factor): independent t-test WH1 vs WH2 WH1 vs WH2 WH1 vs WH2 WH1 vs WH2 WH1 vs WH2
Option B < 0.001 *** 0.259 < 0.001 *** 0.004 ** 0.049 *
Option C 0.433 0.071 0.272 0.013 ** 0.998
Option D 0.046 * 0.861 0.049 * 0.178 0.049 *
Main effect of the Training (within-subject factor): repeated measures one-way ANOVA test (F-test and post-hocs) 0.466 < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** 0.036 * < 0.001 *** < 0.001 ***
Option B vs Option C ——— < 0.001 *** ——— < 0.001 *** ——— < 0.001 *** ——— 0.45 ——— < 0.001 ***
Option B vs Option D ——— < 0.001 *** < 0.001 *** 0.309 < 0.001 *** < 0.001 *** < 0.001 *** 0.045 * < 0.001 *** < 0.001 ***
Option C vs Option D ——— 0.044 * < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** 1 < 0.001 *** 0.002 **
Effect of FA inclusion/ exclusion Between-subject analysis: independent t-test WH1 vs WH2 WH1 vs WH2 WH1 vs WH2 WH1 vs WH2 WH1 vs WH2
Option D 0.046 * 0.861 0.049 * 0.178 0.049 *
Option E 0.462 0.461 0.484 0.565 0.134
Within-subject analysis: paired t-test Option D vs Option E < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 *** < 0.001 ***

Options tested in our study are: (A) without BC, single-site training, FA included; (B) with BC, single-site training, FA included; (C) with BC, site-specific training, FA included; (D) with BC, mixed training, FA included; (E) with BC, mixed training, FA excluded. For each metric we reported: (i) mean ± std values relative to all datasets involved in our study (WH1, WH2); (ii) impact exerted by bias field correction on BIANCA performance (between- and within-subject analysis performed using independent and paired t-tests respectively); (iii) impact exerted by training modalities on BIANCA performance (two-ways mixed ANOVA test assessing the interaction between training and scanner (between- and within-subject factors respectively); when the interaction term resulted being significant we decomposed the analysis in two separate components assessing the main effect of training (repeated measures one-way ANOVA test evaluating differences between the investigated options for each dataset involved in our study; F-test and post-hoc comparisons are displayed) and the main effect of scanner (independent t-test evaluating differences between the investigated dataset for each option involved in our analysis); (iv) impact exerted by FA inclusion/exclusion on BIANCA performance (between- and within-subject analysis performed using independent and paired t-tests respectively). Results relative to the statistical tests are all reported in terms of p-values: * (< 0.05), ** (< 0.01), *** (< 0.001). Legend: DI = Dice Similarity Index, FPR = False Positive Ratio, FNR = False Negative Ratio, WH1 = Whitehall dataset 1, WH2 = Whitehall dataset 2.