. 2021 Aug 15;237:118189. doi: 10.1016/j.neuroimage.2021.118189

Table 3.

BIANCA performance – scanner upgrade scenario – Summary of all the overlap measures between BIANCA output and the corresponding manual mask, calculated for the different analysis options tested in our study (using leave-one-out cross-validation whenever appropriate). Statistical tests performed on data to assess the impact of bias field correction, training modalities and FA inclusion/exclusion on the segmentation performance.

			DI		FPR		FNR		cluster-level FPR		cluster-level FNR
			WH1	WH2	WH1	WH2	WH1	WH2	WH1	WH2	WH1	WH2
Overlap measures	Mean ± std	Option A	0.52 ± 0.10	0.59 ± 0.07	0.05 ± 0.04	0.33 ± 0.16	0.63 ± 0.09	0.43 ± 0.08	0.09 ± 0.07	0.69 ± 0.14	0.610.13	0.48 ± 0.11
		Option B	0.75 ± 0.06	0.64 ± 0.03	0.18 ± 0.08	0.22 ± 0.10	0.28 ± 0.09	0.42 ± 0.07	0.33 ± 0.17	0.57 ± 0.18	0.35 ± 0.17	0.47 ± 0.09
		Option C	0.75 ± 0.06	0.73 ± 0.05	0.18 ± 0.08	0.26 ± 0.12	0.28 ± 0.09	0.24 ± 0.08	0.33 ± 0.17	0.53 ± 0.18	0.35 ± 0.17	0.35 ± 0.12
		Option D	0.76 ± 0.05	0.71 ± 0.04	0.22 ± 0.09	0.23 ± 0.11	0.23 ± 0.08	0.30 ± 0.07	0.42 ± 0.16	0.52 ± 0.17	0.28 ± 0.15	0.40 ± 0.10
		Option E	0.48 ± 0.11	0.45 ± 0.06	0.07 ± 0.05	0.09 ± 0.09	0.66 ± 0.09	0.68 ± 0.06	0.15 ± 0.09	0.17 ± 0.10	0.55 ± 0.14	0.63 ± 0.10
Effect of Bias field correction	Between-subject analysis: independent t-test		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2
		Option A	0.061		< 0.001 ***		< 0.001 ***		< 0.001 ***		0.020 *
		Option B	< 0.001 ***		0.259		< 0.001 ***		0.004 **		0.049 *
	Within-subject analysis: paired t-test	Option A vs Option B	< 0.001 ***	0.035 *	< 0.001 ***	0.002 **	< 0.001 ***	0.531	< 0.001 ***	< 0.001 ***	< 0.001 ***	0.306
Effect of Training modalities	Training - Scanner interaction: two-ways mixed ANOVA test		< 0.001 ***		< 0.001 ***		< 0.001 ***		< 0.001 ***		< 0.001 ***
	Main effect of the Scanner (between-subject factor): independent t-test		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2
		Option B	< 0.001 ***		0.259		< 0.001 ***		0.004 **		0.049 *
		Option C	0.433		0.071		0.272		0.013 **		0.998
		Option D	0.046 *		0.861		0.049 *		0.178		0.049 *
	Main effect of the Training (within-subject factor): repeated measures one-way ANOVA test (F-test and post-hocs)		0.466	< 0.001 ***	< 0.001 ***	< 0.001 ***	< 0.001 ***	< 0.001 ***	< 0.001 ***	0.036 *	< 0.001 ***	< 0.001 ***
		Option B vs Option C	———	< 0.001 ***	———	< 0.001 ***	———	< 0.001 ***	———	0.45	———	< 0.001 ***
		Option B vs Option D	———	< 0.001 ***	< 0.001 ***	0.309	< 0.001 ***	< 0.001 ***	< 0.001 ***	0.045 *	< 0.001 ***	< 0.001 ***
		Option C vs Option D	———	0.044 *	< 0.001 ***	< 0.001 ***	< 0.001 ***	< 0.001 ***	< 0.001 ***	1	< 0.001 ***	0.002 **
Effect of FA inclusion/ exclusion	Between-subject analysis: independent t-test		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2		WH1 vs WH2
		Option D	0.046 *		0.861		0.049 *		0.178		0.049 *
		Option E	0.462		0.461		0.484		0.565		0.134
	Within-subject analysis: paired t-test	Option D vs Option E	< 0.001 ***		< 0.001 ***		< 0.001 ***		< 0.001 ***		< 0.001 ***

Options tested in our study are: (A) without BC, single-site training, FA included; (B) with BC, single-site training, FA included; (C) with BC, site-specific training, FA included; (D) with BC, mixed training, FA included; (E) with BC, mixed training, FA excluded. For each metric we reported: (i) mean ± std values relative to all datasets involved in our study (WH1, WH2); (ii) impact exerted by bias field correction on BIANCA performance (between- and within-subject analysis performed using independent and paired t-tests respectively); (iii) impact exerted by training modalities on BIANCA performance (two-ways mixed ANOVA test assessing the interaction between training and scanner (between- and within-subject factors respectively); when the interaction term resulted being significant we decomposed the analysis in two separate components assessing the main effect of training (repeated measures one-way ANOVA test evaluating differences between the investigated options for each dataset involved in our study; F-test and post-hoc comparisons are displayed) and the main effect of scanner (independent t-test evaluating differences between the investigated dataset for each option involved in our analysis); (iv) impact exerted by FA inclusion/exclusion on BIANCA performance (between- and within-subject analysis performed using independent and paired t-tests respectively). Results relative to the statistical tests are all reported in terms of p-values: * (< 0.05), ** (< 0.01), *** (< 0.001). Legend: DI = Dice Similarity Index, FPR = False Positive Ratio, FNR = False Negative Ratio, WH1 = Whitehall dataset 1, WH2 = Whitehall dataset 2.