Table 4.

Pairwise TOST, where we test the null-hypothesis that for the target overlap score for each row-method, t_row, and the target overlap score for each column-method, t_column, $\frac{t_{row}}{t_{column}} < 0.98$ , or $\frac{t_{row}}{t_{column}} > 1.02$ .

Dataset: `LPBA40`
	`ART`	`SyN`	`LO`	`LP`	`LPC`
`LO`			N/A	✓	✓
`LP`			✓	N/A	✓
`LPC`			✓	✓	N/A
Dataset: `CUMC12`
	`ART`	`SyN`	`LO`	`LP`	`LPC`
`LO`			N/A		✓
`LP`		✓		N/A
`LPC`			✓		N/A

Dataset: `IBSR18`
	`ART`	`SyN`	`LO`	`LP`	`LPC`
`LO`		✓	N/A		✓
`LP`	✓			N/A
`LPC`		✓	✓		N/A
Dataset: `MGH10`
	`ART`	`SyN`	`LO`	`LP`	`LPC`
`LO`		✓	N/A		✓
`LP`				N/A	✓
`LPC`	✓		✓	✓	N/A

Rejecting the null-hypothesis indicates that the row-method and column-method are statistically equivalent. Equivalence is marked as ✓s in the table. We use Bonferroni correction to safe-guard against spurious results due to multiple comparisons by dividing the significance level α by 204 (the total number of statistical tests). The significance level for rejection of the null-hypothesis is α = 0.05/204.