. 2020 Dec 25;29:102548. doi: 10.1016/j.nicl.2020.102548

Table C.6.

Evaluation metrics after training models on different training set (all, reperfused, and non-reperfused) with different fusion strategies (early and late) and evaluating them on reperfused testing patients (a) and non-reperfused testing patients (b) (average values ± standard deviation). Bold values correspond to the best value of the respective evaluation metric (column-wise). A two-sided wilcoxon signed-rank test was performed between global model and the two other models (reperfused and non-reperfused) for a given fusion strategy, with (.) indicating P < 0.10, (*) indicating P < 0.05, (**) indicating P < 0.01 and (***) indicating P < 0.001.

(a) Evaluation on reperfused testing patients
Fusion	Training	DSC	VS	Precision	Recall	HD
early	all	0.39 ± 0.25	0.59 ± 0.30	0.56 ± 0.31	0.40 ± 0.26	29.51 ± 16.26
early	reperfused	0.41 ± 0.25	0.64 ± 0.30	0.46 ± 0.29 (***)	0.49 ± 0.30 (***)	31.24 ± 15.61
early	non-reperfused	0.36 ± 0.22 (*)	0.63 ± 0.27	0.54 ± 0.26	0.33 ± 0.24 (***)	26.64 ± 11.16

late	all	0.43 ± 0.24	0.69 ± 0.27	0.55 ± 0.28	0.43 ± 0.25	33.23 ± 15.64
late	reperfused	0.44 ± 0.25	0.70 ± 0.27	0.50 ± 0.27	0.50 ± 0.26 (***)	38.58 ± 18.15
late	non-reperfused	0.35 ± 0.21 (***)	0.57 ± 0.28 (***)	0.60 ± 0.25 (***)	0.31 ± 0.24 (***)	40.05 ± 15.66 (**)

(b) Evaluation on non-reperfused testing patients
Fusion	Training	DSC	VS	Precision	Recall	HD
early	all	0.42 ± 0.24	0.62 ± 0.27	0.42 ± 0.28	0.55 ± 0.29	30.98 ± 18.23
early	reperfused	0.41 ± 0.26	0.51 ± 0.31	0.36 ± 0.29	0.69 ± 0.24	30.94 ± 16.30
early	non-reperfused	0.42 ± 0.18	0.66 ± 0.17	0.42 ± 0.24	0.55 ± 0.22	28.48 ± 13.63

late	all	0.44 ± 0.21	0.66 ± 0.26	0.39 ± 0.25	0.63 ± 0.21	30.61 ± 16.15
late	reperfused	0.44 ± 0.22	0.63 ± 0.25	0.36 ± 0.23	0.69 ± 0.22 (*)	44.53 ± 16.79 (**)
late	non-reperfused	0.47 ± 0.17	0.74 ± 0.13	0.49 ± 0.22 (**)	0. 52 ± 0.21 (***)	37.70 ± 17.74