Skip to main content
. 2020 Dec 25;29:102548. doi: 10.1016/j.nicl.2020.102548

Table C.6.

Evaluation metrics after training models on different training set (all, reperfused, and non-reperfused) with different fusion strategies (early and late) and evaluating them on reperfused testing patients (a) and non-reperfused testing patients (b) (average values ± standard deviation). Bold values correspond to the best value of the respective evaluation metric (column-wise). A two-sided wilcoxon signed-rank test was performed between global model and the two other models (reperfused and non-reperfused) for a given fusion strategy, with (.) indicating P < 0.10, (*) indicating P < 0.05, (**) indicating P < 0.01 and (***) indicating P < 0.001.

(a) Evaluation on reperfused testing patients
Fusion Training DSC VS Precision Recall HD
early all 0.39 ± 0.25 0.59 ± 0.30 0.56 ± 0.31 0.40 ± 0.26 29.51 ± 16.26
early reperfused 0.41 ± 0.25 0.64 ± 0.30 0.46 ± 0.29 (***) 0.49 ± 0.30 (***) 31.24 ± 15.61
early non-reperfused 0.36 ± 0.22 (*) 0.63 ± 0.27 0.54 ± 0.26 0.33 ± 0.24 (***) 26.64 ± 11.16



late all 0.43 ± 0.24 0.69 ± 0.27 0.55 ± 0.28 0.43 ± 0.25 33.23 ± 15.64
late reperfused 0.44 ± 0.25 0.70 ± 0.27 0.50 ± 0.27 0.50 ± 0.26 (***) 38.58 ± 18.15
late non-reperfused 0.35 ± 0.21 (***) 0.57 ± 0.28 (***) 0.60 ± 0.25 (***) 0.31 ± 0.24 (***) 40.05 ± 15.66 (**)
(b) Evaluation on non-reperfused testing patients
Fusion Training DSC VS Precision Recall HD
early all 0.42 ± 0.24 0.62 ± 0.27 0.42 ± 0.28 0.55 ± 0.29 30.98 ± 18.23
early reperfused 0.41 ± 0.26 0.51 ± 0.31 0.36 ± 0.29 0.69 ± 0.24 30.94 ± 16.30
early non-reperfused 0.42 ± 0.18 0.66 ± 0.17 0.42 ± 0.24 0.55 ± 0.22 28.48 ± 13.63



late all 0.44 ± 0.21 0.66 ± 0.26 0.39 ± 0.25 0.63 ± 0.21 30.61 ± 16.15
late reperfused 0.44 ± 0.22 0.63 ± 0.25 0.36 ± 0.23 0.69 ± 0.22 (*) 44.53 ± 16.79 (**)
late non-reperfused 0.47 ± 0.17 0.74 ± 0.13 0.49 ± 0.22 (**) 0. 52 ± 0.21 (***) 37.70 ± 17.74