. 2023 Mar 6;38:103368. doi: 10.1016/j.nicl.2023.103368

Table 6.

Results on dataset2 (testing clinical dataset). Results are presented as mean ± standard error of the mean across the dataset. Volume Error Rate denoted as VER, absolute VER denoted as AVER. Best performances respective to each annotator are denoted in boldface.

Method	Data Augmentation	Annotator	Dice	Recall	Precision	VER	AVER	Pearson r
1-step	no	1	0.61 ± 0.02	0.63 ± 0.02	0.63 ± 0.03	0.01 ± 0.06	0.24 ± 0.03	0.68
		2	0.56 ± 0.02	0.49 ± 0.02	0.69 ± 0.03	0.49 ± 0.10	0.60 ± 0.08	0.47
	yes	1	0.64 ± 0.01	0.58 ± 0.02	0.74 ± 0.02	0.32 ± 0.06	0.34 ± 0.05	0.64
		2	0.56 ± 0.02	0.44 ± 0.02	0.80 ± 0.01	0.94 ± 0.11	0.94 ± 0.11	0.48

2-step	no	1	0.62 ± 0.02	0.64 ± 0.02	0.61 ± 0.03	−0.04 ± 0.05	0.19 ± 0.03	0.75
		2	0.56 ± 0.02	0.51 ± 0.02	0.68 ± 0.03	0.42 ± 0.09	0.51 ± 0.07	0.52
	yes	1	0.67 ± 0.01	0.62 ± 0.01	0.75 ± 0.02	0.22 ± 0.03	0.24 ± 0.03	0.84
		2	0.59 ± 0.02	0.47 ± 0.02	0.81 ± 0.01	0.80 ± 0.08	0.80 ± 0.08	0.62

Inter-rater agreement			0.64 ± 0.02	0.78 ± 0.01	0.55 ± 0.02	−0.29 ± 0.03	0.29 ± 0.03	0.64