. 2021 Jul 22;121:108197. doi: 10.1016/j.patcog.2021.108197

Table 10.

Comparison of HPC-XGB with respect to the best-performing state of the art model (DT [10], [11]) trained for each subtask. We measured the performance of testing procedure c) in terms of average accuracy and average recall. $↑$ indicates whether the recall distribution of the proposed HPC-XGB over the 11 GPs (Core Data Team) is significantly higher than DT according to the one-sided Wilcoxon signed-rank test (significance level = 0.05).

Model	# of patients	Labels	Accuracy (mean [std])	Recall (mean rev[std])
HPC-XGB B1	3392	{20,26}	0.661 (0.327)	0.493 (0.092)
DT [10], [11] B1	3392	{20,26}	0.912 (0.113)	0.499 (0.024)
HPC-XGB B2	4972	{19,25}	0.764 (0.230)	0.701 $↑$ (0.155)
DT [10], [11] B2	4972	{19,25}	0.801 (0.082)	0.674 (0.116)
HPC-XGB B3	2872	{16,18,22,24}	0.652 (0.148)	0.553 $↑$ (0.099)
DT [10], [11] B3	2872	{16,18,22,24}	0.535 (0.163)	0.458 (0.105)
HPC-XGB B4	1796	{15,17,21}	0.573 (0.213)	0.468 $↑$ (0.077)
DT [10], [11] B4	1796	{15,17,21}	0.477 (0.152)	0.401 (0.064)
HPC-XGB B5	1104	{9,10,11,12,13,14}	0.301 (0.202)	0.266 (0.155)
DT [10], [11] B5	1104	{9,10,11,12,13,14}	0.277(0.148)	0.268 (0.143)
HPC-XGB B6	1110	{2,4,5,6,7}	0.453 (0.263)	0.369 $↑$ (0.139)
DT [10], [11] B6	1110	{2,4,5,6,7}	0.377 (0.159)	0.289 (0.097)
HPC-XGB B7	279	{1,3}	0.734 (0.141)	0.750 $↑$ (0.117)
DT [10], [11] B7	279	{1,3}	0.661 (0.121)	0.643 (0.110)