Table 10.
Comparison of HPC-XGB with respect to the best-performing state of the art model (DT [10], [11]) trained for each subtask. We measured the performance of testing procedure c) in terms of average accuracy and average recall. indicates whether the recall distribution of the proposed HPC-XGB over the 11 GPs (Core Data Team) is significantly higher than DT according to the one-sided Wilcoxon signed-rank test (significance level = 0.05).
Model | # of patients | Labels | Accuracy (mean [std]) | Recall (mean rev[std]) |
---|---|---|---|---|
HPC-XGB B1 | 3392 | {20,26} | 0.661 (0.327) | 0.493 (0.092) |
DT [10], [11] B1 | 3392 | {20,26} | 0.912 (0.113) | 0.499 (0.024) |
HPC-XGB B2 | 4972 | {19,25} | 0.764 (0.230) | 0.701 (0.155) |
DT [10], [11] B2 | 4972 | {19,25} | 0.801 (0.082) | 0.674 (0.116) |
HPC-XGB B3 | 2872 | {16,18,22,24} | 0.652 (0.148) | 0.553 (0.099) |
DT [10], [11] B3 | 2872 | {16,18,22,24} | 0.535 (0.163) | 0.458 (0.105) |
HPC-XGB B4 | 1796 | {15,17,21} | 0.573 (0.213) | 0.468 (0.077) |
DT [10], [11] B4 | 1796 | {15,17,21} | 0.477 (0.152) | 0.401 (0.064) |
HPC-XGB B5 | 1104 | {9,10,11,12,13,14} | 0.301 (0.202) | 0.266 (0.155) |
DT [10], [11] B5 | 1104 | {9,10,11,12,13,14} | 0.277(0.148) | 0.268 (0.143) |
HPC-XGB B6 | 1110 | {2,4,5,6,7} | 0.453 (0.263) | 0.369 (0.139) |
DT [10], [11] B6 | 1110 | {2,4,5,6,7} | 0.377 (0.159) | 0.289 (0.097) |
HPC-XGB B7 | 279 | {1,3} | 0.734 (0.141) | 0.750 (0.117) |
DT [10], [11] B7 | 279 | {1,3} | 0.661 (0.121) | 0.643 (0.110) |