. 2022 Jan 11;1:796078. doi: 10.3389/fradi.2021.796078

Table 2.

Evaluation on the test set.

	Baseline	Wu et al. (18)	Ensembles	MC-drop	Ours (MTL)
Overall	70.69 ± 0.36 (<0.01)	71.26 (<0.01)	73.39 ± 0.98 (<0.01)	72.82 ± 0.29 (<0.01)	80.46 ± 0.29
Density
A	78.42 ± 0.12 (0.03)	81.34 (0.13)	79.14 ± 0.91 (0.05)	80.83 ± 0.10 (0.12)	87.26 ± 0.14
B	68.14 ± 0.19 (<0.01)	70.32 (<0.01)	72.14 ± 1.05 (<0.01)	70.97 ± 0.17 (<0.01)	79.58 ± 0.13
C	67.15 ± 0.16 (0.19)	68.46 (0.21)	68.23 ± 0.90 (0.23)	66.42 ± 0.16 (0.10)	74.23 ± 0.10
D	64.81 ± 0.09 (0.07)	74.81 (0.36)	67.12 ± 0.87 (0.11)	73.33 ± 0.19 (0.23)	83.44 ± 0.07
View angle
CC	63.72 ± 0.14 (<0.01)	69.76 (<0.01)	67.54 ± 1.01 (<0.01)	66.27 ± 0.29 (<0.01)	78.47 ± 0.13
MLO	78.16 ± 0.21 (0.14)	69.82 (<0.01)	79.29 ± 1.12 (<0.30)	79.72 ± 0.29 (0.31)	82.44 ± 0.03

Binary cancer classification performance [“AUC score (p-value to ours)”] of the compared methods on the entire D_{HMI_test} dataset and for the subsets filtered per density class and per view angle.