Skip to main content
. 2024 Dec 3;14:30136. doi: 10.1038/s41598-024-81718-y

Table 2.

Performance metrics of all DL network configurations and human observer.

Observer Variant Input AUROC Acc Sen Spe SB Log loss
AI N1 Inline graphic 0.97 ± 0.02 0.97 0.93 1 0.07 0.42
Inline graphic 0.94 ± 0.05 0.91 1 0.84 –0.16 0.39
Inline graphic 0.89 ± 0.06 0.91 1 0.84 –0.16 0.44
N2 Inline graphic 0.99 ± 0.01 0.97 1 0.95 –0.05 0.13
Inline graphic 0.98 ± 0.02 0.94 1 0.89 –0.11 0.15
Inline graphic 0.93 ± 0.05 0.97 1 0.95 –0.05 0.30
N3 Inline graphic 0.98 ± 0.02 0.94 0.93 0.95 0.01 0.15
Inline graphic 0.99 ± 0.01 0.97 1 0.95 –0.05 0.14
Inline graphic 0.93 ± 0.05 0.85 1 0.74 –0.26 0.36
Human 0.82 0.93 0.74 –0.19

AUROC = area under the ROC curve; Acc = accuracy (overall classification correctness); Sen = sensitivity (female classification accuracy); Spe = specificity (male classification accuracy); SB = sex bias (specificity – sensitivity). For DL networks, the AUROC is the average across all five models from the 5-fold cross validation. The log loss is calculated between the true labels and the average probability of being female.