Skip to main content
. 2023 Feb 13;89:104467. doi: 10.1016/j.ebiom.2023.104467

Table 2.

Disease detection with DenseNet-121 trained and tested on CheXpert.

Test-set No finding
White Asian Black Female Male
AUC (95% CI)
 Original 0.87 (0.86–0.88) 0.88 (0.86–0.89) 0.88 (0.87–0.90) 0.87 (0.86–0.88) 0.87 (0.86–0.88)
 Resampled 0.87 (0.86–0.88) 0.87 (0.87–0.88) 0.89 (0.88–0.89) 0.87 (0.86–0.87) 0.89 (0.88–0.89)
 Multitask 0.86 (0.86–0.87) 0.86 (0.85–0.88) 0.88 (0.86–0.90) 0.86 (0.85–0.87) 0.87 (0.86–0.88)
TPR (95% CI)
 Original 0.79 (0.77–0.80) 0.80 (0.76–0.83) 0.84 (0.80–0.88) 0.79 (0.77–0.82) 0.79 (0.77–0.81)
 Resampled 0.80 (0.78–0.81) 0.79 (0.78–0.81) 0.81 (0.80–0.82) 0.78 (0.76–0.79) 0.82 (0.81–0.83)
 Multitask 0.78 (0.76–0.80) 0.78 (0.74–0.82) 0.82 (0.78–0.87) 0.82 (0.80–0.84) 0.76 (0.74–0.78)
FPR (95% CI)
 Original 0.20 (0.20–0.20) 0.20 (0.19–0.21) 0.23 (0.21–0.24) 0.20 (0.20–0.21) 0.20 (0.20–0.20)
 Resampled 0.20 (0.20–0.21) 0.20 (0.20–0.20) 0.20 (0.19–0.20) 0.20 (0.20–0.20) 0.20 (0.20–0.20)
 Multitask 0.20 (0.20–0.20) 0.19 (0.18–0.20) 0.22 (0.21–0.24) 0.23 (0.23–0.24) 0.18 (0.17–0.18)
Youden's J statistic (95% CI)
 Original 0.59 (0.57–0.60) 0.60 (0.56–0.64) 0.61 (0.57–0.65) 0.59 (0.57–0.61) 0.59 (0.57–0.61)
 Resampled 0.59 (0.58–0.61) 0.59 (0.58–0.61) 0.61 (0.60–0.63) 0.58 (0.56–0.59) 0.62 (0.61–0.63)
 Multitask 0.58 (0.56–0.59) 0.59 (0.55–0.63) 0.60 (0.56–0.65) 0.58 (0.56–0.60) 0.58 (0.56–0.60)
Test-set Pleural effusion
White Asian Black Female Male
AUC (95% CI)
 Original 0.86 (0.86–0.87) 0.88 (0.87–0.89) 0.86 (0.85–0.88) 0.87 (0.86–0.87) 0.86 (0.86–0.87)
 Resampled 0.87 (0.86–0.87) 0.88 (0.88–0.89) 0.85 (0.84–0.85) 0.87 (0.87–0.87) 0.86 (0.86–0.86)
 Multitask 0.86 (0.86–0.87) 0.88 (0.87–0.88) 0.86 (0.85–0.88) 0.87 (0.86–0.87) 0.86 (0.86–0.87)
TPR (95% CI)
 Original 0.77 (0.76–0.78) 0.78 (0.76–0.80) 0.71 (0.68–0.74) 0.76 (0.75–0.78) 0.77 (0.76–0.78)
 Resampled 0.78 (0.78–0.79) 0.80 (0.80–0.81) 0.72 (0.71–0.73) 0.78 (0.77–0.79) 0.76 (0.75–0.76)
 Multitask 0.77 (0.75–0.78) 0.78 (0.77–0.80) 0.69 (0.66–0.73) 0.75 (0.73–0.76) 0.78 (0.77–0.79)
FPR (95% CI)
 Original 0.21 (0.20–0.21) 0.19 (0.18–0.20) 0.16 (0.14–0.17) 0.20 (0.19–0.20) 0.20 (0.20–0.21)
 Resampled 0.21 (0.21–0.21) 0.21 (0.20–0.21) 0.18 (0.18–0.19) 0.20 (0.20–0.21) 0.20 (0.19–0.20)
 Multitask 0.21 (0.20–0.21) 0.20 (0.19–0.21) 0.15 (0.14–0.17) 0.18 (0.18–0.19) 0.21 (0.21–0.22)
Youden's J statistic (95% CI)
 Original 0.56 (0.55–0.57) 0.59 (0.57–0.61) 0.55 (0.52–0.59) 0.57 (0.55–0.58) 0.57 (0.55–0.58)
 Resampled 0.57 (0.56–0.58) 0.59 (0.59–0.60) 0.54 (0.52–0.55) 0.57 (0.57–0.58) 0.56 (0.55–0.57)
 Multitask 0.56 (0.55–0.57) 0.59 (0.57–0.61) 0.54 (0.51–0.58) 0.56 (0.55–0.58) 0.57 (0.55–0.58)

Disease detection results reported separately for each race group and biological sex for ‘no finding’ (top) and ‘pleural effusion’ (bottom). TPR and FPR in subgroups are determined using a fixed decision threshold optimized over the whole patient population for a target FPR of 0.20.