Table 2.
Area under the receiver operating characteristic curve (AUC) in % observed for the hold-out test set of 2099 images that were labeled by report content and for the hold-out test set of 187 images that were labeled by re-evaluating imaging. The image-based models were trained on report-based labels with four different approaches: solely on gold labels (MG), solely on silver labels (MS), first with silver, then with gold labels (MS/G) and with silver and gold labels together (MS+G). The transformer and image-based models were trained with various numbers (N) of gold-labeled reports and images to investigate the influence of annotation effort on DDSS model performance. For MS, solely silver-labeled images were used generated by the transformer trained with N gold labels. The highest performances of the models trained with the same number of gold labels are indicated by bold font for both test sets. Significant differences between the AUCs of MG and MS or MG and MS+G or MG and MS/G are indicated by * and between the AUCs of the same model (MG/MS/MS+G/MS/G) tested on report- or image-based labels with †
Number of gold labels used | Test-set labeled by report content (N = 2099) | Test-set labeled by image content (N = 187) | |||||||||||||||
MG | MS | MS+G | MS/G | MG | MS | MS+G | MS/G | MG | MS | MS+G | MS/G | MG | MS | MS+G | MS/G | ||
Reports | Images | AUC macro-averaged | Misplaced CVC | AUC macro-averaged | Misplaced CVC | ||||||||||||
14,580 | 12,935 | 74.5 | 79.7* | 78.8* | 80.9* | 63.1 | 73.5* | 77.3* | 77.7* | 75.8 | 84.6* | 82.4 | 84.8* | 61.3 | 81.8* | 79.3* | 83.4* |
7000 | 6206 | 73.4 | 78.1* | 78.2* | 79.2* | 64.3 | 73.4* | 70.5 | 74.1* | 76.5 | 82.1* | 82.0 | 82.8 | 68.8 | 76.4 | 73.6 | 76.7 |
3500 | 3096 | 71.8 | 78.3* | 79.2* | 78.5* | 63.1 | 71.9* | 74.5* | 72.6* | 75.7 | 82.9* | 81.8 | 83.0* | 65.4 | 77.7 | 73.1 | 77.9 |
2000 | 1773 | 71.5 | 77.4* | 78.5* | 78.5* | 63.4 | 71.3* | 73.2* | 74.3* | 73.5 | 79.9 | 81.5* | 81.1* | 67.4 | 71.7 | 75.9 | 75.6 |
1000 | 877 | 67.8 | 77.5* | 77.3* | 77.9* | 59.7 | 68.6* | 69.8* | 69.6* | 69.5 | 80.3* | 82.8*† | 80.2* | 57.5 | 69.9 | 76.0 | 69.5 |
500 | 450 | 68.5 | 75.1* | 76.4* | 75.3* | 57.7 | 65.7 | 69.2* | 67.4* | 68.9 | 78.9* | 80.1* | 76.9* | 58.9 | 72.5 | 76.7 | 69.7 |
Reports | Images | Pleural effusion | Pulmonary congestion | Pleural Effusion | Pulmonary congestion | ||||||||||||
14,580 | 12,935 | 83.8 | 86.1 | 85.7 | 86.4 | 72.5 | 73.5 | 75.2 | 74.5 | 84.5 | 87.9 | 88.6 | 87.5 | 81.1 | 81.7 | 84.8† | 83.9† |
7000 | 6206 | 83.6 | 84.5 | 85.9 | 85.8 | 72.9 | 74.2 | 74.3 | 74.4 | 84.1 | 85.5 | 87.7 | 86.6 | 81.9† | 84.8† | 84.3† | 84.8† |
3500 | 3096 | 82.2 | 85.7* | 86.1* | 85.7* | 69.3 | 74.4* | 74.8* | 74.4* | 82.2 | 88.2 | 87.1 | 88.5 | 81.9† | 83.4† | 83.0† | 83.9† |
2000 | 1773 | 81.1 | 85.8* | 86.2* | 85.6* | 70.7 | 73.9 | 73.3 | 74.4 | 81.3 | 86.7 | 87.8 | 87.6 | 80.6† | 82.3† | 82.2† | 83.5† |
1000 | 877 | 79.8 | 86.3* | 85.9* | 86.2* | 69.2 | 74.3* | 73.5 | 74.6* | 79.1 | 87.2 | 86.8 | 86.8 | 81.6† | 83.8† | 84.3† | 83.9† |
500 | 450 | 80.4 | 84.4* | 84.4* | 84.8* | 68.1 | 72.7 | 71.4 | 72.9* | 79.4 | 85.5 | 82.1 | 86.3 | 76.5 | 85.4† | 81.8† | 85.0† |
Reports | Images | Pulmonary infiltrates | Pneumothorax | Pulmonary infiltrates | Pneumothorax | ||||||||||||
14,580 | 12,935 | 80.6 | 82.3 | 82.2 | 81.9 | 72.5 | 83.4 | 73.9 | 84.0 | 73.3 | 81.3 | 79.1 | 77.3 | 79.1 | 90.3 | 80.2 | 91.9 |
7000 | 6206 | 78.5 | 81.4 | 82.6 | 81.7 | 67.6 | 77.2 | 77.5 | 79.8 | 76.3 | 79.4 | 79.2 | 77.8 | 71.2 | 84.7 | 85.2 | 88.0* |
3500 | 3096 | 78.7 | 81.2 | 82.0 | 81.1 | 65.8 | 78.6 | 78.8* | 78.5 | 78.7 | 76.1 | 77.5 | 76.0 | 70.2 | 89.2* | 88.0* | 88.8* |
2000 | 1773 | 74.1 | 81.8* | 82.4* | 81.6* | 68.1 | 74.0 | 77.4 | 76.7 | 68.2 | 79.4 | 77.5 | 77.8 | 69.9 | 79.1 | 83.8 | 81.1 |
1000 | 877 | 70.3 | 80.9* | 83.1* | 81.6* | 59.9 | 77.5* | 74.2* | 77.6* | 63.6 | 74.5 | 81.3* | 75.6 | 65.9 | 86.3* | 85.5* | 85.3* |
500 | 450 | 72.4 | 79.1* | 80.7* | 78.6* | 63.9 | 73.6 | 76.3* | 72.8 | 69.0 | 75.8 | 74.8 | 72.2 | 60.8 | 75.3 | 84.9* | 71.4 |