Table 4. CT volume test set AUROC for models trained on 9 vs. 83 labels.
The area under the receiver operating characteristic (AUROC) is shown for CT-Net-9 (trained only on the 9 labels shown) and CT-Net-83 (trained on the 9 labels shown plus 74 additional labels) for the test set of 7,209 examples. CT-Net-83 outperforms CT-Net-9 on all abnormalities, emphasizing the value of the additional 74 labels. Note that we also experimented with separate binary classifiers for each of the 9 labels independently, but these models did not converge (AUROC ~0.5). Positive Count and Positive Percent are for positive examples of the abnormality in the test set.
Abnormality | Positive Count | Positive Percent | CT-Net-9 |
CT-Net-83 |
DeLong | ||
---|---|---|---|---|---|---|---|
AUROC | 95% CI | AUROC | 95% CI | p-value | |||
nodule | 5,617 | 77.9 | 0.682 | 0.667–0.698 | 0.718 | 0.703–0.732 | 3.346×10−7 |
opacity | 3,877 | 53.8 | 0.617 | 0.605–0.630 | 0.740 | 0.728–0.751 | <4.950×10−16 |
atelectasis | 2,037 | 28.3 | 0.683 | 0.668–0.697 | 0.765 | 0.753–0.777 | <4.950×10−16 |
pleural effusion | 1,404 | 19.5 | 0.945 | 0.937–0.952 | 0.951 | 0.945–0.958 | 1.882×10−2 |
consolidation | 1,086 | 15.1 | 0.719 | 0.703–0.736 | 0.816 | 0.804–0.829 | <4.950×10−16 |
mass | 863 | 12.0 | 0.624 | 0.604–0.644 | 0.773 | 0.755–0.791 | <4.950×10−16 |
pericardial eff. | 1,078 | 15.0 | 0.659 | 0.640–0.677 | 0.697 | 0.679–0.714 | 8.315×10−8 |
cardiomegaly | 649 | 9.0 | 0.791 | 0.774–0.807 | 0.851 | 0.836–0.867 | 7.000×10−13 |
pneumothorax | 205 | 2.8 | 0.816 | 0.785–0.847 | 0.904 | 0.882–0.926 | 8.810×10−11 |