TABLE VI:
The results of cross-dataset evaluation. The AUROC and AUPRC results of various models trained on CheXpert dataset and evaluated on MIMIC-CXR dataset. The upper part is the AUROC results and the lower part is the AUPRC results.
| EC | Card | AO | LL | Edem | Cons | Pneu1 | Atel | Pneu2 | PE | PO | Frac | SD | mean | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||||||
| AUROC | A-GCN-PPS | 0.567 | 0.770 | 0.708 | 0.661 | 0.884 | 0.754 | 0.603 | 0.764 | 0.752 | 0.895 | 0.701 | 0.610 | 0.839 | 0.731 |
| A-GCN-APS | 0.552 | 0.754 | 0.686 | 0.661 | 0.852 | 0.720 | 0.604 | 0.743 | 0.717 | 0.864 | 0.685 | 0.587 | 0.797 | 0.709 | |
| AlexNet | 0.563 | 0.758 | 0.704 | 0.655 | 0.880 | 0.753 | 0.619 | 0.760 | 0.736 | 0.888 | 0.701 | 0.604 | 0.835 | 0.727 | |
|
| |||||||||||||||
| R-GCN-PPS | 0.530 | 0.768 | 0.721 | 0.679 | 0.885 | 0.772 | 0.665 | 0.761 | 0.771 | 0.902 | 0.757 | 0.636 | 0.842 | 0.745 | |
| R-GCN-APS | 0.558 | 0.762 | 0.702 | 0.663 | 0.855 | 0.730 | 0.629 | 0.730 | 0.742 | 0.880 | 0.721 | 0.615 | 0.796 | 0.722 | |
| ResNet50 | 0.527 | 0.713 | 0.671 | 0.609 | 0.842 | 0.690 | 0.606 | 0.702 | 0.700 | 0.857 | 0.655 | 0.573 | 0.781 | 0.687 | |
|
| |||||||||||||||
| V-GCN-PPS | 0.576 | 0.768 | 0.721 | 0.673 | 0.887 | 0.761 | 0.638 | 0.758 | 0.785 | 0.904 | 0.743 | 0.641 | 0.842 | 0.746 | |
| V-GCN-APS | 0.547 | 0.763 | 0.689 | 0.662 | 0.854 | 0.720 | 0.610 | 0.726 | 0.746 | 0.875 | 0.713 | 0.614 | 0.806 | 0.717 | |
| VGGNet16BN | 0.520 | 0.709 | 0.690 | 0.634 | 0.854 | 0.687 | 0.596 | 0.701 | 0.718 | 0.878 | 0.672 | 0.582 | 0.809 | 0.696 | |
|
| |||||||||||||||
| p-val (RM-ANOVA) | * | * | * | * | * | * | * | ||||||||
| PPS >APS | * | ** | * | *** | *** | *** | *** | *** | ** | *** | *** | *** | |||
| PPS >base | * | * | * | * | * | * | * | * | * | * | * | * | |||
|
| |||||||||||||||
| AUPRC | A-GCN-PPS | 0.042 | 0.414 | 0.353 | 0.064 | 0.459 | 0.110 | 0.123 | 0.373 | 0.154 | 0.688 | 0.023 | 0.031 | 0.572 | 0.262 |
| A-GCN-APS | 0.039 | 0.395 | 0.333 | 0.064 | 0.389 | 0.094 | 0.119 | 0.352 | 0.124 | 0.625 | 0.026 | 0.028 | 0.491 | 0.237 | |
| AlexNet | 0.042 | 0.402 | 0.351 | 0.060 | 0.453 | 0.108 | 0.134 | 0.370 | 0.137 | 0.682 | 0.026 | 0.030 | 0.565 | 0.259 | |
|
| |||||||||||||||
| R-GCN-PPS | 0.038 | 0.418 | 0.373 | 0.072 | 0.467 | 0.112 | 0.147 | 0.366 | 0.208 | 0.711 | 0.034 | 0.036 | 0.554 | 0.272 | |
| R-GCN-APS | 0.045 | 0.405 | 0.347 | 0.068 | 0.407 | 0.099 | 0.130 | 0.339 | 0.168 | 0.660 | 0.029 | 0.033 | 0.492 | 0.248 | |
| ResNet50 | 0.031 | 0.360 | 0.315 | 0.050 | 0.388 | 0.084 | 0.117 | 0.313 | 0.131 | 0.628 | 0.021 | 0.028 | 0.475 | 0.226 | |
|
| |||||||||||||||
| V-GCN-PPS | 0.043 | 0.417 | 0.372 | 0.068 | 0.472 | 0.110 | 0.142 | 0.370 | 0.201 | 0.711 | 0.032 | 0.036 | 0.558 | 0.272 | |
| V-GCN-APS | 0.037 | 0.404 | 0.336 | 0.063 | 0.406 | 0.092 | 0.123 | 0.338 | 0.170 | 0.647 | 0.030 | 0.033 | 0.507 | 0.245 | |
| VGGNet16BN | 0.032 | 0.359 | 0.334 | 0.056 | 0.409 | 0.086 | 0.118 | 0.312 | 0.138 | 0.671 | 0.023 | 0.029 | 0.517 | 0.237 | |
|
| |||||||||||||||
| p-val (RM-ANOVA) | * | * | * | ** | * | * | * | ||||||||
| PPS >APS | *** | ** | * | *** | *** | * | *** | *** | *** | *** | *** | *** | |||
| PPS >base | * | * | * | * | * | * | * | * | * | * | * | ||||
*, ** and *** denote the significance levels of 0.1, 0.05 and 0.01, respectively. For each disease, the best results are bolded The red text means our ImageGCN can perform better than the corresponding two baseline models.
EC: Enlarged Cardiomediastinum; Card: Cardiomegaly; AO: Airspace Opacity; LL: Lung Lesion; Edem: Edema; Cons: Consolidation; Pneul: Pneumonia; Atel: Atelectasis; Pneu2: Pneumothorax; PE: Pleural Effusion; PO: Pleural Other; Frac: Fracture; SD: Support Devices