Abstract
Building generalizable AI models is one of the primary challenges in the healthcare domain. While radiologists rely on generalizable descriptive rules of abnormality, Neural Network (NN) models suffer even with a slight shift in input distribution (e.g., scanner type). Fine-tuning a model to transfer knowledge from one domain to another requires a significant amount of labeled data in the target domain. In this paper, we develop an interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. We assume the interpretable component of NN to be approximately domain-invariant. However, interpretable models typically underperform compared to their Blackbox (BB) variants. We start with a BB in the source domain and distill it into a mixture of shallow interpretable models using human-understandable concepts. As each interpretable model covers a subset of data, a mixture of interpretable models achieves comparable performance as BB. Further, we use the pseudo-labeling technique from semi-supervised learning (SSL) to learn the concept classifier in the target domain, followed by fine-tuning the interpretable models in the target domain. We evaluate our model using a real-life large-scale chest-X-ray (CXR) classification dataset. The code is available at: https://github.com/batmanlab/MICCAI-2023-Route-interpret-repeat-CXRs.
Keywords: Explainable-AI, Interpretable models, Transfer learning
1. Introduction
Model generalizability is one of the main challenges of AI, especially in high stake applications such as healthcare. While NN models achieve state-of-the-art (SOTA) performance in disease classification [9,17,24], they are brittle to small shifts in the data distribution [7] caused by a change in acquisition protocol or scanner type [22]. Fine-tuning all or some layers of a NN model on the target domain can alleviate this problem [2], but it requires a substantial amount of labeled data and be computationally expensive [12,21]. In contrast, radiologists follow fairly generalizable and comprehensible rules. Specifically, they search for patterns of changes in anatomy to read abnormality from an image and apply logical rules for specific diagnoses. This approach is transparent and closer to an interpretable-by-design approach in AI. We develop a method to extract a mixture of interpretable models based on clinical concepts, similar to radiologists’ rules, from a pre-trained NN. Such a model is more data- and computation-efficient than the original NN for fine-tuning to a new distribution.
Standard interpretable by design method [18] finds an interpretable function (e.g., linear regression or rule-based) between human-interpretable concepts and final output [14]. A concept classifier [19,26] detects the presence or absence of concepts in an image. In medical images, previous research uses TCAV scores [13] to quantify the role of a concept on the final prediction [3,6,23], but the concept-based interpretable models have been mostly unexplored. Recently Posthoc Concept Bottleneck models (PCBMs) [25] identify concepts from the embeddings of BB. However, the common design choice amongst those methods relies on a single interpretable classifier to explain the entire dataset, cannot capture the diverse sample-specific explanations, and performs poorly than their BB variants.
Our Contributions.
This paper proposes a novel data-efficient interpretable method that can be transferred to an unseen domain. Our interpretable model is built upon human-interpretable concepts and can provide sample-specific explanations for diverse disease subtypes and pathological patterns. Beginning with a BB in the source domain, we progressively extract a mixture of interpretable models from BB. Our method includes a set of selectors routing the explainable samples through the interpretable models. The interpretable models provide First-order-logic (FOL) explanations for the samples they cover. The remaining unexplained samples are routed through the residuals until they are covered by a successive interpretable model. We repeat the process until we cover a desired fraction of data. Due to class imbalance in large CXR datasets, early interpretable models tend to cover all samples with disease present while ignoring disease subgroups and pathological heterogeneity. We address this problem by estimating the class-stratified coverage from the total data coverage. We then finetune the interpretable models in the target domain. The target domain lacks concept-level annotation since they are expensive. Hence, we learn a concept detector in the target domain with a pseudo labeling approach [15] and finetune the interpretable models. Our work is the first to apply concept-based methods to CXRs and transfer them between domains.
2. Methodology
Notation.
Assume is a BB, trained on a dataset , with , and being the images, classes, and concepts, respectively; , where and is the feature extractor and the classifier respectively. Also, is the number of class labels. This paper focuses on binary classification (having or not having a disease), so and . Yet, it can be extended to multiclass problems easily. Given a learnable projection [4,5], , our method learns three functions: (1) a set of selectors routing samples to an interpretable model or residual, (2) a set of interpretable models , and (3) the residuals. The interpretable models are called “experts” since they specialize in a distinct subset of data defined by that iteration’s coverage as shown in SelectiveNet [16]. Figure 1 illustrates our method.
Fig. 1.

Schematic view of our method. Note that . At iteration , the selector routes each sample either towards the expert with probability or the residual with probability generates FOL-based explanations for the samples it covers. Note is fixed across iterations.
2.1. Distilling BB to the Mixture of Interpretable Models
Handling Class Imbalance.
For an iteration , we first split the given coverage to stratified coverages per class as , where denotes the fraction of samples belonging to the class; and are the samples of class and total samples, respectively.
Learning the Selectors.
At iteration , the selector routes sample to the expert or residual with probability and respectively. For coverages , we learn and jointly by solving the loss:
| (1) |
where are the optimal parameters for and , respectively. is the overall selective risk, defined as, , where is the empirical mean of samples of class selected by the selector for the associated expert . We define in the next section. The selectors are neural networks with sigmoid activation. At inference time, routes a sample to if and only if .
Learning the Experts.
For iteration , the loss distills the expert from BB of the previous iteration by solving the following loss:
| (2) |
where is the cumulative probability of the sample covered by the residuals for all the previous iterations from (i.e., and the expert at iteration (i.e., ).
Learning the Residuals.
After learning , we calculate the residual as, (difference of logits). We fix and optimize the following loss to update to specialize on those samples not covered by , effectively creating a new BB for the next iteration :
| (3) |
We refer to all the experts as the Mixture of Interpretable Experts (MoIE-CXR). We denote the models, including the final residual, as MoIE-CXR+R. Each expert in MoIE-CXR constructs sample-specific FOLs using the optimization strategy and algorithm discussed in [4].
2.2. Finetuning to an Unseen Domain
We assume the MoIE-CXR-identified concepts to be generalizable to an unseen domain. So, we learn the projection for the target domain and compute the pseudo concepts using SSL [15]. Next, we transfer the selectors, experts, and final residual ( and ) from the source to a target domain with limited labeled data and computational cost. Algorithm 1 details the procedure.
Algorithm 1.
Finetuning to an unseen domain.
| 1: Input: Learned selectors, experts, and final residual from source domain: and respectively, with as the number of experts to transfer. BB of the source domain: . Source data: . Target data: . Target coverages . |
| 2: Output: Experts and final residual of the target domain. |
| 3: Randomly select samples out of . |
| 4: Compute the pseudo concepts for the correctly classified samples in the target domain using , as, s.t., |
| 5: Learn the projection function for target domain semi-supervisedly [15] using the pseudo labeled samples and unlabeled samples . |
| 6: Complete the triplet for the target domain , where . |
| 7: Finetune and to obtain and using equations 1, 2 and 3 respectively for 5 epochs. and represents MoIE-CXR and MoIE-CXR + R for the target domain. |
3. Experiments
We perform experiments to show that MoIE-CXR 1) captures a diverse set of concepts, 2) does not compromise BB’s performance, 3) covers “harder” instances with the residuals in later iterations resulting in their drop in performance, 4) is finetuned well to an unseen domain with minimal computation.
Experimental Details.
We evaluate our method using 220,763 frontal images from the MIMIC-CXR dataset [11]. We use Densenet121 [8] as BB to classify cardiomegaly, effusion, edema, pneumonia, and pneumothorax, considering each to be a separate binary classification problem. We obtain 107 anatomical and observation concepts from the RadGraph’s inference dataset [10], automatically generated by DYGIE++ [20]. We train BB following [24]. To retrieve the concepts, we utilize until the Densenet block as feature extractor and flatten the features to learn . We use an 80%−10%−10% train-validation-test split with no patient shared across splits. We use 4, 4, 5, 5, and 5 experts for cardiomegaly, pneumonia, effusion, pneumothorax, and edema. We employ ELL [1] as . Further, we only include concepts as input to if their validation auroc exceeds 0.7. Refer to Table 1 in the supplementary material for the hyperparameters. We stop until all the experts cover at least 90% of the data cumulatively.
Table 1.
MoIE-CXR does not compromize the performance of BB. We provide the mean and standard errors of AUROC over five random seeds. For MoIE-CXR, we also report the percentage of test set samples covered by all experts as “Coverage”. We boldfaced our results and BB.
| Model | Effusion | Cardiomegaly | Edema | Pneumonia | Pneumothorax |
|---|---|---|---|---|---|
| Blackbox (BB) | 0.92 | 0.84 | 0.89 | 0.79 | 0.91 |
| INTERPRETABLE BY DESIGN | |||||
| CEM [26] | 0.83±1e−4 | 0.75±1e−4 | 0.77±2e−4 | 0.62±4e−4 | 0.76±3e−4 |
| CBM (Sequential) [14] | 0.78±1e−4 | 0.72±1e−4 | 0.77±5e−4 | 0.60±1e−3 | 0.75±6e−4 |
| CBM + ELL [1,14] | 0.81±1e−4 | 0.72±1e−4 | 0.79±5e−4 | 0.62±8e−4 | 0.75±6e−4 |
| POSTHOC | |||||
| PCBM [25] | 0.88±1e−4 | 0.81±1e−4 | 0.82±1e−4 | 0.72±1e−4 | 0.85±7e−4 |
| PCBM-h [25] | 0.90±1e−4 | 0.83±1e−4 | 0.85±1e−4 | 0.77±1e−4 | 0.89±7e−4 |
| PCBM + ELL [1,25] | 0.90±1e−4 | 0.82±1e−4 | 0.85±1e−4 | 0.75±1e−4 | 0.85±6e−4 |
| PCBM-h + ELL [1,25] | 0.91±1e−4 | 0.83±1e−4 | 0.87±1e−4 | 0.77±1e−4 | 0.90±1e−4 |
| OURS | |||||
| MoIE-CXR (Coverage) | |||||
| MoIE-CXR+R | 0.91 ±1e−4 | 0.82 ±1e−4 | 0.88 ±1e−4 | 0.78 ±1e−4 | 0.90 ±2e−4 |
Baseline.
We compare our method with 1) end-to-end CEM [26], 2) sequential CBM [14], and 3) PCBM [25] baselines, comprising of two parts: a) concept predictor , predicting concepts from images, with all the convolution blocks; and b) label predictor, , predicting labels from the concepts. We create CBM + ELL and PCBM + ELL by replacing the standard classifier with the identical of MOIE-CXR to generate FOLs [1] for the baseline.
MoIE-CXR Captures Diverse Explanations.
Figure 2 illustrates the FOL explanations. Recall that the experts in MoIE-CXR and the baselines are ELLs [1], attributing attention weights to each concept. A concept with high attention weight indicates its high predictive significance. With a single , the baselines rank the concepts in accordance with the identical order of attention weights for all the samples in a class, yielding a generic FOL for that class. In Fig. 2, the baseline PCBM + ELL uses left_pleural and pleural_unspec to identify effusion for all four samples. MoIE-CXR deploys multiple experts, learning to specialize in distinct subsets of a class. So different interpretable models in MoIE assign different attention weights to capture instance-specific concepts unique to each subset. In Fig. 2 expert2 relies on right_pleural and pleural_unspec, but expert4 relies only on pleural_unspec to classify effusion. The results show that the learned experts can provide more precise explanations at the subject level using the concepts, increasing confidence and trust in clinical use.
Fig. 2.

Qualitative comparison of MoIE-CXR discovered concepts with the baselines.
MoIE-CXR does not Compromise BB’s Performance. Analysing MoIE-CXR:
Table 1 shows that MoIE-CXR outperforms other models, including BB. Recall that MoIE-CXR refers to the mixture of all interpretable experts, excluding any residuals. As MoIE-CXR specializes in various subsets of data, it effectively discovers sample-specific classifying concepts and achieves superior performance. In general, MoIE-CXR exceeds the interpretable-by-design baselines (CEM, CBM, and CBM + ELL) by a fair margin (on average, at least ~ 10% ↑), especially for pneumonia and pneumothorax where the number of samples with the disease is significantly less (~750/24000 in the testset).
Analysing MoIE-CXR+R:
To compare the performance on the entire dataset, we additionally report MoIE-CXR+R, the mixture of interpretable experts with the final residual in Table 1. MoIE-CXR+R outperforms the interpretable-by-design models and yields comparable performance as BB. The residualized PCBM baseline, i.e., PCBM-h, performs similarly to MoIE-CXR+R. PCBM-h rectifies the interpretable PCBM’s mistakes by learning the residual with the complete dataset to resemble BB’s performance. However, the experts and the final residual approximate the interpretable and uninterpretable fractions of BB, respectively. In each iteration, the residual focuses on the samples not covered by the respective expert to create BB for the next iteration and likewise. As a result, the final residual in MoIE-CXR+R covers the “hardest” examples, reducing its overall performance relative to MoIE-CXR.
Identification of Harder Samples by Successive Residuals.
Figure 3 (a–c) reports the proportional AUROC of the experts and the residuals per iteration. The proportional AUROC is the AUROC of that model times the empirical coverage, , the mean of the samples routed to the model by the respective selector . According to Fig. 3a in iteration 1, the residual (black bar) contributes more to the proportional AUROC than the expert1 (blue bar) for effusion with both achieving a cumulative proportional AUROC ~ 0.92. All the final experts collectively extract the entire interpretable component from BB in the final iteration, resulting in their more significant contribution to the cumulative performance. In subsequent iterations, the proportional AUROC decreases as the experts are distilled from the BB of the previous iteration. The BB is derived from the residual that performs progressively worse with each iteration. The residual of the final iteration covers the “hardest” samples. Tracing these samples back to the original BB underperforms on these samples (Fig. 3 (d–f)) as the residual.
Fig. 3.

Performance of experts and residuals across iterations. (a-c): Coverage and proportional AUROC of the experts and residuals. (d-f): Routing the samples covered by MoIE-CXR to the initial , we compare the performance of the residuals with .
Applying MoIE-CXR to the Unseen Domain.
In this experiment, we utilize Algorithm 1 to transfer MoIE-CXR trained on MIMIC-CXR dataset to Stanford Chexpert [9] dataset for the diseases – effusion, cardiomegaly and edema. Using 2.5%, 5%, 7.5%, 10%, and 15 % of training data from the Stanford Chexpert dataset, we employ two variants of MoIE-CXR where we (1) train only the selectors without finetuning the experts (“No finetuned” variant of MoIE-CXR in Fig. 4), and (2) finetune and jointly for only 5 epochs (“Finetuned” variant of MoIE-CXR and MoIE-CXR + R in Fig. 4). Finetuning is essential to route the samples of the target domain to the appropriate expert. As later experts cover the “harder” samples of MIMIC-CXR, we only transfer the experts of the first three iterations (refer to Fig. 3). To ensure a fair comparison, we finetune (both the feature extractor and classifier ) BB: of MIMIC-CXR with the same training data of Stanford Chexpert for 5 epochs. Throughout this experiment, we fix while finetuning the final residual in MoIE+R as stated in Eq. 3. Figure 4 displays the performances of different models and the computation costs in terms of Flops. The Flops are calculated as, Flop of (forward propagation + backward propagation) × (total no. of batches) × (no of training epochs). The finetuned MoIE-CXR outperforms the finetuned BB (on average ~ 5% ↑ for effusion and cardiomegaly). As experts are simple models [1] and accept only low dimensional concept vectors compared to BB, the computational cost to train MoIE-CXR is significantly lower than that of BB (Fig. 4 (d–f)). Specifically, BB requires ~ 776T flops to be finetuned on 2.5% of the training data of Stanford CheXpert, whereas MoIE-CXR requires ~ 0.0065T flops. As MoIE-CXR discovers the sample-specific domain-invariant concepts, it achieves such high performance with low computational cost than BB.
Fig. 4.

Transferring the first 3 experts of MoIE-CXR trained on MIMIC-CXR to Stanford-CXR. With varying % of training samples of Stanford CXR, (a-c): reports AUROC of the test sets, (d-g) reports computation costs in terms of log (Flops) (T). We report the coverages in Stanford-CXR on top of the “finetuned” and “No finetuned” variants of MoIE-CXR (red and blue bars) in (d-g).
4. Conclusion
This paper proposes a novel iterative interpretable method that identifies instance-specific concepts without losing the performance of the BB and is effectively fine-tuned in an unseen target domain with no concept annotation, limited labeled data, and minimal computation cost. Also, as in the prior work, MoIE-captured concepts may not showcase a causal effect that can be explored in the future.
Supplementary Material
Acknowledgement.
This work was partially supported by NIH Award Number 1R01HL141813-01 and the Pennsylvania Department of Health. We are grateful for the computational resources from Pittsburgh Super Computing grant number TG-ASC170024
Footnotes
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-43895-0_59.
References
- 1.Barbiero P, Ciravegna G, Giannini F, Lió P, Gori M, Melacci S: Entropy-based logic explanations of neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 6046–6054 (2022) [Google Scholar]
- 2.Chu B, Madhavan V, Beijbom O, Hoffman J, Darrell T: Best practices for fine-tuning visual classifiers to new domains. In: Hua G, Jégou H (eds.) ECCV 2016. LNCS, vol. 9915, pp. 435–442. Springer, Cham: (2016). 10.1007/978-3-319-49409-8_34 [DOI] [Google Scholar]
- 3.Clough JR, Oksuz I, Puyol-Antón E, Ruijsink B, King AP, Schnabel JA: Global and local interpretability for cardiac MRI classification. In: Shen D, et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 656–664. Springer, Cham: (2019). 10.1007/978-3-030-32251-9_72 [DOI] [Google Scholar]
- 4.Ghosh S, Yu K, Arabshahi F, Batmanghelich K: Dividing and conquering a BlackBox to a mixture of interpretable models: route, interpret, repeat. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research. vol. 202, pp. 11360–11397. PMLR (2023). https://proceedings.mlr.press/v202/ghosh23c.html [PMC free article] [PubMed] [Google Scholar]
- 5.Ghosh S, Yu K, Arabshahi F, Batmanghelich K: Tackling shortcut learning in deep neural networks: An iterative approach with interpretable models (2023) [Google Scholar]
- 6.Graziani M, Andrearczyk V, Marchand-Maillet S, Müller H: Concept attribution: explaining CNN decisions to physicians. Comput. Biol. Med 123, 103865 (2020) [DOI] [PubMed] [Google Scholar]
- 7.Guan H, Liu M: Domain adaptation for medical image analysis: a survey. IEEE Trans. Biomed. Eng 69(3), 1173–1185 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Huang G, Liu Z, Van Der Maaten L, Weinberger KQ: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) [Google Scholar]
- 9.Irvin J, et al. : CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 590–597 (2019) [Google Scholar]
- 10.Jain S, et al. : RadGraph: Extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463 (2021) [Google Scholar]
- 11.Johnson A, et al. : MIMIC-CXR-JPG-chest radiographs with structured labels
- 12.Kandel I, Castelli M: How deeply to fine-tune a convolutional neural network: a case study using a histopathology dataset. Appl. Sci 10(10), 3359 (2020) [Google Scholar]
- 13.Kim B, et al. : Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) (2017). arXiv preprint arXiv:1711.11279 (2017) [Google Scholar]
- 14.Koh PW, et al. : Concept bottleneck models. In: International Conference on Machine Learning, pp. 5338–5348. PMLR (2020) [Google Scholar]
- 15.Lee DH, et al. : Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML. vol. 3, p. 896 (2013) [Google Scholar]
- 16.Rabanser S, Thudi A, Hamidieh K, Dziedzic A, Papernot N: Selective classification via neural network training dynamics. arXiv preprint arXiv:2205.13532 (2022) [Google Scholar]
- 17.Rajpurkar P, et al. : CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017) [Google Scholar]
- 18.Rudin C, Chen C, Chen Z, Huang H, Semenova L, Zhong C: Interpretable machine learning: fundamental principles and 10 grand challenges. Stat. Surv 16, 1–85(2022) [Google Scholar]
- 19.Sarkar A, Vijaykeerthy D, Sarkar A, Balasubramanian VN: Inducing semantic grouping of latent concepts for explanations: An ante-hoc approach. arXiv preprint arXiv:2108.11761 (2021) [Google Scholar]
- 20.Wadden D, Wennberg U, Luan Y, Hajishirzi H: Entity, relation, and event extraction with contextualized span representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 5784–5789. Association for Computational Linguistics, Hong Kong, China: (2019). 10.18653/v1/D19-1585, https://aclanthology.org/D19-1585 [DOI] [Google Scholar]
- 21.Wang YX, Ramanan D, Hebert M: Growing a brain: fine-tuning by increasing model capacity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2471–2480 (2017) [Google Scholar]
- 22.Yan W, et al. : MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners. Radiol. Artif. Intell 2(4), e190195 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yeche H, Harrison J, Berthier T: UBS: a dimension-agnostic metric for concept vector interpretability applied to radiomics. In: Suzuki K (ed.) ML-CDS/IMIMIC-2019. LNCS, vol. 11797, pp. 12–20. Springer, Cham: (2019). 10.1007/978-3-030-33850-3_2 [DOI] [Google Scholar]
- 24.Yu K, Ghosh S, Liu Z, Deible C, Batmanghelich K: Anatomy-Guided Weakly-Supervised Abnormality Localization in Chest X-rays. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S (eds.) Medical Image Computing and Computer Assisted Intervention-MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science. vol. 13435. Springer, Cham: (2022). 10.1007/978-3-031-16443-9_63 [DOI] [Google Scholar]
- 25.Yuksekgonul M, Wang M, Zou J: Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480 (2022) [Google Scholar]
- 26.Zarlenga ME, et al. : Concept embedding models. arXiv preprint arXiv:2209.09056 (2022) [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
