Mixed prototype correction for causal inference in medical image classification

Zhi-Liang Hong; Jian-Chuan Yang; Xiao-Rui Peng; Song-Song Wu

doi:10.1038/s41598-025-15920-x

. 2025 Sep 29;15:33488. doi: 10.1038/s41598-025-15920-x

Mixed prototype correction for causal inference in medical image classification

Zhi-Liang Hong ^1,^2,^3,^#, Jian-Chuan Yang ^1,^2,^3,^#, Xiao-Rui Peng ^4,^#, Song-Song Wu ^1,^2,^3,^✉

PMCID: PMC12480981 PMID: 41022936

Abstract

The heterogeneity of medical images poses significant challenges to accurate disease diagnosis. To tackle this issue, the impact of such heterogeneity on the causal relationship between image features and diagnostic labels should be incorporated into model design, which however remains under explored. In this paper, we propose a mixed prototype correction for causal inference (MPCCI) method, aimed at mitigating the impact of unseen confounding factors on the causal relationships between medical images and disease labels, so as to enhance the diagnostic accuracy of deep learning models. The MPCCI comprises a causal inference component based on front-door adjustment and an adaptive training strategy. The causal inference component employs a multi-view feature extraction (MVFE) module to establish mediators, and a mixed prototype correction (MPC) module to execute causal interventions. Moreover, the adaptive training strategy incorporates both information purity and maturity metrics to maintain stable model training. Experimental evaluations on four medical image datasets, encompassing CT and ultrasound modalities, demonstrate the superior diagnostic accuracy and reliability of the proposed MPCCI. The code will be available at https://github.com/Yajie-Zhang/MPCCI.

Keywords: Disease diagnosis, Causal inference, Front-door adjustment, Multiview prototype learning, Medical image

Subject terms: Cancer, Diseases, Health care, Mathematics and computing

Introduction

Medical image classification provides essential support to the clinicians and other medical professionals in diagnosing and treating patients by analyzing lesion features of the human body within medical images¹. With wide applications in the real world, this problem has been extensively researched. In the past few decades, especially driven by the application of deep learning technologies, a significant body of methods have been developed, which can generally be categorized into detection-based approaches^2,3, segmentation-based approaches^4–6 and feature extraction approaches^7–10. Despite the remarkable advancements in previous studies, classifying images in the medical domain remains much more challenging than it is for natural images. This is primarily attributed to the inherently complex nature of the medical images. Compared with the natural images, the medical images often contain more noise and artifacts, due to the limitations of current imaging technologies, such as weak X-ray penetration of the equipment, presence of gas in the body, and motion artifacts¹¹. In addition to these two factors that have received more attention in previous research, lesion heterogeneity is also a crucial challenge to the classifiers, which however is much less exploited.

Lesion heterogeneity in the medical imaging refers to the variability in features and appearances of the same disease, encompassing variations in terms of shape, size, density, intensity, texture, and other lesion characteristics. An exemplar illustration is given in Fig. 1. As shown in the figure, breast cancer may exhibit pronounced heterogeneity in ultrasound images, with lesions varying significantly in appearance. For example, the lesion shapes may be oval, round, or irregular, and the complexity is further exacerbated when combined with other varied attributes as shown in the table below the images. The impact of heterogeneity on model performance has been acknowledged in previous studies^12,13, and there have been some attempts to alleviate its negative effects through variance pooling structures¹⁴ and data augmentation¹⁵. However, these methods often yield suboptimal outcomes due to the lack of a thorough examination on the root causes of heterogeneity.

Fig. 1 — Illustration of different manifestations for the same type of lesion in breast cancer ultrasound images. Lesion attributes are also provided in the table below images.

Lesion heterogeneity is caused by various factors, such as the diverse origins of cancer cells, variable gene expressions, and patient specific susceptibilities¹⁶. These factors have significant influences upon the prediction of a diagnostic label from the medical image. For example, the patients with genetic predispositions are more susceptible to the illness¹⁶. Hence in this work, we propose to model these factors by leveraging causal inference¹⁷, for enhanced medical image classification. To enunciate our idea, we establish a structural causal model for the medical image classification task, as shown in Fig. 2. In this figure, the underlying causes of heterogeneity, denoted as C, involve various factors as above mentioned; the medical images X and the diagnostic outcomes Y are both impacted by the heterogeneity cause factors C. In formulation, there exist X → Y, denoting the causal path that image X contains the lesion representations related to the given label Y, and X ← C → Y representing the backdoor path that X and Y exhibit spurious correlation through C. That is, the factors in C affect the characterization of medical imaging (X ← C); in addition, these factors influence the probability of a patient contracting a particular disease (C → Y). Following the Pearl’s causal inference theory¹⁷, when we try to find the causal effect of X on Y, we want the nodes we condition on to block any “backdoor” path in which one end has an arrow to X, because such paths may make X and Y dependent, but are obviously not transmitting causal influences from X; and if we do not block them, they will confound the effect that X has on Y. Therefore, we should adjust the confounders “C” to block the backdoor path for better inference from X to Y. However, given the inherent difficulty or even impossibility of quantifying these confounders, their adverse impact upon the image-based diagnostic procedures remains elusive and unaddressed.

Fig. 2 — The structural causal model for a disease with heterogeneity. C represents the cause of heterogeneity, X denotes medical images, and Y represents diagnostic results.

In this work, we propose a novel approach for enhanced medical image classification, named mixed prototype correction for causal inference (MPCCI), which mitigates the influences of the confounding factors on the medical diagnosis by exploiting front-door adjustment (FDA)¹⁷. The FDA introduces a mediator variable, denoted as A in Fig. 3, between the causal path of variables X and Y, to adjust the causal pathway between them, which well addresses the immeasurability of the elusive confounding factors. To implement MPCCI, we first design a multi-view feature extraction (MVFE) module with spatial-channel attention that allows these multi-view features to serve as mediators in FDA to link the causal effect from images to labels. We also develop a mixed prototype correction (MPC) module that exchanges some features of the multi-view features and the multi-view prototypes to effectively apply causal intervention on the mediators. The multi-view prototypes contain meta-knowledge of various disease categories, and the causal intervention mechanism that exchanges features with them can correct the spurious association between X and Y formed by the confounders. To improve the smoothing of the feature exchange process, an adaptive training strategy is presented, comprising two key components: information purity (IP) and maturity (MT). The IP module is used to measure the proportion of noise in the feature exchange process, and MT is used to measure the stability of the model to noise in different training stages. Experiments on four medical datasets verify the effectiveness of the proposed MPCCI on diagnosing covid, breast cancer, lymph node metastasis, and thyroid. In summary, the contributions of this work are as followings:1. This work conducts cause-effect analysis to alleviate the immeasurable confounders for enhancing medical image classification. The proposed method solves the problem by applying an FDA strategy, treating the multiview features as mediators to infer the causalities from images to labels. 2.The proposed MPCCI includes two key modules (MVFE and MPC) to achieve FDA step by step, which effectively mitigates the adverse effects of the confounders upon medical diagnosis. An adaptive training strategy, consisting of IP and MT modules, is introduced to mitigate the noise effect in the MPC module. 3.The proposed MPCCI exhibits promising performance across four distinct disease diagnosis tasks, yielding dependable interpretability results.

Fig. 3 — (a) A structural causal model for medical image classification. (b) (c) Two steps of FDA, representing calculations of P (*A |do* (X)) and P (*Y |do* (A)) (red lines). Red fork denotes the causal intervention from X to A.

Related work

Medical image classification

Currently, medical image classification, a task aiming to identify disease categories from unseen medical images, is generally tackled by training deep learning models over annotated training datasets. The model performance is mainly dependent on its architecture design as well as the scale of the training data. Some methods adopt advanced architectures, such as AlexNet¹⁸, ResNet¹⁹, VGG²⁰, and ViT³ for good classification performance. In recent year, some works propose to extract and integrate multi-scale information to improve classification accuracy, utilizing feature pyramid networks⁶, dilated convolutions²¹, and attention mechanisms^10,22, etc. There are also models fusing local and global features^23,24 to achieve lifted efficacy in medical image classification. These methods are all based on the assumptions of sufficient high quality training data, which are not always true. In addition to these model-centric approaches, some other methods broaden the diversity and volume of training data to enhance model generalization by utilizing GANs²⁵, variational autoencoders²⁶, MixUp²⁷, and diffusion models²⁸. These data-centric approaches also demonstrate strong effectiveness on medical image classification tasks. However, either the model-centric models, or the data-centric ones, fall short on addressing disease heterogeneity and its related confounding factors, which hampers their performance.

Causal inference in medical image classification

The goal of causal inference is to unravel the complex causal relationships between variables, far beyond the mere correlations²⁹. It serves as a powerful tool for understanding the roots and implications of phenomena and thereby supports informed decision making and interventions. For its potent analytical power, causal inference has been applied in various domains, such as medical image classification^30–34, domain generalization³⁵, and medical image segmentation^36–39,39. In medical image analysis, some methods^35–37 treat complex organ co-occurrences and background phenomena such as pseudo artifacts as observable confounding factors, and leverage the backdoor adjustment strategy¹⁷ for causal intervention. Some works³⁰ harness counterfactual reasoning for medical image analysis by crafting counterfactual samples to neutralize the effects of observable confounding factors. These causal inference based methods achieve promising results. Yet, they tend to focus on observable confounding factors, which constrains their effectiveness in handling cases with unobservable confounders. In this work, we propose to utilize the FDA strategy to mitigate the impact of unmeasured confounders for better medical image classification performance.

Cause-effect analysis

In this section, we provide a brief analysis of the causal relationships among the elements in our tasks, namely the input image X, multiview features A, image label Y, and confounders C, using a structural causal model (SCM) illustrated in Fig. 3(a). We also describe how FDA is used in this context.

The main causal relationships in Fig. 3 (a) include X → A → Y,C → X, and C → Y. (1) For X → A → Y, the input image X is fed into a deep neural network to extract multi-view features A, which are then used to predict the label Y. (2) For C → X, the confounders C like genetics, origins of cancer cells, patient habits, etc. Influence the lesion manifestation X. (3) For C → Y, the confounders C can also affect the disease category Y of a patient. For example, the patients with genetic predisposition for breast cancer have a higher risk of malignancy.

Note that there are two paths connecting X and Y: the frontdoor path X → A → Y and the backdoor path X ← C → Y. The existence of the backdoor path makes it difficult to evaluate the true causality from X to Y through deep networks. If C is measurable, the backdoor adjustment can be used to eliminate the link of C ← X. However, since most of C in this work are not measurable, we turn to use front-door adjustment¹⁷ to estimate the causality from X to Y. To achieve this, FDA employs a mediator A to transmit knowledge of X to Y through the front-door path, and then evaluates the causalities from X to Y by combining the causal effects of X to A and A to Y, i.e., to estimate the probabilities P (A|do (X)) and P (Y |do (A)), respectively.The do-operation represents an active intervention to a cause rather than a passive observation.

The P(A|do(X)) represents the causal relationship between X and A, as illustrated in Fig. 3 (b). Since the path of X ← C → Y ← A is blocked by the collider¹⁷, we can write

The P(Y|do(A)) (Fig. 3 (c)) pursues the true causality between A and Y without confounders C. There are two paths from A to Y: A → Y and the backdoor path A ← X ← C → Y. Due to the existence of the backdoor path, we need to cut off the link between A and X by controlling X, and we can write P (Y |do (A)) as

Through layer-by-layer causal effect calculation, the causality from X to Y can be represented as

where x′ is an index of summation in P (Y |do (A)).

Methodology

The Fujian Provincial Hospital review committee gave their approval to this study. All experimental protocols were approved by Fujian Provincial Hospital review committee.All participants provided informed consent to participate in the study.The study adhered to the Declaration of Helsinki and relevant national guidelines. All experiments and methods were performed in accordance with relevant guidelines and regulations. In this section, we introduce the proposed Mixed Prototype Correction for Causal Inference (MPCCI) approach in medical image classification. As shown in Fig. 4, it involves multi-view feature extraction (MVFE) and mixed prototype correction (MPC) modules to implement the FDA strategy. Additionally, we present an adaptive training approach, incorporating the information purity (IP) and the maturity (MT), to alleviate the noise at MPC. The IP module quantifies the noise proportion during the feature exchange process, while MT assesses the model’s robustness to noise across various training phases.

Fig. 4 — Illustration of the MPCCI framework. MPCCI consists of three main components: the MVFE, MPC, and the adaptive training strategy. MVFE involves expert networks that use spatial-channel attention to generate multi-view features. MPC is implemented by fusing mixed prototypes with original multi-view features to simulate P (*Y |A*,). In addition, the adaptive training strategy, consisting of IP and MT, is adopted to improve the smoothing of the feature exchange process.

Inline graphic — Illustration of the MPCCI framework. MPCCI consists of three main components: the MVFE, MPC, and the adaptive training strategy. MVFE involves expert networks that use spatial-channel attention to generate multi-view features. MPC is implemented by fusing mixed prototypes with original multi-view features to simulate P (*Y |A*,). In addition, the adaptive training strategy, consisting of IP and MT, is adopted to improve the smoothing of the feature exchange process.

Multi-View feature extraction

The MVFE module is responsible for generating multi-view features A, which serve as the mediator in Fig. 3 (a) and are used to implement P (A|do (X)) in Eq. (1). First, we input the image into a convolutional neural network (CNN) such as ResNet18⁴⁰ to obtain feature maps E = Inline graphic _b (x)ϵ ℜ^D×H×W, where _b is the function of the CNN, and D, H, W represent the number of channels, height, and width of E, respectively. To extract multi-view features from the E, we employ two parallel paths. We apply global average pooling to E to obtain a global feature vector ℊ ϵ ℜ^D and construct expert networks⁴¹ with spatial-channel attention following CBAM⁴². Spatial-channel attention adopted in CNN allows for the adaptive weighting of feature maps across both spatial and channel dimensions.This enables the network to selectively focus on informative features, enhancing its ability to learn and identify complex visual patterns. To ensure that each expert network learns different features of an image, they are initialized with different parameters.The formulation of the expert networks can be expressed as

where Inline graphic represents the k-view feature vector a^k generated by the k-th function of spatial-channel attention.

After extracting the mediator A, it is crucial to ensure that the learned multi-view features are distinct across classes. To achieve this goal, the multi-view features and global feature are concatenated and then fed into the classifier (a fully-connected layer is used in this work) denoted as Inline graphic _c, to produce the predicted label y of the image x:

where || represents the concatenation operation, C is the number of categories, and K is the number of expert networks. The crossentropy loss is used to optimize f_c :

where l ϵ ℜ^C denotes the ground-truth label of x. If x belongs to the c-th category, l_c = 1; otherwise l_c = 0.

Mixed prototype correction

The MPC module aims to correct the side-effects of confounders and further explore the causality of A on Y by estimating

However, it is infeasible to collect all possible Inline graphic (lesions that might appear in reality) with A for predicting Y. Thus we use mixed multi-view prototypes to approximate . Multi-view prototype learning⁴³ is an emerging machine learning technique that aims to learn a set of prototypes across different views to capture the underlying structure of representative examples for each category.Specifically, we use the c-th class-specific average multi-view features to approximate the c-th multi-view prototypes, denoted as S^c = { Inline graphic ,· · ·,} ϵ ℜ^K×D. We then generate the mixed multi-view prototypes ,which partially come from the source multi-view prototypes (c-th) and another random counterparts (c′-th), to express x′ as

where v ϵ {0, 1}^K represents the random exchanging index vector of S^c. Namely, each ʋ _k represents whether the k-th prototype in S^c should be exchanged by the k-th prototype in S^c′. Since Inline graphic captures lesions that have distinct characteristics in specific view features, it can serve as a substitute for x′. To predict label Y, we fuse with A via a fusion module. Cross-attention¹⁰ is a mechanism that enables neural networks to capture the interdependent relationship between two heterogeneous features using a learnable similarity matrix. In this work, we incorporate cross-attention with the feature mapping function ℎ(·) into the fusion module to explore this independence. The output is fused multi-view features denoted as Inline graphic :

We can concatenate Inline graphic with global feature vector g to predict label as = _c (ℊ, ) using Eq. (5).

Adaptive training strategy

We utilize an adaptive training strategy to maintain stable model training. Since Inline graphic randomly mixes the c-th and c′-th multi-view prototypes, there is a possibility that is predicted as the c′-th category. In this study, we hypothesize that two factors are related to this situation: (1) the information purity (IP) in , and (2) the maturity (MT) of the fusion module.

The IP refers to the amount of prototype information from the source category contained in Inline graphic . If contains a large amount of prototype information from other categories, the probability of being predicted as other categories increases. Therefore, we use and to represent the possibility of being predicted as the c-th class and c′-th class, respectively.

The MT represents the ability of the fusion module to accurately fuse label-related prototype information to Inline graphic . When the fusion module cannot fuse multi-view prototypes well, the probability of accurately predicting _c decreases. We assume that MT increases with the process of network iterative optimization, and we denote MT as ɑ = ɑ₀ +, where ɑ₀ represents the initialized maturity. Based on these two factors, we can write the probabilities of Inline graphic to be _c and _c′ :

Based on this adaptive training strategy, the optimization goal of MPC can be formulated as:

graphic file with name 41598_2025_15920_Equ11_HTML.gif

where P( Inline graphic ) is set to a uniform distribution because is generated from a random mixture with equal probability.

Overall loss function

By combining the MVFE, MPC, and the adaptive training strategy, the overall loss function £, which is the optimization objective to be minimized during training iterations, is a combination of the original loss £₀ and the fusion loss £_f :

where ℷ is a hyperparameter that controls the relative weight of the fusion loss. To ease the understanding of MPCCI, the pseudo-code is presented as Algorithm 1.

Experiment

We conduct comprehensive experiments to evaluate the performance of the proposed MPCCI approach. At below, we first introduce the datasets used for experiments, evaluation protocols, compared methods, and implementation details. Then, we report and analyze the quantitative results obtained across four medical datasets. Moreover, the validation of heterogeneity cause C and data set analysis are conducted to systematically evaluate MPCCI.We also take further experimental analysis to assess the capabilities of MPCCI. This analysis encompasses an examination of the number of features, the function of the mixing mechanism, and the visualization results.

Datasets

Four medical image datasets are utilized for the evaluation of MPCCI. The details of the datasets are provided as follows:

The CT COVID-19⁴⁴ dataset comprises 7,593 COVID-19 CT images sourced from 466 patients, 6,893 normal CT images from 604 patients, and 2,618 CAP CT images from 60 patients. For experimentation purposes, a total of 14,486 images from the normal and COVID-19 categories are selected. These images are randomly partitioned into training, validation, and test sets at a ratio of 7:1:2.

The BUSI⁴⁵ is a publicly available dataset consisting of 780 ultrasound images of three classes, i.e., normal, benign, and malignant. In this work, we follow the setting of MIB Net⁴⁶ which achieves the best result reported to date and use only the benign images (437) and malignant images (210). Conforming to the protocol of MIB Net, the dataset is partitioned into training, validation, and test sets at a ratio of 8:1:1.

The FJPH is a dataset established by ourselves for predicting the likelihood of lymph node metastasis. The data inside are obtained anonymously from a local hospital (Fujian Provincial Hospital) to ensure the privacy of all involved patients. It consists of 889 ultrasound images categorized into two classes: metastasis (500 ultrasound images) and non-metastasis (389 ultrasound images). We employ this dataset to demonstrate the versatility of the proposed method in different ultrasound imaging scenarios. Adhering to the settings of BUSI, we partition this dataset into training, validation, and test sets at a ratio of 8:1:1.

The FJTU is a thyroid ultrasound dataset established from the Fujian Provincial Hospital for four sub-types of thyroid, e.g., thyroid adenoma (TA), follicular carcinoma (FC), follicular variant of PTC (FV-PTC), and medullary carcinoma (MC). It consists of 1,969 ultrasound images from 290 patients. Five-fold cross-validation is utilized for this dataset. In addition, a subset of the data, FJTUH(349 images from FC and 174 images from FV-PTC), includes gender information and 9 heterogeneous attributes annotated by professional doctors. The attribute distribution exhibits severe heterogeneity within the same category as in Fig. 5. The FJTU-H is utilized for dataset analysis of heterogeneity and validation of heterogeneity cause C.

Fig. 5 — Distribution of 9 heterogeneous attributes for follicular carcinoma (FC) in the FJTU-H dataset. Each bar represents the frequency of a specific attribute (e.g., shape, echogenicity, margin) annotated by professional radiologists. Severe intra-class heterogeneity is evidenced by the varied distribution of attributes (e.g., irregular shape, calcification status) within the same pathology category. This visualization validates the role of unmeasurable confounders C (e.g., biological variability) in lesion heterogeneity, supporting the evaluation of MPCCI’s robustness in Sect."Experimental Results".

Compared methods and evaluation metrics

Compared methods

In order to comprehensively validate the effectiveness of MPCCI, we make comparisons with various methods. Initially, we select four representative deep learning models with backbone architectures of ResNet18⁴⁷, VGG16⁴⁸, ViT⁴⁹, and Mamba⁵⁰ for image processing. These four methods utilize distinct feature extraction mechanisms to identify image features, facilitating the assessment of MPCCI performance across different architectures. Additionally, we employ several representative supervised learning methods for image classification, such as CABNet⁴⁰ and CAD_PE⁵¹, along with data augmentation techniques like MixupNet⁵² and MixStyleNet⁵³, as well as invariant feature learning method Fishr⁵⁴, to further evaluate the performance of MPCCI. It is noteworthy that for the BUSI dataset, we also incorporate state-of-the-art modality-speciffc methods for breast cancer disease, including TNTs⁵⁵, BVA Net⁵⁶, HoVer-Trans⁵⁷, and MIB Net⁴⁶, for comparative analysis with MPCCI.Evaluation metrics. We evaluate MPCCI using four commonly used metrics in classification tasks: accuracy (Acc), precision (P), recall (R), and F1-score (F1).

Evaluation metrics

We evaluate MPCCI using four commonly used metrics in classification tasks: accuracy (Acc), precision (P), recall (R), and F1-score (F1).

Experimental details

In our experiments, we utilize ResNet18 as the backbone of MPCCI.All medical images are resized to 128 × 128 pixels. The model is trained using the SGD optimizer, with learning rates set to 0.0001/0.0001/0.001/0.0001 for the CT COVID-19, BUSI, FJPH, and FJTU datasets respectively. All experiments are conducted on a single Nvidia RTX3090 GPU, with batch sizes set to 10/10/128/10.The hyperparameter a is set to 0.5, while ℷis set to 0.1/0.1/1/0.1 for the four datasets respectively.

For the baseline methods, we reproduce the source codes of ResNet18⁴⁷, VGG16⁴⁸, ViT⁴⁹, Mamba⁵⁰, CABNet⁴⁰, MixupNet⁵², MixStyleNet⁵³, Fishr⁵⁴, and CAD_PE⁵¹. All experimental results are based on the average of five experiments conducted with different random seeds. The results of TNTs⁵⁵, BVA Net⁵⁶, HoVer-Trans⁵⁷, and MIB Net⁴⁶ are directly cited from the original papers, as their source codes are not publicly accessible.

Experimental results

Tables 1, 2, and 3 present the overall performance of MPCCI and the compared methods. On CT COVID-19 dataset, our MPCCI method exhibits strong performance, underscoring its efficacy and robustness in COVID-19 image classification tasks. Specifically, it achieves the highest performances of 96.10% and 96.28% on precision and F1-score respectively. On the BUSI dataset, MPCCI achieves the best average results of four evaluation metrics, demonstrating its superiority. Compared to MIB Net, which takes advantages of multi-task learning method in both classification and segmentation, our MPCCI outperforms it with improvements of 0.26%, 0.99%, 2.94%, and 2.19% in accuracy, precision, recall, and F1-score, respectively. This demonstrates that our approach can perform well in diagnosing breast cancer in ultrasound images with only instance-level labels. On the FJPH dataset, the performance of MPCCI consistently surpasses the runner-up by 1.36%, 1.20%, and 3.83% in accuracy, recall, and F1-score, respectively. The results highlight its consistent superiority across various datasets and underscores its potential for practical applications in medical image analysis. On the FJTU dataset, we test the accuracy for the four sub-types. Compared to SOTA and baseline methods, the proposed MPCCI achieves the best performance across all categories. While our method generally outperforms the compared methods, it occasionally lags behind by 1–2% in Precision or Recall. This discrepancy may stem from our method’s suboptimal performance in handling specific categories. A potential improvement direction is to leverage prototype learning for enhancing the classification boundaries⁵⁸.

Table 1.

Performance comparison between MPCCI and compared methods on the CT COVID-19 dataset. The best and second best results are marked in bold and with underline respectively.

Method	ACC (%)	P (%)	R (%)	F1 (%)
ResNet18¹⁸	95.44 ± 0.71	98.87 ± 0.94	92.36 ± 1.03	95.50 ± 0.36
Fishr³⁹	96.06 ± 1.30	94.57 ± 0.28	97.31 ± 0.94	95.92 ± 0.73
CABNet¹⁷	95.89 ± 0.64	95.06 ± 1.04	96.37 ± 1.50	95.71 ± 0.83
MixupNet⁵¹	95.10 ± 0.45	96.82 ± 1.51	92.74 ± 1.89	94.74 ± 0.57
MixStyleNet⁵⁴	95.10 ± 1.49	96.82 ± 1.42	92.47 ± 1.76	94.72 ± 0.82
VGG16⁴⁰	93.96 ± 1.84	94.68 ± 2.59	93.74 ± 1.86	94.21 ± 1.58
ViT⁹	95.22 ± 0.21	95.59 ± 0.58	94.52 ± 0.71	95.06 ± 0.63
Mamba¹⁵	79.67 ± 0.00	77.38 ± 0.00	86.50 ± 0.00	81.69 ± 0.00
MPCCI (Ours)	96.10 ± 0.51	96.18 ± 0.24	96.37 ± 0.84	96.28 ± 0.18

Open in a new tab

Table 2.

Performance comparison between MPCCI and compared methods on the BUSI and FJPH datasets for ultrasound images. The best and second best results are marked in bold and with underline respectively.

Method	BUSI				FJPH
Method	ACC (%)	P (%)	R (%)	F1(%)	ACC (%)	P (%)	R (%)	F1(%)
TNTs*¹⁶	81.20 ± 3.20	76.30 ± 5.70	61.10 ± 10.40	67.9 ± 5.70	-	-	-	-
BVA Net*⁴⁶	84.3	88.3	75.1	-	-	-	-	-
HoVer-Trans*³⁵	85.50 ± 5.00	87.60 ± 6.20	86.70 ± 11.50	87.20 ± 8.00	-	-	-	-
MIB Net*⁴²	92.97 ± 1.11	93.21 ± 1.50	92.97 ± 1.10	92.85 ± 1.01	-	-	-	-
ResNet18¹⁸	91.39 ± 2.48	91.76 ± 1.72	95.91 ± 1.82	93.75 ± 1.78	80.90 ± 1.12	80.31 ± 1.17	87.60 ± 4.40	83.74 ± 0.87
Fishr³⁹	93.04 ± 2.74	96.61 ± 1.34	93.18 ± 2.36	94.34 ± 1.23	84.26 ± 0.84	87.87 ± 2.91	74.35 ± 3.77	80.55 ± 2.45
CABNet¹⁷	89.23 ± 5.37	95.12 ± 1.78	88.63 ± 3.63	91.76 ± 3.15	83.14 ± 1.49	83.33 ± 4.63	76.92 ± 9.92	80.00 ± 7.17
MixupNet⁵¹	92.30 ± 2.69	95.34 ± 1.68	93.18 ± 2.51	94.25 ± 1.83	84.26 ± 2.34	79.06 ± 3.21	87.17 ± 2.95	82.92 ± 2.14
MixStyleNet⁵⁴	86.15 ± 5.04	90.69 ± 4.93	88.63 ± 6.42	89.65 ± 4.64	76.40 ± 1.18	76.47 ± 2.43	66.66 ± 3.88	71.23 ± 2.94
VGG16⁴⁰	93.12 ± 1.08	94.45 ± 1.26	94.45 ± 1.45	94.45 ± 1.31	82.47 ± 1.80	83.33 ± 0.98	85.60 ± 4.40	84.41 ± 0.74
ViT⁹	90.76 ± 3.72	93.18 ± 3.55	93.18 ± 4.16	93.18 ± 3.77	74.83 ± 0.45	74.33 ± 0.67	84.40 ± 1.60	79.03 ± 0.21
Mamba¹⁵	67.69 ± 0.00	67.69 ± 0.00	100.00 ± 0.00	80.73 ± 0.00	68.53 ± 0.00	68.33 ± 0.00	82.00 ± 0.00	74.54 ± 0.00
MPCCI (Ours)	93.23 ± 2.15	94.20 ± 1.36	95.91 ± 1.82	95.04 ± 1.59	85.62 ± 3.14	86.62 ± 5.05	88.80 ± 3.20	87.57 ± 2.23

Open in a new tab

*These results are directly cited from the original papers, as their source codes are not publicly accessible.

Table 3.

Performance comparison on FJTU. The best and second best results are marked in bold and with underline respectively.

Method	FV-PTC	FC	TA	MC
Fishr	85.42	70.68	57.77	71.02
CABNet	86.99	71.3	58.03	71.73
MixupNet	86.79	65.15	56.63	70.03
MixStyleNet	86.86	62.28	57.45	66.33
CAD_PE	86.84	71.48	57.91	72.46
ResNet18	86.05	68.98	57.52	74.67
MPCCI	87.15	72.77	59.26	77.99

Open in a new tab

Ablation study

We conduct ablation studies on CT COVID-19 and FJPH datasets to explore the individual contributions of each component of the proposed MPCCI. The four components evaluated are MVFE, MPC, and its two contained criteria (i.e., IP & MT). The results of ablation studies are presented in Table 4 with AB1: ResNet18 (baseline), AB2: AB1 + MVFE, AB3: AB2 + MPC, AB4: AB3 + IP, and AB5: MPCCI (AB4 + MT). The results show that both MVFE and MPC contribute to the improvements in performance compared to the baseline, demonstrating that MPCCI based on FDA can effectively evaluate the causal effect from image to label. Moreover, the results also reveal that it is crucial to consider both IP and MT factors simultaneously, as the performance drops significantly when only IP is considered.

Table 4.

Ablation study of MPCCI on CT COVID-19 and FJPH. AB1: ResNet18 (baseline), AB2: AB1 + MVFE, AB3: AB2 + MPC, AB4: AB3 + IP, and AB5: MPCCI (AB4 + MT). The best and second best results are marked in bold and with underline respectively.

Method	CT COVID-19				FJPH
Method	ACC (%)	P (%)	R (%)	F1(%)	ACC (%)	P (%)	R (%)	F1(%)
AB1	95.44 ± 0.71	98.87 ± 0.94	92.36 ± 1.03	95.50 ± 0.36	80.90 ± 1.12	80.31 ± 1.17	87.60 ± 4.40	83.74 ± 0.87
AB2	95.47 ± 0.33	95.47 ± 0.57	95.91 ± 0.75	95.69 ± 0.25	81.57 ± 0.45	80.54 ± 3.46	88.80 ± 3.20	84.41 ± 0.50
AB3	95.72 ± 1.13	98.20 ± 1.61	93.54 ± 1.34	95.81 ± 1.42	84.95 ± 2.69	83.93 ± 5.87	90.40 ± 1.60	86.99 ± 1.90
AB4	95.92 ± 0.47	97.68 ± 1.21	94.47 ± 1.13	96.05 ± 0.24	83.37 ± 0.90	82.36 ± 0.97	89.60 ± 0.40	85.82 ± 0.71
AB5	96.10 ± 0.51	96.18 ± 0.24	96.37 ± 0.84	96.28 ± 0.18	85.62 ± 3.14	86.62 ± 5.05	88.80 ± 3.20	87.57 ± 2.23

Open in a new tab

Validation of heterogeneity cause C

The gender/age is the unobserved confounding factor C⁵⁹. Inspired by this work⁵⁹, the FJTU-H dataset is divided into male and female groups. Subsequently, we conduct generalization tests on these two groups separately. The higher generalizability indicates that the algorithm is less influenced by the confounding factor of gender. The results are shown in Table 5. It can be seen that MPCCI is less affected by the confounding factor C (gender).

Table 5.

Results of generalization ability for the unoberserved confounder C (gender) on FJTU-H dataset.

Method	Male2Female		Female2Male
Method	ACC(%)	F1 (%)	ACC(%)	F1 (%)
ResNet18	70.43	78.96	66.55	77.02
MPCCI	74.20	85.16	67.21	77.88

Open in a new tab

Data set analysis

We utilize the FJTU-H dataset to validate the effectiveness of our method in addressing heterogeneity. Two sets of control experiments are conducted: random splitting and splitting by low/high heterogeneity. The Pearson correlation in low and high heterogeneity groups are 0.8 and 0.5, respectively. The experimental results are shown in Table 6. It can be found that in the random splitting group, the baseline method (ResNet18) and our MPCCI method achieve similar results. However, in the splitting by heterogeneity group, our method significantly outperforms the baseline, demonstrating its effectiveness in handling the heterogeneity issue.

Table 6.

Results of heterogeneous generalization on FJTU-H dataset.

Method	Random Group		Heterogeneous Group
Method	ACC(%)	F1 (%)	ACC(%)	F1 (%)
ResNet18	93.67	95.43	73.41	81.29
MPCCI	93.74	95.68	77.63	83.29

Open in a new tab

Experimental extensions

In this section, we aim to address three questions to provide more detailed analysis of the proposed method: (1) What is the optimal number of expert networks required to optimize feature views? (2) How do the mixing mechanism and fusion module contribute to the performance of MPC? (3) Can the interpretability of MPCCI be quantitatively assessed? To answer the first question, we conduct experiments by changing the number of expert networks from one to nine on the FJPH dataset and observe the performance. The results in Fig. 6 (a) show that the optimal performance is achieved with five expert networks, indicating that an increased number of views does not correlate directly with enhanced performance. To address the second question, we assess the performance of “MPC1” mode (where the mixing mechanism is removed, and A is fused with only the source multi-view prototypes Inline graphic ) and “MPC2” mode (where the fusion module is removed, and is calculated by randomly mixing A and ). From Fig. 6 (b), we observe that both the mixing mechanism and the fusion module are necessary for MPC, demonstrating its ability to simulate the feature representation of lesions in various states. To answer the third question, we visualize the class activation maps (CAM)⁶⁰ for four samples in the FJPH and CT COVID-19 datasets using baseline (ResNet18), Fishr, and MPCCI in Fig. 7. The lesions in the original images have been marked by professional doctors and shown with red circles. By comparing the locations of the lesions and visualization results, we can see that MPCCI can attend to the entire lesion, while ResNet18 and Fishr can only focus on a small part of the lesion or miss it entirely. Therefore, the CAM results of MPCCI can provide interpretable results to doctors, thereby facilitating medical diagnosis.

Fig. 6 — (a) Performance change by increasing the number of expert networks in MVFE. (b) Effect exploration to the contributions of different mixing mechanisms and fusion modules in MPC.

Fig. 7 — Comparative visualization of lesion detection by ResNet18, Fishr, and MPCCI on FJPH and CT COVID-19 datasets. Professional doctors have annotated the lesions in the original images, which are delineated in red for clarity.

Conclusion

In this paper, we propose a novel approach MPCCI for enhanced medical image classification by addressing the unmeasurable confounding factors present in medical imaging analysis. Leveraging FDA, MPCCI estimates the total causal effect of an image on its corresponding label, thus mitigating the negative effects of the confounders. The proposed approach comprises an MVFE module with spatial-channel attention, allowing multi-view features to serve as mediators in FDA, and an MPC module to effectively apply causal intervention on the mediators. An adaptive training strategy, including IP and MT, is introduced to maintain the stable training during the feature exchange process. Experimental results on four medical datasets demonstrate the effectiveness of MPCCI, achieving high accuracy, precision, recall, and F1-score in diagnosing COVID-19, breast cancer, lymph node metastasis, and thyroid. In the future, we plan to conduct extensive validation studies across a wider spectrum of medical conditions and imaging modalities. By rigorously evaluating the performance of MPCCI on diverse datasets encompassing a myriad of medical scenarios, we aim to demonstrate its efficacy and versatility in facilitating accurate and reliable diagnostic decision-making.

Acknowledgements

The authors are thankful to Fujian Provincial Hospital and Fujian Medical University for their management of our patient database. The authors are thankful to Song-Song Wu for helping critically revise the manuscript for important intellectual content and helping collect data and design the study.

Author contributions

H.P. and W.Y. wrote the main manuscript text and H.P. prepared Figs. 1, 2, 3, 4 and 5. All authors reviewed the manuscript.

Funding

Project of the Department of Finance of Fujian Province (0060092410).

Data availability

Excel files containing raw data included in the main figures and tables can be found in the Source Data File in the article. All other data including the imaging data can be provided upon reasonable request to the corresponding author.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zhi-Liang Hong, Jian-Chuan Yang and Xiao-Rui Peng contributed equally to this work.

References

1.Bradley, J., Erickson, P., Korffatis, Z., Akkus & Timothy, L. K. Machine learning for medical imaging. Radiographics 37, 2 (2017), 505–515. (2017). [DOI] [PMC free article] [PubMed]
2.Duan, J. et al. Normality learning-based graph anomaly detection via multi-scale contrastive learning. In Proceedings of the ACM International Conference on Multimedia. 7502–7511. (2023).
3.Neelu Madan, N. C. et al. Selfsupervised masked convolutional transformer block for anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023). (2023). 10.1109/TPAMI.2023.3322604 [DOI] [PubMed]
4.Li, Z., Zheng, Y., Luo, X. & Shan, D. and Qingqi Hong. Scribblevc: scribble-supervised medical image segmentation with vision-class embedding. In Proceedings of the ACM International Conference on Multimedia. 3384–3393. (2023).
5.Yixuan Wu, J., Chen, J., Zhu, Y. Y., Danny, Z. & Chen and Jian Wu. GCL: gradient-guided contrastive learning for medical image segmentation with multi-perspective meta labels. In Proceedings of the ACM International Conference on Multimedia. 463–471. (2023).
6.Xie, X., Jin, T., Yun, B., Li, Q. & Wang, Y. Exploring hyperspectral histopathology image segmentation from a deformable perspective. In Proceedings of the ACM International Conference on Multimedia. 242–251. (2023).
7.Huang, Z. A., Liu, R., Zhu, Z. & Kay Chen, T. Multitask learning for joint diagnosis of multiple mental disorders in resting-state fmri. IEEE Transactions on Neural Networks and Learning Systems (2022). (2022). 10.1109/TNNLS.2022.3225179 [DOI] [PubMed]
8.Huang, Z. A. et al. Identification of autistic risk candidate genes and toxic chemicals via multilabel learning. IEEE Transactions on Neural Networks and Learning Systems 32, 9 (2020), 3971–3984. (2020). 10.1109/TNNLS.2020.3016357 [DOI] [PubMed]
9.Wu Lin, Q., Lin, L., Feng & Kay Chen, T. Ensemble of domain adaptation-based knowledge transfer for evolutionary multitasking. IEEE Transactions on Evolutionary Computation (2023). (2023). 10.1109/TEVC.2023. 3259067.
10.Rui Liu, Z. A., Huang, Y., Zhu, H. Z., Wong, K. C. & Kay Chen, T. Spatial–temporal co-attention learning for diagnosis of mental disorders from resting-state fmri data. IEEE Transactions on Neural Networks and Learning Systems (2023). (2023). 10.1109/TNNLS.2023. 3243000. [DOI] [PubMed]
11.Fouras, A. et al. The past, present, and future of x-ray technology for in vivo imaging of function and form. Journal of Applied Physics 105, 10 (2009). (2009).
12.Li, X., Liu, L. & Wang, C. Juan Zhou, and Heterogeneity analysis and diagnosis of complex diseases based on deep learning methods. Scientific Reports 8, 1 (2018), 6155. (2018). [DOI] [PMC free article] [PubMed]
13.Lin Yue, D., Tian, W., Chen, X., Han & Yin, M. Deep learning for heterogeneous medical data analysis. World Wide Web 23 (2020), 2715–2737. (2020).
14.Iain Carmichael, A. H., Song, R. J., Chen, Drew, F. K., Williamson, T. Y. & Chen and Faisal Mahmood. Incorporating intratumoral heterogeneity into weakly-supervised deep learning models via variance pooling. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 387–397. (2022).
15.Zhang, L. et al. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Transactions on Medical Imaging 39, 7 (2020), 2531–2540. (2020). 10.1109/TMI. 2020.2973595. [DOI] [PMC free article] [PubMed]
16.François Bertucci and Daniel Birnbaum. Reasons for breast cancer heterogeneity. Journal of Biology 7 (2008), 1–4. (2008). [DOI] [PMC free article] [PubMed]
17.Judea et al. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress 19, 2 (2000), 3. (2000).
18.Kumar, A., Kim, J., Lyndon, D., Fulham, M. & Feng, D. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE Journal of Biomedical and Health Informatics 21, 1 (2016), 31–40. (2016). 10.1109/JBHI.2016.2635663 [DOI] [PubMed]
19.Yao, H. et al. Source free semi-supervised transfer learning for diagnosis of mental disorders on fmri scans. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023). (2023). 10.1109/TPAMI.2023.3298332 [DOI] [PubMed]
20.Yun Yang, Y., Hu, X., Zhang & Wang, S. Two-stage selective ensemble of Cnn via deep tree training for medical image classification. IEEE Trans. Cybernetics. 52 (2021), 9194–9207. 10.1109/TCYB.2021.3061147 (2021). [DOI] [PubMed] [Google Scholar]
21.Graham, S. et al. Yee Wah tsang, and Nasir rajpoot. 2019. MILD-Net: minimal information loss dilated network for gland instance segmentation in colon histology images. Med. Image. Anal.52, 199–211 (2019). [DOI] [PubMed] [Google Scholar]
22.Hong, H., Jiang, M., Feng, L., Lin, Q. & Tan, K. C. Balancing exploration and exploitation for solving large-scale multiobjective optimization via attention mechanism. In 2022 IEEE Congress on Evolutionary Computation (CEC), 1–8 (IEEE, 2022).
23.Junlong Cheng, C., Gao, F., Wang & Zhu, M. Segnetr: rethinking the local-global interactions and skip connections in u-shaped networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 64–74. (2023).
24.Ma, C., Wu, J., Si, C. & Tan, K. C. Scaling supervised local learning with augmented auxiliary networks. In The Twelfth International Conference on Learning Representations. (2023).
25.Bissoto, A. & Valle, E. and Sandra Avila. Gan-based data augmentation and anonymization for skin-lesion analysis: a critical review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1847–1856. (2021).
26.Irem Cetin, M., Stephens, O., Camara, Miguel, A. G. & Ballester Attri-VAE: attribute-based interpretable representations of medical images with variational autoencoders. Computerized Medical Imaging and Graphics 104 (2023), 102158. (2023). [DOI] [PubMed]
27.Haifan Gong, G., Chen, M., Mao, Z., Li & Li, G. Vqamix: conditional triplet mixup for medical visual question answering. IEEE Transactions on Medical Imaging 41, 11 (2022), 3332–3343. (2022). 10.1109/TMI. 2022.3185008. [DOI] [PubMed]
28.Amirhossein Kazerouni, E. K. et al. and. Diffusion models in medical imaging: a comprehensive survey. Medical Image Analysis (2023), 102846. (2023). [DOI] [PubMed]
29.Zhang, D., Zhang, H., Tang, J. & Hua, X. S. and Qianru Sun. Causal intervention for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems 33 (2020), 655–666. (2020).
30.Mattia Prosperi, Y. et al. and. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence 2, 7 (2020), 369–375. (2020).
31.Li, X. et al. A causality-informed graph intervention model for pancreatic cancer early diagnosis. IEEE Trans. Artif. Intell. (2024).
32.Tang, X. et al. A causal counterfactual graph neural network for arising-from-chair abnormality detection in parkinsonians. Med. Image. Anal.97, 103266 (2024). [DOI] [PubMed] [Google Scholar]
33.Qu, J. et al. A causality-inspired generalized model for automated pancreatic cancer diagnosis. Med. Image. Anal.94, 103154 (2024). [DOI] [PubMed] [Google Scholar]
34.Li, X. et al. Causality-driven graph neural network for early diagnosis of pancreatic cancer in non-contrast computerized tomography. IEEE Trans. Med. Imaging. 42 (6), 1656–1667 (2023). [DOI] [PubMed] [Google Scholar]
35.Cheng Ouyang, C. et al. and Daniel Rueckert. Causality-inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging 42, 4 (2022), 1095–1106. (2022). 10.1109/TMI. 2022.3224067. [DOI] [PubMed]
36.Zhang Chen, Z. et al. C-cam: causal cam for weakly supervised semantic segmentation on medical image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11676–11685. (2022).
37.Miao, J., Chen, C., Liu, F., Wei, H. & Pheng-Ann Heng Caussl: causality-inspired semi-supervised learning for medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 21426–21437. (2023).
38.Guanqun, S. et al., Le-Minh Nguyen, Junyi Xin. DA-TransUNet: integrating spatial and channel dual attention with transformer U-net for medical image segmentation.Front Bioeng Biotechnol.16:12:1398237. (2024). 10.3389/fbioe.2024.1398237 [DOI] [PMC free article] [PubMed]
39.Yizhi Pan, J. et al. Sun.2024.A mutual inclusion mechanism for precise boundary segmentation in medical images. Front Bioeng. Biotechnol. 2024 Dec.24:121504249doi : 10.3389/fbioe.2024.1504249 [DOI] [PMC free article] [PubMed]
40.He, A., Li, T., Li, N., Wang, K. & Fu, H. CABNet: category attention block for imbalanced diabetic retinopathy grading. IEEE Transactions on Medical Imaging 40, 1 (2020), 143–153. (2020). 10.1109/TMI.2020.3023463 [DOI] [PubMed]
41.Zhi-An Huang, Y. et al. Federated multi-task learning for joint diagnosis of multiple mental disorders on mri scans. IEEE Transactions on Biomedical Engineering 70, 4 (2022), 1137–1149. (2022). 10.1109/TBME.2022.3210940 [DOI] [PubMed]
42.Woo, S., Park, J. & Lee, J. Y. and In So Kweon. Cbam: convolutional block attention module. In Proceedings of the European Conference on Computer Vision. 3–19. (2018).
43.Chunyan Yu, B. et al. Multiview calibrated prototype learning for few-shot hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–13. (2022). 10.1109/TGRS. 2022.3225947.
44.Maede Maftouni, A. C. C. et al. A robust ensemble-deep learning model for COVID-19 diagnosis based on an integrated CT scan images database. In IIE annual conference. Proceedings. Institute of Industrial and Systems Engineers (IISE), 632–637. (2021).
45.Al-Dhabyani, W., Gomaa, M., Khaled, H. & Fahmy, A. Dataset of breast ultrasound images. Data in Brief 28 (2020), 104863. (2020). [DOI] [PMC free article] [PubMed]
46.Wang, J. et al. and. Information bottleneck-based interpretable multitask network for breast cancer classification and segmentation. Medical Image Analysis 83 (2023), 102687. (2023). [DOI] [PubMed]
47.He, K., Zhang, X. & Ren, S. and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. (2016).
48.Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). (2014).
49.Alexey Dosovitskiy, L. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations. (2020).
50.Albert Gu and Tri Dao. Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023). (2023).
51.Islam, N. U., Zhou, Z., Gehlot, S., Gotway, M. B. & Liang, J. Seeking an optimal approach for Computer-aided Diagnosis of Pulmonary Embolism. Medical image analysis 91 (2024), 102988. (2024). [DOI] [PMC free article] [PubMed]
52.Zhang, H., Cisse, M., Yann, N., Dauphin & Lopez-Paz, D. Mixup: beyond empirical risk minimization. In International Conference on Learning Representations. (2018).
53.Kaiyang Zhou, Y., Yang, Y., Qiao & Xiang, T. Domain generalization with mixstyle. In International Conference on Learning Representations. (2020).
54.Alexandre Rame, C., Dancette & Cord, M. Fishr: invariant gradient variances for out-of-distribution generalization. In International Conference on Machine Learning. PMLR, 18347–18377. (2022).
55.Han, K. et al. Transformer in transformer. Advances in Neural Information Processing Systems 34 (2021), 15908–15919. (2021).
56.Xing, J. et al. Jing Xiao, and Using BI-RADS stratifications as auxiliary information for breast masses classification in ultrasound images. IEEE Journal of Biomedical and Health Informatics 25, 6 (2020), 2058–2070. (2020). 10.1109/JBHI. 2020.3034804. [DOI] [PubMed]
57.Yuhao Mo, C. et al. Hover-trans: anatomy aware hover-transformer for roi-free breast cancer diagnosis in ultrasound images. IEEE Transactions on Medical Imaging (2023). (2023). 10.1109/TMI. 2023.3236011. [DOI] [PubMed]
58.Chong Wang, Y. et al. and. Learning support and trivial prototypes for interpretable image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2062–2072. (2023).
59.Raumanns, S. A. S. R., Britt, E. J., Michels, G. & Schouten and Veronika Cheplygina. Risk of training diagnostic algorithms on data with demographic bias. In Interpretable and Annotation-Efffcient Learning for Medical Image Computing: Third International Workshop, iMIMIC 2020, Second International Workshop, MIL3ID 2020, and 5th International Workshop, LABELS 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4–8, 2020, Proceedings 3. Springer, 183–192. (2020).
60.Bolei Zhou, A., Khosla, A., Lapedriza, A., Oliva & Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2921–2929. (2016).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Bradley, J., Erickson, P., Korffatis, Z., Akkus & Timothy, L. K. Machine learning for medical imaging. Radiographics 37, 2 (2017), 505–515. (2017). [DOI] [PMC free article] [PubMed]

[CR2] 2.Duan, J. et al. Normality learning-based graph anomaly detection via multi-scale contrastive learning. In Proceedings of the ACM International Conference on Multimedia. 7502–7511. (2023).

[CR3] 3.Neelu Madan, N. C. et al. Selfsupervised masked convolutional transformer block for anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023). (2023). 10.1109/TPAMI.2023.3322604 [DOI] [PubMed]

[CR4] 4.Li, Z., Zheng, Y., Luo, X. & Shan, D. and Qingqi Hong. Scribblevc: scribble-supervised medical image segmentation with vision-class embedding. In Proceedings of the ACM International Conference on Multimedia. 3384–3393. (2023).

[CR5] 5.Yixuan Wu, J., Chen, J., Zhu, Y. Y., Danny, Z. & Chen and Jian Wu. GCL: gradient-guided contrastive learning for medical image segmentation with multi-perspective meta labels. In Proceedings of the ACM International Conference on Multimedia. 463–471. (2023).

[CR6] 6.Xie, X., Jin, T., Yun, B., Li, Q. & Wang, Y. Exploring hyperspectral histopathology image segmentation from a deformable perspective. In Proceedings of the ACM International Conference on Multimedia. 242–251. (2023).

[CR7] 7.Huang, Z. A., Liu, R., Zhu, Z. & Kay Chen, T. Multitask learning for joint diagnosis of multiple mental disorders in resting-state fmri. IEEE Transactions on Neural Networks and Learning Systems (2022). (2022). 10.1109/TNNLS.2022.3225179 [DOI] [PubMed]

[CR8] 8.Huang, Z. A. et al. Identification of autistic risk candidate genes and toxic chemicals via multilabel learning. IEEE Transactions on Neural Networks and Learning Systems 32, 9 (2020), 3971–3984. (2020). 10.1109/TNNLS.2020.3016357 [DOI] [PubMed]

[CR9] 9.Wu Lin, Q., Lin, L., Feng & Kay Chen, T. Ensemble of domain adaptation-based knowledge transfer for evolutionary multitasking. IEEE Transactions on Evolutionary Computation (2023). (2023). 10.1109/TEVC.2023. 3259067.

[CR10] 10.Rui Liu, Z. A., Huang, Y., Zhu, H. Z., Wong, K. C. & Kay Chen, T. Spatial–temporal co-attention learning for diagnosis of mental disorders from resting-state fmri data. IEEE Transactions on Neural Networks and Learning Systems (2023). (2023). 10.1109/TNNLS.2023. 3243000. [DOI] [PubMed]

[CR11] 11.Fouras, A. et al. The past, present, and future of x-ray technology for in vivo imaging of function and form. Journal of Applied Physics 105, 10 (2009). (2009).

[CR12] 12.Li, X., Liu, L. & Wang, C. Juan Zhou, and Heterogeneity analysis and diagnosis of complex diseases based on deep learning methods. Scientific Reports 8, 1 (2018), 6155. (2018). [DOI] [PMC free article] [PubMed]

[CR13] 13.Lin Yue, D., Tian, W., Chen, X., Han & Yin, M. Deep learning for heterogeneous medical data analysis. World Wide Web 23 (2020), 2715–2737. (2020).

[CR14] 14.Iain Carmichael, A. H., Song, R. J., Chen, Drew, F. K., Williamson, T. Y. & Chen and Faisal Mahmood. Incorporating intratumoral heterogeneity into weakly-supervised deep learning models via variance pooling. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 387–397. (2022).

[CR15] 15.Zhang, L. et al. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Transactions on Medical Imaging 39, 7 (2020), 2531–2540. (2020). 10.1109/TMI. 2020.2973595. [DOI] [PMC free article] [PubMed]

[CR16] 16.François Bertucci and Daniel Birnbaum. Reasons for breast cancer heterogeneity. Journal of Biology 7 (2008), 1–4. (2008). [DOI] [PMC free article] [PubMed]

[CR17] 17.Judea et al. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress 19, 2 (2000), 3. (2000).

[CR18] 18.Kumar, A., Kim, J., Lyndon, D., Fulham, M. & Feng, D. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE Journal of Biomedical and Health Informatics 21, 1 (2016), 31–40. (2016). 10.1109/JBHI.2016.2635663 [DOI] [PubMed]

[CR19] 19.Yao, H. et al. Source free semi-supervised transfer learning for diagnosis of mental disorders on fmri scans. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023). (2023). 10.1109/TPAMI.2023.3298332 [DOI] [PubMed]

[CR20] 20.Yun Yang, Y., Hu, X., Zhang & Wang, S. Two-stage selective ensemble of Cnn via deep tree training for medical image classification. IEEE Trans. Cybernetics. 52 (2021), 9194–9207. 10.1109/TCYB.2021.3061147 (2021). [DOI] [PubMed] [Google Scholar]

[CR21] 21.Graham, S. et al. Yee Wah tsang, and Nasir rajpoot. 2019. MILD-Net: minimal information loss dilated network for gland instance segmentation in colon histology images. Med. Image. Anal.52, 199–211 (2019). [DOI] [PubMed] [Google Scholar]

[CR22] 22.Hong, H., Jiang, M., Feng, L., Lin, Q. & Tan, K. C. Balancing exploration and exploitation for solving large-scale multiobjective optimization via attention mechanism. In 2022 IEEE Congress on Evolutionary Computation (CEC), 1–8 (IEEE, 2022).

[CR23] 23.Junlong Cheng, C., Gao, F., Wang & Zhu, M. Segnetr: rethinking the local-global interactions and skip connections in u-shaped networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 64–74. (2023).

[CR24] 24.Ma, C., Wu, J., Si, C. & Tan, K. C. Scaling supervised local learning with augmented auxiliary networks. In The Twelfth International Conference on Learning Representations. (2023).

[CR25] 25.Bissoto, A. & Valle, E. and Sandra Avila. Gan-based data augmentation and anonymization for skin-lesion analysis: a critical review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1847–1856. (2021).

[CR26] 26.Irem Cetin, M., Stephens, O., Camara, Miguel, A. G. & Ballester Attri-VAE: attribute-based interpretable representations of medical images with variational autoencoders. Computerized Medical Imaging and Graphics 104 (2023), 102158. (2023). [DOI] [PubMed]

[CR27] 27.Haifan Gong, G., Chen, M., Mao, Z., Li & Li, G. Vqamix: conditional triplet mixup for medical visual question answering. IEEE Transactions on Medical Imaging 41, 11 (2022), 3332–3343. (2022). 10.1109/TMI. 2022.3185008. [DOI] [PubMed]

[CR28] 28.Amirhossein Kazerouni, E. K. et al. and. Diffusion models in medical imaging: a comprehensive survey. Medical Image Analysis (2023), 102846. (2023). [DOI] [PubMed]

[CR29] 29.Zhang, D., Zhang, H., Tang, J. & Hua, X. S. and Qianru Sun. Causal intervention for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems 33 (2020), 655–666. (2020).

[CR30] 30.Mattia Prosperi, Y. et al. and. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence 2, 7 (2020), 369–375. (2020).

[CR31] 31.Li, X. et al. A causality-informed graph intervention model for pancreatic cancer early diagnosis. IEEE Trans. Artif. Intell. (2024).

[CR32] 32.Tang, X. et al. A causal counterfactual graph neural network for arising-from-chair abnormality detection in parkinsonians. Med. Image. Anal.97, 103266 (2024). [DOI] [PubMed] [Google Scholar]

[CR33] 33.Qu, J. et al. A causality-inspired generalized model for automated pancreatic cancer diagnosis. Med. Image. Anal.94, 103154 (2024). [DOI] [PubMed] [Google Scholar]

[CR34] 34.Li, X. et al. Causality-driven graph neural network for early diagnosis of pancreatic cancer in non-contrast computerized tomography. IEEE Trans. Med. Imaging. 42 (6), 1656–1667 (2023). [DOI] [PubMed] [Google Scholar]

[CR35] 35.Cheng Ouyang, C. et al. and Daniel Rueckert. Causality-inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging 42, 4 (2022), 1095–1106. (2022). 10.1109/TMI. 2022.3224067. [DOI] [PubMed]

[CR36] 36.Zhang Chen, Z. et al. C-cam: causal cam for weakly supervised semantic segmentation on medical image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11676–11685. (2022).

[CR37] 37.Miao, J., Chen, C., Liu, F., Wei, H. & Pheng-Ann Heng Caussl: causality-inspired semi-supervised learning for medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 21426–21437. (2023).

[CR38] 38.Guanqun, S. et al., Le-Minh Nguyen, Junyi Xin. DA-TransUNet: integrating spatial and channel dual attention with transformer U-net for medical image segmentation.Front Bioeng Biotechnol.16:12:1398237. (2024). 10.3389/fbioe.2024.1398237 [DOI] [PMC free article] [PubMed]

[CR39] 39.Yizhi Pan, J. et al. Sun.2024.A mutual inclusion mechanism for precise boundary segmentation in medical images. Front Bioeng. Biotechnol. 2024 Dec.24:121504249doi : 10.3389/fbioe.2024.1504249 [DOI] [PMC free article] [PubMed]

[CR40] 40.He, A., Li, T., Li, N., Wang, K. & Fu, H. CABNet: category attention block for imbalanced diabetic retinopathy grading. IEEE Transactions on Medical Imaging 40, 1 (2020), 143–153. (2020). 10.1109/TMI.2020.3023463 [DOI] [PubMed]

[CR41] 41.Zhi-An Huang, Y. et al. Federated multi-task learning for joint diagnosis of multiple mental disorders on mri scans. IEEE Transactions on Biomedical Engineering 70, 4 (2022), 1137–1149. (2022). 10.1109/TBME.2022.3210940 [DOI] [PubMed]

[CR42] 42.Woo, S., Park, J. & Lee, J. Y. and In So Kweon. Cbam: convolutional block attention module. In Proceedings of the European Conference on Computer Vision. 3–19. (2018).

[CR43] 43.Chunyan Yu, B. et al. Multiview calibrated prototype learning for few-shot hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–13. (2022). 10.1109/TGRS. 2022.3225947.

[CR44] 44.Maede Maftouni, A. C. C. et al. A robust ensemble-deep learning model for COVID-19 diagnosis based on an integrated CT scan images database. In IIE annual conference. Proceedings. Institute of Industrial and Systems Engineers (IISE), 632–637. (2021).

[CR45] 45.Al-Dhabyani, W., Gomaa, M., Khaled, H. & Fahmy, A. Dataset of breast ultrasound images. Data in Brief 28 (2020), 104863. (2020). [DOI] [PMC free article] [PubMed]

[CR46] 46.Wang, J. et al. and. Information bottleneck-based interpretable multitask network for breast cancer classification and segmentation. Medical Image Analysis 83 (2023), 102687. (2023). [DOI] [PubMed]

[CR47] 47.He, K., Zhang, X. & Ren, S. and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. (2016).

[CR48] 48.Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). (2014).

[CR49] 49.Alexey Dosovitskiy, L. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations. (2020).

[CR50] 50.Albert Gu and Tri Dao. Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023). (2023).

[CR51] 51.Islam, N. U., Zhou, Z., Gehlot, S., Gotway, M. B. & Liang, J. Seeking an optimal approach for Computer-aided Diagnosis of Pulmonary Embolism. Medical image analysis 91 (2024), 102988. (2024). [DOI] [PMC free article] [PubMed]

[CR52] 52.Zhang, H., Cisse, M., Yann, N., Dauphin & Lopez-Paz, D. Mixup: beyond empirical risk minimization. In International Conference on Learning Representations. (2018).

[CR53] 53.Kaiyang Zhou, Y., Yang, Y., Qiao & Xiang, T. Domain generalization with mixstyle. In International Conference on Learning Representations. (2020).

[CR54] 54.Alexandre Rame, C., Dancette & Cord, M. Fishr: invariant gradient variances for out-of-distribution generalization. In International Conference on Machine Learning. PMLR, 18347–18377. (2022).

[CR55] 55.Han, K. et al. Transformer in transformer. Advances in Neural Information Processing Systems 34 (2021), 15908–15919. (2021).

[CR56] 56.Xing, J. et al. Jing Xiao, and Using BI-RADS stratifications as auxiliary information for breast masses classification in ultrasound images. IEEE Journal of Biomedical and Health Informatics 25, 6 (2020), 2058–2070. (2020). 10.1109/JBHI. 2020.3034804. [DOI] [PubMed]

[CR57] 57.Yuhao Mo, C. et al. Hover-trans: anatomy aware hover-transformer for roi-free breast cancer diagnosis in ultrasound images. IEEE Transactions on Medical Imaging (2023). (2023). 10.1109/TMI. 2023.3236011. [DOI] [PubMed]

[CR58] 58.Chong Wang, Y. et al. and. Learning support and trivial prototypes for interpretable image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2062–2072. (2023).

[CR59] 59.Raumanns, S. A. S. R., Britt, E. J., Michels, G. & Schouten and Veronika Cheplygina. Risk of training diagnostic algorithms on data with demographic bias. In Interpretable and Annotation-Efffcient Learning for Medical Image Computing: Third International Workshop, iMIMIC 2020, Second International Workshop, MIL3ID 2020, and 5th International Workshop, LABELS 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4–8, 2020, Proceedings 3. Springer, 183–192. (2020).

[CR60] 60.Bolei Zhou, A., Khosla, A., Lapedriza, A., Oliva & Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2921–2929. (2016).

PERMALINK

Mixed prototype correction for causal inference in medical image classification

Zhi-Liang Hong

Jian-Chuan Yang

Xiao-Rui Peng

Song-Song Wu

Abstract

Introduction

Fig. 1.

Fig. 2.

Fig. 3.

Related work

Medical image classification

Causal inference in medical image classification

Cause-effect analysis

Methodology

Fig. 4.

Multi-View feature extraction

Mixed prototype correction

Adaptive training strategy

Overall loss function

Algorithm 1:

Experiment

Datasets

Fig. 5.

Compared methods and evaluation metrics

Compared methods

Evaluation metrics

Experimental details

Experimental results

Table 1.

Table 2.

Table 3.

Ablation study

Table 4.

Validation of heterogeneity cause C

Table 5.

Data set analysis

Table 6.

Experimental extensions

Fig. 6.

Fig. 7.

Conclusion

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases