Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Jul 4;15:23948. doi: 10.1038/s41598-025-03459-w

Reweighting balanced representation learning for long tailed image recognition in multiple domains

Panpan Fu 1,2, Nur Intan Raihana Ruhaiyem 2,, Jiangtao Wang 2
PMCID: PMC12227607  PMID: 40615406

Abstract

In multi-domain long-tailed learning, data imbalance appears in two ways: within-domain class imbalance and across-domain sample proportion variation. These imbalances introduce biases in covariates and representations when learning domain-invariant features in both input and latent spaces. This paper applies an advanced reweighting balanced representation learning (BRL) algorithm to multi-domain long-tailed image recognition. By integrating covariate and representation balancing techniques into a reweighting-based class balancing approach, BRL effectively addresses these biases. Extensive evaluation on six benchmark datasets confirms its ability to extract domain- and class-unbiased feature representations, leading to excellent classifier performance, especially for the hardest classes. This approach also shows potential for applications in areas such as environmental monitoring and medical imaging, providing a robust solution with broad scientific implications.

Keywords: Multi-domain, Long-tailed, Representation learning, Re-weighting

Subject terms: Biotechnology, Computational biology and bioinformatics, Plant sciences, Zoology, Ecology, Environmental sciences, Engineering, Mathematics and computing

Introduction

Recent advancements in transfer learning and deep learning have greatly enhanced the state-of-the-art in various computer vision tasks1,2. Among these, image recognition technology plays a foundational role, serving as the basis for numerous high-level visual applications, including image segmentation, object detection, video tracking, and dynamic behavior analysis. However, a persistent and well-recognized challenge in image recognition is the long-tailed (LT) distribution problem, in which many classes occur far less frequently compared to a few dominant classes. This class imbalance often leads recognition models to become biased toward head classes, resulting in poor generalization performance on underrepresented tail classes3.

Although significant research has been devoted to developing LT learning algorithms, most prior work has focused on single-domain settings4. In contrast, real-world applications often involve data collected from multiple heterogeneous sources, introducing variations not only between source and target domains but also among the sources themselves5. This phenomenon—where data distributions differ while maintaining related semantic structures across domains—is known as domain shift (DS). The simultaneous occurrence of LT and DS substantially exacerbates the difficulty of the recognition task, a situation commonly encountered in practical scenarios6. For example, in medical image analysis, images collected from different hospitals may exhibit significant inter-domain variability. Simultaneously, rare diseases remain infrequent and may only appear in specific hospital domains. Yang et al. formally characterized this complex setting as the multi-domain long-tailed (MDLT) problem4.

To address domain shift, domain-invariant representation learning approaches have been widely studied. These methods aim to mitigate domain discrepancies by learning feature representations that are shared and consistent across domains7,8. They typically operate by minimizing a distributional discrepancy measure—such as maximum mean discrepancy (MMD)—between source and target domains within a shared latent space. However, most of these approaches are designed under the assumption of class-balanced data distributions9. In MDLT scenarios, severe class imbalances—both within individual domains and across different domains—introduce complex biases in the input feature space and the learned latent representations, thereby complicating the alignment process. These biases can severely distort the mapping between features and labels, leading to suboptimal feature alignment across domains and insufficient discrimination for tail classes10.

Several existing methods have been widely adopted to address LT, DS, or, to a limited extent, their combination. In single-domain LT settings, approaches such as resampling (e.g., over-sampling tail classes or under-sampling head classes) and reweighting strategies (e.g., class-balanced loss, focal loss) have shown varying degrees of effectiveness11,12. For domain adaptation under class-balanced assumptions, models such as domain-adversarial neural networks (DANN)13 have achieved strong performance. However, when evaluated under the MDLT setting, these methods exhibit critical limitations. Resampling methods can exacerbate domain shifts by overfitting to the limited samples of rare classes within specific domains. Reweighting strategies often struggle with conflicting gradients that arise from simultaneously addressing domain alignment and class imbalance. Moreover, domain-adversarial methods tend to focus on aligning dominant classes while neglecting tail classes, leading to poor generalization on rare categories across domains. Recent efforts to bridge this gap have explored combining domain adaptation techniques with imbalance-aware learning, such as domain-conditional feature normalization4. Nonetheless, these approaches often rely on strong assumptions about the availability of domain labels, class distributions, or large amounts of labeled data—conditions that may not hold in practice.

To tackle the MDLT problem, we propose a reweighting balanced representation learning (BRL) method for multi-domain long-tailed image recognition. BRL incorporates covariate balancing (CB) and representation balancing (RB) techniques to simultaneously solve class imbalance and domain imbalance from both the input feature space and latent representation space. It learns a shared latent space that is invariant to both input features and labels, thereby mitigating complex biases and improving classifier performance, particularly for tail classes. This method robustly handles extreme class imbalance and complex domain shifts without relying heavily on strong prior knowledge or domain-specific annotations. Extensive evaluations on multiple real-world datasets confirm the effectiveness of the proposed method.

Related work

Imbalanced long-tailed distribution

Unlike general data heterogeneity, imbalanced LT data exhibit extremely skewed distribution curves. Due to the significant variation in the number of samples per class, even state-of-the-art deep classification models often face performance challenges3. To address LT training data, one line of research focuses on adjusting class-wise contributions through resampling11, reweighting12,14, logit adjustment15,16, and two-stage training strategies17,18. Another line of work explores ensemble-based approaches, including contrastive learning19, knowledge distillation20, variance bias calibration21, and meta-learning strategies22,23. However, this problem becomes more serious in MDLT scenarios, where, in addition to inter-class differences, intra-class variations due to domain shifts further complicate learning. Tail class representations are particularly sensitive to domain-specific features6. Several recent methods have been proposed to address the MDLT problem. For example, Yang et al. introduced the BoDA loss function4, Ding et al. proposed a deep imbalanced domain adaptation (DIDA) framework24, and Xia et al. developed the generative inference network (GINet) to improve generalization across imbalanced domains25. However, the MDLT problem remains far from being fully resolved and warrants further investigation.

Cross-domain representation learning

In machine learning, selecting an appropriate set of features is crucial for the performance of downstream models such as classifiers or predictors. The effectiveness of these models is directly influenced by the quality and structure of the data representation used during training. Representation learning refers to the process of mapping raw input data to feature vectors or tensors that capture abstract, meaningful concepts to improve downstream task performance26. Recently, numerous representation learning methods have been developed to tackle the problem of domain shift2729. Most of these methods focus on feature-based approaches for learning domain-invariant representations8,30,31. For example, deep domain confusion (DDC) integrates MMD minimization into the final fully connected layer to encourage shared representations between domains32. Correlation alignment (CORAL) employs linear transformations to match second-order statistics between source and target domains33,34, which has been extended into Deep CORAL using nonlinear mappings to adjust correlations in deep network activations35. DANN employs a minimax optimization framework to learn domain-invariant features by jointly training a feature extractor and domain classifier13. Similarly, Wasserstein distance guided representation learning (WDGRL) maps data from multiple domains into a shared feature space to facilitate domain-invariant learning8. Despite these advancements, research on feature representation for datasets that are both cross-domain and LT remains limited and insufficiently explored.

Problem definition

Let Inline graphic and Inline graphic represent the input and label spaces, respectively, and let Inline graphic represent the domain space. The domain space Inline graphic comprises K domains, denoted as Inline graphic, and the label space Inline graphic contains M categories. Each data instance is represented as Inline graphic, where i is the instance index, Inline graphic is the raw input, Inline graphic (Inline graphic) is the class label, and Inline graphic (Inline graphic) is the domain label. For any training domain d (Inline graphic), the category distribution Inline graphic is LT; that is, some rare classes may have very few or no training samples due to their low prevalence in specific domains. Hence, Inline graphic, where (Inline graphic represents the class label set of the d-th domain Inline graphic, and Inline graphic). To evaluate overall performance across all classes, the test data—denoted by Inline graphic—is sampled with a balanced distribution over all classes. Figure 1 presents an example of a MDLT dataset, Digits-MLT. Fig. 1aillustrates the class distribution of Digits-MLT, which includes two domains composed of two digit datasets: MNIST-M and SVHN. Representative samples from this dataset are visualized in Fig. 1b, with further details provided in datasets descriptions section. It is evident that the data in each domain of the MDLT dataset follows an LT distribution, and the class distributions vary across domains. It is important to note that this figure serves to show the general characteristics of the MDLT learning problem, namely the presence of both LT class distributions and domain diversity. To ensure clarity in subsequent descriptions of the relevant formulas, Table 1 gives a detailed description of the key symbols used in this paper.

Fig. 1.

Fig. 1

Example of multi-domain long-tailed dataset (Digits-MLT).

Table 1.

The list of symbols.

Symbol Description Symbol Description
Inline graphic The input spaces of the samples Inline graphic The raw input of the i-th sample
Inline graphic The class label spaces of the samples Inline graphic The class label of the i-th sample
Inline graphic The domain space of the dataset Inline graphic The domain label of the i-th sample
Inline graphic The class space of the dataset M The number of classes in the dataset
Inline graphic The mapped feature representation space of the samples Inline graphic The mapped feature representation of the i-th sample
K The number of domains in the dataset Inline graphic The kernel function
Inline graphic The d-th domain Inline graphic The class distribution of the d-th domain
Inline graphic The class label set of the d-th domain Inline graphic The domain distribution of the samples
W The sample weight space Inline graphic The sample weight of target domain sample Inline graphic
Inline graphic The loss function Inline graphic The weight mapping function
Inline graphic The feature mapping function (feature extractor) Inline graphic The classifier
Inline graphic The feature representation distributions Inline graphic,Inline graphic,Inline graphic The parameter vectors of Inline graphic, Inline graphic and Inline graphic
n The number of samples Inline graphic The domain weight of the i-th source domain
m The number of raw feature dimensions of the sample s The number of mapped feature dimensions of the sample

Definition 1

(Cross-Domain Long-Tailed Classification): Given labeled samples Inline graphic (for Inline graphic) from multiple source domains and unlabeled samples Inline graphic from the test domain Inline graphic, where Inline graphic and Inline graphic represent the number of samples in the d-th source domain Inline graphic and the test domain Inline graphic respectively, the purpose of cross-domain LT (CDLT) classification is to learn a representation mapping Inline graphic and a hypothesis function Inline graphic that together minimize the expected classification error over the instance Inline graphic in the test domain Inline graphic, where Inline graphic denotes the representation space:

graphic file with name d33e960.gif 1

where Inline graphic is the expected value, Inline graphic is the loss function, and Inline graphic and Inline graphic are the parameter vectors of the representation functions Inline graphic and the classifier h, respectively.

Methodology–representation learning with doubly balancing

In MDLT learning, addressing data imbalance is of primary importance. This imbalance arises not only from class label imbalance within individual domains but also from distributional differences among domains. To solve this problem, this paper introduces a CDLT classification algorithm that combines CB and RB techniques to produce data representations that are both class-unbiased and domain-balanced. CB is extensively employed in observational studies to construct balanced datasets by reweighting samples36. In our approach, CB is applied to mitigate covariate bias in the input feature space. However, since the aim is to learn domain-invariant representations, it is also necessary to address bias in the latent representation space. This is achieved via RB, which explicitly targets representation-level imbalance. Because of the noise introduced during the transformation of input features into latent representations, traditional CB methods alone may be insufficient to correct representation bias10. To overcome this limitation, our method jointly optimizes CB and RB through a reweighting strategy that simultaneously addresses covariate and representation biases.

Covariate balancing

According to the analysis by Yang et al.4 on the effect of divergent label distributions on transferable features, if the label distributions across domains are consistent, the model can effectively align similar classes from different domains. Based on this insight, the primary objective of our study is to ensure consistency in label distribution across domains. We interpret this form of domain shift as a covariate bias and apply a CB algorithm to align the label distributions between domains.

CB is often used in observational studies to reduce covariate bias36. The core idea is to reweight samples in one group using a set of weights Inline graphic to align its distribution with that of another group37. This technique learns a mapping function that projects target domain samples into a weight space, allowing for reweighting based on the learned weights. As a result, the data distributions of the target and source domains become more closely aligned. Let the mapping function be denoted as Inline graphic, parameterized by Inline graphic where Inline graphic. For a given target domain sample Inline graphic, its corresponding weight is computed as Inline graphic. These weights are then used to reweight the target domain samples, improving distributional consistency with the source domain. In our proposed algorithm, the objective function for the CB module is formulated as follows:

graphic file with name d33e1068.gif 2

with Inline graphic, where Inline graphic is the number of samples in the i-th domain, and Inline graphic is a vector of dimension-wise means computed over the i-th source domain, the parameter m is the dimensionality of the data. The condition Inline graphic is imposed to normalize the weights of the target domain samples such that they sum to one. Additionally, enforcing Inline graphic ensures that all sample weights are non-negative.

Solving Eq. (2) under these constraints generates a more balanced set of samples across domains, under the following assumption:

Assumption 1

Given a sample Inline graphic drawn from either a source or target domain, the following conditions are assumed to hold:

graphic file with name d33e1130.gif 3

Inline graphic, Inline graphic, where Inline graphic is the i-th source domain, Inline graphic is the target domain, Inline graphic, with Inline graphic where Inline graphic represents the class label set for the i-th domain, Inline graphic is the domain label, and Inline graphic is the class label of the sample. This assumption means that there exists a non-zero number of samples from different classes and domains in both the source and target domains. As such, it may limit the applicability of the method to open-set classification tasks, where unseen classes may appear in the target domain.

Representation balancing

Similar to CB, the basic idea of RB is to adjust the distribution of target domain representations by applying sample weights Inline graphic, such that the reweighted representation distribution of the target domain more closely aligns with that of the source domain.

Let Inline graphic, Inline graphic denote the distributions of the learned representations in the source and target domains, respectively, where the representations are induced by the mapping function Inline graphic on Inline graphic. To quantify the distance between Inline graphic and Inline graphic, we employ an integral probability metric (IPM)38. Based on the approach of Cheng et al.10, we adopt MMD with a multi-scale Gaussian kernel, owing to its informative nature and computational simplicity in measuring the distance between Inline graphic and Inline graphic. It can then be calculated in the equation below:

graphic file with name d33e1267.gif 4

where Inline graphic is a multi-scale Gaussian kernel defined as follows:

graphic file with name d33e1281.gif 5

We then formulate the RB objective as follows:

graphic file with name d33e1288.gif 6

Balanced domain-invariant representation

According to Definition 1, we define the model as a mapping function Inline graphic, which maps raw inputs to predicted labels. This model is decomposed into a feature extractor Inline graphic and a classifier Inline graphic. The final prediction is given by Inline graphic. Based on this architecture, the MDLT classifier is trained using the following objective function:

graphic file with name d33e1322.gif 7

where Inline graphic denotes the sample weight for the i-th domain, calculated similarly to Eq. (2), and Inline graphic is the loss function. The weights Inline graphic compensate for domain imbalance in the source data. A balanced cross-entropy loss is used, in which the prediction scores are adjusted by the logarithm of the number of samples in each class before calculating the cross-entropy.

To further address domain shift among multiple source domains and to improve the generalizability of the classifier, a penalty term is incorporated. This term captures the feature distribution differences between pairs of source domains and is computed as follows:

graphic file with name d33e1356.gif 8

where s denotes the dimensionality of the feature space after mapping, and Inline graphic (Inline graphic) represents the mapped feature representation of the original sample Inline graphic, as defined in Definition 1. Inline graphic and Inline graphic are the mean values of the p-th feature dimension in the i-th and j-th source domains, respectively. Inline graphic and Inline graphic are the covariances between the p-th and q-th dimensions in the i-th and j-th domains, respectively.

The overall objective function for CDLT classification is then formulated by incorporating Eqs. (2), (6) and (8) into the above classifier Eq. (7), we obtain:

graphic file with name d33e1445.gif 9

where Inline graphic, Inline graphic, and Inline graphic are parameters that jointly control the trade-off between prediction accuracy and the imbalance errors associated with CB and RB. By optimizing the sample weights Inline graphic through the joint application of CB and RB, the proposed method effectively mitigates both covariate shift and representation bias. The output prediction for a test domain sample is computed as Inline graphic, Inline graphic.

The model is trained by minimizing Eq. (9) using the Adam optimizer. The training procedure is outlined in Algorithm 1, which presents the pseudocode for the proposed model. Additionally, Fig. 2 visualizes the overall workflow of the method.

Algorithm 1.

Algorithm 1

Representation learning with doubly balancing (BRL)

Fig. 2.

Fig. 2

Workflow diagram of the proposed method.

Experiments

Datasets descriptions

Yang et al.4 proposed six benchmark datasets for MDLT learning: VLCS-MLT, PACS-MLT, OfficeHome-MLT, TerraInc-MLT, DomainNet-MLT, and Digits-MLT. Among these, Digits-MLT is a synthetic dataset created by combining two digit datasets, while the remaining five are real-world multi-domain datasets widely used in domain generalization research. Following their setup, we use these datasets to assess the performance of our proposed method. Table 2 provides detailed information on the datasets used in the experiments, and Fig. 3 displays the class distributions. In the figure, class labels are sorted in descending order based on the number of samples per class to clearly demonstrate the LT characteristics of each dataset. It should be noted that the class and domain labels in the experimental data have been reindexed based on ascending alphabetical order, and the indices correspond to the original labels in the respective datasets.

Table 2.

Dataset details.

Dataset (size) #Class Domains Train set #Sample Validation set #Sample Test set #Sample
Min class Max class Total Min class Max class Total Min class Max class Total
Digits-MLT (3, 28, 28) 10 MNIST-M 10 1000 2478 800 800 8000 800 800 8000
SVHN 10 1000 2478 800 800 8000 800 800 8000
VLCS-MLT (3, 224, 224) 5 Caltech101 22 825 1190 15 15 75 30 30 150
LabelMe 0 1192 2434 14 15 74 28 30 148
SUN09 0 1219 3097 6 15 61 14 30 124
VOC2007 285 1454 3151 15 15 75 30 30 150
PACS-MLT (3, 224, 224) 7 Art painting 109 374 1523 25 25 275 50 50 350
Cartoon 60 382 1819 25 25 175 50 50 350
Photo 107 357 1145 25 25 175 50 50 350
Sketch 5 741 3404 25 25 175 50 50 350
TerraInc-MLT (3, 224, 224) 10 Location-100 0 2444 4459 4 10 94 8 20 188
Location-38 0 4455 9481 1 10 85 2 20 170
Location-43 0 1063 3717 1 10 84 2 20 169
Location 46 0 1370 5612 4 10 90 8 20 181
OfficeHome-MLT (3, 224, 224) 65 Art 0 84 1452 5 5 325 10 10 650
Clipart 24 84 3390 5 5 325 10 10 650
Product 23 84 3464 5 5 325 10 10 650
Real World 8 84 3382 5 5 325 10 10 650
DomainNet-MLT (3, 224, 224) 345 Clipart 0 407 28603 4 20 6490 8 40 13036
Infographic 0 710 33067 3 20 6140 8 40 12398
Painting 0 778 53596 0 20 6198 2 40 12472
Quickdraw 440 440 151800 20 20 6900 40 40 13800
Real 0 755 152311 10 20 6878 21 40 13758
Sketch 0 654 49197 4 20 6634 9 40 13297

Fig. 3.

Fig. 3

Distribution of training, validation, and test data for each MDLT dataset.

Experimental cettings

Setup for model training

All models were implemented using PyTorch and trained on an NVIDIA RTX A4000 GPU. Following the setup of Yang et al.4, we used the same CNN architecture for the Digits-MLT dataset and used ResNet-50 as the backbone network for the remaining five datasets. To ensure a fair comparison, we evaluated the following groups of learning solutions: (1) domain-invariant feature learning solutions, including IRM7, DANN13, CDANN39, CORAL35, MMD40; (2) imbalanced learning solutions, such as Focal14, CBLoss41, LDAM42, BSoftmax15, CRT17, BoDA4; and (3) other baseline, including ERM43, GroupDRO44, Mixup45, SagNet46, MLDG47, MTL48, Fish49. In addition, we evaluated the two-stage training procedure of our model by following the protocol outlined in the work of Yang et al.4. The implementation of our algorithm is based on the codebase provided by Yang et al.4, and their optimal parameter settings were used for all comparison algorithms.

Evaluation setup

Two widely used evaluation metrics for imbalanced learning—top-1 overall accuracy and the F1 score across all classes—were adopted. In addition, we computed accuracy for four disjoint class subsets: many-shot classes (more than 100 samples), medium-shot classes (20~100 samples), few-shot classes (fewer than 20 samples), and zero-shot classes (no training samples). Unlike Yang et al., who selected the best-performing model based on accuracy, we selected the best model during training according to the F1 score for all algorithms. This is because the F1 score provides a more comprehensive evaluation, particularly in the context of imbalanced datasets.

Results and analysis

Quantitative results

For the MDLT image classification task, we conducted experiments on all selected benchmark datasets, with results presented in Tables 3, 4, 5, 6, 7, 8 and 9. It is important to note that, in this study, we did not perform dataset-specific tuning of the parameters Inline graphic, Inline graphic, and Inline graphic. Instead, we adopted a fixed set of values—Inline graphic, Inline graphic, and 10, respectively—to identify a parameter configuration that provides better average performance across different datasets, as supported by the experiments discussed in the Parameter Analysis section. As a result, our method may not yield optimal performance on certain datasets (e.g., VLCS-MLT), as shown in Tables 3, 4, 5, 6, 7 and 8. However, subsequent analysis demonstrates that these parameters significantly affect performance depending on the dataset. For example, in the case of Digits-MLT, when Inline graphic is held constant at 10 and Inline graphic, Inline graphic, both the average F1 score and accuracy improve to 0.686 and 0.688, respectively, while the lowest values also remain relatively high at 0.644 and 0.646.

Table 3.

Results on Digits-MLT.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.079 0.031 0.147 0.101 0.278 0.003 0.036 − 1
DANN 0.646 0.612 0.649 0.614 0.761 0.557 0.508 − 1
CDANN 0.584 0.511 0.596 0.526 0.784 0.476 0.307 − 1
CORAL 0.605 0.538 0.622 0.562 0.815 0.517 0.298 − 1
MMD 0.029 0.028 0.094 0.088 0.080 0.000 0.269 − 1
Focal 0.506 0.423 0.537 0.466 0.774 0.362 0.207 − 1
CBLoss 0.488 0.359 0.515 0.389 0.715 0.422 0.151 − 1
LDAM 0.546 0.400 0.577 0.446 0.792 0.472 0.197 − 1
Bsoftmax 0.595 0.485 0.604 0.501 0.718 0.577 0.359 − 1
CRT(2-stage training) 0.531 0.451 0.562 0.490 0.787 0.448 0.171 − 1
BoDA 0.556 0.458 0.583 0.495 0.814 0.439 0.222 − 1
BoDA(2-stage training) 0.601 0.539 0.617 0.554 0.810 0.504 0.303 − 1
ERM 0.479 0.372 0.524 0.433 0.787 0.360 0.111 − 1
GroupDRO 0.474 0.444 0.522 0.495 0.806 0.342 0.084 − 1
Mixup 0.571 0.454 0.593 0.487 0.804 0.476 0.240 − 1
SagNet 0.565 0.521 0.594 0.552 0.837 0.472 0.172 − 1
MLDG 0.485 0.385 0.530 0.438 0.789 0.392 0.088 − 1
MTL 0.479 0.362 0.522 0.415 0.782 0.368 0.102 − 1
Fish 0.449 0.368 0.498 0.421 0.771 0.324 0.075 − 1
BRL(ours) 0.672 0.630 0.675 0.635 0.781 0.608 0.511 − 1
BRL(2-stage training) 0.674 0.633 0.679 0.639 0.791 0.620 0.488 − 1
Table 4.

Results on VLCS-MLT.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.758 0.512 0.765 0.527 0.842 0.760 − 1 0.395
DANN 0.710 0.413 0.726 0.453 0.814 0.713 − 1 0.307
CDANN 0.726 0.477 0.743 0.493 0.828 0.740 − 1 0.302
CORAL 0.779 0.559 0.788 0.568 0.867 0.787 − 1 0.390
MMD 0.778 0.554 0.788 0.554 0.861 0.813 − 1 0.369
Focal 0.785 0.589 0.794 0.601 0.867 0.813 − 1 0.390
CBLoss 0.785 0.557 0.794 0.568 0.864 0.807 − 1 0.414
LDAM 0.774 0.558 0.781 0.554 0.861 0.787 − 1 0.362
Bsoftmax 0.818 0.647 0.824 0.655 0.867 0.873 − 1 0.524
CRT (2-stage training) 0.793 0.567 0.803 0.574 0.881 0.800 − 1 0.407
BoDA 0.779 0.578 0.789 0.595 0.869 0.800 − 1 0.362
BoDA (2-stage training) 0.804 0.629 0.812 0.635 0.881 0.833 − 1 0.426
ERM 0.775 0.530 0.785 0.547 0.869 0.760 − 1 0.407
GroupDRO 0.781 0.549 0.790 0.568 0.867 0.787 − 1 0.419
Mixup 0.769 0.522 0.778 0.520 0.861 0.780 − 1 0.350
SagNet 0.766 0.542 0.771 0.547 0.861 0.747 − 1 0.374
MLDG 0.777 0.524 0.782 0.520 0.864 0.753 − 1 0.419
MTL 0.776 0.543 0.782 0.554 0.861 0.787 − 1 0.383
Fish 0.774 0.549 0.782 0.554 0.872 0.767 − 1 0.362
BRL (ours) 0.813 0.641 0.809 0.635 0.847 0.840 − 1 0.581
BRL (2-stage training) 0.807 0.619 0.803 0.615 0.850 0.820 − 1 0.564
Table 5.

Results on PACS-MLT.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.964 0.949 0.964 0.949 0.961 0.980 1.000 − 1
DANN 0.930 0.902 0.928 0.897 0.922 0.980 0.960 − 1
CDANN 0.927 0.908 0.926 0.909 0.924 0.950 0.920 − 1
CORAL 0.984 0.977 0.984 0.977 0.982 1.000 1.000 − 1
MMD 0.975 0.960 0.975 0.960 0.974 0.990 0.960 − 1
Focal 0.981 0.963 0.981 0.963 0.981 0.980 0.980 − 1
CBLoss 0.979 0.971 0.979 0.971 0.978 1.000 0.980 − 1
LDAM 0.979 0.971 0.979 0.971 0.978 1.000 0.940 − 1
Bsoftmax 0.980 0.966 0.979 0.966 0.978 1.000 0.980 − 1
CRT(2-stage training) 0.984 0.972 0.984 0.971 0.984 0.980 0.980 − 1
BoDA 0.979 0.969 0.979 0.969 0.978 0.990 1.000 − 1
BoDA(2-stage training) 0.982 0.974 0.982 0.974 0.980 1.000 1.000 − 1
ERM 0.981 0.966 0.981 0.966 0.982 0.980 0.960 − 1
GroupDRO 0.982 0.972 0.981 0.971 0.981 0.980 1.000 − 1
Mixup 0.981 0.963 0.981 0.963 0.980 0.990 0.980 − 1
SagNet 0.976 0.963 0.976 0.963 0.974 1.000 1.000 − 1
MLDG 0.982 0.977 0.982 0.977 0.980 1.000 1.000 − 1
MTL 0.980 0.963 0.980 0.963 0.979 0.990 0.980 − 1
Fish 0.979 0.969 0.979 0.969 0.980 0.980 0.960 − 1
BRL(ours) 0.979 0.968 0.979 0.969 0.977 1.000 1.000 − 1
BRL(2-stage training) 0.982 0.977 0.982 0.977 0.980 1.000 1.000 − 1
Table 6.

Results on TerraInc-MLT.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.486 0.421 0.586 0.530 0.752 0.237 0.183 0.073
DANN 0.434 0.379 0.527 0.464 0.646 0.275 0.133 0.227
CDANN 0.367 0.326 0.472 0.414 0.616 0.037 0.217 0.086
CORAL 0.682 0.587 0.776 0.685 0.872 0.738 0.717 0.136
MMD 0.698 0.562 0.792 0.696 0.878 0.875 0.750 0.156
Focal 0.672 0.587 0.778 0.707 0.870 0.863 0.667 0.073
CBLoss 0.696 0.559 0.776 0.657 0.852 0.812 0.750 0.254
LDAM 0.690 0.583 0.786 0.680 0.880 0.800 0.700 0.131
Bsoftmax 0.761 0.687 0.823 0.762 0.862 0.913 0.933 0.343
CRT (2-stage training) 0.737 0.622 0.841 0.724 0.910 0.950 0.900 0.122
BoDA 0.695 0.597 0.792 0.696 0.888 0.762 0.767 0.104
BoDA (2-stage training) 0.729 0.626 0.824 0.735 0.892 0.887 0.900 0.151
ERM 0.675 0.593 0.778 0.696 0.884 0.800 0.600 0.098
GroupDRO 0.602 0.551 0.705 0.596 0.820 0.700 0.367 0.104
Mixup 0.656 0.561 0.737 0.669 0.860 0.650 0.500 0.167
SagNet 0.682 0.586 0.780 0.707 0.888 0.775 0.533 0.135
MLDG 0.687 0.567 0.790 0.680 0.886 0.800 0.733 0.089
MTL 0.674 0.583 0.780 0.696 0.880 0.775 0.700 0.080
Fish 0.691 0.583 0.802 0.696 0.888 0.863 0.800 0.064
BRL (ours) 0.780 0.712 0.824 0.768 0.872 0.862 0.817 0.430
BRL (2-stage training) 0.802 0.713 0.849 0.773 0.892 0.913 0.900 0.423
Table 7.

Results on OfficeHome-MLT.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.755 0.686 0.757 0.685 0.826 0.764 0.568 0.550
DANN 0.795 0.710 0.798 0.717 0.856 0.805 0.642 0.550
CDANN 0.785 0.707 0.787 0.711 0.836 0.792 0.655 0.650
CORAL 0.838 0.780 0.840 0.783 0.885 0.853 0.684 0.600
MMD 0.008 0.003 0.010 0.005 0.008 0.012 0.003 0.100
Focal 0.816 0.742 0.818 0.743 0.868 0.828 0.668 0.500
CBLoss 0.825 0.746 0.826 0.749 0.874 0.845 0.632 0.600
LDAM 0.817 0.718 0.817 0.720 0.873 0.827 0.658 0.550
Bsoftmax 0.828 0.759 0.828 0.762 0.875 0.837 0.687 0.650
CRT (2-stage training) 0.836 0.770 0.838 0.775 0.889 0.849 0.677 0.600
BoDA 0.839 0.766 0.842 0.772 0.899 0.851 0.681 0.600
BoDA(2-stage training) 0.845 0.775 0.847 0.778 0.899 0.858 0.687 0.600
ERM 0.818 0.757 0.821 0.765 0.878 0.832 0.642 0.700
GroupDRO 0.818 0.730 0.821 0.734 0.882 0.830 0.645 0.600
Mixup 0.846 0.779 0.848 0.785 0.893 0.860 0.694 0.650
SagNet 0.826 0.744 0.831 0.757 0.896 0.841 0.642 0.600
MLDG 0.824 0.748 0.826 0.752 0.892 0.830 0.665 0.600
MTL 0.820 0.746 0.820 0.743 0.873 0.829 0.658 0.650
Fish 0.822 0.736 0.825 0.745 0.893 0.836 0.616 0.700
BRL (ours) 0.832 0.776 0.833 0.775 0.879 0.837 0.716 0.600
BRL (2-stage training) 0.837 0.784 0.837 0.783 0.881 0.842 0.723 0.600
Table 8.

Results on DomainNet-MLT.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.136 0.032 0.169 0.057 0.205 0.128 0.061 0.063
DANN 0.538 0.311 0.551 0.305 0.601 0.540 0.389 0.318
CDANN 0.554 0.327 0.565 0.318 0.623 0.543 0.394 0.325
CORAL 0.597 0.340 0.606 0.326 0.672 0.587 0.407 0.320
MMD 0.002 0.001 0.003 0.002 0.003 0.003 0.002 0.003
Focal 0.578 0.302 0.587 0.293 0.657 0.572 0.369 0.279
CBLoss 0.588 0.318 0.599 0.321 0.651 0.618 0.439 0.298
LDAM 0.586 0.307 0.597 0.301 0.666 0.582 0.389 0.288
Bsoftmax 0.605 0.344 0.613 0.340 0.658 0.615 0.482 0.386
CRT (2-stage training) 0.617 0.340 0.629 0.346 0.685 0.647 0.457 0.313
BoDA 0.597 0.348 0.607 0.337 0.669 0.592 0.413 0.336
BoDA (2-stage training) 0.620 0.366 0.632 0.365 0.683 0.640 0.492 0.350
ERM 0.586 0.307 0.596 0.300 0.666 0.579 0.383 0.283
GroupDRO 0.536 0.297 0.550 0.302 0.622 0.521 0.357 0.239
Mixup 0.583 0.329 0.595 0.323 0.660 0.578 0.401 0.306
SagNet 0.588 0.312 0.598 0.305 0.668 0.581 0.373 0.291
MLDG 0.589 0.314 0.600 0.310 0.667 0.583 0.392 0.298
MTL 0.584 0.311 0.593 0.306 0.665 0.571 0.373 0.288
Fish 0.588 0.317 0.600 0.313 0.668 0.583 0.397 0.291
BRL (ours) 0.569 0.361 0.583 0.362 0.611 0.614 0.483 0.377
BRL (2-stage training) 0.588 0.370 0.602 0.372 0.636 0.631 0.504 0.365
Table 9.

Results over all MDLT benchmarks.

Algorithm F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
IRM 0.530 0.439 0.565 0.475 0.644 0.479 0.370 0.270
DANN 0.676 0.555 0.697 0.575 0.767 0.645 0.526 0.351
CDANN 0.657 0.543 0.682 0.562 0.769 0.590 0.499 0.341
CORAL 0.748 0.630 0.769 0.650 0.849 0.747 0.621 0.362
MMD 0.415 0.351 0.444 0.384 0.467 0.449 0.397 0.157
Focal 0.723 0.601 0.749 0.629 0.836 0.736 0.578 0.311
CBLoss 0.727 0.585 0.748 0.609 0.822 0.751 0.590 0.392
LDAM 0.732 0.590 0.756 0.612 0.842 0.745 0.577 0.333
Bsoftmax 0.765 0.648 0.779 0.664 0.826 0.803 0.688 0.476
CRT (2-stage training) 0.750 0.620 0.776 0.647 0.856 0.779 0.637 0.361
BoDA 0.741 0.619 0.765 0.644 0.853 0.739 0.617 0.351
BoDA (2-stage training) 0.764 0.652 0.786 0.674 0.858 0.787 0.676 0.382
ERM 0.719 0.588 0.748 0.618 0.844 0.719 0.539 0.372
GroupDRO 0.699 0.591 0.728 0.611 0.830 0.693 0.491 0.341
Mixup 0.734 0.601 0.755 0.625 0.843 0.722 0.563 0.368
SagNet 0.734 0.611 0.758 0.639 0.854 0.736 0.544 0.350
MLDG 0.724 0.586 0.752 0.613 0.846 0.726 0.576 0.352
MTL 0.719 0.585 0.746 0.613 0.840 0.720 0.563 0.350
Fish 0.717 0.587 0.748 0.616 0.845 0.726 0.570 0.354
BRL (ours) 0.774 0.681 0.784 0.691 0.828 0.794 0.705 0.497
BRL (2-stage training) 0.782 0.683 0.792 0.693 0.838 0.804 0.723 0.488

Overall, the proposed model performs well across most evaluation measures, especially on the “worst” measure (i.e., performance on the most difficult classes), as shown in Table 9. This robustness is especially evident in the Digits-MLT and TerraInc-MLT datasets, where the model demonstrates strong recognition capabilities for hard-to-classify categories. Furthermore, the two-stage training approach outperforms single-stage training in most cases. However, in scenarios involving few-shot or zero-shot classes, two-stage training may suffer from overfitting due to the limited number of samples, leading to slightly lower results. Nevertheless, our method consistently surpasses the performance of baseline algorithms. Figure 4 displays a heatmap of the macro F1 scores per class and domain for each dataset (excluding DomainNet due to its large number of classes—345—which limits visual clarity). Darker cells indicate lower F1 scores, highlighting domain-class combinations that are more difficult to recognize.

Fig. 4.

Fig. 4

F1 score heatmap of the proposed model on different datasets.

To further assess the stability of the algorithm’s performance, we conducted five independent runs for each method using distinct random seeds, resulting in five sets of experimental outcomes per method. We employed analysis of variance (ANOVA) to calculate p-values across different performance metrics. As shown in Table 10, the ANOVA results for the Digits-MLT dataset indicate that the p-values for all evaluated metrics are substantially below standard significance thresholds (e.g., 0.05), thereby confirming that the observed performance differences among the compared algorithms are statistically significant. To visualize the distribution of F1 scores, violin plots were generated (Fig. 5), depicting the spread of both average and worst-case F1 scores across algorithms. These plots show that the proposed method achieves F1 scores that are highly concentrated in the upper range, with a narrow gap between the average and worst-case values and relatively low variance, suggesting both superior and stable performance.

Table 10.

ANOVA results on Digits-MLT.

Measure F1-Score (by domain) Accuracy (by domain) Accuracy (by shot)
Average Worst Average Worst Many Medium Few Zero
PR(>F) 2.464e− 19 2.35e− 16 5.076e− 20 2.401e− 16 6.374e− 24 3.038e− 18 6.709e− 18 NaN
Fig. 5.

Fig. 5

Distribution of F1 score for different algorithms on Digits-MLT.

In summary, while the proposed model may not be the top performer on every dataset, it remains close to optimal in most cases. Moreover, it exhibits a smaller discrepancy between “average” and “worst-case” results compared to other algorithms, indicating greater stability. These findings support the effectiveness of the proposed model in the MDLT image classification task, especially in improving recognition performance for difficult classes.

Qualitative analysis via visualization

To further analyze the model’s capabilities, we performed a t-SNE visualization of the test set features after training, using both class labels and domain labels for interpretation. Figure 6 illustrates the t-SNE results for the Digits-MLT test set. For each model, the upper row presents the t-SNE plot colored by class labels, while the lower row shows the same features colored by domain labels. From the t-SNE plots-based visualizations, the feature clustering produced by our model closely resembles that of DANN and BoDA, which aligns with the quantitative results in Table 3. However, the domain-based visualization reveals significant differences. In the context of domain-invariant representation learning, ideal feature representations should minimize differences between domains. Thus, better alignment of feature distributions across domains reflects stronger generalization and reduced domain-specific bias. As shown in Fig. 6, our model demonstrates greater overlap between domains, with data points from domain A aligning closely with those from domain B. In this way, the phenomenon of shortcut learning—where classification is based on domain labels—is effectively mitigated.

Fig. 6.

Fig. 6

t-SNE of Digits-MLT test set on different model.

Parameter analysis

The proposed method incorporates three key parameters—Inline graphic, Inline graphic, and Inline graphic—which control the balance between classification accuracy and the correction of imbalance-related errors. In our main experiments, these parameters were consistently set to Inline graphic, Inline graphic, Inline graphic. To analyze the effect of each parameter on model performance, a series of experiments were performed using a range of values for each parameter. Specifically, Inline graphic, Inline graphic, and Inline graphic were set to Inline graphic, with one parameter varied at a time and the other two fixed. Figure 7 shows the results on the Digits-MLT dataset, reporting average and worst-case F1 scores and accuracy. In the figure, solid lines represent average values, dashed lines represent the worst-case values, colors differentiate the parameters, and the shaded area between lines of the same color indicates the difference between average and worst-case performance under different parameter values. The results show that the model maintains robust performance as Inline graphic increases, indicating its resilience when emphasizing the CB component. The best test performance is achieved when Inline graphic is set at approximately 0.01; however, further increases in Inline graphic lead to a rapid decline in performance, likely due to overcorrection in the RB component. Within a certain range, model performance exhibited minor waves as the Inline graphic value increased; however, when Inline graphic became excessively large, such as greater than 100, classification performance declined significantly.

Fig. 7.

Fig. 7

Single-parameter analysis results under different parameter values on Digits-MLT.

During the single-parameter analysis on the Digits-MLT dataset, it was observed that the proposed algorithm achieved relatively high F1 scores and accuracy when the hyperparameters were set to Inline graphic, Inline graphic, and Inline graphic. To further investigate the convergence behavior of the algorithm under this setting, both the loss values and F1 scores were recorded at intervals of 100 iterations over a total of 5000 training iterations. The results are illustrated in Fig. 8. As shown in Fig. 8, the loss value consistently decreased with the number of iterations, indicating progressive convergence. Concurrently, the F1 score showed a steady upward trend, improving as the loss decreased, and eventually stabilized, demonstrating a favorable convergence trend.

Fig. 8.

Fig. 8

Iterative training results on Digits-MLT.

Based on the parameter analysis, it is evident that relatively small values for the hyperparameters tend to result in higher F1 scores and accuracy in practical applications. This implies that during model optimization, priority should be given to exploring smaller parameter ranges to enhance performance more effectively.

Ablation studies

To assess the contribution of individual components within the proposed model, ablation studies were conducted by selectively enabling or disabling key modules: the balanced cross-entropy loss (BCE), CB component, RB component, and cross-domain penalty term (Penalty). The impact of each component on model performance was assessed using the benchmark Digits-MLT dataset, with results summarized in Table 11. The results show that the inclusion of all components (as in experiment number 8) yields the best overall performance, with an average F1 score of 0.672 and a worst-case F1 score of 0.630. These findings indicate that the full combination of these components provides the most robust and consistent improvements in the performance of the model.

Table 11.

Ablation studies on Digits-MLT.

No. Inline graphic CB RB Penalty F1-Score (by domain) Accuracy (by domain)
Average Worst Average Worst
1 Inline graphic 0.595 0.485 0.604 0.501
2 Inline graphic Inline graphic 0.661 0.618 0.664 0.624
3 Inline graphic Inline graphic Inline graphic 0.661 0.618 0.664 0.624
4 Inline graphic Inline graphic Inline graphic 0.645 0.596 0.649 0.602
5 Inline graphic Inline graphic 0.595 0.485 0.604 0.501
6 Inline graphic Inline graphic 0.633 0.554 0.640 0.562
7 Inline graphic Inline graphic Inline graphic 0.648 0.580 0.659 0.594
8 Inline graphic Inline graphic Inline graphic Inline graphic 0.672 0.630 0.675 0.635

Discussion

Our paper presents a comprehensive evaluation of the BRL algorithm across six datasets, including both synthetic and real-world MDLT scenarios. The algorithm achieved an average accuracy of 79.2% and a worst-case accuracy of 69.3%, highlighting its robustness across varying conditions. Significantly, BRL outperformed the state-of-the-art BoDA method by 1.9% in worst-case accuracy and improved significantly by 7.5% over the widely used ERM baseline. In this context, ERM refers to a classical training paradigm in machine learning. Its core principle is to treat the training data as samples drawn from the true data distribution and to optimize model parameters by minimizing the loss function over this dataset, thereby reducing prediction error on the training samples. However, ERM exhibits inherent limitations: it is highly sensitive to distributional shifts, prone to overfitting during training, and often demonstrates poor generalization to data outside the training distribution. Despite these drawbacks, ERM remains a frequently employed benchmark in academic research due to its clear theoretical foundation and ease of implementation. For these reasons, ERM was included in our study as a baseline to highlight the effectiveness of the proposed approach. The performance gains achieved by BRL demonstrate the effectiveness of its reweighting-based balanced representation learning method, which incorporates both CB and RB to effectively address class and domain imbalance in MDLT image recognition.

The ability of BRL to learn domain- and class-unbiased representations through alignment in both the input and latent feature space is central to its success. Despite these strengths, there remain opportunities for further research. For example, future research could focus on optimizing the alignment strategies to achieve higher accuracy, especially on complex or open-set datasets. Additionally, exploring the adaptability of the BRL algorithm to different data characteristics across scientific applications will be critical to ensuring its broader applicability and practical adoption.

Conclusions and future work

In conclusion, this study addressed the challenge of data and domain imbalance in MDLT image recognition by introducing the BRL algorithm. Extensive experimental validation on six benchmark datasets demonstrated the effectiveness of BRL. By learning more balanced feature representations, BRL improved classification performance, particularly for the most difficult-to-identify classes. Beyond computer vision, the BRL algorithm showed considerable potential for broader scientific applications, including species classification in biodiversity research and disease diagnosis in biomedical imaging. These results highlighted the promise of the proposed algorithm for image recognition tasks and its potential utility in data classification across various scientific disciplines.

However, despite these encouraging results, several limitations remained to be addressed. First, the method relied on three key hyperparameters (Inline graphic, Inline graphic, and Inline graphic), and the experimental results demonstrated that model performance is sensitive to their selection. Developing an adaptive or automated parameter optimization strategy represents an important direction for future research. Second, the current framework incorporates feature information from the test data during training and assumes that all classes present in the test set have been seen during training. This closed-set assumption restricts the model’s ability to generalize to unseen classes and domains. Consequently, enhancing the model’s adaptability and robustness under open-domain settings—where novel classes and distributions may be encountered—remains a critical area for future investigation.

Acknowledgements

We would like to express our sincere appreciation to our research team for their valuable contributions to this work. Their insights, feedback, and support played a crucial role in shaping the content and improving the overall quality of the manuscript. We also extend our gratitude to the researchers, authors, and publishers of the studies cited in this paper for their significant contributions to the field.

Author contributions

All authors contributed to the conception and design of the study. P.F. contributed to the study by designing and conducting the experiments, as well as performing the data analysis. N.I.R.R. organized and revised the manuscript. J.W. was in charge of data collection and preprocessing. All authors reviewed and approved the final version of the manuscript.

Funding

This work was supported in part by the Ministry of Education Industry-University Cooperation and Collaborative Education Project under Grant 202002165017, and in part by the Scientific Research Project of Higher Education Institutions in Anhui under Grant 2024AH051814.

Data availability

The experimental data used in this study can be accessed through the codebase provided by Yang et al. We thank the authors for making their work publicly available and for their contributions to this area of research. The data is also available upon request from the corresponding author.

Code availability

The code used in this study is available upon request from the corresponding author.

Declarations

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).
  • 2.Shao, L., Zhu, F. & Li, X. Transfer learning for visual categorization: A survey. IEEE Trans. Neural Netw. Learn. Syst.26, 1019–1034 (2014). [DOI] [PubMed] [Google Scholar]
  • 3.Zhang, Y., Kang, B., Hooi, B., Yan, S. & Feng, J. Deep long-tailed learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell.45, 10795–10816 (2023). [DOI] [PubMed] [Google Scholar]
  • 4.Yang, Y., Wang, H. & Katabi, D. On multi-domain long-tailed recognition, imbalanced domain generalization and beyond. In European Conference on Computer Vision, 57–75 (Springer, 2022).
  • 5.Koh, P. W. et al. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 5637–5664 (PMLR, 2021).
  • 6.Gu, X. et al. Tackling long-tailed category distribution under domain shifts. In European Conference on Computer Vision, 727–743 (Springer, 2022).
  • 7.Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).
  • 8.Shen, J., Qu, Y., Zhang, W. & Yu, Y. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).
  • 9.Yang, X., Yao, H., Zhou, A. & Finn, C. Multi-domain long-tailed learning by augmenting disentangled representations. arXiv preprint arXiv:2210.14358 (2022).
  • 10.Cheng, L., Guo, R., Candan, K. S. & Liu, H. Representation learning for imbalanced cross-domain classification. In Proceedings of the 2020 SIAM International Conference on Data Mining, 478–486 (SIAM, 2020).
  • 11.Buda, M., Maki, A. & Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw.106, 249–259 (2018). [DOI] [PubMed] [Google Scholar]
  • 12.Wang, Y.-X., Ramanan, D. & Hebert, M. Learning to model the tail. In Advances in Neural Information Processing Systems, vol. 30 (2017).
  • 13.Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res.17, 1–35 (2016). [Google Scholar]
  • 14.Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2980–2988 (2017).
  • 15.Ren, J. et al. Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst.33, 4175–4186 (2020). [Google Scholar]
  • 16.Hong, Y. et al. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6626–6636 (2021).
  • 17.Kang, B. et al. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019).
  • 18.Tang, K., Huang, J. & Zhang, H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv. Neural. Inf. Process. Syst.33, 1513–1524 (2020). [Google Scholar]
  • 19.Wang, P., Han, K., Wei, X.-S., Zhang, L. & Wang, L. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 943–952 (2021).
  • 20.Iscen, A., Araujo, A., Gong, B. & Schmid, C. Class-balanced distillation for long-tailed visual recognition. arXiv preprint arXiv:2104.05279 (2021).
  • 21.Wang, X., Lian, L., Miao, Z., Liu, Z. & Yu, S. X. Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020).
  • 22.Jamal, M. A., Brown, M., Yang, M.-H., Wang, L. & Gong, B. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619 (2020).
  • 23.Li, S. et al. Metasaug: Meta semantic augmentation for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5212–5221 (2021).
  • 24.Ding, Y. et al. Deep imbalanced domain adaptation for transfer learning fault diagnosis of bearings under multiple working conditions. Reliabil. Eng. Syst. Saf.230, 108890 (2023). [Google Scholar]
  • 25.Xia, H., Jing, T. & Ding, Z. Generative inference network for imbalanced domain generalization. IEEE Trans. Image Process.32, 1694–1704 (2023). [DOI] [PubMed] [Google Scholar]
  • 26.Le-Khac, P. H., Healy, G. & Smeaton, A. F. Contrastive representation learning: A framework and review. IEEE Access8, 193907–193934 (2020). [Google Scholar]
  • 27.Duan, L., Xu, D. & Tsang, I.W.-H. Domain adaptation from multiple sources: A domain-dependent regularization approach. IEEE Trans. Neural Netw. Learn. Syst.23, 504–518 (2012). [DOI] [PubMed] [Google Scholar]
  • 28.Jhuo, I.-H., Liu, D., Lee, D. & Chang, S.-F. Robust visual domain adaptation with low-rank reconstruction. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2168–2175 (IEEE, 2012).
  • 29.Liu, H., Shao, M. & Fu, Y. Structure-preserved multi-source domain adaptation. In 2016 IEEE 16th International Conference on Data Mining (ICDM), 1059–1064 (IEEE, 2016).
  • 30.Kandemir, M. Asymmetric transfer learning with deep gaussian processes. In International Conference on Machine Learning, 730–738 (PMLR, 2015).
  • 31.Courty, N., Flamary, R., Tuia, D. & Rakotomamonjy, A. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell.39, 1853–1865 (2016). [DOI] [PubMed] [Google Scholar]
  • 32.Tzeng, E., Hoffman, J., Zhang, N., Saenko, K. & Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).
  • 33.Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016).
  • 34.Sun, B., Feng, J. & Saenko, K. Correlation alignment for unsupervised domain adaptation. Domain Adaptation in Computer Vision Applications 153–171 (2017).
  • 35.Sun, B. & Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision—ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part III 14, 443–450 (Springer, 2016).
  • 36.Hainmueller, J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal.20, 25–46 (2012). [Google Scholar]
  • 37.Kuang, K., Cui, P., Li, B., Jiang, M. & Yang, S. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 265–274 (2017).
  • 38.Müller, A. Integral probability metrics and their generating classes of functions. Adv. Appl. Probab.29, 429–443 (1997). [Google Scholar]
  • 39.Li, Y., Gong, M., Tian, X., Liu, T. & Tao, D. Domain generalization via conditional invariant representations. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).
  • 40.Li, H., Pan, S. J., Wang, S. & Kot, A. C. Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5400–5409 (2018).
  • 41.Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019).
  • 42.Cao, K., Wei, C., Gaidon, A., Arechiga, N. & Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst.32 (2019).
  • 43.Vapnik, V. N. An overview of statistical learning theory. IEEE Trans. Neural Netw.10, 988–999 (1999). [DOI] [PubMed] [Google Scholar]
  • 44.Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731 (2019).
  • 45.Xu, M. et al. Adversarial domain adaptation with domain mixup. Proc. AAAI Conf. Artif. Intell.34, 6502–6509 (2020). [Google Scholar]
  • 46.Nam, H., Lee, H., Park, J., Yoon, W. & Yoo, D. Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8690–8699 (2021).
  • 47.Li, D., Yang, Y., Song, Y.-Z. & Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).
  • 48.Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G. & Scott, C. Domain generalization by marginal transfer learning. J. Mach. Learn. Res.22, 1–55 (2021). [Google Scholar]
  • 49.Shi, Y. et al. Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937 (2021).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The experimental data used in this study can be accessed through the codebase provided by Yang et al. We thank the authors for making their work publicly available and for their contributions to this area of research. The data is also available upon request from the corresponding author.

The code used in this study is available upon request from the corresponding author.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES