Reweighting balanced representation learning for long tailed image recognition in multiple domains

Panpan Fu; Nur Intan Raihana Ruhaiyem; Jiangtao Wang

doi:10.1038/s41598-025-03459-w

. 2025 Jul 4;15:23948. doi: 10.1038/s41598-025-03459-w

Reweighting balanced representation learning for long tailed image recognition in multiple domains

Panpan Fu ^1,², Nur Intan Raihana Ruhaiyem ^2,^✉, Jiangtao Wang ²

PMCID: PMC12227607 PMID: 40615406

Abstract

In multi-domain long-tailed learning, data imbalance appears in two ways: within-domain class imbalance and across-domain sample proportion variation. These imbalances introduce biases in covariates and representations when learning domain-invariant features in both input and latent spaces. This paper applies an advanced reweighting balanced representation learning (BRL) algorithm to multi-domain long-tailed image recognition. By integrating covariate and representation balancing techniques into a reweighting-based class balancing approach, BRL effectively addresses these biases. Extensive evaluation on six benchmark datasets confirms its ability to extract domain- and class-unbiased feature representations, leading to excellent classifier performance, especially for the hardest classes. This approach also shows potential for applications in areas such as environmental monitoring and medical imaging, providing a robust solution with broad scientific implications.

Keywords: Multi-domain, Long-tailed, Representation learning, Re-weighting

Subject terms: Biotechnology, Computational biology and bioinformatics, Plant sciences, Zoology, Ecology, Environmental sciences, Engineering, Mathematics and computing

Introduction

Recent advancements in transfer learning and deep learning have greatly enhanced the state-of-the-art in various computer vision tasks^1,2. Among these, image recognition technology plays a foundational role, serving as the basis for numerous high-level visual applications, including image segmentation, object detection, video tracking, and dynamic behavior analysis. However, a persistent and well-recognized challenge in image recognition is the long-tailed (LT) distribution problem, in which many classes occur far less frequently compared to a few dominant classes. This class imbalance often leads recognition models to become biased toward head classes, resulting in poor generalization performance on underrepresented tail classes³.

Although significant research has been devoted to developing LT learning algorithms, most prior work has focused on single-domain settings⁴. In contrast, real-world applications often involve data collected from multiple heterogeneous sources, introducing variations not only between source and target domains but also among the sources themselves⁵. This phenomenon—where data distributions differ while maintaining related semantic structures across domains—is known as domain shift (DS). The simultaneous occurrence of LT and DS substantially exacerbates the difficulty of the recognition task, a situation commonly encountered in practical scenarios⁶. For example, in medical image analysis, images collected from different hospitals may exhibit significant inter-domain variability. Simultaneously, rare diseases remain infrequent and may only appear in specific hospital domains. Yang et al. formally characterized this complex setting as the multi-domain long-tailed (MDLT) problem⁴.

To address domain shift, domain-invariant representation learning approaches have been widely studied. These methods aim to mitigate domain discrepancies by learning feature representations that are shared and consistent across domains^7,8. They typically operate by minimizing a distributional discrepancy measure—such as maximum mean discrepancy (MMD)—between source and target domains within a shared latent space. However, most of these approaches are designed under the assumption of class-balanced data distributions⁹. In MDLT scenarios, severe class imbalances—both within individual domains and across different domains—introduce complex biases in the input feature space and the learned latent representations, thereby complicating the alignment process. These biases can severely distort the mapping between features and labels, leading to suboptimal feature alignment across domains and insufficient discrimination for tail classes¹⁰.

Several existing methods have been widely adopted to address LT, DS, or, to a limited extent, their combination. In single-domain LT settings, approaches such as resampling (e.g., over-sampling tail classes or under-sampling head classes) and reweighting strategies (e.g., class-balanced loss, focal loss) have shown varying degrees of effectiveness^11,12. For domain adaptation under class-balanced assumptions, models such as domain-adversarial neural networks (DANN)¹³ have achieved strong performance. However, when evaluated under the MDLT setting, these methods exhibit critical limitations. Resampling methods can exacerbate domain shifts by overfitting to the limited samples of rare classes within specific domains. Reweighting strategies often struggle with conflicting gradients that arise from simultaneously addressing domain alignment and class imbalance. Moreover, domain-adversarial methods tend to focus on aligning dominant classes while neglecting tail classes, leading to poor generalization on rare categories across domains. Recent efforts to bridge this gap have explored combining domain adaptation techniques with imbalance-aware learning, such as domain-conditional feature normalization⁴. Nonetheless, these approaches often rely on strong assumptions about the availability of domain labels, class distributions, or large amounts of labeled data—conditions that may not hold in practice.

To tackle the MDLT problem, we propose a reweighting balanced representation learning (BRL) method for multi-domain long-tailed image recognition. BRL incorporates covariate balancing (CB) and representation balancing (RB) techniques to simultaneously solve class imbalance and domain imbalance from both the input feature space and latent representation space. It learns a shared latent space that is invariant to both input features and labels, thereby mitigating complex biases and improving classifier performance, particularly for tail classes. This method robustly handles extreme class imbalance and complex domain shifts without relying heavily on strong prior knowledge or domain-specific annotations. Extensive evaluations on multiple real-world datasets confirm the effectiveness of the proposed method.

Related work

Imbalanced long-tailed distribution

Unlike general data heterogeneity, imbalanced LT data exhibit extremely skewed distribution curves. Due to the significant variation in the number of samples per class, even state-of-the-art deep classification models often face performance challenges³. To address LT training data, one line of research focuses on adjusting class-wise contributions through resampling¹¹, reweighting^12,14, logit adjustment^15,16, and two-stage training strategies^17,18. Another line of work explores ensemble-based approaches, including contrastive learning¹⁹, knowledge distillation²⁰, variance bias calibration²¹, and meta-learning strategies^22,23. However, this problem becomes more serious in MDLT scenarios, where, in addition to inter-class differences, intra-class variations due to domain shifts further complicate learning. Tail class representations are particularly sensitive to domain-specific features⁶. Several recent methods have been proposed to address the MDLT problem. For example, Yang et al. introduced the BoDA loss function⁴, Ding et al. proposed a deep imbalanced domain adaptation (DIDA) framework²⁴, and Xia et al. developed the generative inference network (GINet) to improve generalization across imbalanced domains²⁵. However, the MDLT problem remains far from being fully resolved and warrants further investigation.

Cross-domain representation learning

In machine learning, selecting an appropriate set of features is crucial for the performance of downstream models such as classifiers or predictors. The effectiveness of these models is directly influenced by the quality and structure of the data representation used during training. Representation learning refers to the process of mapping raw input data to feature vectors or tensors that capture abstract, meaningful concepts to improve downstream task performance²⁶. Recently, numerous representation learning methods have been developed to tackle the problem of domain shift^27–29. Most of these methods focus on feature-based approaches for learning domain-invariant representations^8,30,31. For example, deep domain confusion (DDC) integrates MMD minimization into the final fully connected layer to encourage shared representations between domains³². Correlation alignment (CORAL) employs linear transformations to match second-order statistics between source and target domains^33,34, which has been extended into Deep CORAL using nonlinear mappings to adjust correlations in deep network activations³⁵. DANN employs a minimax optimization framework to learn domain-invariant features by jointly training a feature extractor and domain classifier¹³. Similarly, Wasserstein distance guided representation learning (WDGRL) maps data from multiple domains into a shared feature space to facilitate domain-invariant learning⁸. Despite these advancements, research on feature representation for datasets that are both cross-domain and LT remains limited and insufficiently explored.

Problem definition

Let Inline graphic and represent the input and label spaces, respectively, and let represent the domain space. The domain space comprises K domains, denoted as , and the label space contains M categories. Each data instance is represented as , where i is the instance index, is the raw input, () is the class label, and Inline graphic () is the domain label. For any training domain d (), the category distribution is LT; that is, some rare classes may have very few or no training samples due to their low prevalence in specific domains. Hence, , where ( represents the class label set of the d-th domain , and ). To evaluate overall performance across all classes, the test data—denoted by Inline graphic —is sampled with a balanced distribution over all classes. Figure 1 presents an example of a MDLT dataset, Digits-MLT. Fig. 1aillustrates the class distribution of Digits-MLT, which includes two domains composed of two digit datasets: MNIST-M and SVHN. Representative samples from this dataset are visualized in Fig. 1b, with further details provided in datasets descriptions section. It is evident that the data in each domain of the MDLT dataset follows an LT distribution, and the class distributions vary across domains. It is important to note that this figure serves to show the general characteristics of the MDLT learning problem, namely the presence of both LT class distributions and domain diversity. To ensure clarity in subsequent descriptions of the relevant formulas, Table 1 gives a detailed description of the key symbols used in this paper.

Fig. 1 — Example of multi-domain long-tailed dataset (Digits-MLT).

Table 1.

The list of symbols.

Symbol	Description	Symbol	Description
	The input spaces of the samples		The raw input of the i-th sample
	The class label spaces of the samples		The class label of the i-th sample
	The domain space of the dataset		The domain label of the i-th sample
	The class space of the dataset	M	The number of classes in the dataset
	The mapped feature representation space of the samples		The mapped feature representation of the i-th sample
K	The number of domains in the dataset		The kernel function
	The d-th domain		The class distribution of the d-th domain
	The class label set of the d-th domain		The domain distribution of the samples
W	The sample weight space		The sample weight of target domain sample
	The loss function		The weight mapping function
	The feature mapping function (feature extractor)		The classifier
	The feature representation distributions	,,	The parameter vectors of , and
n	The number of samples		The domain weight of the i-th source domain
m	The number of raw feature dimensions of the sample	s	The number of mapped feature dimensions of the sample

Open in a new tab

Definition 1

(Cross-Domain Long-Tailed Classification): Given labeled samples Inline graphic (for ) from multiple source domains and unlabeled samples from the test domain , where and represent the number of samples in the d-th source domain and the test domain respectively, the purpose of cross-domain LT (CDLT) classification is to learn a representation mapping and a hypothesis function Inline graphic that together minimize the expected classification error over the instance in the test domain , where denotes the representation space:

where Inline graphic is the expected value, is the loss function, and and are the parameter vectors of the representation functions and the classifier h, respectively.

Methodology–representation learning with doubly balancing

In MDLT learning, addressing data imbalance is of primary importance. This imbalance arises not only from class label imbalance within individual domains but also from distributional differences among domains. To solve this problem, this paper introduces a CDLT classification algorithm that combines CB and RB techniques to produce data representations that are both class-unbiased and domain-balanced. CB is extensively employed in observational studies to construct balanced datasets by reweighting samples³⁶. In our approach, CB is applied to mitigate covariate bias in the input feature space. However, since the aim is to learn domain-invariant representations, it is also necessary to address bias in the latent representation space. This is achieved via RB, which explicitly targets representation-level imbalance. Because of the noise introduced during the transformation of input features into latent representations, traditional CB methods alone may be insufficient to correct representation bias¹⁰. To overcome this limitation, our method jointly optimizes CB and RB through a reweighting strategy that simultaneously addresses covariate and representation biases.

Covariate balancing

According to the analysis by Yang et al.⁴ on the effect of divergent label distributions on transferable features, if the label distributions across domains are consistent, the model can effectively align similar classes from different domains. Based on this insight, the primary objective of our study is to ensure consistency in label distribution across domains. We interpret this form of domain shift as a covariate bias and apply a CB algorithm to align the label distributions between domains.

CB is often used in observational studies to reduce covariate bias³⁶. The core idea is to reweight samples in one group using a set of weights Inline graphic to align its distribution with that of another group³⁷. This technique learns a mapping function that projects target domain samples into a weight space, allowing for reweighting based on the learned weights. As a result, the data distributions of the target and source domains become more closely aligned. Let the mapping function be denoted as Inline graphic , parameterized by where . For a given target domain sample , its corresponding weight is computed as . These weights are then used to reweight the target domain samples, improving distributional consistency with the source domain. In our proposed algorithm, the objective function for the CB module is formulated as follows:

with Inline graphic , where is the number of samples in the i-th domain, and is a vector of dimension-wise means computed over the i-th source domain, the parameter m is the dimensionality of the data. The condition is imposed to normalize the weights of the target domain samples such that they sum to one. Additionally, enforcing Inline graphic ensures that all sample weights are non-negative.

Solving Eq. (2) under these constraints generates a more balanced set of samples across domains, under the following assumption:

Assumption 1

Given a sample Inline graphic drawn from either a source or target domain, the following conditions are assumed to hold:

Inline graphic , , where is the i-th source domain, is the target domain, , with where represents the class label set for the i-th domain, is the domain label, and is the class label of the sample. This assumption means that there exists a non-zero number of samples from different classes and domains in both the source and target domains. As such, it may limit the applicability of the method to open-set classification tasks, where unseen classes may appear in the target domain.

Representation balancing

Similar to CB, the basic idea of RB is to adjust the distribution of target domain representations by applying sample weights Inline graphic , such that the reweighted representation distribution of the target domain more closely aligns with that of the source domain.

Let Inline graphic , denote the distributions of the learned representations in the source and target domains, respectively, where the representations are induced by the mapping function on . To quantify the distance between and , we employ an integral probability metric (IPM)³⁸. Based on the approach of Cheng et al.¹⁰, we adopt MMD with a multi-scale Gaussian kernel, owing to its informative nature and computational simplicity in measuring the distance between Inline graphic and . It can then be calculated in the equation below:

where Inline graphic is a multi-scale Gaussian kernel defined as follows:

We then formulate the RB objective as follows:

Balanced domain-invariant representation

According to Definition 1, we define the model as a mapping function Inline graphic , which maps raw inputs to predicted labels. This model is decomposed into a feature extractor and a classifier . The final prediction is given by . Based on this architecture, the MDLT classifier is trained using the following objective function:

where Inline graphic denotes the sample weight for the i-th domain, calculated similarly to Eq. (2), and is the loss function. The weights compensate for domain imbalance in the source data. A balanced cross-entropy loss is used, in which the prediction scores are adjusted by the logarithm of the number of samples in each class before calculating the cross-entropy.

To further address domain shift among multiple source domains and to improve the generalizability of the classifier, a penalty term is incorporated. This term captures the feature distribution differences between pairs of source domains and is computed as follows:

where s denotes the dimensionality of the feature space after mapping, and Inline graphic () represents the mapped feature representation of the original sample , as defined in Definition 1. and are the mean values of the p-th feature dimension in the i-th and j-th source domains, respectively. and are the covariances between the p-th and q-th dimensions in the i-th and j-th domains, respectively.

The overall objective function for CDLT classification is then formulated by incorporating Eqs. (2), (6) and (8) into the above classifier Eq. (7), we obtain:

where Inline graphic , , and are parameters that jointly control the trade-off between prediction accuracy and the imbalance errors associated with CB and RB. By optimizing the sample weights through the joint application of CB and RB, the proposed method effectively mitigates both covariate shift and representation bias. The output prediction for a test domain sample is computed as Inline graphic , .

The model is trained by minimizing Eq. (9) using the Adam optimizer. The training procedure is outlined in Algorithm 1, which presents the pseudocode for the proposed model. Additionally, Fig. 2 visualizes the overall workflow of the method.

Fig. 2 — Workflow diagram of the proposed method.

Experiments

Datasets descriptions

Yang et al.⁴ proposed six benchmark datasets for MDLT learning: VLCS-MLT, PACS-MLT, OfficeHome-MLT, TerraInc-MLT, DomainNet-MLT, and Digits-MLT. Among these, Digits-MLT is a synthetic dataset created by combining two digit datasets, while the remaining five are real-world multi-domain datasets widely used in domain generalization research. Following their setup, we use these datasets to assess the performance of our proposed method. Table 2 provides detailed information on the datasets used in the experiments, and Fig. 3 displays the class distributions. In the figure, class labels are sorted in descending order based on the number of samples per class to clearly demonstrate the LT characteristics of each dataset. It should be noted that the class and domain labels in the experimental data have been reindexed based on ascending alphabetical order, and the indices correspond to the original labels in the respective datasets.

Table 2.

Dataset details.

Dataset (size)	#Class	Domains	Train set #Sample			Validation set #Sample			Test set #Sample
Dataset (size)	#Class	Domains	Min class	Max class	Total	Min class	Max class	Total	Min class	Max class	Total
Digits-MLT (3, 28, 28)	10	MNIST-M	10	1000	2478	800	800	8000	800	800	8000
Digits-MLT (3, 28, 28)	10	SVHN	10	1000	2478	800	800	8000	800	800	8000
VLCS-MLT (3, 224, 224)	5	Caltech101	22	825	1190	15	15	75	30	30	150
		LabelMe	0	1192	2434	14	15	74	28	30	148
		SUN09	0	1219	3097	6	15	61	14	30	124
		VOC2007	285	1454	3151	15	15	75	30	30	150
PACS-MLT (3, 224, 224)	7	Art painting	109	374	1523	25	25	275	50	50	350
		Cartoon	60	382	1819	25	25	175	50	50	350
		Photo	107	357	1145	25	25	175	50	50	350
		Sketch	5	741	3404	25	25	175	50	50	350
TerraInc-MLT (3, 224, 224)	10	Location-100	0	2444	4459	4	10	94	8	20	188
		Location-38	0	4455	9481	1	10	85	2	20	170
		Location-43	0	1063	3717	1	10	84	2	20	169
		Location 46	0	1370	5612	4	10	90	8	20	181
OfficeHome-MLT (3, 224, 224)	65	Art	0	84	1452	5	5	325	10	10	650
		Clipart	24	84	3390	5	5	325	10	10	650
		Product	23	84	3464	5	5	325	10	10	650
		Real World	8	84	3382	5	5	325	10	10	650
DomainNet-MLT (3, 224, 224)	345	Clipart	0	407	28603	4	20	6490	8	40	13036
		Infographic	0	710	33067	3	20	6140	8	40	12398
		Painting	0	778	53596	0	20	6198	2	40	12472
		Quickdraw	440	440	151800	20	20	6900	40	40	13800
		Real	0	755	152311	10	20	6878	21	40	13758
		Sketch	0	654	49197	4	20	6634	9	40	13297

Open in a new tab

Fig. 3 — Distribution of training, validation, and test data for each MDLT dataset.

Experimental cettings

Setup for model training

All models were implemented using PyTorch and trained on an NVIDIA RTX A4000 GPU. Following the setup of Yang et al.⁴, we used the same CNN architecture for the Digits-MLT dataset and used ResNet-50 as the backbone network for the remaining five datasets. To ensure a fair comparison, we evaluated the following groups of learning solutions: (1) domain-invariant feature learning solutions, including IRM⁷, DANN¹³, CDANN³⁹, CORAL³⁵, MMD⁴⁰; (2) imbalanced learning solutions, such as Focal¹⁴, CBLoss⁴¹, LDAM⁴², BSoftmax¹⁵, CRT¹⁷, BoDA⁴; and (3) other baseline, including ERM⁴³, GroupDRO⁴⁴, Mixup⁴⁵, SagNet⁴⁶, MLDG⁴⁷, MTL⁴⁸, Fish⁴⁹. In addition, we evaluated the two-stage training procedure of our model by following the protocol outlined in the work of Yang et al.⁴. The implementation of our algorithm is based on the codebase provided by Yang et al.⁴, and their optimal parameter settings were used for all comparison algorithms.

Evaluation setup

Two widely used evaluation metrics for imbalanced learning—top-1 overall accuracy and the F1 score across all classes—were adopted. In addition, we computed accuracy for four disjoint class subsets: many-shot classes (more than 100 samples), medium-shot classes (20~100 samples), few-shot classes (fewer than 20 samples), and zero-shot classes (no training samples). Unlike Yang et al., who selected the best-performing model based on accuracy, we selected the best model during training according to the F1 score for all algorithms. This is because the F1 score provides a more comprehensive evaluation, particularly in the context of imbalanced datasets.

Results and analysis

Quantitative results

For the MDLT image classification task, we conducted experiments on all selected benchmark datasets, with results presented in Tables 3, 4, 5, 6, 7, 8 and 9. It is important to note that, in this study, we did not perform dataset-specific tuning of the parameters Inline graphic , , and . Instead, we adopted a fixed set of values—, , and 10, respectively—to identify a parameter configuration that provides better average performance across different datasets, as supported by the experiments discussed in the Parameter Analysis section. As a result, our method may not yield optimal performance on certain datasets (e.g., VLCS-MLT), as shown in Tables 3, 4, 5, 6, 7 and 8. However, subsequent analysis demonstrates that these parameters significantly affect performance depending on the dataset. For example, in the case of Digits-MLT, when Inline graphic is held constant at 10 and , , both the average F1 score and accuracy improve to 0.686 and 0.688, respectively, while the lowest values also remain relatively high at 0.644 and 0.646.

Table 3.

Results on Digits-MLT.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.079	0.031	0.147	0.101	0.278	0.003	0.036	− 1
DANN	0.646	0.612	0.649	0.614	0.761	0.557	0.508	− 1
CDANN	0.584	0.511	0.596	0.526	0.784	0.476	0.307	− 1
CORAL	0.605	0.538	0.622	0.562	0.815	0.517	0.298	− 1
MMD	0.029	0.028	0.094	0.088	0.080	0.000	0.269	− 1
Focal	0.506	0.423	0.537	0.466	0.774	0.362	0.207	− 1
CBLoss	0.488	0.359	0.515	0.389	0.715	0.422	0.151	− 1
LDAM	0.546	0.400	0.577	0.446	0.792	0.472	0.197	− 1
Bsoftmax	0.595	0.485	0.604	0.501	0.718	0.577	0.359	− 1
CRT(2-stage training)	0.531	0.451	0.562	0.490	0.787	0.448	0.171	− 1
BoDA	0.556	0.458	0.583	0.495	0.814	0.439	0.222	− 1
BoDA(2-stage training)	0.601	0.539	0.617	0.554	0.810	0.504	0.303	− 1
ERM	0.479	0.372	0.524	0.433	0.787	0.360	0.111	− 1
GroupDRO	0.474	0.444	0.522	0.495	0.806	0.342	0.084	− 1
Mixup	0.571	0.454	0.593	0.487	0.804	0.476	0.240	− 1
SagNet	0.565	0.521	0.594	0.552	0.837	0.472	0.172	− 1
MLDG	0.485	0.385	0.530	0.438	0.789	0.392	0.088	− 1
MTL	0.479	0.362	0.522	0.415	0.782	0.368	0.102	− 1
Fish	0.449	0.368	0.498	0.421	0.771	0.324	0.075	− 1
BRL(ours)	0.672	0.630	0.675	0.635	0.781	0.608	0.511	− 1
BRL(2-stage training)	0.674	0.633	0.679	0.639	0.791	0.620	0.488	− 1

Open in a new tab

Table 4.

Results on VLCS-MLT.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.758	0.512	0.765	0.527	0.842	0.760	− 1	0.395
DANN	0.710	0.413	0.726	0.453	0.814	0.713	− 1	0.307
CDANN	0.726	0.477	0.743	0.493	0.828	0.740	− 1	0.302
CORAL	0.779	0.559	0.788	0.568	0.867	0.787	− 1	0.390
MMD	0.778	0.554	0.788	0.554	0.861	0.813	− 1	0.369
Focal	0.785	0.589	0.794	0.601	0.867	0.813	− 1	0.390
CBLoss	0.785	0.557	0.794	0.568	0.864	0.807	− 1	0.414
LDAM	0.774	0.558	0.781	0.554	0.861	0.787	− 1	0.362
Bsoftmax	0.818	0.647	0.824	0.655	0.867	0.873	− 1	0.524
CRT (2-stage training)	0.793	0.567	0.803	0.574	0.881	0.800	− 1	0.407
BoDA	0.779	0.578	0.789	0.595	0.869	0.800	− 1	0.362
BoDA (2-stage training)	0.804	0.629	0.812	0.635	0.881	0.833	− 1	0.426
ERM	0.775	0.530	0.785	0.547	0.869	0.760	− 1	0.407
GroupDRO	0.781	0.549	0.790	0.568	0.867	0.787	− 1	0.419
Mixup	0.769	0.522	0.778	0.520	0.861	0.780	− 1	0.350
SagNet	0.766	0.542	0.771	0.547	0.861	0.747	− 1	0.374
MLDG	0.777	0.524	0.782	0.520	0.864	0.753	− 1	0.419
MTL	0.776	0.543	0.782	0.554	0.861	0.787	− 1	0.383
Fish	0.774	0.549	0.782	0.554	0.872	0.767	− 1	0.362
BRL (ours)	0.813	0.641	0.809	0.635	0.847	0.840	− 1	0.581
BRL (2-stage training)	0.807	0.619	0.803	0.615	0.850	0.820	− 1	0.564

Open in a new tab

Table 5.

Results on PACS-MLT.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.964	0.949	0.964	0.949	0.961	0.980	1.000	− 1
DANN	0.930	0.902	0.928	0.897	0.922	0.980	0.960	− 1
CDANN	0.927	0.908	0.926	0.909	0.924	0.950	0.920	− 1
CORAL	0.984	0.977	0.984	0.977	0.982	1.000	1.000	− 1
MMD	0.975	0.960	0.975	0.960	0.974	0.990	0.960	− 1
Focal	0.981	0.963	0.981	0.963	0.981	0.980	0.980	− 1
CBLoss	0.979	0.971	0.979	0.971	0.978	1.000	0.980	− 1
LDAM	0.979	0.971	0.979	0.971	0.978	1.000	0.940	− 1
Bsoftmax	0.980	0.966	0.979	0.966	0.978	1.000	0.980	− 1
CRT(2-stage training)	0.984	0.972	0.984	0.971	0.984	0.980	0.980	− 1
BoDA	0.979	0.969	0.979	0.969	0.978	0.990	1.000	− 1
BoDA(2-stage training)	0.982	0.974	0.982	0.974	0.980	1.000	1.000	− 1
ERM	0.981	0.966	0.981	0.966	0.982	0.980	0.960	− 1
GroupDRO	0.982	0.972	0.981	0.971	0.981	0.980	1.000	− 1
Mixup	0.981	0.963	0.981	0.963	0.980	0.990	0.980	− 1
SagNet	0.976	0.963	0.976	0.963	0.974	1.000	1.000	− 1
MLDG	0.982	0.977	0.982	0.977	0.980	1.000	1.000	− 1
MTL	0.980	0.963	0.980	0.963	0.979	0.990	0.980	− 1
Fish	0.979	0.969	0.979	0.969	0.980	0.980	0.960	− 1
BRL(ours)	0.979	0.968	0.979	0.969	0.977	1.000	1.000	− 1
BRL(2-stage training)	0.982	0.977	0.982	0.977	0.980	1.000	1.000	− 1

Open in a new tab

Table 6.

Results on TerraInc-MLT.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.486	0.421	0.586	0.530	0.752	0.237	0.183	0.073
DANN	0.434	0.379	0.527	0.464	0.646	0.275	0.133	0.227
CDANN	0.367	0.326	0.472	0.414	0.616	0.037	0.217	0.086
CORAL	0.682	0.587	0.776	0.685	0.872	0.738	0.717	0.136
MMD	0.698	0.562	0.792	0.696	0.878	0.875	0.750	0.156
Focal	0.672	0.587	0.778	0.707	0.870	0.863	0.667	0.073
CBLoss	0.696	0.559	0.776	0.657	0.852	0.812	0.750	0.254
LDAM	0.690	0.583	0.786	0.680	0.880	0.800	0.700	0.131
Bsoftmax	0.761	0.687	0.823	0.762	0.862	0.913	0.933	0.343
CRT (2-stage training)	0.737	0.622	0.841	0.724	0.910	0.950	0.900	0.122
BoDA	0.695	0.597	0.792	0.696	0.888	0.762	0.767	0.104
BoDA (2-stage training)	0.729	0.626	0.824	0.735	0.892	0.887	0.900	0.151
ERM	0.675	0.593	0.778	0.696	0.884	0.800	0.600	0.098
GroupDRO	0.602	0.551	0.705	0.596	0.820	0.700	0.367	0.104
Mixup	0.656	0.561	0.737	0.669	0.860	0.650	0.500	0.167
SagNet	0.682	0.586	0.780	0.707	0.888	0.775	0.533	0.135
MLDG	0.687	0.567	0.790	0.680	0.886	0.800	0.733	0.089
MTL	0.674	0.583	0.780	0.696	0.880	0.775	0.700	0.080
Fish	0.691	0.583	0.802	0.696	0.888	0.863	0.800	0.064
BRL (ours)	0.780	0.712	0.824	0.768	0.872	0.862	0.817	0.430
BRL (2-stage training)	0.802	0.713	0.849	0.773	0.892	0.913	0.900	0.423

Open in a new tab

Table 7.

Results on OfficeHome-MLT.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.755	0.686	0.757	0.685	0.826	0.764	0.568	0.550
DANN	0.795	0.710	0.798	0.717	0.856	0.805	0.642	0.550
CDANN	0.785	0.707	0.787	0.711	0.836	0.792	0.655	0.650
CORAL	0.838	0.780	0.840	0.783	0.885	0.853	0.684	0.600
MMD	0.008	0.003	0.010	0.005	0.008	0.012	0.003	0.100
Focal	0.816	0.742	0.818	0.743	0.868	0.828	0.668	0.500
CBLoss	0.825	0.746	0.826	0.749	0.874	0.845	0.632	0.600
LDAM	0.817	0.718	0.817	0.720	0.873	0.827	0.658	0.550
Bsoftmax	0.828	0.759	0.828	0.762	0.875	0.837	0.687	0.650
CRT (2-stage training)	0.836	0.770	0.838	0.775	0.889	0.849	0.677	0.600
BoDA	0.839	0.766	0.842	0.772	0.899	0.851	0.681	0.600
BoDA(2-stage training)	0.845	0.775	0.847	0.778	0.899	0.858	0.687	0.600
ERM	0.818	0.757	0.821	0.765	0.878	0.832	0.642	0.700
GroupDRO	0.818	0.730	0.821	0.734	0.882	0.830	0.645	0.600
Mixup	0.846	0.779	0.848	0.785	0.893	0.860	0.694	0.650
SagNet	0.826	0.744	0.831	0.757	0.896	0.841	0.642	0.600
MLDG	0.824	0.748	0.826	0.752	0.892	0.830	0.665	0.600
MTL	0.820	0.746	0.820	0.743	0.873	0.829	0.658	0.650
Fish	0.822	0.736	0.825	0.745	0.893	0.836	0.616	0.700
BRL (ours)	0.832	0.776	0.833	0.775	0.879	0.837	0.716	0.600
BRL (2-stage training)	0.837	0.784	0.837	0.783	0.881	0.842	0.723	0.600

Open in a new tab

Table 8.

Results on DomainNet-MLT.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.136	0.032	0.169	0.057	0.205	0.128	0.061	0.063
DANN	0.538	0.311	0.551	0.305	0.601	0.540	0.389	0.318
CDANN	0.554	0.327	0.565	0.318	0.623	0.543	0.394	0.325
CORAL	0.597	0.340	0.606	0.326	0.672	0.587	0.407	0.320
MMD	0.002	0.001	0.003	0.002	0.003	0.003	0.002	0.003
Focal	0.578	0.302	0.587	0.293	0.657	0.572	0.369	0.279
CBLoss	0.588	0.318	0.599	0.321	0.651	0.618	0.439	0.298
LDAM	0.586	0.307	0.597	0.301	0.666	0.582	0.389	0.288
Bsoftmax	0.605	0.344	0.613	0.340	0.658	0.615	0.482	0.386
CRT (2-stage training)	0.617	0.340	0.629	0.346	0.685	0.647	0.457	0.313
BoDA	0.597	0.348	0.607	0.337	0.669	0.592	0.413	0.336
BoDA (2-stage training)	0.620	0.366	0.632	0.365	0.683	0.640	0.492	0.350
ERM	0.586	0.307	0.596	0.300	0.666	0.579	0.383	0.283
GroupDRO	0.536	0.297	0.550	0.302	0.622	0.521	0.357	0.239
Mixup	0.583	0.329	0.595	0.323	0.660	0.578	0.401	0.306
SagNet	0.588	0.312	0.598	0.305	0.668	0.581	0.373	0.291
MLDG	0.589	0.314	0.600	0.310	0.667	0.583	0.392	0.298
MTL	0.584	0.311	0.593	0.306	0.665	0.571	0.373	0.288
Fish	0.588	0.317	0.600	0.313	0.668	0.583	0.397	0.291
BRL (ours)	0.569	0.361	0.583	0.362	0.611	0.614	0.483	0.377
BRL (2-stage training)	0.588	0.370	0.602	0.372	0.636	0.631	0.504	0.365

Open in a new tab

Table 9.

Results over all MDLT benchmarks.

Algorithm	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Algorithm	Average	Worst	Average	Worst	Many	Medium	Few	Zero
IRM	0.530	0.439	0.565	0.475	0.644	0.479	0.370	0.270
DANN	0.676	0.555	0.697	0.575	0.767	0.645	0.526	0.351
CDANN	0.657	0.543	0.682	0.562	0.769	0.590	0.499	0.341
CORAL	0.748	0.630	0.769	0.650	0.849	0.747	0.621	0.362
MMD	0.415	0.351	0.444	0.384	0.467	0.449	0.397	0.157
Focal	0.723	0.601	0.749	0.629	0.836	0.736	0.578	0.311
CBLoss	0.727	0.585	0.748	0.609	0.822	0.751	0.590	0.392
LDAM	0.732	0.590	0.756	0.612	0.842	0.745	0.577	0.333
Bsoftmax	0.765	0.648	0.779	0.664	0.826	0.803	0.688	0.476
CRT (2-stage training)	0.750	0.620	0.776	0.647	0.856	0.779	0.637	0.361
BoDA	0.741	0.619	0.765	0.644	0.853	0.739	0.617	0.351
BoDA (2-stage training)	0.764	0.652	0.786	0.674	0.858	0.787	0.676	0.382
ERM	0.719	0.588	0.748	0.618	0.844	0.719	0.539	0.372
GroupDRO	0.699	0.591	0.728	0.611	0.830	0.693	0.491	0.341
Mixup	0.734	0.601	0.755	0.625	0.843	0.722	0.563	0.368
SagNet	0.734	0.611	0.758	0.639	0.854	0.736	0.544	0.350
MLDG	0.724	0.586	0.752	0.613	0.846	0.726	0.576	0.352
MTL	0.719	0.585	0.746	0.613	0.840	0.720	0.563	0.350
Fish	0.717	0.587	0.748	0.616	0.845	0.726	0.570	0.354
BRL (ours)	0.774	0.681	0.784	0.691	0.828	0.794	0.705	0.497
BRL (2-stage training)	0.782	0.683	0.792	0.693	0.838	0.804	0.723	0.488

Open in a new tab

Overall, the proposed model performs well across most evaluation measures, especially on the “worst” measure (i.e., performance on the most difficult classes), as shown in Table 9. This robustness is especially evident in the Digits-MLT and TerraInc-MLT datasets, where the model demonstrates strong recognition capabilities for hard-to-classify categories. Furthermore, the two-stage training approach outperforms single-stage training in most cases. However, in scenarios involving few-shot or zero-shot classes, two-stage training may suffer from overfitting due to the limited number of samples, leading to slightly lower results. Nevertheless, our method consistently surpasses the performance of baseline algorithms. Figure 4 displays a heatmap of the macro F1 scores per class and domain for each dataset (excluding DomainNet due to its large number of classes—345—which limits visual clarity). Darker cells indicate lower F1 scores, highlighting domain-class combinations that are more difficult to recognize.

Fig. 4 — F1 score heatmap of the proposed model on different datasets.

To further assess the stability of the algorithm’s performance, we conducted five independent runs for each method using distinct random seeds, resulting in five sets of experimental outcomes per method. We employed analysis of variance (ANOVA) to calculate p-values across different performance metrics. As shown in Table 10, the ANOVA results for the Digits-MLT dataset indicate that the p-values for all evaluated metrics are substantially below standard significance thresholds (e.g., 0.05), thereby confirming that the observed performance differences among the compared algorithms are statistically significant. To visualize the distribution of F1 scores, violin plots were generated (Fig. 5), depicting the spread of both average and worst-case F1 scores across algorithms. These plots show that the proposed method achieves F1 scores that are highly concentrated in the upper range, with a narrow gap between the average and worst-case values and relatively low variance, suggesting both superior and stable performance.

Table 10.

ANOVA results on Digits-MLT.

Measure	F1-Score (by domain)		Accuracy (by domain)		Accuracy (by shot)
Measure	Average	Worst	Average	Worst	Many	Medium	Few	Zero
PR(>F)	2.464e− 19	2.35e− 16	5.076e− 20	2.401e− 16	6.374e− 24	3.038e− 18	6.709e− 18	NaN

Open in a new tab

Fig. 5 — Distribution of F1 score for different algorithms on Digits-MLT.

In summary, while the proposed model may not be the top performer on every dataset, it remains close to optimal in most cases. Moreover, it exhibits a smaller discrepancy between “average” and “worst-case” results compared to other algorithms, indicating greater stability. These findings support the effectiveness of the proposed model in the MDLT image classification task, especially in improving recognition performance for difficult classes.

Qualitative analysis via visualization

To further analyze the model’s capabilities, we performed a t-SNE visualization of the test set features after training, using both class labels and domain labels for interpretation. Figure 6 illustrates the t-SNE results for the Digits-MLT test set. For each model, the upper row presents the t-SNE plot colored by class labels, while the lower row shows the same features colored by domain labels. From the t-SNE plots-based visualizations, the feature clustering produced by our model closely resembles that of DANN and BoDA, which aligns with the quantitative results in Table 3. However, the domain-based visualization reveals significant differences. In the context of domain-invariant representation learning, ideal feature representations should minimize differences between domains. Thus, better alignment of feature distributions across domains reflects stronger generalization and reduced domain-specific bias. As shown in Fig. 6, our model demonstrates greater overlap between domains, with data points from domain A aligning closely with those from domain B. In this way, the phenomenon of shortcut learning—where classification is based on domain labels—is effectively mitigated.

Fig. 6 — t-SNE of Digits-MLT test set on different model.

Parameter analysis

The proposed method incorporates three key parameters— Inline graphic , , and —which control the balance between classification accuracy and the correction of imbalance-related errors. In our main experiments, these parameters were consistently set to , , . To analyze the effect of each parameter on model performance, a series of experiments were performed using a range of values for each parameter. Specifically, Inline graphic , , and were set to , with one parameter varied at a time and the other two fixed. Figure 7 shows the results on the Digits-MLT dataset, reporting average and worst-case F1 scores and accuracy. In the figure, solid lines represent average values, dashed lines represent the worst-case values, colors differentiate the parameters, and the shaded area between lines of the same color indicates the difference between average and worst-case performance under different parameter values. The results show that the model maintains robust performance as Inline graphic increases, indicating its resilience when emphasizing the CB component. The best test performance is achieved when is set at approximately 0.01; however, further increases in lead to a rapid decline in performance, likely due to overcorrection in the RB component. Within a certain range, model performance exhibited minor waves as the Inline graphic value increased; however, when became excessively large, such as greater than 100, classification performance declined significantly.

Fig. 7 — Single-parameter analysis results under different parameter values on Digits-MLT.

During the single-parameter analysis on the Digits-MLT dataset, it was observed that the proposed algorithm achieved relatively high F1 scores and accuracy when the hyperparameters were set to Inline graphic , , and . To further investigate the convergence behavior of the algorithm under this setting, both the loss values and F1 scores were recorded at intervals of 100 iterations over a total of 5000 training iterations. The results are illustrated in Fig. 8. As shown in Fig. 8, the loss value consistently decreased with the number of iterations, indicating progressive convergence. Concurrently, the F1 score showed a steady upward trend, improving as the loss decreased, and eventually stabilized, demonstrating a favorable convergence trend.

Fig. 8 — Iterative training results on Digits-MLT.

Based on the parameter analysis, it is evident that relatively small values for the hyperparameters tend to result in higher F1 scores and accuracy in practical applications. This implies that during model optimization, priority should be given to exploring smaller parameter ranges to enhance performance more effectively.

Ablation studies

To assess the contribution of individual components within the proposed model, ablation studies were conducted by selectively enabling or disabling key modules: the balanced cross-entropy loss (BCE), CB component, RB component, and cross-domain penalty term (Penalty). The impact of each component on model performance was assessed using the benchmark Digits-MLT dataset, with results summarized in Table 11. The results show that the inclusion of all components (as in experiment number 8) yields the best overall performance, with an average F1 score of 0.672 and a worst-case F1 score of 0.630. These findings indicate that the full combination of these components provides the most robust and consistent improvements in the performance of the model.

Table 11.

Ablation studies on Digits-MLT.

No.	CB	RB	Penalty	F1-Score (by domain)		Accuracy (by domain)
No.	CB	RB	Penalty	Average	Worst	Average	Worst
1	–	–	–	0.595	0.485	0.604	0.501
2	–	–		0.661	0.618	0.664	0.624
3		–		0.661	0.618	0.664	0.624
4	–			0.645	0.596	0.649	0.602
5		–	–	0.595	0.485	0.604	0.501
6	–		–	0.633	0.554	0.640	0.562
7			–	0.648	0.580	0.659	0.594
8				0.672	0.630	0.675	0.635

Open in a new tab

Discussion

Our paper presents a comprehensive evaluation of the BRL algorithm across six datasets, including both synthetic and real-world MDLT scenarios. The algorithm achieved an average accuracy of 79.2% and a worst-case accuracy of 69.3%, highlighting its robustness across varying conditions. Significantly, BRL outperformed the state-of-the-art BoDA method by 1.9% in worst-case accuracy and improved significantly by 7.5% over the widely used ERM baseline. In this context, ERM refers to a classical training paradigm in machine learning. Its core principle is to treat the training data as samples drawn from the true data distribution and to optimize model parameters by minimizing the loss function over this dataset, thereby reducing prediction error on the training samples. However, ERM exhibits inherent limitations: it is highly sensitive to distributional shifts, prone to overfitting during training, and often demonstrates poor generalization to data outside the training distribution. Despite these drawbacks, ERM remains a frequently employed benchmark in academic research due to its clear theoretical foundation and ease of implementation. For these reasons, ERM was included in our study as a baseline to highlight the effectiveness of the proposed approach. The performance gains achieved by BRL demonstrate the effectiveness of its reweighting-based balanced representation learning method, which incorporates both CB and RB to effectively address class and domain imbalance in MDLT image recognition.

The ability of BRL to learn domain- and class-unbiased representations through alignment in both the input and latent feature space is central to its success. Despite these strengths, there remain opportunities for further research. For example, future research could focus on optimizing the alignment strategies to achieve higher accuracy, especially on complex or open-set datasets. Additionally, exploring the adaptability of the BRL algorithm to different data characteristics across scientific applications will be critical to ensuring its broader applicability and practical adoption.

Conclusions and future work

In conclusion, this study addressed the challenge of data and domain imbalance in MDLT image recognition by introducing the BRL algorithm. Extensive experimental validation on six benchmark datasets demonstrated the effectiveness of BRL. By learning more balanced feature representations, BRL improved classification performance, particularly for the most difficult-to-identify classes. Beyond computer vision, the BRL algorithm showed considerable potential for broader scientific applications, including species classification in biodiversity research and disease diagnosis in biomedical imaging. These results highlighted the promise of the proposed algorithm for image recognition tasks and its potential utility in data classification across various scientific disciplines.

However, despite these encouraging results, several limitations remained to be addressed. First, the method relied on three key hyperparameters ( Inline graphic , , and ), and the experimental results demonstrated that model performance is sensitive to their selection. Developing an adaptive or automated parameter optimization strategy represents an important direction for future research. Second, the current framework incorporates feature information from the test data during training and assumes that all classes present in the test set have been seen during training. This closed-set assumption restricts the model’s ability to generalize to unseen classes and domains. Consequently, enhancing the model’s adaptability and robustness under open-domain settings—where novel classes and distributions may be encountered—remains a critical area for future investigation.

Acknowledgements

We would like to express our sincere appreciation to our research team for their valuable contributions to this work. Their insights, feedback, and support played a crucial role in shaping the content and improving the overall quality of the manuscript. We also extend our gratitude to the researchers, authors, and publishers of the studies cited in this paper for their significant contributions to the field.

Author contributions

All authors contributed to the conception and design of the study. P.F. contributed to the study by designing and conducting the experiments, as well as performing the data analysis. N.I.R.R. organized and revised the manuscript. J.W. was in charge of data collection and preprocessing. All authors reviewed and approved the final version of the manuscript.

Funding

This work was supported in part by the Ministry of Education Industry-University Cooperation and Collaborative Education Project under Grant 202002165017, and in part by the Scientific Research Project of Higher Education Institutions in Anhui under Grant 2024AH051814.

Data availability

The experimental data used in this study can be accessed through the codebase provided by Yang et al. We thank the authors for making their work publicly available and for their contributions to this area of research. The data is also available upon request from the corresponding author.

Code availability

The code used in this study is available upon request from the corresponding author.

Declarations

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).
2.Shao, L., Zhu, F. & Li, X. Transfer learning for visual categorization: A survey. IEEE Trans. Neural Netw. Learn. Syst.26, 1019–1034 (2014). [DOI] [PubMed] [Google Scholar]
3.Zhang, Y., Kang, B., Hooi, B., Yan, S. & Feng, J. Deep long-tailed learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell.45, 10795–10816 (2023). [DOI] [PubMed] [Google Scholar]
4.Yang, Y., Wang, H. & Katabi, D. On multi-domain long-tailed recognition, imbalanced domain generalization and beyond. In European Conference on Computer Vision, 57–75 (Springer, 2022).
5.Koh, P. W. et al. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 5637–5664 (PMLR, 2021).
6.Gu, X. et al. Tackling long-tailed category distribution under domain shifts. In European Conference on Computer Vision, 727–743 (Springer, 2022).
7.Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).
8.Shen, J., Qu, Y., Zhang, W. & Yu, Y. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).
9.Yang, X., Yao, H., Zhou, A. & Finn, C. Multi-domain long-tailed learning by augmenting disentangled representations. arXiv preprint arXiv:2210.14358 (2022).
10.Cheng, L., Guo, R., Candan, K. S. & Liu, H. Representation learning for imbalanced cross-domain classification. In Proceedings of the 2020 SIAM International Conference on Data Mining, 478–486 (SIAM, 2020).
11.Buda, M., Maki, A. & Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw.106, 249–259 (2018). [DOI] [PubMed] [Google Scholar]
12.Wang, Y.-X., Ramanan, D. & Hebert, M. Learning to model the tail. In Advances in Neural Information Processing Systems, vol. 30 (2017).
13.Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res.17, 1–35 (2016). [Google Scholar]
14.Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2980–2988 (2017).
15.Ren, J. et al. Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst.33, 4175–4186 (2020). [Google Scholar]
16.Hong, Y. et al. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6626–6636 (2021).
17.Kang, B. et al. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019).
18.Tang, K., Huang, J. & Zhang, H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv. Neural. Inf. Process. Syst.33, 1513–1524 (2020). [Google Scholar]
19.Wang, P., Han, K., Wei, X.-S., Zhang, L. & Wang, L. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 943–952 (2021).
20.Iscen, A., Araujo, A., Gong, B. & Schmid, C. Class-balanced distillation for long-tailed visual recognition. arXiv preprint arXiv:2104.05279 (2021).
21.Wang, X., Lian, L., Miao, Z., Liu, Z. & Yu, S. X. Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020).
22.Jamal, M. A., Brown, M., Yang, M.-H., Wang, L. & Gong, B. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619 (2020).
23.Li, S. et al. Metasaug: Meta semantic augmentation for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5212–5221 (2021).
24.Ding, Y. et al. Deep imbalanced domain adaptation for transfer learning fault diagnosis of bearings under multiple working conditions. Reliabil. Eng. Syst. Saf.230, 108890 (2023). [Google Scholar]
25.Xia, H., Jing, T. & Ding, Z. Generative inference network for imbalanced domain generalization. IEEE Trans. Image Process.32, 1694–1704 (2023). [DOI] [PubMed] [Google Scholar]
26.Le-Khac, P. H., Healy, G. & Smeaton, A. F. Contrastive representation learning: A framework and review. IEEE Access8, 193907–193934 (2020). [Google Scholar]
27.Duan, L., Xu, D. & Tsang, I.W.-H. Domain adaptation from multiple sources: A domain-dependent regularization approach. IEEE Trans. Neural Netw. Learn. Syst.23, 504–518 (2012). [DOI] [PubMed] [Google Scholar]
28.Jhuo, I.-H., Liu, D., Lee, D. & Chang, S.-F. Robust visual domain adaptation with low-rank reconstruction. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2168–2175 (IEEE, 2012).
29.Liu, H., Shao, M. & Fu, Y. Structure-preserved multi-source domain adaptation. In 2016 IEEE 16th International Conference on Data Mining (ICDM), 1059–1064 (IEEE, 2016).
30.Kandemir, M. Asymmetric transfer learning with deep gaussian processes. In International Conference on Machine Learning, 730–738 (PMLR, 2015).
31.Courty, N., Flamary, R., Tuia, D. & Rakotomamonjy, A. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell.39, 1853–1865 (2016). [DOI] [PubMed] [Google Scholar]
32.Tzeng, E., Hoffman, J., Zhang, N., Saenko, K. & Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).
33.Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016).
34.Sun, B., Feng, J. & Saenko, K. Correlation alignment for unsupervised domain adaptation. Domain Adaptation in Computer Vision Applications 153–171 (2017).
35.Sun, B. & Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision—ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part III 14, 443–450 (Springer, 2016).
36.Hainmueller, J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal.20, 25–46 (2012). [Google Scholar]
37.Kuang, K., Cui, P., Li, B., Jiang, M. & Yang, S. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 265–274 (2017).
38.Müller, A. Integral probability metrics and their generating classes of functions. Adv. Appl. Probab.29, 429–443 (1997). [Google Scholar]
39.Li, Y., Gong, M., Tian, X., Liu, T. & Tao, D. Domain generalization via conditional invariant representations. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).
40.Li, H., Pan, S. J., Wang, S. & Kot, A. C. Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5400–5409 (2018).
41.Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019).
42.Cao, K., Wei, C., Gaidon, A., Arechiga, N. & Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst.32 (2019).
43.Vapnik, V. N. An overview of statistical learning theory. IEEE Trans. Neural Netw.10, 988–999 (1999). [DOI] [PubMed] [Google Scholar]
44.Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731 (2019).
45.Xu, M. et al. Adversarial domain adaptation with domain mixup. Proc. AAAI Conf. Artif. Intell.34, 6502–6509 (2020). [Google Scholar]
46.Nam, H., Lee, H., Park, J., Yoon, W. & Yoo, D. Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8690–8699 (2021).
47.Li, D., Yang, Y., Song, Y.-Z. & Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).
48.Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G. & Scott, C. Domain generalization by marginal transfer learning. J. Mach. Learn. Res.22, 1–55 (2021). [Google Scholar]
49.Shi, Y. et al. Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937 (2021).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The code used in this study is available upon request from the corresponding author.

[CR1] 1.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).

[CR2] 2.Shao, L., Zhu, F. & Li, X. Transfer learning for visual categorization: A survey. IEEE Trans. Neural Netw. Learn. Syst.26, 1019–1034 (2014). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Zhang, Y., Kang, B., Hooi, B., Yan, S. & Feng, J. Deep long-tailed learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell.45, 10795–10816 (2023). [DOI] [PubMed] [Google Scholar]

[CR4] 4.Yang, Y., Wang, H. & Katabi, D. On multi-domain long-tailed recognition, imbalanced domain generalization and beyond. In European Conference on Computer Vision, 57–75 (Springer, 2022).

[CR5] 5.Koh, P. W. et al. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 5637–5664 (PMLR, 2021).

[CR6] 6.Gu, X. et al. Tackling long-tailed category distribution under domain shifts. In European Conference on Computer Vision, 727–743 (Springer, 2022).

[CR7] 7.Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).

[CR8] 8.Shen, J., Qu, Y., Zhang, W. & Yu, Y. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).

[CR9] 9.Yang, X., Yao, H., Zhou, A. & Finn, C. Multi-domain long-tailed learning by augmenting disentangled representations. arXiv preprint arXiv:2210.14358 (2022).

[CR10] 10.Cheng, L., Guo, R., Candan, K. S. & Liu, H. Representation learning for imbalanced cross-domain classification. In Proceedings of the 2020 SIAM International Conference on Data Mining, 478–486 (SIAM, 2020).

[CR11] 11.Buda, M., Maki, A. & Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw.106, 249–259 (2018). [DOI] [PubMed] [Google Scholar]

[CR12] 12.Wang, Y.-X., Ramanan, D. & Hebert, M. Learning to model the tail. In Advances in Neural Information Processing Systems, vol. 30 (2017).

[CR13] 13.Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res.17, 1–35 (2016). [Google Scholar]

[CR14] 14.Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2980–2988 (2017).

[CR15] 15.Ren, J. et al. Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst.33, 4175–4186 (2020). [Google Scholar]

[CR16] 16.Hong, Y. et al. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6626–6636 (2021).

[CR17] 17.Kang, B. et al. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019).

[CR18] 18.Tang, K., Huang, J. & Zhang, H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv. Neural. Inf. Process. Syst.33, 1513–1524 (2020). [Google Scholar]

[CR19] 19.Wang, P., Han, K., Wei, X.-S., Zhang, L. & Wang, L. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 943–952 (2021).

[CR20] 20.Iscen, A., Araujo, A., Gong, B. & Schmid, C. Class-balanced distillation for long-tailed visual recognition. arXiv preprint arXiv:2104.05279 (2021).

[CR21] 21.Wang, X., Lian, L., Miao, Z., Liu, Z. & Yu, S. X. Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020).

[CR22] 22.Jamal, M. A., Brown, M., Yang, M.-H., Wang, L. & Gong, B. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619 (2020).

[CR23] 23.Li, S. et al. Metasaug: Meta semantic augmentation for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5212–5221 (2021).

[CR24] 24.Ding, Y. et al. Deep imbalanced domain adaptation for transfer learning fault diagnosis of bearings under multiple working conditions. Reliabil. Eng. Syst. Saf.230, 108890 (2023). [Google Scholar]

[CR25] 25.Xia, H., Jing, T. & Ding, Z. Generative inference network for imbalanced domain generalization. IEEE Trans. Image Process.32, 1694–1704 (2023). [DOI] [PubMed] [Google Scholar]

[CR26] 26.Le-Khac, P. H., Healy, G. & Smeaton, A. F. Contrastive representation learning: A framework and review. IEEE Access8, 193907–193934 (2020). [Google Scholar]

[CR27] 27.Duan, L., Xu, D. & Tsang, I.W.-H. Domain adaptation from multiple sources: A domain-dependent regularization approach. IEEE Trans. Neural Netw. Learn. Syst.23, 504–518 (2012). [DOI] [PubMed] [Google Scholar]

[CR28] 28.Jhuo, I.-H., Liu, D., Lee, D. & Chang, S.-F. Robust visual domain adaptation with low-rank reconstruction. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2168–2175 (IEEE, 2012).

[CR29] 29.Liu, H., Shao, M. & Fu, Y. Structure-preserved multi-source domain adaptation. In 2016 IEEE 16th International Conference on Data Mining (ICDM), 1059–1064 (IEEE, 2016).

[CR30] 30.Kandemir, M. Asymmetric transfer learning with deep gaussian processes. In International Conference on Machine Learning, 730–738 (PMLR, 2015).

[CR31] 31.Courty, N., Flamary, R., Tuia, D. & Rakotomamonjy, A. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell.39, 1853–1865 (2016). [DOI] [PubMed] [Google Scholar]

[CR32] 32.Tzeng, E., Hoffman, J., Zhang, N., Saenko, K. & Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).

[CR33] 33.Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016).

[CR34] 34.Sun, B., Feng, J. & Saenko, K. Correlation alignment for unsupervised domain adaptation. Domain Adaptation in Computer Vision Applications 153–171 (2017).

[CR35] 35.Sun, B. & Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision—ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part III 14, 443–450 (Springer, 2016).

[CR36] 36.Hainmueller, J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal.20, 25–46 (2012). [Google Scholar]

[CR37] 37.Kuang, K., Cui, P., Li, B., Jiang, M. & Yang, S. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 265–274 (2017).

[CR38] 38.Müller, A. Integral probability metrics and their generating classes of functions. Adv. Appl. Probab.29, 429–443 (1997). [Google Scholar]

[CR39] 39.Li, Y., Gong, M., Tian, X., Liu, T. & Tao, D. Domain generalization via conditional invariant representations. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).

[CR40] 40.Li, H., Pan, S. J., Wang, S. & Kot, A. C. Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5400–5409 (2018).

[CR41] 41.Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019).

[CR42] 42.Cao, K., Wei, C., Gaidon, A., Arechiga, N. & Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst.32 (2019).

[CR43] 43.Vapnik, V. N. An overview of statistical learning theory. IEEE Trans. Neural Netw.10, 988–999 (1999). [DOI] [PubMed] [Google Scholar]

[CR44] 44.Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731 (2019).

[CR45] 45.Xu, M. et al. Adversarial domain adaptation with domain mixup. Proc. AAAI Conf. Artif. Intell.34, 6502–6509 (2020). [Google Scholar]

[CR46] 46.Nam, H., Lee, H., Park, J., Yoon, W. & Yoo, D. Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8690–8699 (2021).

[CR47] 47.Li, D., Yang, Y., Song, Y.-Z. & Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018).

[CR48] 48.Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G. & Scott, C. Domain generalization by marginal transfer learning. J. Mach. Learn. Res.22, 1–55 (2021). [Google Scholar]

[CR49] 49.Shi, Y. et al. Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937 (2021).

PERMALINK

Reweighting balanced representation learning for long tailed image recognition in multiple domains

Panpan Fu

Nur Intan Raihana Ruhaiyem

Jiangtao Wang

Abstract

Introduction

Related work

Imbalanced long-tailed distribution

Cross-domain representation learning

Problem definition

Fig. 1.

Table 1.

Definition 1

Methodology–representation learning with doubly balancing

Covariate balancing

Assumption 1

Representation balancing

Balanced domain-invariant representation

Algorithm 1.

Fig. 2.

Experiments

Datasets descriptions

Table 2.

Fig. 3.

Experimental cettings

Setup for model training

Evaluation setup

Results and analysis

Quantitative results

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Fig. 4.

Table 10.

Fig. 5.

Qualitative analysis via visualization

Fig. 6.

Parameter analysis

Fig. 7.

Fig. 8.

Ablation studies

Table 11.

Discussion

Conclusions and future work

Acknowledgements

Author contributions

Funding

Data availability

Code availability

Declarations

Competing Interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases