Unsupervised multi-source domain adaptation with no observable source data

Hyunsik Jeon; Seongmin Lee; U Kang

doi:10.1371/journal.pone.0253415

. 2021 Jul 9;16(7):e0253415. doi: 10.1371/journal.pone.0253415

Unsupervised multi-source domain adaptation with no observable source data

Hyunsik Jeon ¹, Seongmin Lee ¹, U Kang ^1,^*

Editor: Thippa Reddy Gadekallu²

PMCID: PMC8270218 PMID: 34242258

Abstract

Given trained models from multiple source domains, how can we predict the labels of unlabeled data in a target domain? Unsupervised multi-source domain adaptation (UMDA) aims for predicting the labels of unlabeled target data by transferring the knowledge of multiple source domains. UMDA is a crucial problem in many real-world scenarios where no labeled target data are available. Previous approaches in UMDA assume that data are observable over all domains. However, source data are not easily accessible due to privacy or confidentiality issues in a lot of practical scenarios, although classifiers learned in source domains are readily available. In this work, we target data-free UMDA where source data are not observable at all, a novel problem that has not been studied before despite being very realistic and crucial. To solve data-free UMDA, we propose DEMS (Data-free Exploitation of Multiple Sources), a novel architecture that adapts target data to source domains without exploiting any source data, and estimates the target labels by exploiting pre-trained source classifiers. Extensive experiments for data-free UMDA on real-world datasets show that DEMS provides the state-of-the-art accuracy which is up to 27.5% point higher than that of the best baseline.

Introduction

Given trained models from multiple source domains, how can we predict the labels of unlabeled data in a target domain? Unsupervised multi-source domain adaptation (UMDA) aims at predicting the labels of unlabeled target data by utilizing the knowledge of multiple source domains. Many previous works [1–9] for UMDA have focused on finding domain-invariant features z of data x to transfer the knowledge of conditional probability p(y|z), where y represents the label of data x, from the source domains to the target domain. It is thus essential for UMDA that data x is observable in all domains to be able to estimate the conditional probabilities p(z|x) of all domains while finding the domain-invariant features z.

However, source data are not always accessible, although models of conditional probabilities p(y|x) learned in source domains are often readily available, due to privacy or confidentiality issues in many practical scenarios. For instance, a hospital is allowed to access disease classifiers that are trained in other hospitals but not the data the classifiers observed because of privacy issues. Fig 1 illustrates the UMDA problems with two different constraints. It is problematic to find a shared manifold z and to translate data between domains if source data are not observable at all (Fig 1b), compared to the setting where data are observable in all domains (Fig 1a).

In this paper, we focus on data-free UMDA (Fig 1b), a more difficult but practical problem of knowledge transfer from multiple source domains to an unlabeled target domain. The main challenges are that: 1) we cannot directly estimate the target conditional probability p(y|x) since target labels are not given, and 2) we cannot directly learn the shared manifold z between domains since there is no information of source domain data distributions p(x). We propose DEMS (Data-free Exploitation of Multiple Sources), a novel architecture that adapts target data to source domains without using any source data and estimates the target labels exploiting pre-trained source classifiers. To the best of our knowledge, there has been no approach for data-free UMDA.

Table 1 compares DEMS with other algorithms for data-free UMDA in various perspectives. Since data-free UMDA is a new problem without previous studies, we introduce several baselines. The first one is Best Single Source which employs source classifiers individually and to find the best source classifier. The second one is Average which averages the results of all source classifiers. The third one is Weighted Sum which combines the results of all source classifiers by calculating domain proximities in a heuristic way. DEMS is the only method that utilizes multiple sources, considers domain proximity, and adapts source domains into target domain. Table 2 lists the symbols used in this paper. The contributions of this work are as follows:

Table 1. Comparison of DEMS and other methods.

Method	Utilize multiple sources	Consider domain proximity	Domain adaptation
Best Single Source	X	X	X
Average	O	X	X
Weighted Sum	O	O	X
DEMS (proposed)	O	O	O

Open in a new tab

DEMS is the only method supporting all the desired properties.

Table 2. Table of frequently-used symbols.

Symbol	Description
$T$	Target domain
$S_{k}$	k-th source domain
$M_{S_{k}}$	k-th source classifier
$x_{T}$ , $y_{T}$	Data and label of target domain
$x_{S_{k}}$ , $y_{S_{k}}$	Data and label of k-th source domain
A_k	Adaptation model from target domain to k-th source domain
E	Encoder
$D_{T}$	Decoder for target domain
$D_{S_{k}}$	Decoder for k-th source domain

Open in a new tab

Problem Formulation. We formulate a new problem of data-free UMDA which is challenging but important task for transfer learning (see Fig 1b). Unlike traditional UMDA, data-free UMDA needs to handle the issue of inaccessible source data.
Approach. We propose DEMS, a novel approach to solve data-free UMDA. DEMS adapts target data to source domains and exploits given source classifiers based on our proposed domain proximity. DEMS learns the adaptation functions while regulating the classification results of the source classifiers after adaptation.
Performance. Our extensive experiments demonstrate that DEMS provides the state-of-the-art accuracy which is up to 27.5% point higher than that of the best baseline (see Fig 2).

Fig 2 — DEMS shows the best classification accuracy for five target domains; each percentage indicates the accuracy increase compared to the second-best one for each target domain.

Related work

Domain adaptations (DA) aim at transferring the knowledge of a source domain to a different but related target domain. Unsupervised domain adaptation (UDA) aims to leverage a labeled source domain dataset for label prediction for an unlabeled target domain dataset. Various approaches for UDA have been proposed including adversarial methods [10–13], distance-based methods [14–18], and optimal transportations [19, 20].

Recent works [1–9] address unsupervised multi-source domain adaptation (UMDA) which aims at transferring the knowledge from multiple source domains rather than a single one to an unlabeled target domain. UMDA bestows high potential of a superior performance by exploiting multiple source domain knowledge, but poses challenges of reducing domain discrepancy between multiple domains and obtaining appropriate domain-invariant features. Many previous works have tackled UMDA problems with various approaches. Table 3 summarizes the key differences in various approaches. Zhao et al. [5] propose an adversarial network based approach with generalization bounds for UMDA. Xu et al. [6] propose Deep Cocktail Network which addresses the domain and category shifts among multiple source domains in a multi-way adversarial manner. Peng et al. [9] introduce moment matching to UMDA to dynamically align moments of low-dimensional features in source and target domains while training source classifiers. However, these approaches assume that source data are observable and train adaptation networks to align manifolds of source and target domains. Thus they are not applicable to our setting where no source data are accessible due to strict privacy or confidentiality issues. On the other hand, DEMS trains adaptation networks using target data while regulating the results of the given source classifiers.

Table 3. Comparison of different latent space transformation methods for unsupervised multi-source domain adaptation (UMDA).

Method	Source data accessibility	Feature alignment method
[5, 6]	Accessible	Adversarial approach
[9]	Accessible	Discrepancy-based approach
DEMS (proposed)	Inaccessible	Source classifier-based approach

Open in a new tab

Previous studies have proposed adversarial and discrepancy-based approaches which necessitate source data. On the other hand, DEMS works without source data by carefully utilizing source classifiers.

Proposed method

Problem definition

Suppose there are N source domains $S_{1}, S_{2}, \dots, S_{N}$ and one target domain $T$ where all domains have different data distributions. We are given pre-trained source classifiers ${M_{S_{k}} : x_{S_{k}} \to y_{S_{k}}}_{k = 1}^{N}$ that predict the labels of data from the corresponding source domains ${S_{k}}_{k = 1}^{N}$ , and an unlabeled target dataset $X_{T} = {x_{T}^{i}}_{i = 1}^{n_{T}}$ from the target domain $T$ ; for simplicity, we assume the target dataset is sampled from uniform label distribution. Each source classifier $M_{S_{k}}$ is trained under a labeled dataset ${(x_{S_{k}}^{i}, y_{S_{k}}^{i})}_{i = 1}^{n_{S_{k}}}$ which is drawn from the corresponding domain data distribution $p_{S_{k}} (x, y)$ . Note that the source datasets are unavailable to us, and only the source classifiers are available. In this work, we assume 1) homogeneity which indicates that sources and target domains have similar feature spaces and label distributions, and 2) closed label set, i.e. $y_{S_{k}}, y_{T} \in Y$ for k = 1, 2, …, N, where $Y$ is the label space, indicating all domains have the same label space. The goal of data-free UMDA is to accurately predict the target domain labels $Y_{T} = {y_{T}^{i}}_{i = 1}^{n_{T}}$ of the corresponding target domain data $X_{T} = {x_{T}^{i}}_{i = 1}^{n_{T}}$ .

Method overview

In UMDA, directly training a target classifier $M_{T} : x_{T} \to y_{T}$ from the target dataset is not possible since the target labels are not observable. Thus, most UMDA methods train N adaptation functions ${A_{k} : X_{T} \to X_{S_{k}}}_{k = 1}^{N}$ and exploit the pre-trained source classifiers ${M_{S_{k}}}_{k = 1}^{N}$ to predict the target labels $Y_{T}$ of the target data $X_{T}$ . However, in data-free UMDA, we face the challenge of defining the objective function to train the adaptation functions ${A_{k}}_{k = 1}^{N}$ , since the source data are unobservable and we have no information about the source data distribution $p_{S_{k}} (x)$ that was used to train $M_{S_{k}}$ .

To address the challenge, we propose DEMS (Data-free Exploitation of Multiple Sources), a novel method for unsupervised multiple domain adaptation problem when the source data are entirely unavailable. We cannot directly learn the adaptation results of the target data to the source domains since we have no information on the source domains at all. Hence, we regulate the classification results using the source classifiers instead of learning the translation between the target and the source domains directly.

We introduce four ideas in DEMS to regulate the classification results.

The first idea is label consistency regularization which regulates the label predictions of all source classifiers to be similar. The adapted examples from the target domain to the source domains should all have the same label if the adaptation functions work properly; we relax the constraint so that the conditional probability p(y|x) of adapted examples should be similar across all source domains.
The second idea is batch entropy regularization which maximizes the label entropy of a shuffled mini-batch. The labels of randomly selected target examples are uniformly distributed; note that we assume the target dataset is sampled from uniform label distribution. Thus, we maximize the batch entropy to prevent mode collapse where most of the target examples are mapped to a specific label.
The third ideas are instance entropy regularization and pseudo label which minimize the label entropy of each instance. A target example naturally has a clear single label. Thus, the adapted examples should all have clear labels if the adaptation functions work properly; we minimize the label entropy after adaptation. We further bolster the entropy minimization by labeling highly confident target data with pseudo labels and minimizing cross-entropy loss between predictions and the pseudo labels.
The last idea is reconstruction regularization that forces an autoencoder to reconstruct target data from the shared manifold. The autoencoder helps find the manifold without losing meaningful information. Thus, we introduce the autoencoder in DEMS with shared parameters and reconstruct target examples to learn their manifold effectively.

The overall architecture of DEMS is depicted in Fig 3. DEMS adapts the target features $X_{T}$ to the source domains ${S_{k}}_{k = 1}^{N}$ via an encoder and decoders to exploit the source classifiers ${M_{S_{k}}}_{k = 1}^{N}$ . Each adaptation function $A_{k} : X_{T} \to X_{S_{k}}$ is divided into two components: encoder E and decoder $D_{S_{k}}$ . The encoder E takes a target data $x_{T}$ as an input and returns its low-dimensional representation vector z; E is shared over all domain adaptation functions. The decoder $D_{S_{k}}$ takes the vector z as an input and returns ${\hat{x}}_{S_{k}}$ , the translated data into the domain $S_{k}$ . Additionally, we introduce a decoder $D_{T}$ that decodes the low-dimensional representation z into the target domain $T$ . We describe the label prediction and the objective function of DEMS in the next.

Method details

Label prediction

For each unlabeled target instance $x_{T}$ , DEMS exploits pre-trained source models ${M_{S_{k}}}_{k = 1}^{N}$ in predicting its label $y_{T}$ . Specifically, the predicted label by DEMS is formulated as:

\begin{matrix} {\hat{y}}_{T} = \sum_{k = 1}^{N} w_{S_{k}} M_{S_{k}} ({\hat{x}}_{S_{k}}) . \end{matrix}

(1)

In the equation, ${\hat{x}}_{S_{k}}$ is $D_{S_{k}} (E (x_{T}))$ which indicates the translated data instance into source domain $S_{k}$ utilizing the encoder E and the decoder $D_{S_{k}}$ ; $0 \leq w_{S_{k}} \leq 1$ (Eq 2) denotes the weight for the source domain $S_{k}$ . All weights add up to 1, i.e. $\sum_{k = 1}^{N} w_{S_{k}} = 1$ , which states that DEMS predicts label ${\hat{y}}_{T}$ of data $x_{T}$ as a weighted sum of the source classifiers’ predictions after domain adaptations. DEMS depends more on the prediction of a source classifier with a higher proximity as:

\begin{matrix} w_{S_{k}} = \frac{e x p (Φ (T, S_{k}) / λ_{1})}{\sum_{k^{'} = 1}^{N} e x p (Φ (T, S_{k^{'}}) / λ_{1})}, \end{matrix}

(2)

where Φ(A, B) (Eq 3) denotes the degree of proximity between domains A and B, and λ₁ > 0 is a hyperparameter that controls the balance of dependency on source domains. For instance, all the source classifiers contribute almost equally to the label prediction if λ₁ is a large value, while a source classifier with higher proximity Φ becomes dominant to the label prediction if λ₁ is close to 0.

It is challenging to estimate the degree of proximity between domains since data distributions p(x) of domains are not observable except for the target domain. Our approach is to learn it using an objective function; the degree of proximity Φ(A, B) between domain A and B is defined by

\begin{matrix} Φ (A, B) = v_{A}^{⊺} v_{B}, \end{matrix}

(3)

where $v_{A}, v_{B} \in R^{d}$ are learnable parameters with dimensionality d, which indicates that the degree of proximity between domains A and B is estimated by an inner-product of their trained embedding vectors. The embedding vectors are trained in the optimization process.

Objective function

DEMS is trained to minimize the following loss:

\begin{matrix} L_{t o t a l} = α L_{l a b e l} + β L_{e n t r o p y} + γ L_{p s e u d o} + L_{r e c o n}, \end{matrix}

(4)

which consists of four different loss terms $L_{l a b e l}$ , $L_{e n t r o p y}$ , $L_{p s e u d o}$ , and $L_{r e c o n}$ . α, β, and γ are nonnegative hyperparameters that adjust the balance between the loss terms. We define these loss terms in Eqs 5, 9–11, respectively.

Label consistency regularization

The aim of domain adaptation is to translate domain-specific features of an example from the target domain to any source domain while preserving its semantics. If a target example $x_{T}$ is adapted to multiple source domains while preserving its semantics, the conditional probability p(y|x) of the adapted examples in all source domains should be similar. For instance, if an example has a high probability of label 4 in the target domain, the adapted example should likewise have high probabilities of label 4 in any source domain. To guarantee this property, we propose a label-consistency regularization for multi-source domain adaptation as:

\begin{matrix} L_{l a b e l} = {(_{2}^{N})}^{- 1} \sum_{1 \leq i < j \leq N} r_{S_{i}, S_{j}} J S D ({\hat{y}}_{S_{i}} | | {\hat{y}}_{S_{j}}), \end{matrix}

(5)

where ${\hat{y}}_{S_{k}}$ is $M_{S_{k}} ({\hat{x}}_{S_{k}})$ indicating the label probability distribution of $x_{T}$ estimated by source domain classifier $M_{S_{k}}$ after adapted to the source domain $S_{k}$ . JSD(⋅) in the equation indicates Jensen-Shannon divergence [21] which is a symmetrized and smoothed version of the Kullback-Leibler divergence [22]. Jensen-Shannon divergence measures the distance between two probability distributions; a small JSD indicates that the two distributions are similar, and a large JSD indicates otherwise. $r_{S_{i}, S_{j}}$ (Eq 6) is a degree of proximity between $S_{i}$ and $S_{j}$ over the sum of all possible proximities between source domains:

\begin{matrix} r_{S_{i}, S_{j}} = \frac{e x p (Φ (S_{i}, S_{j}) / λ_{2})}{\sum_{1 \leq i^{'} < j^{'} \leq N} e x p (Φ (S_{i}^{'}, S_{j^{'}}) / λ_{2})} . \end{matrix}

(6)

$r_{S_{i}, S_{j}}$ strengthens label-consistency between close source domains while mitigating that between distant source domains. λ₂ > 0 is a hyperparameter to control the degree of the regularization.

Entropy regularizations

Entropy regularizations include two distinct losses based on information entropy [23]: 1) batch-entropy loss $L_{b e}$ for maximizing the label entropy of a batch, and 2) instance-entropy loss $L_{i e}$ for minimizing the label entropy of each instance.

We assume that the target dataset is balanced against classes, i.e. examples are sampled with a similar probability from each label, which is a common prior for real-world data. By the assumption, the average of all target label probabilities follows a uniform distribution, i.e. $(\frac{1}{| C |}, \frac{1}{| C |}, \dots, \frac{1}{| C |})$ where $C$ denotes the set of classes. Using the fact that a uniform distribution has the maximum value of information entropy, we define the batch-entropy loss as follows:

\begin{matrix} L_{b e} = - \frac{1}{N} \sum_{k = 1}^{N} H (\frac{1}{| B |} \sum_{i \in B} {\hat{y}}_{S_{k}}^{i}), \end{matrix}

(7)

where $B$ is set of instances of a mini-batch ${x_{T}^{i} \sim p_{T (x)}}$ , and H(⋅) indicates the information entropy [23]; the mini-batch is also balanced against classes since it is randomly sampled from the whole dataset. By minimizing the batch-entropy loss, we force the average of batch-wise label probabilities estimated by each source classifier after adaptation to have a uniform probability distribution.

On another aspect, each target instance inherently has a clear single label, which indicates that it has a one-hot label probability even if the exact label probability is unknown. Based on the fact that a one-hot probability distribution has the minimum value of information entropy [23], we define the instance-entropy loss as follows:

\begin{matrix} L_{i e} = \frac{1}{N | B |} \sum_{k = 1}^{N} \sum_{i \in B} H ({\hat{y}}_{S_{k}}^{i}) . \end{matrix}

(8)

We finally define the total entropy loss by summing up batch-entropy loss (Eq 7) and instance-entropy loss (Eq 8) as follows:

\begin{matrix} L_{e n t r o p y} = L_{b e} + L_{i e} . \end{matrix}

(9)

Pseudo label

High confidence of the predicted label of a target example, which is estimated by Eq 1, indicates that the example is successfully adapted to source domains and clearly classified by the source classifiers. Accordingly, we employ pseudo-labels to bolster the current predictions by pretending that the predicted label is the ground-truth label. The pseudo-label loss is formulated by a cross-entropy between the predictions and the pseudo-labels as follows:

\begin{matrix} L_{p s e u d o} = - \frac{1}{| b |} \sum_{i \in b} \sum_{j \in C} {({\bar{y}}_{T}^{i})}_{j} log {({\hat{y}}_{T}^{i})}_{j}, \end{matrix}

(10)

where $C$ is the set of classes, ${\bar{y}}_{T} = D i r a c ({\hat{y}}_{T})$ , and (y)_j denotes the probability of j-th class in y. ${\hat{y}}_{T}$ is a predicted target label by DEMS (Eq 1). Dirac(⋅) is a function that makes a Dirac distribution; for simplicity, we choose one-hot vectorization that sets the maximum probability to 1 and the rest to 0. Only examples that meet ${max}_{j} {({\hat{y}}_{T})}_{j} > ∊$ , where 0 ≤ ϵ ≤ 1 is a hyperparameter that regulates the threshold of confidence, are sampled from the mini-batch $B$ ; $b \subset B$ in Eq 10 indicates the selected subset of the mini-batch.

Reconstruction

Autoencoders [24], which encode input data to low-dimensional vectors and decode them into the original space by reconstruction regularization, learn a meaningful low-dimensional manifold by preventing the simple copy of the input data. We employ an autoencoder sharing the encoder E in finding a low-dimensional manifold z. The reconstruction loss is formulated as follows:

\begin{matrix} L_{r e c o n} = {| x_{T} - {\hat{x}}_{T} |}_{1}, \end{matrix}

(11)

where ${\hat{x}}_{T}$ is $D_{T} (E (x_{T}))$ indicating the reconstruction of $x_{T}$ by encoder E and decoder $D_{T}$ , and ‖⋅‖₁ denotes the l₁ norm.

Algorithm 1 Training DEMS (Data-free Exploitation of Multiple Sources)

Require: unlabeled target dataset $X_{T} = {x_{T}^{i}}_{i = 1}^{n_{T}}$

Require: trained source classifiers ${M_{S_{k}} : x_{S_{k}} \to y_{S_{k}}}_{k = 1}^{N}$

Require: adaptation networks ${A_{k} : X_{T} \to X_{S_{k}}}_{k = 1}^{N}$

Require: hyperparameters α, β, γ, λ₁, λ₂, and ϵ

Ensure: trained adaptation networks ${A_{k} : X_{T} \to X_{S_{k}}}_{k = 1}^{N}$

1: for [1, num_epochs] do

2: Calculate the label consistency loss $L_{l a b e l}$ (Eq 5)

3: Calculate the batch-entropy loss $L_{b e}$ (Eq 7)

4: Calculate the instance-entropy loss $L_{i e}$ (Eq 8)

5: Calculate the entropy loss $L_{e n t r o p y} \leftarrow L_{b e} + L_{i e}$ (Eq 9)

6: Predict the target labels ${\hat{y}}_{T}$ (Eq 10) and filter only ones that meet ${max}_{j} {({\hat{y}}_{T})}_{j} > ∊$

7: Calculate the pseudo-label loss $L_{p s e u d o}$ (Eq 10)

8: Calculate the reconstruction loss $L_{r e c o n}$ (Eq 11)

9: Calculate the total loss $L_{t o t a l} \leftarrow α L_{l a b e l} + β L_{e n t r o p y} + γ L_{p s e u d o} + L_{r e c o n}$ (Eq 4)

10: Update the parameters of ${A_{k}}_{k = 1}^{N}$ to minimize $L_{t o t a l}$

11: end for

Algorithm

We summarize the training algorithm of DEMS in Algorithm 1. DEMS takes initialized adaptation networks ${M_{S_{k}} : x_{S_{k}} \to y_{S_{k}}}_{k = 1}^{N}$ and trains them while exploiting pre-trained source classifiers without any source data. DEMS calculates the total loss $L_{t o t a l}$ in lines 2 to 9. Then, in line 10, DEMS updates the parameters of the adaptation networks ${M_{S_{k}}}_{k = 1}^{N}$ to minimize the total loss $L_{t o t a l}$ . This is repeated until the adaptation networks ${M_{S_{k}}}_{k = 1}^{N}$ are trained properly; we use validation set and the training is performed until the total loss $L_{t o t a l}$ of the validation set is the lowest. After being trained, DEMS predicts the target labels of test data by Eq 10 using the trained adaptation networks. The predicted target labels are evaluated by the ground-truth labels and we report the accuracies in the next section. The computational complexity is dependent on the architecture of the encoder and decoders. In the case of a CNN-based architecture, the computational complexity of label prediction of DEMS is $O (H W k^{2} M N)$ ; H and W are height and width of input image, respectively, k is size of kernel, and M and N are sizes of input and output channels, respectively.

Experiments

We conduct experiments to answer the following questions:

Q1. Accuracy. How accurate is DEMS on real-world datasets?
Q2. Qualitative analysis. How well does DEMS adapt a given target example to source domains?
Q3. Parameter sensitivity. How much do ∊ (Eq 10) and λ (Eqs 2 and 6) affect the accuracy?

Experimental settings

Datasets

We use five different number datasets: MNIST [25], MNIST-M [10], SVHN [26], SynDigits [27], and USPS [28], which are summarized in Table 4; Fig 4 shows sample images of each dataset. For SynDigits, we use a randomly selected subset of 60,000 images for training and validation out of 479,400 images;the subset is considered to possess sufficient domain knowledge since a classifier trained on it shows 95.9% accuracy. We use the original datasets for the other datasets. The five datasets are scaled to the size of (3 × 32 × 32) to have the same input dimensionality. We set one of them as a target and the rest as sources in the experiments.

Table 4. Summary of datasets.

Dataset	Features	Training	Validation	Test
MNIST	1 × 28 × 28	55,000	5,000	10,000
MNIST-M	3 × 32 × 32	55,000	5,000	10,000
SVHN	3 × 32 × 32	68,257	5,000	26,032
SynDigits	3 × 32 × 32	55,000	5,000	9,553
USPS	1 × 16 × 16	6,291	1,000	2,007

Open in a new tab

Baselines

We set three baselines: Best single source, Average, and Weighted sum. Best single source directly feeds the target data into source classifiers, and the source classifier which yields the best performance is chosen. Average feeds the target data into all source classifiers and averages the resulting label probabilities to predict target labels. Weighted sum takes a weighted sum of the results after feeding the target data into source classifiers; we utilize Eq 2 for the weights, and set $Φ (T, S_{k})$ as $ξ - L_{e n t r o p y}^{T \to S_{k}}$ , where $L_{e n t r o p y}^{T \to S_{k}}$ is the sum of batch-entropy loss and instance-entropy loss that are estimated when the target data are directly fed into source classifier $M_{S_{k}}$ . ξ is a hyperparmeter and we set it to 1 for all experiments. The intuition behind the definition of $Φ (T, S_{k})$ is that $L_{e n t r o p y}^{T \to S_{k}}$ is presumable to be low if the degree of proximity between $T$ and $S_{k}$ is high.

Network architecture

We pre-train ResNet14 [29] for each dataset to generate the source classifiers. We adopt the architecture of generator in CycleGAN [30]; the encoder is composed of two convolutional layers with stride size two and three residual blocks [29]; each of the decoder is composed of three residual blocks and two transposed convolutional layers with stride size two. We use batch normalization [31] for the encoder and the decoders. Note that an appropriate network architecture should be selected for each domain of application; recurrent neural networks [32] and graph autoencoders [33] could be selected in the natural language processing domain [34, 35] and in the graph domain [36–39], respectively.

Training details

We first minimize $L_{r e c o n}$ during the first 5 epochs, initialize ${D_{S_{k}}}_{k = 1}^{N}$ with the trained $D_{T}$ , and then minimize $L_{t o t a l}$ . Finally, a classification accuracy of the test target dataset is reported at the lowest validation loss $L_{t o t a l}$ among 100 epochs. Each experiment is performed 5 times with different random seeds, and the standard deviation is reported along with the average. We use the hyperparameters that give the best performance. We set α = 0.1, β = 1, and γ = 1 among {0.1, 0.5, 1, 5, 10} in Eq 4. Unless otherwise noted, ϵ (Eq 10) is set to 0.9 among {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. We set λ₁ (Eq 2) and λ₂ (Eq 6) the same as λ; λ is set to 1 among {0.125, 0.25, 0.5, 1, 2, 4, 8}. We set the dimensionality of v_A and v_B as 10 in Eq 3. All the networks are trained with Adam optimizer [40] with learning rate 0.001, l₂ regularization coefficient 0.0001, β₁ = 0.9, and β₂ = 0.999. We implement all the codes with PyTorch and perform a grid search to find the best hyperparameters, using a workstation with RTX 2080 Ti.

Accuracy

Overall performance

We compare DEMS with other baselines for data-free UMDA. Table 5 shows the classification accuracy. DEMS shows the best performance outperforming the baselines in all experiments. In particular, the performance differences between DEMS and the baselines are large for the MNIST-M target which has very complex patterns as shown in Fig 4; DEMS shows 27.5% point higher accuracy than the best baseline. In all experiments except the USPS target, Average and Weighted sum exploiting the knowledge of multiple source domains show worse performances than Best single source exploiting the knowledge of single source domain. This demonstrates how challenging data-free UMDA problem is and supports the contribution of this work.

Table 5. Classification accuracy of DEMS and baselines.

Target dataset	Best single source (Single source)	Average (Multi-sources)	Weighted sum (Multi-sources)	DEMS (proposed) (Multi-sources)
MNIST	97.65 ± 0.75%	94.87 ± 1.22%	96.37 ± 0.40%	99.01 ± 0.12%
MNIST-M	45.03 ± 3.74%	33.50 ± 1.72%	40.91 ± 1.24%	72.57 ± 3.20%
SVHN	71.87 ± 3.53%	23.11 ± 1.61%	56.09 ± 5.17%	76.60 ± 1.39%
SynDigits	91.89 ± 1.79%	60.47 ± 5.69%	78.66 ± 4.37%	93.74 ± 0.79%
USPS	82.03 ± 3.77%	84.54 ± 5.31%	88.09 ± 2.04%	96.14 ± 0.41%

Open in a new tab

The remains except for the target dataset are used for sources. The best method is in bold, and the second best one is underlined. Note that DEMS gives the best performance.

Ablation study

We conduct an ablation study to evaluate how each loss of DEMS contributes to the performance.Table 6 shows the ablation study that evaluates the effectiveness of each loss in DEMS. Note that each of the proposed losses in the objective function (Eq 4) contributes significantly to the performance of DEMS, showing the effectiveness of our ideas.

Table 6. Ablation study on MNIST-M target dataset.

DEMS	$DEMS - L_{l a b e l}$	$DEMS - L_{b e}$	$DEMS - L_{i e}$	$DEMS - L_{p s e u d o}$	$DEMS - L_{r e c o n}$
72.69 ± 2.60%	65.65 ± 5.55%	10.33 ± 0.62%	59.58 ± 3.57%	43.88 ± 1.73%	11.11 ± 1.59%

Open in a new tab

$DEMS - L$ indicates a variant of DEMS with $L$ excluded from $L_{t o t a l}$ . Note that each of the loss significantly contributes to the accuracy of DEMS.

Qualitative analysis

We analyze DEMS and its variants $DEMS - L$ qualitatively to evaluate how well DEMS adapts data to different domains; $DEMS - L$ indicates a variant of DEMS with $L$ excluded from $L_{t o t a l}$ . Note that the baseline algorithms are not analyzed qualitatively since they do not adapt data to different domains (see Table 1). For $DEMS - L$ , we select three variants $DEMS - L_{p s e u d o}$ , $DEMS - L_{b e}$ , and $DEMS - L_{r e c o n}$ which show the lowest accuracies in the ablation study (see Table 6).

Fig 5 visualizes adapted sample examples from MNIST-M to MNIST, SVHN, SynDigits, and USPS, respectively. DEMS (Fig 5b) translates the images into noises at the beginning of training (epoch 1). As training progresses, however, meaningful patterns (e.g. shape of digits rather than backgrounds) of the target images are detected and adapted to each source domain (epoch 7). As training progresses more (epoch 30), DEMS focuses adaptation on closer source domains (MNIST, SVHN, and SynDigits) than to the far source domain (USPS), and its classification performance improves. $DEMS - L_{p s e u d o}$ (Fig 5c) successfully adapts most of the classes to MNIST and SynDigits, but fails to adapt some classes (digits 3, 7, and 9) to the source domains yielding degraded classification performance. It is shown that $DEMS - L_{b e}$ (Fig 5d) and $DEMS - L_{r e c o n}$ (Fig 5e) do not learn to adapt the target data to the source domains.

Parameter sensitivity

Sensitivity of ϵ

The hyperparameter ϵ, which is involved in $L_{p s e u d o}$ (Eq 10), governs the threshold of pseudo-labels. As ϵ increases, the selected examples have higher confidence while fewer examples are selected. On the other hand, as ϵ decreases, the number of selected examples increases while the confidence of the examples decreases. As shown in Fig 6a, the accuracy is the highest when ϵ is 0.9 for all datasets, and the accuracy is significantly reduced in the extreme case when ϵ = 1. The results demonstrate that DEMS is best optimized through high-quality pseudo-labels.

Sensitivity of λ

The hyperparameter λ, which is involved in Eqs 2 and 6, controls the balance of dependency between domains; note that λ₁ = λ₂ = λ for our experiments. For instance, if λ is a large positive value, all the source classifiers almost equally contribute to the target label prediction in Eq 1 and are highly regulated to output the similar predictions in Eq 5. For instance, if λ is a large positive value, all the source classifiers almost equally contribute to the target label prediction in Eq 1 and even source classifiers that are not close to each other are regulated to output the similar predictions in Eq 5. Conversely, if λ is close to zero, a source classifier closer to the target domain contributes more to the target label prediction in Eq 1 and source classifiers that are not closer to each other are less regulated to output similar predictions in Eq 5. Fig 6b shows that the best results are obtained when λ = 1 for all target domains, and the performance degrades if the λ is too large or too small. In particular, SVHN which has relatively complex patterns shows a severely degraded performance when λ is larger than 2, which means that it is more helpful for a complex target to consider a nearby source than all sources.

Conclusion

We propose DEMS (Data-free Exploitation of Multiple Sources), a novel architecture for multi-source domain adaptation without any observable source data. DEMS learns to adapt target data to each source domain to exploit the pre-trained source classifiers. Experiments on real-world datasets show that DEMS outperforms baselines up to 27.5% point higher accuracy, by successfully learning the adaptation function and exploiting the source classifiers in target label predictions. However, DEMS assumes that the source and target domains have similar feature spaces and have the same label space. Thus, DEMS is not applicable in domain adaptation between heterogeneous domains. Future works include extending DEMS to transfer knowledge between heterogeneous domains, e.g. from images to text or vice versa, that may require careful design of adaptation networks.

Data Availability

The data and code are available at: https://github.com/snudatalab/DEMS.

Funding Statement

This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No.2020-0-00894, Flexible and Efficient Model Compression Method for Various Applications and Environments). The Institute of Engineering Research and ICT at Seoul National University provided research facilities for this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Gan, C., Yang, T., & Gong, B. Learning attributes equals multi-source domain generalization. In CVPR (2016).
2.Hoffman, J., Kulis, B., Darrell, T., & Saenko, K. Discovering latent domains for multisource domain adaptation. In ECCV (2012).
3.Sun, Q., Chattopadhyay, R., Panchanathan, S., & Ye, J. A two-stage weighting framework for multi-source domain adaptation. In NeuIPS (2011).
4.Zhang, K., Gong, M., & Schölkopf, B. Multi-source domain adaptation: A causal view. In AAAI (2015).
5.Zhao, H., Zhang, S., Wu, G., Moura, J. M. F., Costeira, J. P., & Gordon, G. J. Adversarial multiple source domain adaptation. In NeurIPS (2018).
6.Xu, R., Chen, Z., Zuo, W., Yan, J., & Lin, L. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In CVPR (2018).
7.Roy, S., Siarohin, A., Sangineto, E., Sebe, N., & Ricci, E. Trigan: Image-to-image translation for multi-source domain adaptation. CoRR (2020).
8. Ben-David S., Blitzer J., Crammer K., Kulesza A., Pereira F., & Vaughan J. W. A theorcy of learning from different domains. Mach. Learn. 79(1-2), 151–175 (2010). doi: 10.1007/s10994-009-5152-4 [DOI] [Google Scholar]
9.Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. Moment matching for multi-source domain adaptation. In ICCV (2019).
10.Ganin, Y. & Lempitsky, V. S. Unsupervised domain adaptation by backpropagation. In ICML (2015).
11.Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR (2017).
12.Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. Adversarial discriminative domain adaptation. In CVPR (2017).
13.Long, M., Cao, Z., Wang, J., & Jordan, M. I. Conditional adversarial domain adaptation. In NeurIPS (2018).
14.Long, M., Zhu, H., Wang, J., & Jordan, M. I. Deep transfer learning with joint adaptation networks. In ICML (2017).
15.Long, M., Zhu, H., Wang, J., & Jordan, M. I. Unsupervised domain adaptation with residual transfer networks. In NeurIPS (2016).
16.Long, M., Cao, Y., Wang, J., & Jordan, M. I. Learning transferable features with deep adaptation networks. In ICML (2015). [DOI] [PubMed]
17.Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., & Saminger-Platz, S. Central moment discrepancy (CMD) for domain-invariant representation learning. In ICLR (2017).
18.Chen, C., Chen, Z., Jiang, B., & Jin, X. Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In AAAI (2019).
19.Courty, N., Flamary, R., Habrard, A., & Rakotomamonjy, A. Joint distribution optimal transportation for domain adaptation. In NeurIPS (2017).
20.Damodaran, B. B., Kellenberger, B., Flamary, R., Tuia, D., & Courty, N. Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. In ECCV (2018).
21. Lin J. Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991). doi: 10.1109/18.61115 [DOI] [Google Scholar]
22. Kullback S. & Leibler R. A. On information and sufficiency. The annals of mathematical statistics 22(1), 79–86 (1951). doi: 10.1214/aoms/1177729694 [DOI] [Google Scholar]
23. Shannon C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948). doi: 10.1002/j.1538-7305.1948.tb01338.x [DOI] [Google Scholar]
24.Masci, J., Meier, U., Ciresan, D. C., & Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In ICANN (2011).
25. LeCun Y., Bottou L., Bengio Y., & Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998). doi: 10.1109/5.726791 [DOI] [Google Scholar]
26.Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. Reading digits in natural images with unsupervised feature learning. (2011).
27.Roy, P., Ghosh, S., Bhattacharya, S., & Pal, U. Effects of degradations on deep neural network architectures. CoRR (2018).
28.Hastie, T., Friedman, J. H., & Tibshirani, R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer (2001).
29.He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. In CVPR (2016).
30.Zhu, J., Park, T., Isola, P., & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV (2017).
31.Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Bach, F. R. & Blei, D. M., editors, ICML (2015).
32.Sutskever, I., Vinyals, O., & Le, Q. V. Sequence to sequence learning with neural networks. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., & Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada (2014).
33.Kipf, T. N. & Welling, M. Variational graph auto-encoders. CoRR (2016).
34.Clark, K., Luong, M., Le, Q. V., & Manning, C. D. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net (2020).
35.He, J., Wang, X., Neubig, G., & Berg-Kirkpatrick, T. A probabilistic formulation of unsupervised text style transfer. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020).
36.Ahamad, R. Z., Javed, A. R., Mehmood, S., Khan, M. Z., Noorwali, A., & Rizwan, M. Interference mitigation in d2d communication underlying cellular networks: Towards green energy. CMC-COMPUTERS MATERIALS & CONTINUA (2021).
37.Alazab, A., Venkatraman, S., Abawajy, J., & Alazab, M. An optimal transportation routing approach using gis-based dynamic traffic flows. In ICMTA 2010: Proceedings of the International Conference on Management Technology and Applications (2010).
38.Naeem, A., Javed, A. R., Rizwan, M., Abbas, S., Lin, J. C., & Gadekallu, T. R. DARE-SEP: A hybrid approach of distance aware residual energy-efficient SEP for WSN. IEEE Trans. Green Commun. Netw. (2021).
39.Priya, R. M. S., Maddikunta, P. K. R., M., P., Koppu, S., Gadekallu, T. R., Chowdhary, C. L., et al. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in iomt architecture. Comput. Commun. (2020).
40.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Bengio, Y. & LeCun, Y., editors, ICLR (2015).

PLoS One. doi: 10.1371/journal.pone.0253415.r001

Decision Letter 0

Thippa Reddy Gadekallu

26 Apr 2021

PONE-D-21-11372

Unsupervised Multi-Source Domain Adaptation with No Observable Source Data

PLOS ONE

Dear Dr. Kang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Based on the comments received from the reviewers and my own observation, I recommend minor revisions for the paper.

Please submit your revised manuscript by Jun 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Thippa Reddy Gadekallu

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: - Please highlight the contribution clearly in the introduction

- this paper lacks in Novelty of the proposed approach. The author should highlight the contribution clearly in the introduction and provide a comparison note with existing studies.

- Some Paragraphs in the paper can be merged and some long paragraphs can be split into two.

- The quality of the figures can be improved more. Figures should be eye-catching. It will enhance the interest of the reader.

- What are the computational resources reported in the state of the art for the same purpose?

- Please cite each equation and clearly explain its terms.

- Math work should be written math mode.

- What are the evaluations used for the verification of results?

- Clearly highlight the terms used in the algorithm and explain them in the text.

- Authors should add the most recent reference:

1) DARE-SEP: A Hybrid Approach of Distance Aware Residual Energy-Efficient SEP for WSN, IEEE Transactions on Green Communications and Networking

2) Interference Mitigation in D2D Communication Underlying Cellular Networks: Towards Green Energy, Computers, Materials & Continua

Reviewer #2: 1. What are the limitations of the existing works that motivated the current research?

2. Summarize the key findings from the related works in the form of a table.

3. Some of the recent works such as teh following on DNN/ML can be discussed in the paper: "An Eﬀective Feature Engineering for DNN using Hybrid PCA-GWO for Intrusion Detection in IoMT Architecture, An optimal transportation routing approach using GIS-based dynamic traffic flows"

3. Present computational complexity of the proposed approach.

4.Compare the current work with recent state-of-the-art.

5. Discuss about the limitations of the current work in conclusion.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 9;16(7):e0253415. doi: 10.1371/journal.pone.0253415.r002

Author response to Decision Letter 0

3 Jun 2021

1. Reviewer 1.

• (R1-1) Please highlight the contribution clearly in the introduction.

– (A1-1) The contents of the contribution list have been revised to clearly highlight the contributions (lines 38-40 and lines 43-44 in introduction section).

• (R1-2) This paper lacks in novelty of the proposed approach. The author should highlight the contribution clearly in the introduction and provide a comparison note with existing studies.

– (A1-2) We revised the contribution list (lines 38-40 and lines 43-44 in introduction section) and provided a comparison note with the competitors in the introduction (lines 29-34 in introduction section). The existing studies for unsupervised multi-source domain adaptation (UMDA) assume that source data are observable and they train the adaptation networks to align manifolds of source and target domains. Thus, the existing methods are not applicable to our setting where source data are not observable. On the other hand, DEMS trains the adaptation networks while regulating the results of the source classifiers (lines 67-72 in related work section).

• (R1-3) Some Paragraphs in the paper can be merged and some long paragraphs can be split into two.

– (A1-3) We reviewed the overall manuscript to reorganize the paragraphs in it. Especially, we itemized the main ideas which were in one long paragraph into individual items (lines 104-126 in proposed method section).

• (R1-4) The quality of the figures can be improved more. Figures should be eye-catching. It will enhance the interest of the reader.

– (A1-4) We improved Figure 3, which previously looked complicated, to be eye-catching.

• (R1-5) What are the computational resources reported in the state-of-the-art for the same purpose?

– (A1-5) We implemented all the codes using PyTorch and trained all networks including DEMS and competitors using RTX 2080 Ti (lines 288-289 in experiments section).

• (R1-6) Please cite each equation and clearly explain its terms.

– (A1-6) We cited all equations in the manuscript and clearly explained the terms (lines 143, 147, 164, 179, 203, 204, and 213 in experiments section).

• (R1-7) Math work should be written math mode.

– (A1-7) We already wrote the math works by math mode. If there is anything we missed, please let us know and we will fix it.

• (R1-8) What are the evaluations used for the verification of results?

– (A1-8) We used validation set and trained DEMS until the loss L_{total} of the validation set is the lowest. Each experiment is performed 5 times with different random seeds and we reported the standard deviation along with the averaged accuracy. We added such description in lines 232-233 in proposed method section and lines 278-281 in experiments section.

• (R1-9) Clearly highlight the terms used in the algorithm and explain them in the text.

– (A1-9) We summarized the training algorithm of DEMS and explained the process in algorithm section (lines 226-240 in proposed method section).

• (R1-10) Authors should add the most recent reference: 1) DARE-SEP: A Hybrid Approach of Distance Aware Residual Energy-Efficient SEP for WSN, IEEE Transactions on Green Communications and Networking, and 2) Interference Mitigation in D2D Communication Underlying Cellular Networks: Towards Green Energy, Computers, Materials & Continua.

– (A1-10) We added the two references (lines 273-276 in experiments section).

2. Reviewer 2.

• (R2-1) What are the limitations of the existing works that motivated the current research?

– (A2-1)Previous works have focused on unsupervised multi-source domain adaptation(UMDA) where source data are accessible. Thus, they trained the adaptation networks to align manifolds between source and target domains using the source and the target data. However, source data are not easily accessible in practical scenarios although source classifiers are readily accessible. Hence, we are motivated to develop a method to train adaptation networks using the source classifiers and the target data without using the source data. We added such discussion in lines 67-72 in related works section.

• (R2-2) Summarize the key findings from the related works in the form of a table.

– (A2-2) We summarized the key findings from the related works and compared them with our proposed method in Table 3 (lines 61-62 in related works section).

• (R2-3) Some of the recent works such as the following on DNN/ML can be discussed in the paper: ”An Effective Feature Engineering for DNN using Hybrid PCA-GWO for Intrusion Detection in IoMT Architecture”, and ”An optimal transportation routing approach using GIS-based dynamic traffic flows”

– (A2-3) We added the two references (lines 273-276 in experiments section).

• (R2-4) Present computational complexity of the proposed approach.

– (A2-4) We added the computational complexity of DEMS (lines 236-240 in proposed method section).

• (R2-5) Compare the current work with recent state-of-the-art.

– (A2-5) The data-free UMDA is a novel problem since there were no previous approaches which can work without the source data. Nevertheless, we introduced several baselines in Table 1 and explained them in the introduction (lines 29-34 in introduction section).

• (R2-6) Discuss about the limitations of the current work in conclusion.

– (A2-6) We included the limitations of the work in conclusion section (lines 351-354 in conclusion section).

Attachment

Submitted filename: rebuttal_letter.pdf

Click here for additional data file.^{(85.8KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0253415.r003

Decision Letter 1

Thippa Reddy Gadekallu

7 Jun 2021

Unsupervised Multi-Source Domain Adaptation with No Observable Source Data

PONE-D-21-11372R1

Dear Dr. Kang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Thippa Reddy Gadekallu

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed almost all my suggestions. I would like to accept this paper.

Reviewer #2: The authors have done a good job in addressing all the comments and suggestions. The paper is improved significantly and is in a good shape now. I recommend the paper to be accepted in the current form.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. doi: 10.1371/journal.pone.0253415.r004

Acceptance letter

Thippa Reddy Gadekallu

29 Jun 2021

PONE-D-21-11372R1

Unsupervised Multi-Source Domain Adaptation with No Observable Source Data

Dear Dr. Kang:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Thippa Reddy Gadekallu

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: rebuttal_letter.pdf

Click here for additional data file.^{(85.8KB, pdf)}

Data Availability Statement

The data and code are available at: https://github.com/snudatalab/DEMS.

[pone.0253415.ref001] 1.Gan, C., Yang, T., & Gong, B. Learning attributes equals multi-source domain generalization. In CVPR (2016).

[pone.0253415.ref002] 2.Hoffman, J., Kulis, B., Darrell, T., & Saenko, K. Discovering latent domains for multisource domain adaptation. In ECCV (2012).

[pone.0253415.ref003] 3.Sun, Q., Chattopadhyay, R., Panchanathan, S., & Ye, J. A two-stage weighting framework for multi-source domain adaptation. In NeuIPS (2011).

[pone.0253415.ref004] 4.Zhang, K., Gong, M., & Schölkopf, B. Multi-source domain adaptation: A causal view. In AAAI (2015).

[pone.0253415.ref005] 5.Zhao, H., Zhang, S., Wu, G., Moura, J. M. F., Costeira, J. P., & Gordon, G. J. Adversarial multiple source domain adaptation. In NeurIPS (2018).

[pone.0253415.ref006] 6.Xu, R., Chen, Z., Zuo, W., Yan, J., & Lin, L. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In CVPR (2018).

[pone.0253415.ref007] 7.Roy, S., Siarohin, A., Sangineto, E., Sebe, N., & Ricci, E. Trigan: Image-to-image translation for multi-source domain adaptation. CoRR (2020).

[pone.0253415.ref008] 8. Ben-David S., Blitzer J., Crammer K., Kulesza A., Pereira F., & Vaughan J. W. A theorcy of learning from different domains. Mach. Learn. 79(1-2), 151–175 (2010). doi: 10.1007/s10994-009-5152-4 [DOI] [Google Scholar]

[pone.0253415.ref009] 9.Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. Moment matching for multi-source domain adaptation. In ICCV (2019).

[pone.0253415.ref010] 10.Ganin, Y. & Lempitsky, V. S. Unsupervised domain adaptation by backpropagation. In ICML (2015).

[pone.0253415.ref011] 11.Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR (2017).

[pone.0253415.ref012] 12.Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. Adversarial discriminative domain adaptation. In CVPR (2017).

[pone.0253415.ref013] 13.Long, M., Cao, Z., Wang, J., & Jordan, M. I. Conditional adversarial domain adaptation. In NeurIPS (2018).

[pone.0253415.ref014] 14.Long, M., Zhu, H., Wang, J., & Jordan, M. I. Deep transfer learning with joint adaptation networks. In ICML (2017).

[pone.0253415.ref015] 15.Long, M., Zhu, H., Wang, J., & Jordan, M. I. Unsupervised domain adaptation with residual transfer networks. In NeurIPS (2016).

[pone.0253415.ref016] 16.Long, M., Cao, Y., Wang, J., & Jordan, M. I. Learning transferable features with deep adaptation networks. In ICML (2015). [DOI] [PubMed]

[pone.0253415.ref017] 17.Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., & Saminger-Platz, S. Central moment discrepancy (CMD) for domain-invariant representation learning. In ICLR (2017).

[pone.0253415.ref018] 18.Chen, C., Chen, Z., Jiang, B., & Jin, X. Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In AAAI (2019).

[pone.0253415.ref019] 19.Courty, N., Flamary, R., Habrard, A., & Rakotomamonjy, A. Joint distribution optimal transportation for domain adaptation. In NeurIPS (2017).

[pone.0253415.ref020] 20.Damodaran, B. B., Kellenberger, B., Flamary, R., Tuia, D., & Courty, N. Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. In ECCV (2018).

[pone.0253415.ref021] 21. Lin J. Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991). doi: 10.1109/18.61115 [DOI] [Google Scholar]

[pone.0253415.ref022] 22. Kullback S. & Leibler R. A. On information and sufficiency. The annals of mathematical statistics 22(1), 79–86 (1951). doi: 10.1214/aoms/1177729694 [DOI] [Google Scholar]

[pone.0253415.ref023] 23. Shannon C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948). doi: 10.1002/j.1538-7305.1948.tb01338.x [DOI] [Google Scholar]

[pone.0253415.ref024] 24.Masci, J., Meier, U., Ciresan, D. C., & Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In ICANN (2011).

[pone.0253415.ref025] 25. LeCun Y., Bottou L., Bengio Y., & Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998). doi: 10.1109/5.726791 [DOI] [Google Scholar]

[pone.0253415.ref026] 26.Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. Reading digits in natural images with unsupervised feature learning. (2011).

[pone.0253415.ref027] 27.Roy, P., Ghosh, S., Bhattacharya, S., & Pal, U. Effects of degradations on deep neural network architectures. CoRR (2018).

[pone.0253415.ref028] 28.Hastie, T., Friedman, J. H., & Tibshirani, R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer (2001).

[pone.0253415.ref029] 29.He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. In CVPR (2016).

[pone.0253415.ref030] 30.Zhu, J., Park, T., Isola, P., & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV (2017).

[pone.0253415.ref031] 31.Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Bach, F. R. & Blei, D. M., editors, ICML (2015).

[pone.0253415.ref032] 32.Sutskever, I., Vinyals, O., & Le, Q. V. Sequence to sequence learning with neural networks. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., & Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada (2014).

[pone.0253415.ref033] 33.Kipf, T. N. & Welling, M. Variational graph auto-encoders. CoRR (2016).

[pone.0253415.ref034] 34.Clark, K., Luong, M., Le, Q. V., & Manning, C. D. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net (2020).

[pone.0253415.ref035] 35.He, J., Wang, X., Neubig, G., & Berg-Kirkpatrick, T. A probabilistic formulation of unsupervised text style transfer. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020).

[pone.0253415.ref036] 36.Ahamad, R. Z., Javed, A. R., Mehmood, S., Khan, M. Z., Noorwali, A., & Rizwan, M. Interference mitigation in d2d communication underlying cellular networks: Towards green energy. CMC-COMPUTERS MATERIALS & CONTINUA (2021).

[pone.0253415.ref037] 37.Alazab, A., Venkatraman, S., Abawajy, J., & Alazab, M. An optimal transportation routing approach using gis-based dynamic traffic flows. In ICMTA 2010: Proceedings of the International Conference on Management Technology and Applications (2010).

[pone.0253415.ref038] 38.Naeem, A., Javed, A. R., Rizwan, M., Abbas, S., Lin, J. C., & Gadekallu, T. R. DARE-SEP: A hybrid approach of distance aware residual energy-efficient SEP for WSN. IEEE Trans. Green Commun. Netw. (2021).

[pone.0253415.ref039] 39.Priya, R. M. S., Maddikunta, P. K. R., M., P., Koppu, S., Gadekallu, T. R., Chowdhary, C. L., et al. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in iomt architecture. Comput. Commun. (2020).

[pone.0253415.ref040] 40.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Bengio, Y. & LeCun, Y., editors, ICLR (2015).

PERMALINK

Unsupervised multi-source domain adaptation with no observable source data

Hyunsik Jeon

Seongmin Lee

U Kang

Roles

Abstract

Introduction

Fig 1. An illustration of unsupervised multi-source domain adaptation (UMDA) problems.

Table 1. Comparison of DEMS and other methods.

Table 2. Table of frequently-used symbols.

Fig 2. Classification accuracy.

Related work

Table 3. Comparison of different latent space transformation methods for unsupervised multi-source domain adaptation (UMDA).

Proposed method

Problem definition

Method overview

Fig 3. Overall architecture of DEMS.

Method details

Label prediction

Objective function

Label consistency regularization

Entropy regularizations

Pseudo label

Reconstruction

Algorithm

Experiments

Experimental settings

Datasets

Table 4. Summary of datasets.

Fig 4. Sample images (10 classes).

Baselines

Network architecture

Training details

Accuracy

Overall performance

Table 5. Classification accuracy of DEMS and baselines.

Ablation study

Table 6. Ablation study on MNIST-M target dataset.

Qualitative analysis

Fig 5. Visualization of image adaptation from MNIST-M to other source domains.

Parameter sensitivity

Sensitivity of ϵ

Fig 6. Sensitivity of accuracy to the hyperparameters ϵ (Eq 10) and λ (Eqs 2 and 6).

Sensitivity of λ

Conclusion

Data Availability

Funding Statement

References

Decision Letter 0

Thippa Reddy Gadekallu

Roles

Author response to Decision Letter 0

Decision Letter 1

Thippa Reddy Gadekallu

Roles

Acceptance letter

Thippa Reddy Gadekallu

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases