|
Algorithm 2 The proposed adaptation algorithm. |
-
1:
Input
-
2:
A set of expert demonstrations on the target task
-
3:
A set of expert demonstrations on the source task
-
4:
Randomly initialize task embedding network E, generator G and discriminator D
-
5:
fork = 0, 1, 2, … do
-
6:
Sample an expert demonstration on the target task
-
7:
Sample an expert demonstration on the source task
-
8:
Sample state-action pairs ∼ and ∼
-
9:
n ← uniform random number between 0 and 1
-
10:
if
then ▹ Review source task’s learned knowledge
-
11:
Compute
-
12:
Compute
-
13:
Generate action
-
14:
Compute the loss
-
15:
else ▹ Learn target task
-
16:
Compute
-
17:
Compute
-
18:
Generate action
-
19:
Compute the loss
-
20:
end if
-
21:
Update the parameters of F, G, and D
-
22:
Update policy with the reward signal
-
23:
end for
-
24:
Output
-
25:
Learned policy for both source and target task
|