Repetition-Based Approach for Task Adaptation in Imitation Learning

. 2022 Sep 14;22(18):6959. doi: 10.3390/s22186959

Algorithm 2 The proposed adaptation algorithm.

1:
Input
2:
${τ_{T}^{1}, τ_{T}^{2}, \dots}$ A set of expert demonstrations on the target task
3:
${τ_{S}^{1}, τ_{S}^{2}, \dots}$ A set of expert demonstrations on the source task
4:
Randomly initialize task embedding network E, generator G and discriminator D
5:
fork = 0, 1, 2, … do
6:
Sample an expert demonstration on the target task $τ_{T}^{i}$
7:
Sample an expert demonstration on the source task $τ_{S}^{i}$
8:
Sample state-action pairs $({\hat{s}}_{S}^{t}, {\hat{a}}_{S}^{t})$ ∼ $τ_{S}^{i}$ and $({\hat{s}}_{T}^{t}, {\hat{a}}_{T}^{t})$ ∼ $τ_{T}^{i}$
9:
n ← uniform random number between 0 and 1
10:
if $n < λ$ then ▹ Review source task’s learned knowledge
11:
Compute $z_{S}^{t} = E ({\hat{s}}_{S}^{t})$
12:
Compute $z_{T}^{t} = s t o p g r a d (E ({\hat{s}}_{T}^{t}))$
13:
Generate action $a_{S}^{t} = G (z_{S}^{t})$
14:
Compute the loss $L = L_{E} (z_{S}^{t}, z_{T}^{t}) + L_{G D} ({\hat{a}}_{S}^{t}, a_{S}^{t})$
15:
else ▹ Learn target task
16:
Compute $z_{T}^{t} = E ({\hat{s}}_{T}^{t})$
17:
Compute $z_{S}^{t} = s t o p g r a d (E ({\hat{s}}_{S}^{t}))$
18:
Generate action $a_{T}^{t} = G (z_{T}^{t})$
19:
Compute the loss $L = L_{E} (z_{T}^{t}, z_{S}^{t}) + L_{G D} ({\hat{a}}_{T}^{t}, a_{T}^{t})$
20:
end if
21:
Update the parameters of F, G, and D
22:
Update policy $π_{S}$ with the reward signal $r = - l o g D ({\hat{a}}_{S}^{t})$
23:
end for
24:
Output
25:
$π_{S T}$ Learned policy for both source and target task