Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation

. 2023 Jan 9;23(2):762. doi: 10.3390/s23020762

Algorithm 2 VIB based meta-reinforcement learning testing algorithm.

1:
Input: ${T_{n}}_{m = 1, \dots, N} \sim p (T)$ : Meta-testing task set; $θ$ : meta-training policy network; $ω$ : meta-training latent space encoder network.
2:
Initializing the trajectory: $e^{T} = {}$
3:
for $k = 1, \dots, N$ do
4:
Latent space inference: $z \sim E_{ω} (z| e^{T})$
5:
Using current policy $π_{θ} (a| s, z)$ to interact with each task and obtain $D^{k}$
6:
Sample accumulation: $e^{T} = e^{T} \cup D^{k}$
7:
Evaluating empirical discounted return for each task.
8:
end for