|
Algorithm 2 VIB based meta-reinforcement learning testing algorithm. |
-
1:
Input:: Meta-testing task set; : meta-training policy network; : meta-training latent space encoder network.
-
2:
Initializing the trajectory:
-
3:
for do
-
4:
Latent space inference:
-
5:
Using current policy to interact with each task and obtain
-
6:
Sample accumulation:
-
7:
Evaluating empirical discounted return for each task.
-
8:
end for
|