|
Algorithm 1 Decision Transformer–based Hierarchical Reinforcement Learning (DT-HRL) |
|
Input: Offline dataset 𝒟, action primitive set 𝒫, task embedding
, goal embedding , learning rate , path-efficiency loss term
|
|
Output: Hierarchical policy
|
| // Train High-Level Decision Transformer // |
|
1. Initialization
|
|
2. for episode do
|
|
3. Sample a task and trajectory
from 𝒟
|
|
4.
|
|
5.
|
|
6.
▷ Delivered to low-level policy
|
|
7.
|
|
8.
|
|
9. end for
|
|
|
| // Low-Level Execution // |
|
10. function
|
|
11.
|
|
12. for
do
|
|
13.
▷ Execute low-level controller
|
|
14.
|
|
15. if done then
|
|
16. break
|
|
17. end if
|
|
18. end for
|
|
19. end function
|