Skip to main content
. 2025 Sep 1;10(9):577. doi: 10.3390/biomimetics10090577
Algorithm 1 Decision Transformer–based Hierarchical Reinforcement Learning (DT-HRL)
Input: Offline dataset 𝒟, action primitive set 𝒫, task embedding ϕtask·, goal embedding ϕgoal·, learning rate η, path-efficiency loss term LPE
Output: Hierarchical policy π={ πθhigh,πθlow }
// Train High-Level Decision Transformer //
1.  Initialization
2.  for episode do
3.       Sample a task τ and trajectory  {st,at,rt,gtt=0T from 𝒟
4.       S=stH,,st,  A=atH,,at
5.       ât+1πθhighS,A,ϕtaskτ,ϕgoalgt
6.       aH=ât+1             ▷ Delivered to low-level policy
7.       LCrossEntropyât+1,at+1+λLPE
8.       θθηθL
9.  end for
// Low-Level Execution //
10.  function πθlowaH
11.       s0initialstate
12.       for  k = 0 to K do
13.              ak+1πθlowsk,aH       ▷ Execute low-level controller
14.              sk+1Environmentsk,ak+1
15.              if done then
16.                   break
17.              end if
18.        end for
19.  end function