Skip to main content
. 2020 Aug 10;20(16):4468. doi: 10.3390/s20164468
Algorithm 2 Hybrid Reinforcement Learning Algorithm
1: Initialize: State space S, and action space A
2: Apply discrete actions randomly to the robot and collect data set D1
3: Using D1 to generate D2, and then obtain the transition model
4: Obtain the reduced action space areducedi
5: Initialize A using aroughi [arough(min)i,arough(max)i]
6: Initialize replay memory D, to capacity N, parameter vector θ
7: For episode=1,K do
8:   Reset the robot and the platform to their initial position, transition temporary table T=
9:   For t=1, T do
10:     Obtain current state St from sensors’ reading
11:     Select a random action from At with probability ϵ, otherwise select atk
      atk=argmaxaAQ(St,a|θ), observe the next state st+1, and receive an immediate reward
rt+1
12:     Append transition (St,at,rt,Rtλ,St+1) to T
13:     If S𝓉+1 is within stable region Ss
14:       Update RTλ using Algorithm 1
15:       Store T in D, refresh T: T=
16:     End If
17:     Sample random minibatch of transitions from D(Sj,aj,rj,Rjλ,Sj+1), j=1,2,,P
18:     Apply a gradient descent on θ to improve Q-function θθ+αθ(RjλQ(Sj,aj|θ))2
19:     Every C step, update RDλ using Algorithm 1
20:    End For
21: End For