Algorithm 2 Hybrid Reinforcement Learning Algorithm |
1: Initialize: State space , and action space |
2: Apply discrete actions randomly to the robot and collect data set |
3: Using to generate , and then obtain the transition model |
4: Obtain the reduced action space |
5: Initialize using |
6: Initialize replay memory , to capacity , parameter vector |
7: For do |
8: Reset the robot and the platform to their initial position, transition temporary table |
9: For do |
10: Obtain current state from sensors’ reading |
11: Select a random action from with probability , otherwise select
, observe the next state , and receive an immediate reward |
12: Append transition to |
13: If is within stable region |
14: Update using Algorithm 1 |
15: Store in , refresh : |
16: End If |
17: Sample random minibatch of transitions from , |
18: Apply a gradient descent on to improve Q-function |
19: Every step, update using Algorithm 1 |
20: End For |
21: End For |