Algorithm 1. SAC-Based Training Algorithm
|
|
|
|
|
-
5.
For each episode:
-
(1)
Initialize the UAV starting position;
-
(2)
Reset various parameters in the interactive environment;
-
(3)
Receive initial observation of the image state ;
-
(4)
For each time step :
Take the current state as the input of the current actor network, and generate actions ;
Normalize the actions and convert them into speed control commands;
Control the UAV with control commands and observe reward and observe new image state ;
Store the piece of experience into the replay buffer;
Sample replayed experience from the replay buffer;
Update the behavior value network: ;
Update the Q network: ;
Update the policy network: ;
Update the target value network: ;
If the terminal condition is satisfied, start a new episode. Or, continue for next time step.
The end of a time step;
The end of an episode;
|