|
Algorithm 2 Pre-training method for control actor network. |
Initialize
fordo
for do
Choose
Choose
Apply after the sampling time unit
Add (,,) to the data buffer
if then
for data in do
Generated by the control strategy network
Compute
Using the adam optimizer to update the network parameters
end for
end if
end for
end for
|