Comparative Study of Cooperative Platoon Merging Control Based on Reinforcement Learning

. 2023 Jan 15;23(2):990. doi: 10.3390/s23020990

Algorithm 1: PPO, Actor–Critic Style [15]

for iteration 1,2, ... do

for actor=1,2, ... N do

Interact with the environment using the

π_{θ_{o l d}}

policy

feed the states to the critic network to calculate states

base estimate

Compute advantage estimates

end for

Optimize surrogate L wrt

θ

, with K epochs

update the policy

end for