|
Algorithm 1 Traffic Signal Control Using Parameterized Deep RL. |
-
1:
Initialize: Learning rates , exploration parameter , minibatch size B, a probability distribution , flow configurations, network weights and .
-
2:
for episode
do
-
3:
Start simulation, observe initial state and take initial joint action .
-
4:
for
do
-
5:
Compute action parameters .
-
6:
Select action according to -greedy policy.
-
7:
Perform , observe next state and get .
-
8:
Store in memory M.
-
9:
Sample random B experiences from M.
-
10:
-
11:
.
-
12:
update weights and .
-
13:
end for
-
14:
end for
|