An Edge Based Multi-Agent Auto Communication Method for Traffic Light Control

. 2020 Jul 31;20(15):4291. doi: 10.3390/s20154291

Algorithm 1 MAAC learning algorithm process

1: Initialize the communication matrix of all agents

C_{0}

2: Initialize the parameters of the agent

θ_{S e n d e r}^{i}

and

θ_{R e c e i v e r}^{i}

3: repeat

4: Receiver of

A g e n t^{i}

: uses attention mechanism to generate communication matrix

\hat{C_{t}}

5: Sender of

A g e n t^{i}

: chooses an action

a_{t + 1}^{i}

from policy selection network, or randomly chooses
action a (e.g.,

ϵ

-greedy exploration)

6: Sender of

A g e n t^{i}

: generates its own information through the receiver’s communication matrix

\hat{C_{t}}

c_{t + 1}^{i}

7: Collect all the joint actions of Agent and execute the actions

a_{t + 1}^{1}, \dots, a_{t + 1}^{N}

, get the reward from
the environment

R_{t + 1}

and next state

X_{t + 1}

8: Update the strategic value function of each Agent:

\nabla_{θ_{i}} V (θ_{i}) = E [\nabla_{θ_{i}} log π_{θ_{i}} (a_{i}^{t} | c_{i}^{t}) {\hat{Q}}_{t} (, a_{t}^{1}, \dots, a_{t}^{N})]

9: until End of Round Episode

10: returns

θ_{S e n d e r}^{i}

and

θ_{R e c e i v e r}^{i}

for each Agent