Skip to main content
. 2020 Jul 31;20(15):4291. doi: 10.3390/s20154291
Algorithm 1 MAAC learning algorithm process
1: Initialize the communication matrix of all agents C0
2: Initialize the parameters of the agent θSenderi and θReceiveri
3: repeat
4:  Receiver of Agenti: uses attention mechanism to generate communication matrix Ct^
5:  Sender of Agenti: chooses an action at+1i from policy selection network, or randomly chooses
  action a (e.g.,ϵ-greedy exploration)
6:  Sender of Agenti: generates its own information through the receiver’s communication matrix
  Ct^ct+1i
7:  Collect all the joint actions of Agent and execute the actions at+11,,at+1N, get the reward from
  the environment Rt+1 and next state Xt+1
8:  Update the strategic value function of each Agent:
  θiVθi=Eθilogπθiait|citQ^t,at1,,atN
9: until End of Round Episode
10: returns θSenderi and θReceiveri for each Agent