| Algorithm 1 MAAC learning algorithm process |
| 1: Initialize the communication matrix of all agents |
| 2: Initialize the parameters of the agent and |
| 3: repeat |
| 4: Receiver of : uses attention mechanism to generate communication matrix |
| 5: Sender of : chooses an action from policy selection network, or randomly chooses action a (e.g.,-greedy exploration) |
| 6: Sender of : generates its own information through the receiver’s communication matrix |
| 7: Collect all the joint actions of Agent and execute the actions , get the reward from the environment and next state |
| 8: Update the strategic value function of each Agent: |
| 9: until End of Round Episode |
| 10: returns and for each Agent |