Algorithm 1. QL_MAS |
Initialize: The capacity Cap of the memory M, the values Q: ∀r, a|Q (r, a) = 0, The estimation weight LSTM-DQN θ = θ_0, the weight of the LSTM-DQN objectives θ′ For episode = 1 → ep do # ep represents the number of episodes Fix the initial positions of the agents according to the map from the RAG For i = 1 -> Reg do # Reg is the number of regions #N is the number of agents N = Reg Implement in each Reg i an agent Ai End for Do as long as t < Cap For j = 1 -> N do Calculation of the initial actions (2) Calculation of initial Q-value (1) Verification of the best adjacent neighbors satisfying the similarity criteria Negotiation to decide the optimal proposal End for Fusion Update of the map of the regions Update Reg N = Reg For j = 1 -> N do Calculation of the actions (2) Calculation of Q-value (1) Calculation of reward (4) Next state calculation (3) Save data d = {state(t), action(t), R(t), state(t + 1)} in memory M End for End Do Reset End for |