Algorithm 5 SLA-Aware Delivery Scheduling via Cooperative Multi-Agent RL with Context-Aware Weights and Coordination |
-
1:
Input: Delivery queue Q, route availability R, SLA terms S, demand forecast F
-
2:
Agents: (e.g., vehicle or hub controllers)
-
3:
Initialize: Policy for each agent i, shared critic
-
4:
Initialize: Replay buffer , intention buffer
-
5:
Initialize: Reward weighting coefficients
-
6:
for each training episode do
-
7:
Generate demand and disruptions from F
-
8:
Initialize global state from environment
-
9:
for each timestep t do
-
10:
for each agent i do
|
-
11:
Select action
|
▹ e.g., assign vehicle or reschedule |
-
12:
Append to
-
13:
end for
-
14:
if conflicting vehicle assignments or resource overuse in then
-
15:
Apply penalty or resolve using SLA priority or distance heuristics
-
16:
end if
-
17:
Execute actions , observe
-
18:
for each agent i do
-
19:
Observe: delay , SLA violation flag , fuel used , emissions
-
20:
Extract context vector:
-
21:
-
22:
-
23:
Store transition in
-
24:
end for
-
25:
Sample mini-batch from
-
26:
Update shared critic Q by minimizing temporal-difference loss:
-
27:
for each agent i do
-
28:
Update actor policy to maximize expected reward:
-
29:
end for
-
30:
Clear
-
31:
end for
-
32:
end for
-
33:
Output: Trained delivery policies
|