Algorithm 1 Perishable-Aware Route Optimization via Q-Learning with Context-Aware Weights and Conflict Avoidance |
-
1:
Input: Cold-chain graph , perishability profile P, disruption model D, emission matrix C
-
2:
Initialize: Q-table ; learning rate ; discount factor ; exploration rate
-
3:
Initialize: Static priority coefficients
|
|
▹ For coordination |
-
5:
for each episode do
-
6:
Initialize joint global state using D
-
7:
while shipment not delivered do
-
8:
for each routing agent i do
-
9:
With probability , choose random action
-
10:
Otherwise, choose
|
-
11:
Append to
|
▹ Declare action intention |
-
12:
end for
-
13:
Detect conflicts in (e.g., duplicate vehicle or route allocation)
-
14:
if conflict detected then
-
15:
Apply coordination penalty or reassign conflicting agent(s) via tie-breaking
-
16:
end if
-
17:
for each agent i do
-
18:
Execute , observe , travel time , temp deviation , emissions
-
19:
Compute spoilage risk:
-
20:
Extract context vector:
-
21:
-
22:
Compute context-aware reward:
-
23:
-
24:
Update state:
-
25:
end for
-
26:
Clear intention buffer:
-
27:
end while
-
28:
end for
-
29:
Output: Learned policies for all agents i
|