Algorithm 4 Sustainability-Aware Inventory Management via Actor–Critic RL with Context-Aware Weights and Coordination |
-
1:
Input: Local inventory state
-
2:
Initialize: Actor network , critic network , replay buffer
-
3:
Initialize: Shared intention buffer , coefficients
-
4:
for each episode do
-
5:
Observe global inventory state and local state
-
6:
for each timestep t do
-
7:
Select order quantity
-
8:
Append to shared buffer
-
9:
if conflict detected in (e.g., stock over-allocation or supply contention) then
-
10:
Apply coordination penalty or reassign
-
11:
end if
-
12:
Optionally exchange supply info with peers (e.g., via blockchain or DIDs)
-
13:
Execute , observe new state
-
14:
Compute spoilage loss from overstocked perishables
-
15:
Compute holding cost and emissions from delivery
-
16:
Define context vector:
-
17:
-
18:
-
19:
Store transition in
-
20:
Sample mini-batch from
-
21:
-
22:
Update actor via policy gradient:
-
23:
Clear
-
24:
end for
-
25:
end for
-
26:
Output: Trained inventory ordering policy
|