Generative AI and Blockchain-Integrated Multi-Agent Framework for Resilient and Sustainable Fruit Cold-Chain Logistics

View full-text article in PMC

. 2025 Aug 27;14(17):3004. doi: 10.3390/foods14173004

Algorithm 2 Edge-Aware Temperature Control via DDPG with Context-Aware Weights and Coordination

1:
Input: Environment state $s = [T_{crate}, T_{ambient}, H, fruit type]$ , spoilage model S, energy profile E
2:
Initialize: Actor network $μ (s | θ^{μ})$ , critic network $Q (s, a | θ^{Q})$
3:
Initialize: Target networks $μ^{'}$ and $Q^{'}$ with $θ^{μ^{'}} \leftarrow θ^{μ}$ , $θ^{Q^{'}} \leftarrow θ^{Q}$
4:
Initialize: Replay buffer $R$ , noise process $N$
5:
Initialize: Shared intention buffer $B \leftarrow \emptyset$ , static coefficients $α_{1}, α_{2}$
6:
for each episode do
7:
Receive initial global state $S_{0} = {s_{0}^{(1)}, s_{0}^{(2)}, \dots}$
8:
for each timestep t do
9:
for each agent i do
10:
Select action $a_{t}^{(i)} = μ (s_{t}^{(i)} | θ^{μ}) + N_{t}$
11:
Append $(s_{t}^{(i)}, a_{t}^{(i)})$ to $B$
12:
end for
13:
Detect overlapping cooling requests in $B$ (e.g., shared compressor or zone contention)
14:
if conflict detected then
15:
Apply coordination penalty $ρ$ or reschedule conflicting setpoints
16:
end if
17:
for each agent i do
18:
Apply action $a_{t}^{(i)}$ , observe next state $s_{t + 1}^{(i)}$ , energy cost $E_{t}^{(i)}$ , deviation $Δ T^{(i)}$ , duration $Δ t$
19:
Estimate spoilage risk $σ_{t}^{(i)} = S (s_{t}^{(i)}, a_{t}^{(i)}, Δ t)$
20:
Define context vector: $c t x^{(i)} = [Δ T^{(i)}, E_{t}^{(i)}, σ_{t}^{(i)}]$
21:
Compute dynamic weights:
$ω_{j}^{(i)} = \frac{α_{j} \cdot c t x_{j}^{(i)}}{\sum_{k = 1}^{3} α_{k} \cdot c t x_{k}^{(i)}}, j = 1, 2, 3$
22:
Compute reward:
$r_{t}^{(i)} = - (ω_{1}^{(i)} E_{t}^{(i)} + ω_{2}^{(i)} σ_{t}^{(i)}) - ρ$
23:
Store transition $(s_{t}^{(i)}, a_{t}^{(i)}, r_{t}^{(i)}, s_{t + 1}^{(i)})$ in $R$
24:
end for
25:
Sample mini-batch from $R$
26:
Update Critic using Bellman loss:
$L = {(r + γ Q^{'} (s^{'}, μ^{'} (s^{'})) - Q (s, a))}^{2}$
27:
Update actor via policy gradient:
$\nabla_{θ^{μ}} J \approx \frac{1}{N} \sum \nabla_{a} Q (s, a) \nabla_{θ^{μ}} μ (s)$
28:
Soft update target networks:
$θ^{Q^{'}} \leftarrow τ θ^{Q} + (1 - τ) θ^{Q^{'}}, θ^{μ^{'}} \leftarrow τ θ^{μ} + (1 - τ) θ^{μ^{'}}$
29:
Clear $B \leftarrow \emptyset$
30:
end for
31:
end for
32:
Output: Trained temperature control policy $μ^{*} (s)$