Algorithm 2 Edge-Aware Temperature Control via DDPG with Context-Aware Weights and Coordination |
-
1:
Input: Environment state , spoilage model S, energy profile E
-
2:
Initialize: Actor network , critic network
-
3:
Initialize: Target networks and with ,
-
4:
Initialize: Replay buffer , noise process
-
5:
Initialize: Shared intention buffer , static coefficients
-
6:
for each episode do
-
7:
Receive initial global state
-
8:
for each timestep t do
-
9:
for each agent i do
-
10:
Select action
-
11:
Append to
-
12:
end for
-
13:
Detect overlapping cooling requests in (e.g., shared compressor or zone contention)
-
14:
if conflict detected then
-
15:
Apply coordination penalty or reschedule conflicting setpoints
-
16:
end if
-
17:
for each agent i do
-
18:
Apply action , observe next state , energy cost , deviation , duration
-
19:
Estimate spoilage risk
-
20:
Define context vector:
-
21:
-
22:
-
23:
Store transition in
-
24:
end for
-
25:
Sample mini-batch from
-
26:
Update Critic using Bellman loss:
-
27:
Update actor via policy gradient:
-
28:
Soft update target networks:
-
29:
Clear
-
30:
end for
-
31:
end for
-
32:
Output: Trained temperature control policy
|