Skip to main content
. 2025 Aug 27;14(17):3004. doi: 10.3390/foods14173004
Algorithm 2 Edge-Aware Temperature Control via DDPG with Context-Aware Weights and Coordination
  •   1:

    Input: Environment state s=[Tcrate,  Tambient,H,fruittype], spoilage model S, energy profile E

  •   2:

    Initialize: Actor network μ(s|θμ), critic network Q(s,a|θQ)

  •   3:

    Initialize: Target networks μ and Q with θμθμ, θQθQ

  •   4:

    Initialize: Replay buffer R, noise process N

  •   5:

    Initialize: Shared intention buffer B, static coefficients α1,α2

  •   6:

    for each episode do

  •   7:

       Receive initial global state S0={s0(1),s0(2),}

  •   8:

       for each timestep t do

  •   9:

              for each agent i do

  • 10:

                 Select action at(i)=μ(st(i)|θμ)+Nt

  • 11:

                 Append (st(i),at(i)) to B

  • 12:

              end for

  • 13:

              Detect overlapping cooling requests in B (e.g., shared compressor or zone contention)

  • 14:

              if conflict detected then

  • 15:

                 Apply coordination penalty ρ or reschedule conflicting setpoints

  • 16:

              end if

  • 17:

              for each agent i do

  • 18:

                 Apply action at(i), observe next state st+1(i), energy cost Et(i), deviation ΔT(i), duration Δt

  • 19:

                 Estimate spoilage risk σt(i)=S(st(i),at(i),Δt)

  • 20:

                 Define context vector: ctx(i)=[ΔT(i),Et(i),σt(i)]

  • 21:
                 Compute dynamic weights:
    ωj(i)=αj·ctxj(i)k=13αk·ctxk(i), j=1, 2, 3
  • 22:
                 Compute reward:
    rt(i)=(ω1(i)Et(i)+ω2(i)σt(i))ρ
  • 23:

                 Store transition (st(i),at(i),rt(i),st+1(i)) in R

  • 24:

              end for

  • 25:

              Sample mini-batch from R

  • 26:
              Update Critic using Bellman loss:
    L=r+γQ(s,μ(s))Q(s,a)2
  • 27:
              Update actor via policy gradient:
    θμJ1NaQ(s,a)θμμ(s)
  • 28:
              Soft update target networks:
    θQτθQ+(1τ)θQ, θμτθμ+(1τ)θμ
  • 29:

              Clear B

  • 30:

          end for

  • 31:

    end for

  • 32:

    Output: Trained temperature control policy μ*(s)