Skip to main content
. 2022 Aug 14;22(16):6068. doi: 10.3390/s22166068
Algorithm 1. BENS—Training Algorithm (BENS-T).
  1. Input: BENS framework, scenario emulator, and all applications’ QoS requirements.

  2. For every session i = (1, 2, …, to M)instancs, perform the following:

  3. Initialize: every agent Q-network where, function Q(p, b), rule-based approach φ(p, b), load β, refactor W;

  4. Perform process Instance = (0, 1, 2, to F);

  5.  Every agent keeps track of its condition Ts;

  6.  Randomly select action Ac with probability λ;

  7.  Alternatively, randomly select action Ac with probability = max arg a∈A Qt(Ps, bs, µs);

  8.  Carry out an action at, then collect a reward Wins;

  9. Record a fresh state Ts+1;

  10. Store refactor Wins = (ps, bs, W(ps, bs), Ts+1) in memory O;

  11. Repeat for every agent, proceed;

  12.   Randomly sample a micro-data ks in O;

  13.   Add µs—Update;

  14.   Incline to update µs+1;

  15.   Update φ, the policy with Q-max_value;

  16.   Perform action on φ;

  17. Conclude the loop for;

  18. Conclude the loop for;

  19. Return: Return trained BENS models.