Algorithm 1. BENS—Training Algorithm (BENS-T). |
Input: BENS framework, scenario emulator, and all applications’ QoS requirements.
For every session i = (1, 2, …, to M)instancs, perform the following:
Initialize: every agent Q-network where, function Q(p, b), rule-based approach φ(p, b), load β, refactor W;
Perform process Instance = (0, 1, 2, to F);
Every agent keeps track of its condition Ts;
Randomly select action Ac with probability λ;
Alternatively, randomly select action Ac with probability = max arg a∈A Qt(Ps, bs, µs);
Carry out an action at, then collect a reward Wins;
Record a fresh state Ts+1;
Store refactor Wins = (ps, bs, W(ps, bs), Ts+1) in memory O;
Repeat
for every agent, proceed;
Randomly sample a micro-data ks in O;
Add µs—Update;
Incline to update µs+1;
Update φ, the policy with Q-max_value;
Perform action on φ;
Conclude the loop for;
Conclude the loop for;
Return: Return trained BENS models.
|