Two Tier Slicing Resource Allocation Algorithm Based on Deep Reinforcement Learning and Joint Bidding in Wireless Access Networks

. 2022 May 4;22(9):3495. doi: 10.3390/s22093495

Algorithm 2 Dueling DQN Algorithm for Lower Tier Bandwidth Allocation

Initialize the action-value function

Q

, target action-value function

\hat{Q}

the replay memory D to capacity N

Each MVNO receives a bandwidth

c_{m}

from the BS;

Each MVNO creates an action space

A_{l}

;

Form = 1 to M, do

MVNO randomly chooses an action

a \in A_{l}

and performs

a

;

MVNO allocates the bandwidth

c_{j}^{m}

to users which are connected with it;

Calculate the

q_{j}^{m}

as state

s

;

For t = 1, to T, do

The agent gets the current state

s

;

10:

Choose an action

a \in A_{l}

according to the policy of Dueling DQN;

11:

Calculate the total system utility

f_{m}

according to (15);

12:

Calculate the total reward;

13:

The agent allocates the bandwidth to users and calculates the state after the selection action of this iteration as

s^{'}

;

14:

#Train Dueling DQN

15:

The agent i.e., each MVNO inputs

(s, a, s^{'}, r)

into the Dueling DQN;

16:

The agent store transition

(s, a, s^{'}, r)

in D;

17:

The agent sample random minibatch of transitions

(s_, a_, s^{'}_, r_)

from D;

18:

Set

y_= {\begin{matrix} r_{-} \\ r_+ γ m a x_{a^{*}} \hat{Q} (s^{'}_, a^{*}; θ^{-}, α, β) \end{matrix} \begin{matrix} if episode terminates at step_+ 1 \\ otherwise \end{matrix}

19:

The agent perform a gradient descent step on

{(y_- Q (s_, a_; θ, α, β))}^{2}

with respect to the network parameters θ,

α

and

β

;

20:

Every steps reset

\hat{Q} = Q

;

21:

End for

22:

End for