Dynamic Service Function Chain Deployment and Readjustment Method Based on Deep Reinforcement Learning

. 2023 Mar 12;23(6):3054. doi: 10.3390/s23063054

Algorithm 2: DQN-based dynamic SFC readjustment algorithm

Input: The network state

s_{t}

the set of SFC

G_{1}, G_{2} \cdot \cdot \cdot G_{m}

, dynamic SFC deployment policy

Π_{1}

.
Output: Dynamic SFC readjustment policy

Π_{2}

.
1: Initialize the action-value function

Q (s_{t}, a; θ)

, where

θ

is the randomly generated neural network weights.
2: Initialize the target action-value function

\hat{Q} (s_{t}, a; θ^{-})

, where

θ^{-} = θ

.
3: Initialize the experience pool

D

with memory

N

.
4: for episode in range (EPISODES):
5: Generate a new collection of SFCs.
6: Initialization state

s

.
7: for step in range (STEPS):
8: Generate the set of nodes that need to be readjusted based on the state of the underlying network.
9: With probability

ε

, select an action

a_{t}

at random.
10: Otherwise, select the action

a_{t} = \arg \max_{a} Q (s_{t}, a; θ)

.
11: Execute readjustment action

a_{t}

s_{t} \Rightarrow s_{t}^{'}

.
12: Perform deployment with

Π_{1}

.
13: Observe reward

r_{t}

, s_{t}^{'} \Rightarrow s_{t + 1}

.
14: Store transition

e_{t} = (s_{t}, a_{t}, r_{t}, s_{t + 1})

D

.
15: Sample random minibatch of transitions

(s_{j}, a_{j}, r_{j}, s_{j + 1})

from D.
16: Set

y_{j} = \{\begin{cases} r_{j}, r_{j} = e n d \\ r_{j} + γ \max_{a^{'}} \hat{Q} (s_{j + 1}, a^{'}; θ^{'}), r_{j} \neq e n d \end{cases}

17: Perform a gradient descent step on

{(y_{j} - Q (s_{j + 1}, a; θ))}^{2}

with respect to the network parameters

θ

.
18: Every

C

steps, reset

\hat{Q} = Q

.
19: End.
20: End.