Dynamic Service Function Chain Deployment and Readjustment Method Based on Deep Reinforcement Learning

. 2023 Mar 12;23(6):3054. doi: 10.3390/s23063054

Algorithm 1: DQN-based dynamic SFC deployment algorithm

Input: The underlying network state

s_{t}

the set of dynamically arriving SFC requests

r_{1}, r_{2} \cdot \cdot \cdot r_{m}

.
Output: Dynamic SFC deployment policy

Π_{1}

.
1: Initialize the action-value function

Q (s_{t}, a; θ)

where

θ

are the randomly generated neural network weights.
2: Initialize the target action-value function

\hat{Q} (s_{t}, a; θ^{-})

, where

θ^{-} = θ

.
3: Initialize the experience pool

D

with memory

N

.
4: for episode in range (EPISODES):
5: Generate a new collection of SFCs.
6: Initialize state

s \dots

.
7: for step in range (STEPS):
8: Select the nodes that satisfy the resource and delay requirements.
9: Select m nodes that are closest to the last deployed node among the nodes that satisfy the deployment requirements and add them to set

Φ

.
10: With probability

ε

, select an action

a_{t}

at random.
11: Otherwise, select the action

a_{t} = \arg \max_{a} Q (s_{t}, a; θ), a \in Φ

.
12: Execute action

a_{t}

and observe reward

r_{t}

.
13: Store transition

e_{t} = (s_{t}, a_{t}, r_{t}, s_{t + 1})

D

.
14: Sample random minibatch of transitions

(s_{j}, a_{j}, r_{j}, s_{j + 1})

from D.
15: Set

y_{j} = \{\begin{cases} r_{j}, r_{j} = e n d \\ r_{j} + γ \max_{a^{'}} \hat{Q} (s_{j + 1}, a^{'}; θ^{'}), r_{j} \neq e n d \end{cases}

16: Perform a gradient descent step on

{(y_{j} - Q (s_{j + 1}, a; θ))}^{2}

with respect to the network parameters

θ

.
17: Every

C

steps, reset

\hat{Q} = Q

.
18: End.
19: End.