Skip to main content
. 2020 Dec 31;21(1):217. doi: 10.3390/s21010217
Algorithm 1 Reinforcement Learning-based Wavelength Allocation Algorithm
Input:   Γ,ε,λt
Output:   Wavelength allocation matrix
1: Initialize Process:
2: Based on history network data Γ={{H1,Q1},{H2,Q2},,{HT,QT}}, the BBU initialize a virtual environment. Then a agent is created, whose action set is A={a1,a2,,aL}. Initialize t=1, Qal=0 for each al, the exploration probability ε and and learning rate λt.
3: Learning Process:
4: for each transmission interval t=1:T do
5:     Generate a random number z
6:     if z<ε then
7:         The agent selects a wavelength allocation strategy at with equal probability.
8:     else
9:         The agent selects a wavelength allocation strategy with the maximum Qvalue
10:     end if
11:     Under at, Ht and Qt, Algorithm 2 is performed to produce a clustering result
12:     The environment feedbacks the signal rat to the agent
13:     The agent makes an update according to Equation (5)
14: end for