|
Algorithm 1 Reinforcement Learning-based Wavelength Allocation Algorithm |
|
Input:
|
|
Output: Wavelength allocation matrix |
| 1: |
Initialize Process: |
| 2: |
Based on history network data , the BBU initialize a virtual environment. Then a agent is created, whose action set is . Initialize , for each , the exploration probability and and learning rate . |
| 3: |
Learning Process: |
| 4: |
for each transmission interval
do
|
| 5: |
Generate a random number z
|
| 6: |
if
then
|
| 7: |
The agent selects a wavelength allocation strategy with equal probability. |
| 8: |
else
|
| 9: |
The agent selects a wavelength allocation strategy with the maximum
|
| 10: |
end if
|
| 11: |
Under , and , Algorithm 2 is performed to produce a clustering result |
| 12: |
The environment feedbacks the signal to the agent |
| 13: |
The agent makes an update according to Equation (5) |
| 14: |
end for |