|
Algorithm 5: The algorithm for optimizing initial problems. |
| Require:
, , , Reward r, Learning rate , |
| Greedy probability and Discount factor . |
| Ensure: Optimal action configuration in each state. |
| 1: Initialize , , , , |
| 2: while
n <=
do
|
| 3: Fixed the CRE bias and the number of SBS activation , calculate the ABS ratio
|
| according to Algorithm 2. Pass the solved to step (4) and step (5); |
| 4: Fixing the ABS ratio and the number of SBS activation , calculate the CRE bias
|
| according to Algorithm 3. Pass the solved to step (3) and step (4); |
| 5: Fix the ABS ratio and the CRE bias , calculate the number of SBS activation
|
| according to Algorithm 4. Pass the solved to step (4) and step (3); |
| 6: ; |
| 7: end while
|
| 8: Output: , , ; |