Table 5. Q table.
Expected reward for each action for each cluster. Based on alpha (learning rate) = 0.05 and gamma (discount factor) = 0.2. Both alpha and gamma range from 0 to 1.
Cluster | Keep Dose | Lower Dose |
---|---|---|
1 | 0.0 | 0.0 |
2 | -0.0057 | 0.0 |
3 | 0.0 | 0.0 |
4 | -0.00002 | 0.0 |
5 | -0.227 | -2.26 |
6 | -0.021 | 0.0 |
7 | 0.0 | 0.0 |
8 | -0.00015 | 0.0 |