Skip to main content
. 2019 Dec 31;14(12):e0227324. doi: 10.1371/journal.pone.0227324

Table 5. Q table.

Expected reward for each action for each cluster. Based on alpha (learning rate) = 0.05 and gamma (discount factor) = 0.2. Both alpha and gamma range from 0 to 1.

Cluster Keep Dose Lower Dose
1 0.0 0.0
2 -0.0057 0.0
3 0.0 0.0
4 -0.00002 0.0
5 -0.227 -2.26
6 -0.021 0.0
7 0.0 0.0
8 -0.00015 0.0