. 2019 Dec 31;14(12):e0227324. doi: 10.1371/journal.pone.0227324

Table 5. Q table.

Expected reward for each action for each cluster. Based on alpha (learning rate) = 0.05 and gamma (discount factor) = 0.2. Both alpha and gamma range from 0 to 1.

Cluster	Keep Dose	Lower Dose
1	0.0	0.0
2	-0.0057	0.0
3	0.0	0.0
4	-0.00002	0.0
5	-0.227	-2.26
6	-0.021	0.0
7	0.0	0.0
8	-0.00015	0.0