Skip to main content
. 2024 Apr 29;10:e1998. doi: 10.7717/peerj-cs.1998

Table 3. DQN training process reward/loss and epsilon.

Epoch Epsilon Total step Reward Loss Elapsed time (s)
5 1.00 126 0.12 12.607 0.110
10 0.94 258 0.25 11.230 3.097
15 0.83 370 0.49 10.259 5.964
20 0.63 569 0.63 9.000 11.293
25 0.50 694 0.78 0.832 7.386
30 0.32 877 0.88 0.783 10.308
35 0.17 103 0.89 0.715 8.685
40 0.09 119 0.89 0.618 9.239
45 0.09 134 0.94 0.589 9.268
50 0.09 151 0.98 0.458 9.279