Table 3. DQN training process reward/loss and epsilon.
| Epoch | Epsilon | Total step | Reward | Loss | Elapsed time (s) |
|---|---|---|---|---|---|
| 5 | 1.00 | 126 | 0.12 | 12.607 | 0.110 |
| 10 | 0.94 | 258 | 0.25 | 11.230 | 3.097 |
| 15 | 0.83 | 370 | 0.49 | 10.259 | 5.964 |
| 20 | 0.63 | 569 | 0.63 | 9.000 | 11.293 |
| 25 | 0.50 | 694 | 0.78 | 0.832 | 7.386 |
| 30 | 0.32 | 877 | 0.88 | 0.783 | 10.308 |
| 35 | 0.17 | 103 | 0.89 | 0.715 | 8.685 |
| 40 | 0.09 | 119 | 0.89 | 0.618 | 9.239 |
| 45 | 0.09 | 134 | 0.94 | 0.589 | 9.268 |
| 50 | 0.09 | 151 | 0.98 | 0.458 | 9.279 |