Table 2.
Comparison of experimental training processes.
Policy | Train | Description | ||
---|---|---|---|---|
Mean_length | Mean_reward | Success_rate | ||
DRLNDT | 160.0993 | 46.3408 | 0.128 | Transformer + SAC |
Baseline | 192.1297 | 70.02924 | 0.623 | RNN + SAC |
Standard SAC | 251.9917 | 77.8219 | 0.713 | |
DRLNDT-n-10 | 152.487 | 42.1571 | 0.1642 | Historical state length n = 10 |
DRLNDT-n-20 | 77.869 | 13.363 | 0.0009 | Historical state length n = 20 |
DRLNDT-n-30 | 75.24716 | 21.7094 | 0.0195 | Historical state length n = 30 |
DRLNDT-n-40 | 85.8629 | 14.1125 | 0.024 | Historical state length n = 40 |
DRLNDT-n-60 | 107.8153 | 45.3691 | 0.1809 | Historical state length n = 60 |
DRLNDT-n-70 | 133.1698 | 51.724 | 0.1999 | Historical state length n = 70 |
DRLNDT-n-80 | 179.1541 | 16.9834 | 0.00982 | Historical state length n = 80 |
DRLNDT-w-0.125 | 105.7095 | 8.6156 | 0 | Potential_reward_w = 0.125 |
DRLNDT-w-0.175 | 231.223 | 27.2584 | 0.0476 | Potential_reward_w = 0.175 |
DRLNDT-w-0.25 | 107.2249 | 39.07225 | 0.1076 | Potential_reward_w = 0.25 |
DRLNDT-w-0.5 | 111.2946 | 44.93245 | 0.139 | Potential_reward_w = 0.5 |