Skip to main content
. 2024 Mar 19;18:1338189. doi: 10.3389/fnbot.2024.1338189

Table 2.

Comparison of experimental training processes.

Policy Train Description
Mean_length Mean_reward Success_rate
DRLNDT 160.0993 46.3408 0.128 Transformer + SAC
Baseline 192.1297 70.02924 0.623 RNN + SAC
Standard SAC 251.9917 77.8219 0.713
DRLNDT-n-10 152.487 42.1571 0.1642 Historical state length n = 10
DRLNDT-n-20 77.869 13.363 0.0009 Historical state length n = 20
DRLNDT-n-30 75.24716 21.7094 0.0195 Historical state length n = 30
DRLNDT-n-40 85.8629 14.1125 0.024 Historical state length n = 40
DRLNDT-n-60 107.8153 45.3691 0.1809 Historical state length n = 60
DRLNDT-n-70 133.1698 51.724 0.1999 Historical state length n = 70
DRLNDT-n-80 179.1541 16.9834 0.00982 Historical state length n = 80
DRLNDT-w-0.125 105.7095 8.6156 0 Potential_reward_w = 0.125
DRLNDT-w-0.175 231.223 27.2584 0.0476 Potential_reward_w = 0.175
DRLNDT-w-0.25 107.2249 39.07225 0.1076 Potential_reward_w = 0.25
DRLNDT-w-0.5 111.2946 44.93245 0.139 Potential_reward_w = 0.5