Skip to main content

View full-text article in PMC

. 2024 Mar 19;18:1338189. doi: 10.3389/fnbot.2024.1338189

Table 2.

Comparison of experimental training processes.

Policy	Train			Description
	Mean_length	Mean_reward	Success_rate
DRLNDT	160.0993	46.3408	0.128	Transformer + SAC
Baseline	192.1297	70.02924	0.623	RNN + SAC
Standard SAC	251.9917	77.8219	0.713
DRLNDT-n-10	152.487	42.1571	0.1642	Historical state length n = 10
DRLNDT-n-20	77.869	13.363	0.0009	Historical state length n = 20
DRLNDT-n-30	75.24716	21.7094	0.0195	Historical state length n = 30
DRLNDT-n-40	85.8629	14.1125	0.024	Historical state length n = 40
DRLNDT-n-60	107.8153	45.3691	0.1809	Historical state length n = 60
DRLNDT-n-70	133.1698	51.724	0.1999	Historical state length n = 70
DRLNDT-n-80	179.1541	16.9834	0.00982	Historical state length n = 80
DRLNDT-w-0.125	105.7095	8.6156	0	Potential_reward_w = 0.125
DRLNDT-w-0.175	231.223	27.2584	0.0476	Potential_reward_w = 0.175
DRLNDT-w-0.25	107.2249	39.07225	0.1076	Potential_reward_w = 0.25
DRLNDT-w-0.5	111.2946	44.93245	0.139	Potential_reward_w = 0.5