Skip to main content
. 2021 Feb 15;104:29–42. doi: 10.1016/j.tranpol.2021.01.008

Table 2.

Parameter settings for TD3.

Parameter Value
Learning rate 0.001–0.005
Discount factor 0.99
Training batch size 64
Policy delay 2
Target smoothing coefficient 0.05
Min replay buffer size 150
Max replay buffer size 5000
Training batch size 64
Step size (day) 50
Number of iterations 500