Skip to main content
. 2022 Sep 8;23:368. doi: 10.1186/s12859-022-04912-7

Table 1.

Important notations in RL

Notations Meaning
sS State
aA Actions
rR Immediate reward
γ Discount factor
Gt The long-term reward: Gt=k=0γkRt+k+1
πθ(a|s) Actor model with parameters θ; it is a distribution of action given the state
Vωπs Critic model with parameters ω; it depends on the policy model and can output score