. 2025 Jan 2;25(1):211. doi: 10.3390/s25010211

Table 1.

Description of hyperparameters for $A$ and their respective values.

Hyperparameter	Description	Value 1	Value 2
Alpha ( $α$ )	Learning rate that controls how much the agent learns from each new experience. A higher value accelerates learning but may lead to unstable convergence.	$0.01$	$0.5$
Gamma ( $γ$ )	Discount factor that determines the importance of future rewards. A higher value prioritizes long-term rewards.	$0.9$	$0.5$
Epsilon ( $ϵ$ )	Exploration rate that controls the probability of the agent taking a random action instead of following its policy. A higher value encourages exploration.	$0.2$	$0.015$
Epsilon Decay ( $ϵ_{d e c a y}$ )	Decay rate for the exploration rate ( $ϵ$ ), which controls how $ϵ$ decreases over time, allowing the agent to reduce exploration as it learns.	$0.999$	$0.9$