Table 1.
Hyperparameter | Description | Value 1 | Value 2 |
---|---|---|---|
Alpha () | Learning rate that controls how much the agent learns from each new experience. A higher value accelerates learning but may lead to unstable convergence. | ||
Gamma () | Discount factor that determines the importance of future rewards. A higher value prioritizes long-term rewards. | ||
Epsilon () | Exploration rate that controls the probability of the agent taking a random action instead of following its policy. A higher value encourages exploration. | ||
Epsilon Decay () | Decay rate for the exploration rate (), which controls how decreases over time, allowing the agent to reduce exploration as it learns. |