Figure 2.
Concept of reinforcement learning. An agent receives current state observations from an environment (1) and, in response, selects an action according to its policy (2). For this action, the agent receives a reward based on a reward function and the state of the environment changes (3). The agent’s goal is to maximize long-term rewards and achieve the optimum possible return.
