State
|
Represents the environment’s true information which cannot be directly observed by the agent |
Represents the underlying characteristic of the environment or phenomena that the agent is observing |
Both maintain a beleif state
. RL focuses on representing unobservable true state, while active sensing aims to understand observed characteristics to reduce uncertainty |
Action
|
Actions are taken by the agent to influence the environment based on the current belief state
|
Actions are the agent’s decisions on where to sample and collect data. Actions are typically selected to maximize information gained and/or reduce uncertainty in the belief state
|
RL actions influence environment directly, while active sensing actions aim to maximize information gain through data collection |
Observation
|
Observations in RL provide partial information about the hidden state |
Observations represent the data or measurements collected by the sensors |
Typically directly comparable used to infer the underlying state
|
Transition Function
|
This is an internal model of the environment describing the probability of how the environmental states evolve from the current state to the next state under the influence of a specific action
a
t
|
A model of the environment describing how it’s characteristics change when agents take a specific action
a
t
. Typically models the dynamics of what is being sensed |
In RL, transition function models state evolution; in active sensing, it models environmental characteristics change |
Reward Function
|
A single scalar feedback value that determines how effective a given action was at the current time step. It is a high-level description of the agent’s desired outcomes |
The value of information gain. Assigned based on how well the latest sensing action reduces uncertainty in the belief state |
RL reward is more literal, it is intrinsic to learning and can incorporate the active sensing reward |
Discount Factor γ
t
|
A scalar factor that determines how much weight the agent is giving to the future long-term rewards compared to the immediate reward |
A factor that balances between exploration of new unobserved regions and exploitation of data in the previously sensed regions with high level of information |
RL balances future rewards against immediate, active sensing balances exploration against exploitation of existing information |