Skip to main content
. 2024 Mar 12;11:1336612. doi: 10.3389/frobt.2024.1336612

TABLE 1.

Table summarizing the how components of the POMDP compare between active sensing formulations and RL formulations.

POMDP Reinforcement learning Active sensing Summary
State S Represents the environment’s true information which cannot be directly observed by the agent Represents the underlying characteristic of the environment or phenomena that the agent is observing Both maintain a beleif state Bt(st) . RL focuses on representing unobservable true state, while active sensing aims to understand observed characteristics to reduce uncertainty
Action A Actions are taken by the agent to influence the environment based on the current belief state Bt(st) Actions are the agent’s decisions on where to sample and collect data. Actions are typically selected to maximize information gained and/or reduce uncertainty in the belief state Bt(st) RL actions influence environment directly, while active sensing actions aim to maximize information gain through data collection
Observation Z Observations in RL provide partial information about the hidden state Observations represent the data or measurements collected by the sensors Typically directly comparable used to infer the underlying state S
Transition Function T:S×A×S This is an internal model of the environment describing the probability of how the environmental states evolve from the current state to the next state under the influence of a specific action a t A model of the environment describing how it’s characteristics change when agents take a specific action a t . Typically models the dynamics of what is being sensed In RL, transition function models state evolution; in active sensing, it models environmental characteristics change
Reward Function R:S×A A single scalar feedback value that determines how effective a given action was at the current time step. It is a high-level description of the agent’s desired outcomes The value of information gain. Assigned based on how well the latest sensing action reduces uncertainty in the belief state RL reward is more literal, it is intrinsic to learning and can incorporate the active sensing reward
Discount Factor γ t A scalar factor that determines how much weight the agent is giving to the future long-term rewards compared to the immediate reward A factor that balances between exploration of new unobserved regions and exploitation of data in the previously sensed regions with high level of information RL balances future rewards against immediate, active sensing balances exploration against exploitation of existing information