. 2024 Mar 12;11:1336612. doi: 10.3389/frobt.2024.1336612

TABLE 1.

Table summarizing the how components of the POMDP compare between active sensing formulations and RL formulations.

POMDP	Reinforcement learning	Active sensing	Summary
State $S$	Represents the environment’s true information which cannot be directly observed by the agent	Represents the underlying characteristic of the environment or phenomena that the agent is observing	Both maintain a beleif state $B_{t} (s t)$ . RL focuses on representing unobservable true state, while active sensing aims to understand observed characteristics to reduce uncertainty
Action $A$	Actions are taken by the agent to influence the environment based on the current belief state $B_{t} (s t)$	Actions are the agent’s decisions on where to sample and collect data. Actions are typically selected to maximize information gained and/or reduce uncertainty in the belief state $B_{t} (s t)$	RL actions influence environment directly, while active sensing actions aim to maximize information gain through data collection
Observation $Z$	Observations in RL provide partial information about the hidden state	Observations represent the data or measurements collected by the sensors	Typically directly comparable used to infer the underlying state $S$
Transition Function $T : S \times A \times S$	This is an internal model of the environment describing the probability of how the environmental states evolve from the current state to the next state under the influence of a specific action a _t	A model of the environment describing how it’s characteristics change when agents take a specific action a _t. Typically models the dynamics of what is being sensed	In RL, transition function models state evolution; in active sensing, it models environmental characteristics change
Reward Function $R : S \times A$	A single scalar feedback value that determines how effective a given action was at the current time step. It is a high-level description of the agent’s desired outcomes	The value of information gain. Assigned based on how well the latest sensing action reduces uncertainty in the belief state	RL reward is more literal, it is intrinsic to learning and can incorporate the active sensing reward
Discount Factor γ _t	A scalar factor that determines how much weight the agent is giving to the future long-term rewards compared to the immediate reward	A factor that balances between exploration of new unobserved regions and exploitation of data in the previously sensed regions with high level of information	RL balances future rewards against immediate, active sensing balances exploration against exploitation of existing information