An Information-Theoretic Perspective on Intrinsic Motivation in Reinforcement Learning: A Survey

. 2023 Feb 10;25(2):327. doi: 10.3390/e25020327

Notations used in the paper.
∝	proportional to
$x \sim p (\cdot)$	$x \sim p (x)$
${\| \| x \| \|}_{2}$	Euclidian norm of x
t	timestep
$C o n s t$	arbitrary constant
A	set of possible actions
S	set of possible states
$a \in A$	action
$s \in S$	state
$s_{0} \in S$	first state of a trajectory
$s_{f} \in S$	final state of a trajectory
$s^{'} \in S$	state following a tuple $(s, a)$
h	history of interactions $(s_{0}, a_{0}, s_{1}, \dots)$
$\hat{s}$	predicted states
$g \in G$	goal
$s_{g} \in S$	state used as a goal
$S_{b}$	set of states contained in b
$τ \in T$	trajectory
$u (τ)$	function that extracts parts of the trajectory $τ$
$R (s, a, s^{'})$	reward function
$d_{t}^{π} (s)$	t-steps state distribution
$d_{0 : T}^{π} (S)$	stationary state-visitation distribution of $π$ over a horizon T
	$\frac{1}{T} \sum_{t = 1}^{T} d_{t}^{π} (S))$
f	representation function
z	compressed latent variable, $z = f (s)$
$ρ \in P$	density model
$ϕ \in Φ$	forward model
$ϕ_{T} \in Φ_{T}$	true forward model
$q_{ω}$	parameterized discriminator
$π$	policy
$π^{g}$	policy conditioned on a goal g
$n n_{k} (S, s^{'})$	k-th closest state to $s^{'}$ in S
$D_{K L} (p (x) \| \| p^{'} (x))$	Kullback–Leibler divergence
	$E_{x \sim p (\cdot)} log \frac{p (x)}{p^{'} (x)}$
$H (X)$	$- \int_{X} p (x) log p (x)$
$H (X \| S)$	$- \int_{S} p (s) \int_{X} p (x \| s) log p (x \| s) d x d s$
$I (X; Y)$	$H (X) - H (X \| Y)$
$I (X; Y \| S)$	$H (X \| S) - H (X \| Y, S)$
$I G (h, A, S^{'}, S, Φ)$	Information gain
	$I (S^{'}; Φ \| h, A, S)$