| Notations used in the paper. |
| ∝ |
proportional to |
|
|
|
Euclidian norm of x
|
| t |
timestep |
|
arbitrary constant |
| A |
set of possible actions |
| S |
set of possible states |
|
action |
|
state |
|
first state of a trajectory |
|
final state of a trajectory |
|
state following a tuple
|
| h |
history of interactions
|
|
predicted states |
|
goal |
|
state used as a goal |
|
set of states contained in b
|
|
trajectory |
|
function that extracts parts of the trajectory
|
|
reward function |
|
t-steps state distribution |
|
stationary state-visitation distribution of over a horizon T |
|
|
|
| f |
representation function |
| z |
compressed latent variable,
|
|
density model |
|
forward model |
|
true forward model |
|
parameterized discriminator |
|
policy |
|
policy conditioned on a goal g
|
|
k-th closest state to in S
|
|
Kullback–Leibler divergence |
|
|
|
|
|
|
|
|
|
|
|
|
Information gain |
|
|
|