Ns
|
No. of spatial units |
2 |
Ne
|
No. of units in recurrent network (autoencoder) |
490 |
Nm
|
No. of place cells |
980 |
Nv
|
No. of cortical units (layer 1) |
980 |
Nh
|
No. of cortical units (layer 2) |
300 |
β |
Bout length |
Variable (see text) |
tb
|
Time elapsed within bout |
[0, β] |
Rt
|
Reward administered at time t |
{0, 1} |
σϵ
|
Within-bout variance |
0 (none) or 0.003 |
σφ
|
Interbout variance |
0 (none) or 0.11 |
σf
|
Place cell breadth |
0.16 |
ϕ |
Sampled reward location (sudden shift) |
∼N(μ, σφ) bound to [0, 1] |
ϵ |
Incremental shift |
∼N(μ, σϵ) |
lt
|
Reward location at time t |
[0, 1] |
xt
|
Agent location at time t |
[0, 1] |
si
|
Place cell centerfield |
[0, 1] |
s |
Spatial cell activation vector |
— |
e(k)
|
Recurrent network (autoencoder) layer k activation vector |
— |
m |
Place cell activation vector (memory) |
— |
mE
|
Episodic output |
— |
mS
|
Schematic output |
— |
mO
|
Combined episodic/schematic output |
— |
mR
|
Output from replay event |
— |
m∼(xt+1|ai) |
Predicted output given action ai
|
— |
V |
Cortex layer 1 activation vector |
— |
H |
Cortex layer 2 activation vector |
— |
WSE-AE
|
Spatial encoder to autoencoder weights |
— |
WAE-AE
|
Autoencoder recurrent weights |
— |
WAE-PC
|
Autoencoder to place cell weights |
— |
WCTX
|
Cortical weights |
— |
|
Agent speed |
0.04 |
at
|
Action taken at time t |
ϵ{N, NW, W, SW, S, SE, E, NE} |
ai
|
Possible action at time t |
ϵ{N, NW, W, SW, S, SE, E, NE} |
αt
|
Policy unit (episodic, schematic) at time t |
[0, 1] |
αRt
|
Policy unit (random) at time t |
[0, 1] |
δt
|
Temporal difference error |
— |
γ |
Temporal difference discounting factor |
0.95 |
λ |
Learning rate (autoencoder) |
δt 0.1 |
|
Learning rate (cortex) |
0.00001 |
|
Learning rate (place cells, actor) |
0.0075 |
|
Learning rate (place cells, critic) |
0.04 |