Skip to main content
. 2023 Jan 18;25(2):188. doi: 10.3390/e25020188
Algorithm 1 Maximum entropy exploration with neural networks
  • Input: α,N,θ0,X0,k

  • for  i=1,,N  do

  •     Receive context si and choose aiπi where

  •     πi(a|si)=er^θi1(a,si)/αaer^θi1(a,si)/αda

  •     Agent receives reward ri

  •     Add the triplet {si,ai,ri} to the dataset X

  •     Every k steps train the model r^θ:

  •     θi=arg minθ{sj,aj,rj}X|rjr^θ(aj,sj)|

  • end for