Skip to main content
. 2023 Jan 18;25(2):188. doi: 10.3390/e25020188
Algorithm 2 Contextual bandit with Energy Based Models
  • Input: N,θ0,X0,K,c,α,amax,amin,η,σ

  • for  i=1,,N  do

  •     Choose aiπi with SGLD, a˜0U(amin,amax)

  •     for k=1,,K do

  •         Draw sample for noise ωN(0,σ)

  •         a˜ka˜k1ηxEθi1(a˜k1,si)/α+ω

  •     end for

  •     Play action a˜K, receive ri, update X

  •     Every c steps train Eθ in batches:

  •     θi=arg minθXlog(1+eEθ(a+,sj)Eθ(a,sj))

  • end for