Skip to main content
. 2020 Oct 15;7:558531. doi: 10.3389/frobt.2020.558531

Algorithm 1.

LinUCB (Li et al., 2010)

  0: Inputs: α ∈ ℝ+
  1: for t = 1, 2, 3, …, T do
  2:      Observe features of all arms aAt: xt,a ∈ ℝd
  3:      for all aAt do
  4:         if a is new then
  5:            AaId (d-dimensional identity matrix)
  6:            ba ← 0d×1 (d-dimensional zero vector)
  7:         end if
  8:         θ^aAa-1 ba
  8:         Pt,aθ^aT xt,a + α xt,aTAa-1xt,a
10:    end for
11:    Choose arm at = arg maxaAt Pt,a with ties broken arbitrarily and observe a real valued payoff rt
12:    AatAat + xt,at xt,atT
13:    batbat + rt xt,at
14:  end for