Skip to main content
. 2021 Mar 25;21(7):2308. doi: 10.3390/s21072308
Algorithm 2 Main Online Learning Algorithm
  • 1:

    Initialize: Time slot count: t=0;

  • 2: 

    while tT do

  • 3: 

    Increment the iteration index t=t+1;

  • 4: 

          for f=1:F

  • 5: 

                if t=1 (initial time slot)

  • 6: 

                      then Initialize the super arm for the first trial (k=1) as                   A1[set]={01,,0N},

  • 7: 

                else                   A1[set]=S*,

  • 8: 

                end if

  • 9: 

          Exploration Stage: Run Algorithm 1

  • 10: 

          Estimation Stage:

  • 11: 

          Calculate the mean reward vector for the frame                   μ^n[f,t]=(μ^n,1[f,t],μ^n,2[f,t],,μ^n,J[f,t]), where                   μ^n,p[f,t]=k=1Kμn,p[k,f,t]K,pJ,nLb.

  • 12: 

          Adjustment Stage:

  • 13: 

                if Ψp (number of times the p-th arm is played) 0

  • 14: 

                      then                   adjust μ¯n,p[f,t]=μ^n,p[f,t]+3lnK2Ψp,

  • 15: 

                else                   μ¯n,p[f,t]=μ^n,p[f,t],pJ,nLb.

  • 16: 

                end if

  • 17: 

          end for

  • 18: 

    Average adjusted mean reward vector over all frames                   μ¯n[t]=fFμ¯n,1[f,t]F,fFμ¯n,2[f,t]F,,fFμ¯n,J[f,t]F,nLb.

  • 19: 

    Exploitation Stage:

  • 20: 

    Average μ¯n[t] over accumulated number of time slots, as                    μ¯n=t=1tμ¯n[t]t=[μ¯n,1,μ¯n,2,,μ¯n,J],nLb.

  • 21: 

    For the next time slot: find N optimum arm indexes as                   pn*=argmaxp(μ¯n,p),pJ,nLb, and the updated super arm as                   S*=ΔE[p1*,p2*,,pN*].

  • 22: 

    end while