Skip to main content
. 2022 Feb 18;22(4):1603. doi: 10.3390/s22041603
Algorithm 1 The M-EXP3 Algorithm with the Modification.
Parameters: η=γα in [0,1] where α>0 is a discount factor
initialization: wi(1)=1 for all i = 1, …, K.
For each time t = 1, 2, …
At time t,
Receive the experts’ advice vectors Bi
Calculate, for each action i, the probability
Pi(t)=(1η)j=1Nwi,j(t)Bi(t)Wt+ηK (5)

Calculate the sum of the weights of the actions at time t:
Wt=j=1Kwj(t) (6)

Choose action It according to the max distribution Pi(t),
Receive a profit for the action i:
gi(t)[0,1], (7)
gi(t)=git(t)/Pit(t) If ACK (rj(i)) is received0 Otherwise (8)

Update Bi(t) as the reward (here the reward is a function of the expert in addition to the current action)
yi(t)=Bi(t)·g(t)=i=1KBi(t)gi(t) (9)

Update the weight of each expert
wj(t+1)=wj(t)exp(ηKyi(t)) (10)