| Algorithm 1 The M-EXP3 Algorithm with the Modification. | ||||||||||||
| Parameters: in where is a discount factor initialization: for all i = 1, …, K. For each time t = 1, 2, … At time t, Receive the experts’ advice vectors Calculate, for each action i, the probability
Calculate the sum of the weights of the actions at time t:
Choose action according to the max distribution , Receive a profit for the action i:
Update as the reward (here the reward is a function of the expert in addition to the current action)
Update the weight of each expert
|