-
1:
Initialize: Time slot count: ;
-
2:
while do
-
3:
Increment the iteration index ;
-
4:
for
-
5:
if (initial time slot)
-
6:
then Initialize the super arm for the first trial () as ,
-
7:
else ,
-
8:
end if
-
9:
Exploration Stage: Run Algorithm 1
-
10:
Estimation Stage:
-
11:
Calculate the mean reward vector for the frame , where .
-
12:
Adjustment Stage:
-
13:
if (number of times the p-th arm is played)
-
14:
then adjust ,
-
15:
else .
-
16:
end if
-
17:
end for
-
18:
Average adjusted mean reward vector over all frames .
-
19:
Exploitation Stage:
-
20:
Average over accumulated number of time slots, as .
-
21:
For the next time slot: find N optimum arm indexes as , and the updated super arm as .
-
22:
end while
|