Algorithm 2 Upper confidence bound (UCB) |
-
1:
for do
-
2:
Apply treatment i ▹ apply each treatment once
-
3:
for do
-
4:
for do
-
5:
Estimate ▹ estimate mean rewards
-
6:
# of times treatment u has been applied so far
-
7:
▹ select and apply action
-
8:
Apply and observe
-
9:
▹ update distribution
|