Table 1.
The relative advantages of using the UCB approach and Thompson sampling to select new treatments, as opposed to using a random assignment, in the context-free multi-arm-bandit and contextual-bandit cases. For instance, the first entry in the table means that Thompson sampling incurs only 11.18 ± 5% of the regret incurred using a random assignment.
Multi-Arm Bandit | Contextual Bandit | |||
---|---|---|---|---|
Thompson | UCB | Thompson | UCB | |
Regret | 11.18 ± 5% | 29.57 ± 7% | 11.03 ± 3% | 26.10 ± 4% |
Suboptimal draws | 35.66 ± 10% | 64.79 ± 13% | 27.37 ± 2% | 44.78 ± 3% |