Skip to main content
. 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277

Table 1.

The relative advantages of using the UCB approach and Thompson sampling to select new treatments, as opposed to using a random assignment, in the context-free multi-arm-bandit and contextual-bandit cases. For instance, the first entry in the table means that Thompson sampling incurs only 11.18 ± 5% of the regret incurred using a random assignment.

Multi-Arm Bandit Contextual Bandit
Thompson UCB Thompson UCB
Regret 11.18 ± 5% 29.57 ± 7% 11.03 ± 3% 26.10 ± 4%
Suboptimal draws 35.66 ± 10% 64.79 ± 13% 27.37 ± 2% 44.78 ± 3%