. 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277

Table 1.

The relative advantages of using the UCB approach and Thompson sampling to select new treatments, as opposed to using a random assignment, in the context-free multi-arm-bandit and contextual-bandit cases. For instance, the first entry in the table means that Thompson sampling incurs only 11.18 ± 5% of the regret incurred using a random assignment.

	Multi-Arm Bandit		Contextual Bandit
	Thompson	UCB	Thompson	UCB
Regret	11.18 ± 5%	29.57 ± 7%	11.03 ± 3%	26.10 ± 4%
Suboptimal draws	35.66 ± 10%	64.79 ± 13%	27.37 ± 2%	44.78 ± 3%