Figure 1.
Expected mean net energy gain of an omniscient agent performing an optimal strategy for a sequence of T=500 trials. The curves (from the top to the bottom) represent 1/η=∞, 100, 20, 10, 2 and 1, respectively, with 1/η being the expected number of trials between subsequent changes in the reward probability. The values represent the mean over different initial reward probabilities (equation (2.4)).