Task design. (A) In the Horizon task, participants chose between two bandits whose payout values are hidden. Within the first four choices, the computer revealed the payouts three times for one bandit and once for the other bandit, creating an “unequal information” condition. Information-directed exploration was defined as selecting the bandit with one payout (more informative) at the first free choice. The length of each game was manipulated to have either one or six free choices, rendering exploration more advantageous in long-horizon games. Figure adapted from Somerville et al. (37). (B) In the Orchard task, participants decided whether to harvest at the current tree, where apples depleted over time, or switch to the next tree with a full supply of apples. Exploration was measured by exit thresholds, the average of the last two harvests before moving to a different tree. A high exit threshold indicates that participants moved to the next option faster and thus represents a higher exploration rate. The travel time (short or long) indicated the cost of switching in different orchards, representing the “rich” and “poor” foraging environment, respectively. Figure adapted from Lenow et al. (34).