Task design, experiment 1. (A) Naturalistic tree stimuli were parametrically varied along two dimensions (leafiness and branchiness). (B) All participants engaged in a virtual gardening task with two different gardens (north and south). Via trial and error, they had to learn which type of tree grows best in each garden. (C) Each training trial consisted of a cue, stimulus, response, and feedback period. At the beginning of each trial, an image of one of the two gardens served as contextual cue. Next, the context was blurred (to direct the attention toward the task-relevant stimulus while still providing information about the contextual cue), and the stimulus (tree) appeared together with a reminder of the key mapping (“accept” vs. “reject,” corresponding to “plant” vs. “don’t plant”) in the center of the screen. Once the participant had communicated her decision via button press (left or right arrow key), the tree would either be planted inside the garden (“accept”) or disappear (“reject”). In the feedback period, the received and counterfactual rewards were displayed above the tree, with the received one being highlighted, and the tree would either grow or shrink, proportionally to the received reward. Test trials had the same structure, but no feedback was provided. Key mappings were counterbalanced across participants. (D) Unbeknownst to the participants a priori, there were clear mappings of feature dimensions onto rewards. In experiment 1a (cardinal group), each of the two feature dimensions (branchiness or leafiness) was mapped onto one task rule (north or south). The sign of the rewards was counterbalanced across participants (see Methods). (E) In experiment 1b (diagonal group), feature combinations were mapped onto rewards, yielding nonverbalizable rules. Once again, we counterbalanced the sign of the rewards across participants. (F) Experiments 1a and 1b were between-group designs. All four groups were trained on 400 trials (200 per task) and evaluated on 200 trials (100 per task). The groups differed in the temporal autocorrelation of the tasks during training, ranging from “blocked 200” (200 trials of one task, thus only one switch) to “interleaved” (randomly shuffled and thus unpredictable task switches). Importantly, all four groups were evaluated on interleaved test trials. The order of tasks for the blocked groups was counterbalanced across participants.