A. In a four-arm restless bandit task, subjects choose on each trial from one of four targets. B. The value associated with each option changes in value (uncued) stochastically on each trial. Consequently, when the subject has identified the best target, there is a benefit to occasionally interspersing trials where an alternative is chosen (exploration) into the more common pattern of choosing the known best option (exploitation). For example, the subject may choose option A (red color) for several trials but would not know that blue (B) will soon overtake A in value without occasionally exploring other options. C. In this task, neurons in posterior cingulate cortex show higher tonic firing on explore trials than on exploit trials.