The schematic illustrates the computational architecture that best accounts for the choice and confidence data. In each context (or state) ‘s’, the agent tracks option values (Q(s,:)), which are used to decide amongst alternative courses of action, together with the value of the context (V(s)), which quantify the average expected value of the decision context. In all contexts, the agent receives an outcome associated with the chosen option (Rc), which is used to update the chosen option value (Q(s,c)) via a prediction error (δc) weighted by a learning rate (αc). In the complete feedback condition, the agent also receives information about the outcome of the unselected option (Ru), which is used to update the unselected option value (Q(s,u)) via a prediction error (δu) weighted by a learning rate (αu). The available feedback information (Rc and Ru, in the complete feedback contexts and Q(s,u) in the partial feedback contexts) is also used to update the value of the context (V(s)), via a prediction error (δV) weighted by a specific learning rate (αV). Option and context values jointly contribute to the generation of confidence judgments.