Skip to main content
. 2013 Jul 22;8(7):e69541. doi: 10.1371/journal.pone.0069541

Figure 1. Experiment and Model Summary.

Figure 1

A: The reward signal is derived from the subject's frontal lobe hemodynamics. The Δ[HbO] and Δ[HbD] signals recorded at times around events are classified using a support vector machine (SVM) in order to read out their prediction about the subjective desirability of the event. Any classifier is subject to some misclassification noise (green arrow with red imperfections, and vice versa) so the RL agent that uses this signal as reward information must be robust to occasional misclassifications. Gray inset: The error rates achieved by the SVM classifier in this study were added to the win/loss feedback to a model task in which the reinforcement learning agent had to select actions to be taken by a rake tool in order to achieve the goal of pulling a pellet off of the front side of a table, without knocking it off the back side. The adaptation of the action values for the most recently observed state (and thus the adaptation of the agent's control policy in subsequent visits to that state) is dictated by the reward signal. The agent learns to select the action with the highest expected return, given the current state (i.e. the locations of the pellet and rake tool). B: Brain MRI of the rhesus macaque used in this study. The T1-weighted MRI image (right panel) was registered to a standard atlas (left panel) to locate the DLPFC region of cortex (indicated by the crosshairs). Skull landmarks were then used to localize and place probe guides during implantation. The lower right subpanel shows a 3D reconstruction of the subject's head with dots at the locations of the NIRS probes used as sources (purple) and detectors (red).