Skip to main content
. 2018 Mar 26;12:106. doi: 10.3389/fnhum.2018.00106

Figure 5.

Figure 5

DDM and RL Model. In the DDM in (A), single traces show multiple examples of simulated noisy accumulation of evidence to correct (black) or incorrect (red) decisions. The resulting distribution of reaction times are plotted for correct (top) and incorrect trials (bottom). The DDM relies on three main parameters—the non decision time, the threshold q indicating the bounds to which evidence is accumulated, and the drift rate indicating the rate of evidence accumulation. In the RLM depicted in (B), an example of a sequence of 30 learning trials is given where the left choice is rewarded with probability p = 0.2, and the right with probability p = 0.8. Given a choice and reward history (black), the computational model provides the inferred underlying changes in expected value for each option (V, red and blue traces), and the inferred reward prediction error (RPE, yellow). The latent variables from modeling can be used to analyze trial by trial voltage. Within (C) activity over mid-frontal electrodes is correlated with RPEs from correct trials (modified from Collins and Frank, 2016).