Skip to main content
. 2017 Jul 6;12(7):e0180234. doi: 10.1371/journal.pone.0180234

Fig 1. Overall architecture of the NHRL model, showing the three main components and the functional values flowing between them.

Fig 1

The action values component computes the Q values given the state from the environment (see Fig 2 for more detail). The action selection component determines the highest valued action, and sends the action itself to the environment and the identity of the selected action to the error calculation component (see Fig 3 for more detail). The error calculation component uses the Q values and environmental reward to calculate the TD error, which it uses to update the Q function in the action values component (see Fig 4 for more detail). Triangular arrowheads indicate a vector value connection, semicircle indicates a modulatory connection that drives learning, and circular arrowheads denote an inhibitory connection. See Fig 6 for a mapping of this abstract architecture onto neuroanatomical structures.