Skip to main content
. 2008 Jul 9;2(1):86–99. doi: 10.3389/neuro.01.014.2008

Figure 5.

Figure 5

Two possible realizations of an Actor/Critic network with a dysfunctional Critic. (A) The Critic is unable to learn or represent a meaningful mapping between states and their values (depicted is the extreme case of similar values for all states), thus the prediction error signal δ that is used to train the Actor comprises only of the current reward. (B) A deficient prediction error signal disrupts learning in both the Critic and the Actor (depicted is the extreme case of no prediction error whatsoever).