Each individual expert receives sensory input and makes its own
predictions about the expected value of taking different actions. The
predictions of each expert can then be compared with reality, when the organism
takes an action and experiences an outcome. The difference between predicted and
actual outcomes are then compared to yield a prediction error. The prediction
errors for each system are then reported to a “manager” which uses
them to compute a reliability signal (blue line), corresponding to a
recency-weighted cumulative averaged prediction error for that controller. The
manager uses these reliability signals to compute weights over the experts,
proportional to their relative reliabilities. These weights are used by the
manager to implement a gating of the outputs of each expert (red line),
modulating the degree to which each expert contributes its
“advice” toward the overall control of behavior (black line). The
overall behavioral policy of the organism then corresponds to a combination of
the advice of each expert, weighted by its overall reliability. The present
schematic is agnostic as to the nature of the experts or their number. Four
generic experts are depicted here. For a related mixture of experts
implementation in computational reinforcement-learning see Hamrick et al., (2017).