Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2015 Aug 11;5:12874. doi: 10.1038/srep12874

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2015, Macmillan Publishers Limited

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

PMC Copyright notice

For φ = π/4 (left) the action pair and is degenerate with respect to the expected reward. For φ = π/8 (middle), i.e., not exactly between two projectors, the agent measures more often into the direction α = 0. Fluctuations in the measurement probabilities do not necessarily show in the success probability. For comparison, the ensemble averages of 1000 agents after 1000 measurements are given as dashed lines. Larger rewards λ and damping γ (both rescaled by a factor 10) decrease the timescale of the fluctuations while maintaining approximately the same time average (right). The agent jumps between different preferred action and stays for extended times.

Inline graphic — For φ = π/4 (left) the action pair and is degenerate with respect to the expected reward. For φ = π/8 (middle), i.e., not exactly between two projectors, the agent measures more often into the direction α = 0. Fluctuations in the measurement probabilities do not necessarily show in the success probability. For comparison, the ensemble averages of 1000 agents after 1000 measurements are given as dashed lines. Larger rewards λ and damping γ (both rescaled by a factor 10) decrease the timescale of the fluctuations while maintaining approximately the same time average (right). The agent jumps between different preferred action and stays for extended times.