Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2015 Sep 22;11(9):e1004402. doi: 10.1371/journal.pcbi.1004402

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2015 Christopoulos, Schrater

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.

PMC Copyright notice

Fig 2 — It consists of multiple stochastic optimal control schemes where each of them is attached to a particular goal presented currently in the field. We illustrate the architecture of the theory using the hypothetical scenario of the soccer game, in which the player who is possessing the ball is presented with 3 alternative options—i.e., 3 teammates—located at different distances from the current state x _t. In such a situation, the control schemes related to these options are triggered and generate 3 action plans (u ₁ = π ₁(x _t), u ₂ = π ₂(x _t) and u ₃ = π ₃(x _t)) to pursue each of the individual options. At each time t, desirabilities of the each policy in terms of action cost and good value are computed separately, then combined into an overall desirability. The action cost of each policy is the cost-to-go of the remaining actions that would occur if the policy were followed from the current state x _t to the target. These action costs are converted into a relative desirability that characterizes the probability that implementing this policy will have the lowest cost relative to the alternative policies. Similarly, the good value attached to each policy is evaluated in the goods-space and is converted into a relative desirability that characterizes the probability that implementing that policy (i.e., select the goal i) will result in highest reward compare to the alternative options, from the current state x _t. These two desirabilities are combined to give what we call “relative-desirability” value, which reflects the degree to which the individual policy π _i is desirable to follow, at the given time and state, with respect to the other available policies. The overall policy that the player follows is a time-varying weighted mixture of the individual policies using the desirability value as weighted factor. Because relative desirability is time- and state- dependent, the weighted mixture of policies produces a range of behavior from “winner-take-all” (i.e., pass the ball) to “spatial averaging” (i.e., keep the ball and delay your decision).