Skip to main content
. 2018 May 29;12:24. doi: 10.3389/fnbot.2018.00024

Figure 6.

Figure 6

(A) 2D problem used to explain the proposed Relevance Weighted Policy Optimization (RWPO) algorithm. The green x at the lower left corner of the image represents the start position. The blue lines in the middle represent a wall with a window in the center. The red x at the upper-right corner represents the end position. The goal of our algorithm is, given a few initial trajectories (depicted in light gray), to find a distribution over trajectories that begin at the start position, pass through the center of the window and reach the end position. (B) Learned relevance functions for the 2D problem. The learned relevance functions show that policy parameters close to w1 are more important for beginning at the start position, policy parameters around w5 are more important to pass through the center of the window and policy parameters close to w10 are more important to reach the end position.