Skip to main content
. 2022 Sep 5;9:878246. doi: 10.3389/frobt.2022.878246

FIGURE 3.

FIGURE 3

Simulation results of the proposed learning framework with the variation of step size α (A,B), trace decay rate λ (C,D), and ϵ-greedy parameter (E,F) with respect to the total number of steps per episode and returns per episode.