Skip to main content
. 2019 May 31;5(6):970–981. doi: 10.1021/acscentsci.9b00055

Figure 3.

Figure 3

Training results. (a, b) ⟨ctot⟩ and ⟨btot⟩ computed using π are plotted versus policy iterations, respectively (solid blue squares). Solid horizontal lines show these quantities for the heuristic policy πsd (red triangles) and the random policy (black circles). The larger cyan square shows ⟨ctot⟩ after each tree had been searched for the best (lowest) target cost. Dashed vertical lines show points when ε was lowered.