Examples of paths obtained with SARSA learning under different strategies: (a, b) - traditional exploration-exploitation (E) for initial learning stages, (c, d) - exploration-exploitation mixed with path straightening (SE) for initial learning stages, (e) - learned optimal path with (S). When adding an E-component (SE) sometimes kinks exist from an exploratory move early on the path (inset in e), (f)- zigzagging learned path in straightened case (S). The inset shows that adding the E component (SE) will reduce zigzagging. Panels (g, h) show divergent paths for the straightened case (S), inset in (h) - divergent pattern when direction “back” is not forbidden. Default parameters (Table 1) were used. Small numbers at the bottom refer to the trial number from which the examples were taken