Skip to main content
. 2021 Sep 13;8:738113. doi: 10.3389/frobt.2021.738113

FIGURE 9.

FIGURE 9

Path progression performance comparison between RL algorithms in training and real-world simulation environments using the simplified reward function. Compared to the previous reward function, the off-policy RL algorithms show a drastic increase in performance. In contrast, PPO now performs considerably worse in the Sorbuoya environment, yet maintains its performance in the other testing environments and is still the best performer overall.