Results in the horizontal and sagittal policies. (a–d) indicate the horizontal targets, and (e–h) indicate sagittal targets. (a) An example of snapshots in the horizontal targets. (b) Trajectories in the horizontal targets. Black cross symbols and bullets indicate qtarget and qinitial, respectively. (c) Learning progress. Trajectories at the 0, 10, 20, 30, 50, 100, 150, 200, 250 and 300 (×103)th iterations are shown. (d) Trajectories in the various targets with different θ, including unlearned ones. (e) An example of snapshots in the sagittal targets. (f) Trajectories in the sagittal targets. (g) Learning progress. Trajectories at the 0, 10, 20, 30, 50, 100, 150, 200, 250 and 300 (×103)th iterations are shown. (h) Trajectories in the various targets with different θ, including unlearned ones.