Figure 10.
The training results of UAV target tracking. (a) The variation curve of episode cumulative reward in training process of speed command perception; (b) the position of UAV and target; (c) evaluation policy loss result of in training process; (d) evaluation value loss results of in training process.
