表 1. Statistical results of navigation experiments of various algorithms.
各算法导航实验的统计结果
| 环境 | 算法 | 导航路径 平均长度/m |
发现目标区域后 收敛的概率
|
首次发现目标所需的 平均探索次数 |
完成导航习惯养成 所需的平均探索次数 |
| 1 | Q-learning | 16.59 | 84.2% | 9.4 | 11.2 |
| SARSA | 19.56 | 67.6% | 10.7 | 16.5 | |
| IAC | 17.44 | 85.8% | 11.9 | 17.3 | |
| Sn-Plast | 18.97 | 73.3% | 6.6 | 13.8 | |
| Sn-Plast + PO | 13.02 | 73.3% | 6.6 | 13.8 | |
| Sn-Plast + PO + PF | 12.48 | 96.1% | 6.4 | 6.7 | |
| 2 | Q-learning | 11.87 | 79.9% | 8.5 | 9.3 |
| SARSA | 12.62 | 73.5% | 9.8 | 12.4 | |
| IAC | 11.14 | 81.3% | 7.7 | 8.9 | |
| Sn-Plast | 12.90 | 80.7% | 6.8 | 12.9 | |
| Sn-Plast + PO | 7.73 | 80.7% | 6.8 | 12.9 | |
| Sn-Plast + PO + PF | 7.52 | 97.8% | 7.0 | 7.1 | |
| 3 | Q-learning | 16.28 | 76.2% | 9.7 | 14.2 |
| SARSA | 17.55 | 71.5% | 9.3 | 22.8 | |
| IAC | 17.13 | 89.1% | 8.5 | 16.1 | |
| Sn-Plast | 18.93 | 78.4% | 7.9 | 20.7 | |
| Sn-Plast + PO | 11.40 | 78.4% | 7.9 | 20.7 | |
| Sn-Plast + PO + PF | 11.16 | 97.3% | 7.9 | 8.4 | |
| 4 | Q-learning | 18.51 | 85.7% | 9.7 | 14.1 |
| SARSA | 19.38 | 70.2% | 11.6 | 20.9 | |
| IAC | 18.96 | 91.5% | 8.1 | 9.6 | |
| Sn-Plast | 19.73 | 74.8% | 6.2 | 17.4 | |
| Sn-Plast + PO | 14.06 | 74.8% | 6.2 | 17.4 | |
| Sn-Plast + PO + PF | 13.70 | 98.2% | 5.9 | 6.1 |
