Table 3.
Comparison with the state of the art on Atari score optimization. For reference, we include here the scores for the state of the art in Atari game playing optimizing for in-game scores: Agent57 [2], which was in 2020 the first implementation to surpass the reference “average human” performance on all 57 Atari games. The scores for “Average Human”, “Random“ and “Agent57” in the table are taken directly from the paper. Such an impressive achievement utilizes a classic reinforcement learning framework setup, where not one but two neural networks are used to approximate the value function, rather than the policy. As such, network size comparison is not directly applicable, though Figure 1 in their paper gives us an estimate (for both networks) of over 4’300 neurons and more than 1’600’000 weights. Notice that our method notably surpasses the random policy baseline on all games, supporting our “sensible play” thesis, and actually also overtakes reference human performance on one game: FishingDerby, using only 18 neurons
Game | Average human | Random | Agent57 | Ours |
---|---|---|---|---|
Berzerk | 2630 | 124 | 61508 | 900 |
Bowling | 161 | 23 | 251 | 82 |
DemonAttack | 1971 | 152 | 143161 | 325 |
Enduro | 861 | 0 | 2368 | 7.4 |
FishingDerby | 39 | 92 | 87 | − 10 |
Frostbite | 4335 | 65 | 541281 | 300 |
Gravitar | 3351 | 173 | 19214 | 1100 |
Kangaroo | 3035 | 52 | 24034 | 1200 |
NameThisGame | 8049 | 2292 | 54387 | 920 |
Phoenix | 7243 | 761 | 908264 | 4600 |
Qbert | 13455 | 164 | 580329 | 1250 |
Seaquest | 42055 | 68 | 999998 | 320 |
SpaceInvaders | 1669 | 148 | 48681 | 830 |
StarGunner | 10250 | 664 | 839574 | 1200 |
TimePilot | 5229 | 3568 | 405425 | 4600 |