Skip to main content
. 2021 Apr 19;35(2):17. doi: 10.1007/s10458-021-09497-8

Table 3.

Comparison with the state of the art on Atari score optimization. For reference, we include here the scores for the state of the art in Atari game playing optimizing for in-game scores: Agent57 [2], which was in 2020 the first implementation to surpass the reference “average human” performance on all 57 Atari games. The scores for “Average Human”, “Random“ and “Agent57” in the table are taken directly from the paper. Such an impressive achievement utilizes a classic reinforcement learning framework setup, where not one but two neural networks are used to approximate the value function, rather than the policy. As such, network size comparison is not directly applicable, though Figure 1 in their paper gives us an estimate (for both networks) of over 4’300 neurons and more than 1’600’000 weights. Notice that our method notably surpasses the random policy baseline on all games, supporting our “sensible play” thesis, and actually also overtakes reference human performance on one game: FishingDerby, using only 18 neurons

Game Average human Random Agent57 Ours
Berzerk 2630 124 61508 900
Bowling 161 23 251 82
DemonAttack 1971 152 143161 325
Enduro 861 0 2368 7.4
FishingDerby -39 -92 87 − 10
Frostbite 4335 65 541281 300
Gravitar 3351 173 19214 1100
Kangaroo 3035 52 24034 1200
NameThisGame 8049 2292 54387 920
Phoenix 7243 761 908264 4600
Qbert 13455 164 580329 1250
Seaquest 42055 68 999998 320
SpaceInvaders 1669 148 48681 830
StarGunner 10250 664 839574 1200
TimePilot 5229 3568 405425 4600