. 2021 Apr 19;35(2):17. doi: 10.1007/s10458-021-09497-8

Table 3.

Comparison with the state of the art on Atari score optimization. For reference, we include here the scores for the state of the art in Atari game playing optimizing for in-game scores: Agent57 [2], which was in 2020 the first implementation to surpass the reference “average human” performance on all 57 Atari games. The scores for “Average Human”, “Random“ and “Agent57” in the table are taken directly from the paper. Such an impressive achievement utilizes a classic reinforcement learning framework setup, where not one but two neural networks are used to approximate the value function, rather than the policy. As such, network size comparison is not directly applicable, though Figure 1 in their paper gives us an estimate (for both networks) of over 4’300 neurons and more than 1’600’000 weights. Notice that our method notably surpasses the random policy baseline on all games, supporting our “sensible play” thesis, and actually also overtakes reference human performance on one game: FishingDerby, using only 18 neurons

Game	Average human	Random	Agent57	Ours
Berzerk	2630	124	61508	900
Bowling	161	23	251	82
DemonAttack	1971	152	143161	325
Enduro	861	0	2368	7.4
FishingDerby	$-$ 39	$-$ 92	87	− 10
Frostbite	4335	65	541281	300
Gravitar	3351	173	19214	1100
Kangaroo	3035	52	24034	1200
NameThisGame	8049	2292	54387	920
Phoenix	7243	761	908264	4600
Qbert	13455	164	580329	1250
Seaquest	42055	68	999998	320
SpaceInvaders	1669	148	48681	830
StarGunner	10250	664	839574	1200
TimePilot	5229	3568	405425	4600