Skip to main content
. 2022 Feb 16;602(7897):414–419. doi: 10.1038/s41586-021-04301-9

Extended Data Table 5.

Reward elements

graphic file with name 41586_2021_4301_Tab5_ESM.jpg

Elements used to construct reward functions. Transforms scale the different reward component. The q95 value is as  defined54. Transforms take a good and bad value that usually have some semantic meaning defined by the reward component and then map it to the range 0-1. The good value should lead to a reward close or equal to 1, whereas a bad value should lead to a reward close or equal to 0. Combiners take a list of values and corresponding weights and return a single value. Any values with a weight of 0 are excluded. Terminations trigger the end of an episode with a large negative reward. Specific implementations are in the Supplementary Data.