Skip to main content
. 2020 Sep 4;12:53. doi: 10.1186/s13321-020-00454-3

Fig. 3.

Fig. 3

The reinforcement learning pathway for systemic generation of molecules (Redrawn from You et al. [34]). a The state is defined as the current graph Gt and the possible atom types C. b The GCPN conducts message passing to encode the state as node embeddings and estimates the policy function. c The action to be performed (at) is sampled from the policy function. The environment performs a chemical valency check on the intermediate state and returns (d) the next state Gt and (e) the associated reward (rt)