Skip to main content
. 2019 May 24;11:35. doi: 10.1186/s13321-019-0355-6

Fig. 2.

Fig. 2

The workflow of deep reinforcement learning. For each loop, it contains several steps: (1) a batch of SMILES sequences was sampled by the RNN generator. (2) Each generated molecule represented by this SMILES format was encoded into a fingerprint; (3) a probability score of activity on the A2AR was assigned to each molecule, calculated by the QSAR model which had been trained in advance. (4) All of the generated molecules and their scores were sent back for training of the generator with the policy gradient method