Skip to main content
. 2020 Sep 14;11(40):10959–10972. doi: 10.1039/d0sc04184j

Fig. 4. Examples of chemical routes that mUCT-dc-V method can solve within 30 s while PUCT-V cannot. The value of the P(s,a) given by the policy network and the ranking of the template among the top 50 templates are given. The unique advantage of mUCT-dc-V method is that the P(s,a) value is not explicitly used, therefore even if the P(s,a) value is extremely small as a result of the imperfect policy network, the valid template will still be explored by the tree expander, which is not the case in PUCT-V method. The value network here is Round 1 RL value network. The policy network used by both MCTS variants are the same. The restrictions for both MCTS variants are the same: top 50 templates given by policy network are considered, maximum depth is 10, and minimum plausibility is 0.75 (see Methods). The affected functional groups in each step are marked in blue. The buyable compounds are framed in green.

Fig. 4