Skip to main content
. 2020 Sep 14;11(40):10959–10972. doi: 10.1039/d0sc04184j

Fig. 1. The process of Monte Carlo Tree Search in synthesis planning. Following the notations of MDP, a molecule (or state) is denoted as s, and a template (or retrosynthetic disconnection action) is denoted as a. In the selection phase, starting from the target molecule, the most “promising” template is recursively chosen by selecting the template with the highest upper confidence bound (UCB(s,a)) value until a leaf node is reached. A policy network is used to narrow down the search beam in each template selection step. In the expansion phase, the leaf node is expanded by applying the selected template. New leaf nodes (precursors) that are not visited by the tree expander before are generated. Once the new leaf nodes are encountered, in the evaluation step, a value network is used to evaluate the values of the leaf nodes (if the node is buyable, the value is set to 1). Then in the backpropagation step, upward along the tree, the visit count N(s,a) of each compound-template (s,a) pairs, or edges, are updated. The Q(s,a) value (see Table 1) is recalculated as well and used to recompute UCB(s,a) values in the next selection step. With the updated values, the tree expander goes back to the selection phase, starting selecting the most promising template for the target molecule (root node) again. Here circles denote compounds. (Blue) not commercially available; (Green) commercially available.

Fig. 1