Skip to main content
. 2021 Nov 12;13:85. doi: 10.1186/s13321-021-00561-9

Fig. 3.

Fig. 3

The mechanism of the updated exploration strategy. Shown are the agent net GA, mutation net GM (red) and crossover net GC (blue). In the training loop, GM is fixed, Gc is updated iteratively and GA is trained at each epoch. For each position, a random number from 0 to 1 is generated. If it is larger than the mutation rate (ε), the probability for token sampling is controlled by the combination of GA and GC, otherwise, it is determined by GM