Skip to main content
. 2023 Jun 8;15:59. doi: 10.1186/s13321-023-00719-7

Fig. 3.

Fig. 3

Optimization of molecules for drug-likeness and synthesizability produced by a fixed language model, adaptive language model, or fixed language model without a genetic algorithm based optimization scheme. Two datasets (GDB9 and a custom dataset with the top scoring molecules for drug-likeness and synthesizability) are used as initial data. In the Fitness vs Generations subplots, the y-axis is the average fitness of the population over six runs. The related standard deviations are small compared to the mean values in the order of 0.1%-0.2%. The fixed approach (blue) results in a faster increase in fitness, along with greater valid and accepted molecules for the GDB9 dataset. For the top dataset, however, the adaptive approach leads to a faster increase in fitness along with greater accepted molecules. Both the adaptive and fixed approaches outperform the baseline of a fixed language model without the genetic algorithm. The histograms show synthesizability and drug-likeness of the final population after six generations for each approach