Skip to main content
. 2021 May 26;12:3156. doi: 10.1038/s41467-021-23415-2

Table 4.

Distributional results on ChEMBL. LSTM, Graph MCTS52, AAE67, ORGAN62 and VAE49 (with a bidirectional GRU53 as encoder and autoregressive GRU53 as decoder) results are taken from Brown et al.46.

Model Valid Uniq Novel KL Div Fréchet Dist
SMILES AAE 0.822 1.000 0.998 0.886 0.529
ORGAN 0.379 0.841 0.687 0.267 0.000
VAE 0.870 0.999 0.974 0.982 0.863
LSTM 0.959 1.000 0.912 0.991 0.913
Transformer Sml (ours) 0.920 0.999 0.939 0.968 0.859
Transformer Reg (ours) 0.961 1.000 0.846 0.977 0.883
Graph Graph MCTS 1.000 1.000 0.994 0.522 0.015
NAT GraphVAE 0.830 0.944 1.000 0.554 0.016
MGM (ours proposed) 0.849 1.000 0.722 0.987 0.845

NAT GraphVAE25 stands for non-autoregressive graph VAE. Models labelled as ‘ours’ were trained by us and subsequently used to carry out generation. Our masked graph model results correspond to a 1% masking rate and training graph initialization, which has the highest geometric mean for all five benchmark metrics. (See the Supplementary Discussion section of the Supplementary Information for details.) Values of validity(↑), uniqueness(↑), novelty(↑), KL Div(↑) and Fréchet Dist(↑) metrics are between 0 and 1.