Table 4.
Distributional results on ChEMBL. LSTM, Graph MCTS52, AAE67, ORGAN62 and VAE49 (with a bidirectional GRU53 as encoder and autoregressive GRU53 as decoder) results are taken from Brown et al.46.
Model | Valid | Uniq | Novel | KL Div | Fréchet Dist | |
---|---|---|---|---|---|---|
SMILES | AAE | 0.822 | 1.000 | 0.998 | 0.886 | 0.529 |
ORGAN | 0.379 | 0.841 | 0.687 | 0.267 | 0.000 | |
VAE | 0.870 | 0.999 | 0.974 | 0.982 | 0.863 | |
LSTM | 0.959 | 1.000 | 0.912 | 0.991 | 0.913 | |
Transformer Sml (ours) | 0.920 | 0.999 | 0.939 | 0.968 | 0.859 | |
Transformer Reg (ours) | 0.961 | 1.000 | 0.846 | 0.977 | 0.883 | |
Graph | Graph MCTS | 1.000 | 1.000 | 0.994 | 0.522 | 0.015 |
NAT GraphVAE | 0.830 | 0.944 | 1.000 | 0.554 | 0.016 | |
MGM (ours proposed) | 0.849 | 1.000 | 0.722 | 0.987 | 0.845 |
NAT GraphVAE25 stands for non-autoregressive graph VAE. Models labelled as ‘ours’ were trained by us and subsequently used to carry out generation. Our masked graph model results correspond to a 1% masking rate and training graph initialization, which has the highest geometric mean for all five benchmark metrics. (See the Supplementary Discussion section of the Supplementary Information for details.) Values of validity(↑), uniqueness(↑), novelty(↑), KL Div(↑) and Fréchet Dist(↑) metrics are between 0 and 1.