Table 5.
Target Condition | Model | G-mean | Unique Count | Property Value | KLD Score |
---|---|---|---|---|---|
MolWt = 120 | NAT GraphVAE | 0.623 | 3048 | 124.47 ± 7.58 | 0.843 |
MGM | 0.522 | 8800 | 120.02 ± 7.66 | 0.811 | |
MGM - Final Step | 0.404 | 8509 | 119.42 ± 7.67 | 0.761 | |
Dataset | — | — | — | 0.679 | |
MolWt = 125 | NAT GraphVAE | 0.565 | 2326 | 127.21 ± 7.05 | 0.827 |
MGM | 0.561 | 9983 | 125.00 ± 8.48 | 0.850 | |
MGM - Final Step | 0.354 | 9293 | 122.48 ± 7.20 | 0.936 | |
Dataset | — | — | — | 0.835 | |
MolWt = 130 | NAT GraphVAE | 0.454 | 1204 | 129.12 ± 6.79 | 0.614 |
MGM | 0.501 | 9465 | 128.85 ± 8.85 | 0.705 | |
MGM - Final Step | 0.369 | 8892 | 126.85 ± 7.43 | 0.789 | |
Dataset | — | — | — | 0.695 | |
LogP = -0.4 | NAT GraphVAE | 0.601 | 2551 | −0.409 ± 0.775 | 0.739 |
MGM | 0.424 | 9506 | −0.349 ± 0.503 | 0.803 | |
MGM - Final Step | 0.300 | 9495 | −0.337 ± 0.523 | 0.876 | |
Dataset | — | — | — | 0.811 | |
LogP = 0.2 | NAT GraphVAE | 0.562 | 2188 | 0.051 ± 0.746 | 0.803 |
MGM | 0.378 | 9524 | 0.200 ± 0.468 | 0.846 | |
MGM - Final Step | 0.376 | 9487 | 0.202 ± 0.462 | 0.895 | |
Dataset | — | — | — | 0.816 | |
LogP = 0.8 | NAT GraphVAE | 0.515 | 1837 | 0.588 ± 0.759 | 0.807 |
MGM | 0.418 | 9360 | 0.769 ± 0.473 | 0.826 | |
MGM - Final Step | 0.300 | 9294 | 0.745 ± 0.442 | 0.857 | |
Dataset | — | — | — | 0.797 |
The results shown here correspond to the best mean property value (MGM) or the final sampling iteration with initialization chosen according to the better geometric mean among the five GuacaMol metrics (MGM—Final Step). Results for the NAT GraphVAE baseline model25 that we trained are also shown. ‘Dataset’ rows refer to molecules sampled from the dataset with MolWt within ± 1 for the MolWt conditions and LogP within ± 0.1 for the LogP conditions. G-mean refers to the geometric mean of validity, uniqueness and novelty.