Skip to main content
. 2019 Nov 21;11:71. doi: 10.1186/s13321-019-0393-0

Table 3.

Best models trained on subsets of GDB-13 after the hyperparameter optimization

Set SMILES Time % GDB-13 Valid Unif Comp Closed UCC
1M Canonical 4:08 72.8 0.994 0.879 0.836 0.861 0.633
Rand. unr. 31:47 80.9 0.995 0.970 0.929 0.876 0.790
Rand. unr. no DA 1:37 77.0 0.987 0.957 0.795 0.883 0.672
Rand. rest. 7:19 83.0 0.999 0.977 0.953 0.925 0.860
Rand. rest. no DA 1:21 78.2 0.992 0.957 0.829 0.898 0.712
DS branch 1:33 72.1 0.987 0.881 0.828 0.834 0.608
DS rings 1:11 68.6 0.979 0.852 0.788 0.798 0.535
DS both 1:05 68.4 0.979 0.851 0.785 0.796 0.532
10K Canonical 0:04 38.8 0.905 0.666 0.445 0.426 0.126
Rand. rest. 0:36 62.3 0.974 0.882 0.715 0.598 0.377
1K Canonical 0:01 14.5 0.504 0.611 0.167 0.133 0.014
Rand. rest. 0:04 34.1 0.812 0.790 0.392 0.276 0.085

See “Methods” section for a description of the ratios

Best result for each training set size are indicated in italics

Set Benchmark training set size, SMILES SMILES variant, including randomized variants with and without data augmentation (DA), Time training time up in hh:mm, % GDB-13 Percent of unique molecules from GDB-13 generated in a 2 billion sample with replacement, Valid valid SMILES, Unif uniformity ratio, Comp completeness ratio, Closed closedness ratio, UCC UCC ratio