Table 3.
Set | SMILES | Time | % GDB-13 | Valid | Unif | Comp | Closed | UCC |
---|---|---|---|---|---|---|---|---|
1M | Canonical | 4:08 | 72.8 | 0.994 | 0.879 | 0.836 | 0.861 | 0.633 |
Rand. unr. | 31:47 | 80.9 | 0.995 | 0.970 | 0.929 | 0.876 | 0.790 | |
Rand. unr. no DA | 1:37 | 77.0 | 0.987 | 0.957 | 0.795 | 0.883 | 0.672 | |
Rand. rest. | 7:19 | 83.0 | 0.999 | 0.977 | 0.953 | 0.925 | 0.860 | |
Rand. rest. no DA | 1:21 | 78.2 | 0.992 | 0.957 | 0.829 | 0.898 | 0.712 | |
DS branch | 1:33 | 72.1 | 0.987 | 0.881 | 0.828 | 0.834 | 0.608 | |
DS rings | 1:11 | 68.6 | 0.979 | 0.852 | 0.788 | 0.798 | 0.535 | |
DS both | 1:05 | 68.4 | 0.979 | 0.851 | 0.785 | 0.796 | 0.532 | |
10K | Canonical | 0:04 | 38.8 | 0.905 | 0.666 | 0.445 | 0.426 | 0.126 |
Rand. rest. | 0:36 | 62.3 | 0.974 | 0.882 | 0.715 | 0.598 | 0.377 | |
1K | Canonical | 0:01 | 14.5 | 0.504 | 0.611 | 0.167 | 0.133 | 0.014 |
Rand. rest. | 0:04 | 34.1 | 0.812 | 0.790 | 0.392 | 0.276 | 0.085 |
See “Methods” section for a description of the ratios
Best result for each training set size are indicated in italics
Set Benchmark training set size, SMILES SMILES variant, including randomized variants with and without data augmentation (DA), Time training time up in hh:mm, % GDB-13 Percent of unique molecules from GDB-13 generated in a 2 billion sample with replacement, Valid valid SMILES, Unif uniformity ratio, Comp completeness ratio, Closed closedness ratio, UCC UCC ratio