General information about the three databases.
| Database | Number of molecules | Exact molecular mass | Number of atoms | Number of chiral centers | Number of rings | Number of bridgehead atoms | Number of heterocycles |
|---|---|---|---|---|---|---|---|
| CheML | 2 899 276 | 373 ± 213 | 26.3 ± 15 | 0.23 ± 0.60 | 2.55 ± 1.02 | 0.023 ± 0.248 | 1.18 ± 0.93 |
| eMolecules | 26 394 586 | 331 ± 87.7 | 22.8 ± 6.46 | 0.0944 ± 0.489 | 2.63 ± 1.16 | 0.036 ± 0.319 | 1.35 ± 0.97 |
| ZINC | 30 000 000 | 321 ± 24.5 | 22.7 ± 2.0 | 1.399 ± 1.017 | 2.55 ± 0.90 | 0.076 ± 0.391 | 1.72 ± 0.95 |
| CheML JT-VAE | 1 399 265 | 323 ± 55 | 22.7 ± 3.9 | 0.0 ± 0.0 | 2.69 ± 0.93 | 0.024 ± 0.25 | 1.41 ± 0.89 |
| CheML RNN | 962 245 | 475 ± 336 | 33.8 ± 23.8 | 0.30 ± 0.66 | 2.37 ± 1.12 | 0.0176 ± 0.23 | 0.80 ± 0.84 |
| CheML GrammarVAE | 239 206 | 326 ± 59 | 22.7 ± 4.25 | 0.86 ± 0.90 | 2.63 ± 0.93 | 0.0238 ± 0.247 | 1.37 ± 0.92 |
| CheML ChemVAE | 99 273 | 333 ± 63 | 22.9 ± 4.6 | 0.81 ± 0.9 | 2.72 ± 1.01 | 0.0540 ± 0.37 | 1.48 ± 0.95 |
| CheML MolCycleGAN | 60 856 | 330 ± 61 | 23.0 ± 4.4 | 0.94 ± 1.00 | 2.75 ± 0.98 | 0.0400 ± 0.32 | 1.45 ± 0.96 |
| CheML ORGAN | 50 262 | 273 ± 58 | 18.6 ± 4.1 | 0.28 ± 0.57 | 2.20 ± 0.75 | 0.029 ± 0.26 | 0.77 ± 0.72 |
| CheML ORGANIC | 42 609 | 222 ± 60 | 15.8 ± 4.39 | 0.74 ± 0.82 | 1.39 ± 0.58 | 0.0022 ± 0.068 | 0.46 ± 0.55 |
| CheML SSVAE | 42 606 | 355 ± 70 | 24.9 ± 4.8 | 0.0 ± 0.0 | 2.89 ± 0.92 | 0.034 ± 0.26 | 1.18 ± 0.87 |
| CheML CDN | 2415 | 385 ± 413 | 26.8 ± 27.8 | 0.44 ± 0.74 | 2.70 ± 1.28 | 0.167 ± 0.7 | 1.35 ± 1.07 |
| CheML CVAE | 539 | 304 ± 19 | 22.3 ± 1.48 | 0.87 ± 1.26 | 2.19 ± 0.52 | 0.0074 ± 0.122 | 0.69 ± 0.69 |