Skip to main content
. 2024 Jun 21;16:73. doi: 10.1186/s13321-024-00863-8

Table 2.

Datasets used in the combined dataset

Dataset Number of SMILES
ZINC 15 [31] 5M
QM9 [32, 33] 134k
ZINC 250k [34] 250k
RedDB [35] 31k
OPV [36] 91k
PubchemQC 2017/2020 [37, 38] 5.3M
CEP [39] subset [40] 20k
ChEMBL [4144] 2.3M
Combined (OrganiX13) 13.1M (Total) / 12.5M (After removing duplicates)