ZINC [1] |
Database of commercially available compounds together with very simple estimated molecular properties for virtual screening. |
1.4 billion |
Inherently biased by currently synthesizable chemical space. Consequently, the molecular shapes have been shown to be highly biased against sphere-like molecules. |
QM9 [2] |
Electronic properties estimated using density functional theory (DFT) simulations. |
134 thousand |
Biased towards small molecules only containing the elements C, H, N, O and F. |
PubChemQC [3, 4] |
Geometries and electronic properties of molecules with short string representations taken from PubChem. |
221 million |
Biased towards small molecules that have been reported in the literature before. |
Tox21 [5] |
Toxicologic properties of molecules with respect to 12 different assays |
13 thousand |
Biased towards environmental compounds and approved drugs. |
ToxCast [6] |
High-throughput screening and computational data for the toxicology of molecules from industry, consumer products and the food industry based on cell assays. |
1.8 thousand |
Biased towards molecules used in industry, consumer products and the food industry. |
ClinTox [7] |
Drugs and drug candidates that made it to clinical trials and were either approved or failed. |
1.5 thousand |
Biased towards drugs that made it to clinical trials. |
SIDER [8] |
Recorded adverse drug reactions of marketed drugs. |
1.4 thousand |
Biased towards marketed drugs. |
ChEMBL [9] |
Bioactive small molecules and their activities extracted from the literature, from clinical trials and from other databases. |
2.0 million |
Biased towards compounds for which bioactivity was published in the scientific literature. |
DUD-E [10] |
Ligand binding affinities against 102 distinct target proteins with both strong and weak binders. |
23 thousand |
Biased towards molecules that have been synthesized and evaluated for binding affinity. |
AqSolDB [11] |
Aqueous solubility data of organic molecules taken from 9 different datasets. |
10 thousand |
Biased towards organic molecules with relatively high aqueous solubility. |
Olfaction Prediction Challenge [11] |
Olfactory perception of organic molecules at different concentrations. |
0.5 thousand |
Biased towards small and volatile organic molecules. Results biased by familiarity of smells. |
FreeSolv [12] |
Experimental and computed hydration free energies of small and neutral molecules. |
0.6 thousand |
Biased towards small and neutral molecules that have been studied in the literature both computationally and experimentally for hydration free energies. |
ESOL [13] |
Experimental aqueous solubility combining datasets for small molecules from the literature, for medium-sized molecules used as pesticides and larger proprietary compounds from the pharmaceutical industry. |
2.9 thousand |
The sub-groups each have a different bias as they each have different application domains. |
Lipophilicity [14, 15] |
Experimental n-octanol/water (buffered at pH 7.4) distribution coefficient of organic molecules taken from other databases. |
1.1 thousand |
Biased towards molecules with distribution coefficients between −10 and 10. |
PubChem Bioassay [16] |
Bioactivity outcomes from high-throughput screenings of molecules. |
2.3 million |
Biased towards molecules of interest and molecules that are synthesizable. |
PDBbind [17, 18] |
Experimental binding affinity for biomolecular complexes deposited in the protein data bank (PDB). |
21.4 thousand |
Biased towards complexes with available crystal structures. |
BBBP [19] |
The blood-brain penetration partition coefficient for molecules collected from the literature. |
2.1 thousand |
Biased towards molecules studied in the literature for blood-brain penetration. |