Skip to main content
. 2024 Nov 28. Online ahead of print. doi: 10.1039/d4sc04064c

Molecular databases, datasets, repositories and dataset repositories that contain QC data. The table is divided into six categories, describing the type of data resource (database, dataset, repository, dataset repository) and the chemical space covered (organic, organic and inorganic, transition metal complexes). An ‘-sp’ in the ‘Method’ column denotes single-point calculations, often preceded by a geometry relaxation using a less computationally intensive method, such as xTB. Computational methods mentioned: semi-empirical (xTB and PM6/PM7), Hartree–Fock, DFT, TD-DFT, Gaussian-4 theory using second-order Møller–Plesset perturbation theory (G4MP2), complete active space self-consistent field (CASSCF), and coupled-cluster (CC).

Name Size Method Source Content
Organic molecular databases
CEPDB101,102 2.3M DFT Enumerated Organic compounds for photovoltaics
Materials Project8–11 1.0M DFT ICSD & others 153k bulk materials (main data), and 222k organic molecules, 4k battery materials, 25k battery electrolytes, 20k MOFs, 560k catalyst surfaces, 41k synthesis recipes
OCELOT103,104 56k DFT CSD, community Crystalline organic semiconductors
Organic + inorganic molecular datasets
PubChemQC21,99,100 86M PM6 + DFT-sp PubChem Organic and organometallic molecules containing first-row transition metals
SPICE105 1.1M DFT Literature, PubChem, DES370K Conformations of small molecules, dimers, dipeptide, and solvated amino acids
DES370K106 370K DFT + CC-sp Literature 370k data points of dimer interactions of 392 mostly organic molecules
Alexandria library107 2.7k DFT PubChem, ChemSpider Mostly organic molecules
CCCBDB108 2.2k DFT Literature Gas-phase atoms and small molecules
QuestDB109,110 >500 CC & others Literature Vertical excitation energies for small- and medium-sized molecules
Organic molecular datasets
GEOM111 37M xTB AICures, QM9 37M conformers of 450k organic molecules
Transition1x112 10M DFT-sp Grambow et al.113 Molecular configurations along the potential energy surface of 11 961 reactions
ANI-1x114 5.0M DFT GDB11, ChEMBL, generated Small molecules
QM7-X115 4.2M DFT QM7 Equilibrium and non-equilibrium structures of small organic molecules
QMugs116 2.0M xTB + DFT-sp ChEMBL 2M conformers of 665K biologically relevant organic molecules
WS22 117 1.2M DFT Literature 1.2M data points of equilibrium and non-equilibrium geometries of 10 species
VQ24 118 836k DFT & xTB Generated Enumerated molecules with up to 5 heavy atoms from C, N, O, F, Si, P, S, Cl, Br
Frag20 119 566k DFT ZINC, PubChem Small organic molecules from ZINC and PubChem
ANI-1ccx114 500k DFT + CC-sp ANI-1x Subset of ANI-1x recomputed with CC-sp
John et al.120 240k DFT PubChem Open- and closed-shell small organic molecules
QM-symex121,122 173k DFT & TD-DFT Generated Includes point group and excited states of small molecules
QM9 123 134k DFT GDB-17 Small organic molecules with up to 9 heavy atoms
Kim et al.124 134k G4MP2 QM9 Refinement of QM9
Narayanan et al.125 133k G4MP2 QM9 Refinement of QM9
FORMED126 117k xTB, DFT-sp & TD-DFT CSD Organic molecules from the CSD
OE62 127 62k DFT CSD Organic molecules from the CSD
MQMspin128 13k DFT & CASSCF QM9 Small organic carbene molecules
HOPV15 129 6.0k DFT Literature 6k conformers of 353 p-type molecules for organic photovoltaics + exp. data
VERDE Materials DB130,131 1.8k DFT Generated Light-responsive π-conjugated organic molecules
HAB79 132 921 DFT & CASSCF Literature Benchmark dataset for DFT
Transition metal complex (TMC) datasets
tmQM133 80k xTB + DFT-sp CSD Monometallic TMCs
tmQMg134 60k DFT tmQM Subset of tmQM with full DFT and graphs from natural bond orders
SC1MC-2022 135 7.0k Hartree–Fock Generated TMCs assembled from ligands
OHLDB136 1.4k DFT Enumerated Homoleptic TMCs
divTMC137 855 DFT CSD Octahedral TMCs assembled from monodentate ligands
16OSTM10 138 160 DFT CSD Open-shell TMCs for conformer benchmark
ROST61 139 61 CC Literature Open-shell TMCs for DFT functional benchmark
MOR41 140 41 CC Literature Closed-shell TMCs for DFT functional benchmark
Organic + inorganic molecular repositories
NOMAD66–69 12M DFT & others Submissions, MP, OQMD, AFLOW, and others 9M bulks, 75k surfaces; 5k 2D, 33k 1D materials, 2.8M organic and inorganic molecules
ioChem-BD36,37 356k DFT mixed Submissions 38k materials and 318k molecules, chemically diverse
Organic + inorganic molecular dataset repositories
QCarchive141,142 47 sets Mixed Mixed Datasets from publications