Skip to main content
. 2021 Oct 28;12:739684. doi: 10.3389/fmicb.2021.739684

TABLE 1.

Representative biochemical datasets used in deep learning studies.

Dataset Description URL References
ZINC ZINC database contains over 230 million compounds. http://zinc.docking.org/ Bai et al., 2020; Choi et al., 2020; Stokes et al., 2020; Ton et al., 2020
ChEMBL ChEMBL (version 27) chemical database contains over 1.9 million specific compounds. https://www.ebi.ac.uk/chembl/ Stokes et al., 2020
Drug target commons (DTC) DTC crowdsourcing database contains 204,901 annotated bioactivity data points among 4,276 compounds and 1,007 specific protein targets. https://drugtargetcommons.fimm.fi/ Beck et al., 2020
BindingDB BindingDB database of measured binding affinities contains 2,061,017 binding data for 8,160 protein targets and 907,259 small molecules. http://www.bindingdb.org/bind/index.jsp Beck et al., 2020
DrugBank DrugBank pharmaceutical database contains detailed molecular information about drugs, their mechanisms, interactions and targets. https://go.drugbank.com/releases/latest Choi et al., 2020; Zeng et al., 2020
PDBbind PDBbind database provides binding data of 21,382 biomolecular complexes, including protein-ligand (17,679), nucleic acid-ligand (136), protein-nucleic acid (973), and protein-protein complexes (2,594). http://www.pdbbind.org.cn Bai et al., 2020