. 2021 Oct 28;12:739684. doi: 10.3389/fmicb.2021.739684

TABLE 1.

Representative biochemical datasets used in deep learning studies.

Dataset	Description	URL	References
ZINC	ZINC database contains over 230 million compounds.	http://zinc.docking.org/	Bai et al., 2020; Choi et al., 2020; Stokes et al., 2020; Ton et al., 2020
ChEMBL	ChEMBL (version 27) chemical database contains over 1.9 million specific compounds.	https://www.ebi.ac.uk/chembl/	Stokes et al., 2020
Drug target commons (DTC)	DTC crowdsourcing database contains 204,901 annotated bioactivity data points among 4,276 compounds and 1,007 specific protein targets.	https://drugtargetcommons.fimm.fi/	Beck et al., 2020
BindingDB	BindingDB database of measured binding affinities contains 2,061,017 binding data for 8,160 protein targets and 907,259 small molecules.	http://www.bindingdb.org/bind/index.jsp	Beck et al., 2020
DrugBank	DrugBank pharmaceutical database contains detailed molecular information about drugs, their mechanisms, interactions and targets.	https://go.drugbank.com/releases/latest	Choi et al., 2020; Zeng et al., 2020
PDBbind	PDBbind database provides binding data of 21,382 biomolecular complexes, including protein-ligand (17,679), nucleic acid-ligand (136), protein-nucleic acid (973), and protein-protein complexes (2,594).	http://www.pdbbind.org.cn	Bai et al., 2020