Skip to main content
. 2023 Dec 13;15(9):3130–3139. doi: 10.1039/d3sc04185a

Data sets used to train the selected five machine learning-based docking methods. All five DL-based methods were trained on subsets of the PDBBind 2020 General Set.

Method Training and validation set
DeepDock PDBBind 2019 General Set without complexes included in CASF-2016 or those that fail pre-processing—16 367 complexes
DiffDock, EquiBind PDBbind 2020 General Set keeping complexes published before 2019 and without those with ligands found in test set—17 347 complexes
TankBind PDBbind 2020 General Set keeping complexes published before 2019 and without those failing pre-processing—18 755 complexes
Uni-Mol PDBBind 2020 General Set without complexes where protein sequence identity (MMSeq2) with CASF-2016 is above 40% and ligand fingerprint similarity is above 80%—18 404 complexes