Data sets used to train the selected five machine learning-based docking methods. All five DL-based methods were trained on subsets of the PDBBind 2020 General Set.
| Method | Training and validation set |
|---|---|
| DeepDock | PDBBind 2019 General Set without complexes included in CASF-2016 or those that fail pre-processing—16 367 complexes |
| DiffDock, EquiBind | PDBbind 2020 General Set keeping complexes published before 2019 and without those with ligands found in test set—17 347 complexes |
| TankBind | PDBbind 2020 General Set keeping complexes published before 2019 and without those failing pre-processing—18 755 complexes |
| Uni-Mol | PDBBind 2020 General Set without complexes where protein sequence identity (MMSeq2) with CASF-2016 is above 40% and ligand fingerprint similarity is above 80%—18 404 complexes |