Skip to main content
. 2024 Nov 11;16:126. doi: 10.1186/s13321-024-00923-z

Table 2.

Summary statistics of the different datasets analysed in this study

Dataset Type # Structures # Sites # Ligands Overlap (%) Methods
LIGYSIS NEW 3448 8244 65,116+
LIGYSISNI NEW 2275 4572 38,595
sc-PDBFULL TRAIN 17,594+ 17,594+ 17,594 801 (9.7) VN-EGNN, GrASP, PUResNet, DeepPocket
bMOADSUB TRAIN 5899 11,184 11,184 606 (7.6) IF-SitePred
CHEN11 TRAIN 244 479 479 40+ (0.5) PRANK, P2Rank
PDBbindREF TEST 5316 5316 5316 310 (3.8) VN-EGNN
SC6K TEST 6147 6147 6147 259 (3.1) DeepPocket
HOLO4K TEST 4009 10,175 10,175 207 (2.5) ALL*
COACH420 TEST 413 624 624 41 (0.5) VN-EGNN, GrASP, DeepPocket, P2Rank, PUResNet
JOINED TEST 557 752 752 110 (1.3) PRANK

LIGYSIS is our reference dataset, LIGYSISNI is a subset with no ion (NI) ligand binding sites, sc-PDBFULL, bMOADSUB and CHEN11 constitute the training datasets, whereas PDBbindREF, SC6K, HOLO4K, COACH420 and JOINED represent test sets. # Structures, # Sites and # Ligands represent the number of PDB structures, ligand sites and total number of ligands for each dataset. Note that for LIGYSIS and LIGYSISNI, 3448 and 2775, are the number of human structural segments considered, each represented by a single chain. For each segment, all biologically relevant ligand-binding structures were considered: N = 23,321 (LIGYSIS) and N = 19,012 (LIGYSISNI). The number of ligands, or protein–ligand complexes, is not equal to the number of sites for LIGYSIS, as data from multiple structures of the same protein are aggregated into unique sites, i.e., a LIGYSIS site often includes multiple ligands. Overlap is the number of LIGYSIS binding sites represented by at least one protein–ligand complex for a given dataset. Percentage relative to LIGYSIS also reported. Methods represents the ligand site predictors that use these datasets for training or test. Only the original version of each dataset is considered in the analysis, e.g., HOLO4K is analysed, but not HOLO4KMlig, nor HOLO4KMlig+ HAP, or HAP-small. The same goes for Mlig, Mlig+ versions of COACH420, sc-PDBSUB and sc-PDBRICH. ALL* represents all the methods compared in this work except for PRANK, fpocket, PocketFinder+, Ligsite+ and Surfnet+. For # Structures, # Sites and # Ligands, highest values are indicated with “+” bold superscript and lowest with “”. This is the other way around for Overlap