Benchmark results on the CASF-2016 and the CSAR NRC-HiQ dataset. R, ρ indicate the Pearson correlation coefficient and Spearman's rank correlation coefficient, respectively. The top 1 score was used for a docking success rate, and the top 1% rate was used for an average EF and a screening success rate. ΔVinaRF2071 was excluded from the comparison, as it was fine-tuned on the PDBbind 2017 data, which in fact includes ∼ 50% of data in the CASF-2016 test set. Numbers in the parenthesis of CSAR NRC-HiQ benchmarks are for the test sets that have excluded the targets with protein sequence similarity higher than 60% with the training set. The results of the 3D CNN-based, the 3D GNN-based model, and PIGNets were averaged from 4-fold models. The highest values of each column are shown in bold.
Model | CASF-2016 | CSAR NRC-HiQ | |||||
---|---|---|---|---|---|---|---|
Docking | Screening | Scoring | Ranking | Set 1 | Set 2 | ||
Success rate | Average EF | Success rate | R | ρ | R | R | |
AutoDock Vina8 | 84.6% | 7.7 | 29.8% | 0.604 | 0.528 | — | — |
GlideScore-SP13 | 84.6% | 11.4 | 36.8% | 0.513 | 0.419 | — | — |
ChemPLP@GOLD15 | 83.2% | 11.9 | 35.1% | 0.614 | 0.633 | — | — |
K DEEP 39 | 29.1% | — | — | 0.701 | 0.528 | — | — |
AK-Score (single)39 | 34.9% | — | — | 0.719 | 0.572 | — | — |
AK-Score (ensemble)39 | 36.0% | — | — | 0.812 | 0.67 | — | — |
AEScore45 | 35.8% | — | — | 0.800 | 0.640 | — | — |
Δ-AEScore45 | 85.6% | 6.16 | 19.3% | 0.790 | 0.590 | — | — |
3D CNN-based model | 48.2% | 3.9 | 10.1% | 0.687 | 0.580 | 0.738(0.756) | 0.804(0.837) |
3D GNN-based model | 67.7% | 10.2 | 28.5% | 0.667 | 0.604 | 0.514(0.566) | 0.627(0.723) |
PIGNet (single) | 85.8% | 18.5 | 50.0% | 0.749 | 0.668 | 0.774(0.798) | 0.799(0.863) |
PIGNet (ensemble) | 87.0% | 19.6 | 55.4% | 0.761 | 0.682 | 0.768(0.798) | 0.800(0.857) |