Skip to main content
. 2022 Feb 7;13(13):3661–3673. doi: 10.1039/d1sc06946b

Benchmark results on the CASF-2016 and the CSAR NRC-HiQ dataset. R, ρ indicate the Pearson correlation coefficient and Spearman's rank correlation coefficient, respectively. The top 1 score was used for a docking success rate, and the top 1% rate was used for an average EF and a screening success rate. ΔVinaRF2071 was excluded from the comparison, as it was fine-tuned on the PDBbind 2017 data, which in fact includes ∼ 50% of data in the CASF-2016 test set. Numbers in the parenthesis of CSAR NRC-HiQ benchmarks are for the test sets that have excluded the targets with protein sequence similarity higher than 60% with the training set. The results of the 3D CNN-based, the 3D GNN-based model, and PIGNets were averaged from 4-fold models. The highest values of each column are shown in bold.

Model CASF-2016 CSAR NRC-HiQ
Docking Screening Scoring Ranking Set 1 Set 2
Success rate Average EF Success rate R ρ R R
AutoDock Vina8 84.6% 7.7 29.8% 0.604 0.528
GlideScore-SP13 84.6% 11.4 36.8% 0.513 0.419
ChemPLP@GOLD15 83.2% 11.9 35.1% 0.614 0.633
K DEEP 39 29.1% 0.701 0.528
AK-Score (single)39 34.9% 0.719 0.572
AK-Score (ensemble)39 36.0% 0.812 0.67
AEScore45 35.8% 0.800 0.640
Δ-AEScore45 85.6% 6.16 19.3% 0.790 0.590
3D CNN-based model 48.2% 3.9 10.1% 0.687 0.580 0.738(0.756) 0.804(0.837)
3D GNN-based model 67.7% 10.2 28.5% 0.667 0.604 0.514(0.566) 0.627(0.723)
PIGNet (single) 85.8% 18.5 50.0% 0.749 0.668 0.774(0.798) 0.799(0.863)
PIGNet (ensemble) 87.0% 19.6 55.4% 0.761 0.682 0.768(0.798) 0.800(0.857)