Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Nov 15.
Published in final edited form as: Proteins. 2010 Nov 15;78(15):3111–3114. doi: 10.1002/prot.22830

Protein-Protein Docking Benchmark Version 4.0

Howook Hwang 1, Thom Vreven 1, Joël Janin 2, Zhiping Weng 1,*
PMCID: PMC2958056  NIHMSID: NIHMS228065  PMID: 20806234

Abstract

We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are non-redundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Benchmark 4.0 thus provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. 17 of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/

Keywords: protein-protein docking, protein complexes, protein-protein interactions, complex structure

Introduction

During the last decade, the computational protein-protein docking field has advanced considerably. In part, this is due to the efforts of making algorithms available to the community through web servers and/or downloadable packages18, the community-wide CAPRI experiment9, and the development of publically available benchmarks of protein-protein complexes.10,11

A protein-protein docking benchmark provides the community with a set of non-redundant protein-protein complexes for which the complex structure and the constituent unbound structures are availabe. A benchmarks forms a subset of the Protein Data Bank (PDB)12, and provides a standard dataset that can be used for systematic comparison of docking algorithms. Quantity and diversity of interactions covered in a benchmark can be improved by tracking updates in PDB.

Eight years ago we introduced the first protein-protein docking benchmark,10 and we updated twice, in 2005 (Benchmark 2.0) and 2008 (Benchmark 3.0).13,14 Recently Kastritis and Bonvin collected experimentally measured protein-protein binding affinities (Kd’s) of 81 test cases in Benchmark 3.0.15 Since the last release, the number of entries in the PDB has increased by more than 13,000. This enables us to release a new update to the Benchmark.

Materials and methods

Data collection

We collected candidate structures from the PDB in a semiautomatic way with the same resolution cutoffs for X-ray structures (3.25 Å) and chain length (minimum of 30 residues) as described previously.10,13,14 Unlike the previous release, we now also consider structures determined with nuclear magnetic resonance (NMR) for the unbound forms of the proteins. We still excluded NMR structures for complexes, to preclude the possibility that they were generated with aid of docking algorithms. We used the biological assembly information from the PDB to distinguish crystal contacts from biological complexes. This initial pass yielded 47,767 unbound structures and 8,654 complex structures that represent hetero complexes of at least 2 interacting chains. The unbound forms of both binding partners were available for 1,667 complex structures, and we used the Structural Classification of Proteins (SCOP)16 database (version 1.75) to check this set for redundancy at the family level. Two complexes were deemed redundant if both proteins in one complex were in the same SCOP families as the two proteins in the other complex, respectively. This yielded 109 complexes that were non-redundant with the complexes in the previous release of the Benchmark and amongst themselves. (PDB entries without SCOP unique identifier sunid17 were excluded from the bound candidate list to remove possible redundancy.) Finally, we used literature information to eliminate obligate complexes18, which further reduced the list to 52 complexes.

When we found multiple candidates for an unbound structure, we selected one structure based on a combination of several considerations: highest sequence similarity with the bound structure, highest resolution, and lowest number of missing residues in protein-protein interface area. For an ensemble of multiple candidate entries for NMR structures, we selected the model that had the lowest interface RMSD (I-RMSD; defined below) with the bound form. The final structure files that are on the benchmark website include cofactors that were present in the original PDB files, and in the case of an NMR structure, all the models that were provided in the original file.

Classification

As done for the previous releases of the Benchmark, we classify the new entries according to expected difficulty for protein-protein docking algorithms, based on the structural difference between the bound and the unbound forms of the binding partners:14

Rigid body:

I-RMSD1.5andfnon-nat0.4

Medium difficulty:

[1.5<I-RMSD22]or[I-RMSD1.5andfnon-nat>0.4]

Difficult:

I-RMSD>2.2

We define I-RMSD as the root-mean-square distance between the unbound and the bound structures, superposed onto each other, calculated using the Cα atoms of the interface residues of both binding partners. In line with Mendez et al.19, fnat and fnon-nat are the fractions of native residue contacts and non-native residue contacts, respectively, of the superposed unbound structures.

Results and discussion

The 52 new cases are listed in Table 1. The entire updated Benchmark is reported in Table S1 in Supplementary Materials. 1OYV is a 1:2 complex of a two-headed inhibitor and subtilisin.20 We split this complex into two cases for the Benchmark that represent the interaction between chain Aof subtilisin and chain I (inhibitor) and the interaction between chain B of subtilisin and chain I, respectively. In addition to the aforementioned properties, the tables also report the change in accessible surface area (ASA) upon complexation, which is a measure for the size of the interface between the binding partners.

Table 1.

New cases in the protein-protein docking Benchmark 4.0.

Complex Cat.a PDB ID 1 Protein 1 PDB ID 2b Protein 2 RMSD (Å) DASA (Å2)c
Rigid Body (33)
1CLV_A:I E 1JAE_A α-amylase 1QFD_A(1) α-amylase inhibitor 0.86 2086
1FLE_E:I E 9EST_A Elastase 2REL_A(4) Elafin 1.02 1762
1GL1_A:I E 1K2I_1 α-chymotrypsin 1PMC_A(6) Protease inhibitor LCMI II 1.21 1590
1GXD_A:C E 1CK7_A proMMP2 type IV collagenase 1BR9_A Metalloproteinase inhibitor 2 1.39 2445
1JTG_B:A E 3GMU_B β-Lactamase inhibitory protein 1ZG4_A β-lactamase TEM-1 0.49 2599
1OC0_A:B E 1B3K_A Plasminogen activator inhibitor-1 2JQ8_A(4) Vitronectin Somatomedin B domain 1 1312
1OYV_A:I E 1SCD_A Subtilisin Carlsberg 1PJU_A Two-headed tomato inhibitor-II 0.7 1929
1OYV_B:I E 1SCD_A Subtilisin Carlsberg 1PJU_A Two-headed tomato inhibitor-II 0.5 1279
2ABZ_B:E E 3I1U_A Carboxypeptidase A1 1ZFI_A(1) Leech carboxypeptidase inhibitor 0.9 1443
2J0T_A:D E 966C_A MMP1 Interstitial collagenase 1D2B_A(20) Metalloproteinase inhibitor 1 1.23 1476
2OUL_A:B E 3BPF_A Falcipain 2 2NNR_A Chagasin 0.53 1932
3SGQ_E:I E 2QA9_E Streptogrisin B 2OVO_A Ovomucoid inhibitor third domain 0.39 1210
1FCC_AB:C O 1FC1_AB Fc domain of IgG1 MO6 2IGG_A(3) Strep. protein G C2 fragment 0.93 1354
1FFW_A:B O 3CHY_A Chemotaxis protein CheY 1FWP_A Chemotaxis protein CheA 1.43 1166
1H9D_A:B O 1EAN_A Runx1 domain of CBF α1 1ILF_A(1) Dimerisation domain of CBF-β 1.32 2121
1HCF_AB:X O 1B98_AM Neurotrophin-4 1WWB_X TrkB-d5 growth factors receptor 0.88 2135
1JWH_CD:A O 3EED_AB Casein kinase II β chain 3C13_A Casein kinase II α chain 1.27 1451
1OFU_XY:A O 1OFT_AB SulA (PA3008) 2VAW_A Cell division protein FtsZ 1.1 1583
1PVH_A:B O 1BQU_A IL6 receptor βchain D2-D3 domains 1EMR_A Leukemia inhibitory factor 0.34 1403
1RV6_VW:X O 1FZV_AB PIGF receptor binding domain 1QSZ_A Flt1 protein domain 2 1.09 1625
1US7_A:B O 2FXS_A Heat shock protein 82 N-ter domain 2W0G_A HSP 90 co-chaperone CDC 37 C-ter domain 1.06 1095
1WDW_BD:A O 1V8Z_AB Tryptophan synthase β chain 1 1GEQ_A Tryptophan synthase α chain 1.29 3147
1XU1_ABD:T O 1U5Y_ABD TNF domain of APRIL 1XUT_A(11) TNF receptor superfamily member 13B TACI CRD2 domain 1.3 1696
1ZHH_A:B O 1JX6_A Autoinducer 2-binding periplasmic protein LuxP 2HJE_A Autoinducer 2 sensor kinase/phosphatase LuxQ 1.31 2189
2A5T_A:B O 1Y20_A NMDA receptor R1-4A subunit ligand-binding core 2A5S_A NMDA receptor R2A subunit ligand-binding core 1.28 1892
2A9K_A:B O 1U90_A Ras-related protein Ral-A 2C8B_X Mono-ADP-ribosyltransferase C3 0.85 1750
2B4J_AB:C O 1BIZ_AB Integrase (HIV-1) 1Z9E_A(1) PC4 and SFRS1 interacting protein 0.99 1273
2FJU_B:A O 2ZKM_X Phospholipase β 2 1MH1_A Rac GTPase 1.04 1245
2G77_A:B O 1FKM_A GTPase-activating protein Gyp1 1Z06_A Ras-related protein Rab-33B 1.75 2524
2OOR_AB:C O 1L7E_AB NAD(P) transhydrogenase subunit α part 1 1E3T_A NAD(P) transhydrogenase subunit β 1.42 2065
2VDB_A:B O 3CX9_A Serum albumin 2J5Y_A Peptostreptococcal albumin-binding protein GA module 0.47 1797
3BP8_AB:C O 1Z6R_AB Mlc transcription regulator 3BP3_A PTS glucose-specific enzyme EIICB 0.45 1390
3D5S_A:C O 1C3D_A Complement C3d fragment 2GOM_A Fibrinogen-binding protein C-ter domain 0.56 1620
Medium Difficult (11)
1JIW_P:I E 1AKL_A Alkaline metalloproteinase 2RN4_A(1) Proteinase inhibitor 2.07 1997
4CPA_A:I E 8CPA_A Carboxypeptidase A 1H20_A(9) Potato carboxypeptidase inhibitor 1.97 1175
1LFD_B:A O 5P21_A Ras 1LXD_A RalGDS Ras-interacting domain 1.79 1167
1MQ8_A:B O 1IAM_A ICAM-1 domains 1-2 1MQ9_A Integrin α-L I domain 1.76 1252
1R6Q_A:C O 1R6C_X Clp protease subunit ClpA 2W9R_A Clp protease adaptor protein ClpS 1.67 1651
1SYX_A:B O 1QGV_A Spliceosomal U5 15 kDa protein 1L2Z_A(1) CD2 receptor binding protein 2 C- ter fragment 1.64 1292
2AYO_A:B O 2AYN_A Ubiquitin carboxyl-terminal hydrolase 14 2FCN_A Ubiquitin 1.62 3026
2J7P_A:D O 1NG1_A SRP GTPase Ffh 2IYL_D Cell division protein FtsY 1.93 3008
2OZA_B:A O 3HEC_A MAP kinase 14 3FYK_X MAP kinase-activated protein kinase 2 1.89 6247
2Z0E_A:B O 2D1I_A Cysteine protease Atg4B 1V49_A(1) Microtubule-associated proteins 1A/1B light chain 3B 2.15 2477
3CPH_G:A O 3CPI_G Ras-related protein Sec4 1G16_A Rab GDP-dissociation inhibitor 2.12 1684
Difficult (8)
1F6M_A:C E 1CL0_A Thioredoxin reductase 2TIR_A Thioredoxin 1 4.9 1821
1ZLI_A:B E 1KWM_A Carboxypeptidase B 2JTO_A(6) Tick carboxypeptidase inhibitor 2.53 2083
2O3B_A:B E 1ZM8_A NucA nuclease 1J57_A NuiA nuclease inhibitor 3.13 1675
1JK9_B:A O 1QUP_A CCS metallochaperone 2JCW_A SOD1 superoxide dismutase 4.87 2130
1JZD_AB:C O 1JZO_AB DsbC disulfide bond isomerase 1JPE_A DsbD disulfide bond isomerase 2.71 2026
1ZM4_A:B O 1N0V_C Elongation factor 2 1XK9_A Diphtheria toxin A catalytic domain 2.94 1554
2I9B_E:A O 1YWH_A Urokinase plasminogen activator surface receptor 2I9A_A Urokinase-type plasminogen activator 3.79 2370
2IDO_A:B O 1J54_A DNA polymerase III ε exonuclease domain 1SE7_A(1) HOT protein (P1 phage) 2.79 1953
a

Complex category labels: E = Enzyme/Inhibitor or Enzyme/Substrate, O = Other.

b

NMR model numbers from are shown in parenthesis.

c

Change in accessible surface area (ΔASA) upon complex formation, defined as the ASA of Protein 1 plus the ASA of Protein 2 minus the ASA of the Complex. ASA is calculated using NACCESS.

Benchmark 4.0 includes 121 rigid body cases (33 new), 30 cases of medium difficulty (11 new), and 25 difficult cases (8 new). According to biochemical function, we have 52 enzyme-inhibitor (17 new), 25 antibody-antigen, and 99 complexes with other function (35 new). We did not find new antibody-antigen complexes. In this update of the Benchmark, we included 16 cases that involve NMR unbound structures. Among them, 11 cases are classified as rigid body, 4 cases of medium difficulty, and 1 case as difficult. Thus the expected difficulty for docking algorithms using NMR structures in the benchmark is similar to the expected difficulty using X-ray structures. If we would consider NMR structures for the bound complexes, we would have included seven more cases (1GGR, 1J6T, 1O2F, 1P9D, 1UR6, 2ODG, 3EZA). Although one can argue that exclusion of complex NMR structures from the Benchmark should be decided on a case-by-case basis, we decided to simply leave all out since inclusion would only lead to a small increase of the Benchmark.

Table 2 summarizes the average I-RMSD, fnat and fnon-nat for the different classes of docking difficulty. The numbers in Table 2 indicate that the new cases in Benchmark 4.0 (in parentheses) have generally higher I-RMSD for rigid body cases and cases of medium difficulty, which predicts the new test cases to be more challenging for computational docking. Also, the fraction of rigid body cases in the new cases is 0.63, somewhat lower than the 0.71 in Benchmark 3.0. Thus the new cases are expected to be more difficult for protein-protein docking algorithms and this must be taken into account when assessing docking algorithms, since performance will depend on the benchmark version utilized.

Table 2.

Statistics of the three classes of difficulty in the entire Benchmark 4.0 and the new cases (in parentheses).

I-RMSD fnat fnon-nat Number
Rigid-body 0.90 (1.12) 0.79 (0.80) 0.21 (0.19) 121 (33)
Medium 1.76 (1.86) 0.63 (0.66) 0.35 (0.27) 30 (11)
Difficult 3.76 (3.45) 0.51 (0.60) 0.51 (0.41) 25 (8)

In summary, Benchmark 4.0 includes 52 new cases and a higher number of new rigid-body and medium difficulty cases show larger conformational changes upon binding than cases in the previous release. This is especially useful for the development of protein-protein docking algorithms that incorporate protein flexibility, a problem that has recently received much attention but still remains a major challenge.21

Supplementary Material

Supp Table s1

Acknowledgments

This work was funded by NIH grant R01 GM084884 awarded to ZW.

References

  • 1.Vakser IA. Protein docking for low-resolution structures. Protein Eng. 1995;8(4):371–377. doi: 10.1093/protein/8.4.371. [DOI] [PubMed] [Google Scholar]
  • 2.Comeau SR, Gatchell DW, Vajda S, Camacho CJ. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics. 2004;20(1):45–50. doi: 10.1093/bioinformatics/btg371. [DOI] [PubMed] [Google Scholar]
  • 3.Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Nelson E, Tsigelny I, Ten Eyck LF. Protein docking using continuum electrostatics and geometric fit. Protein Eng. 2001;14(2):105–113. doi: 10.1093/protein/14.2.105. [DOI] [PubMed] [Google Scholar]
  • 4.Chen R, Li L, Weng Z. ZDOCK: an initial-stage protein-docking algorithm. Proteins. 2003;52(1):80–87. doi: 10.1002/prot.10389. [DOI] [PubMed] [Google Scholar]
  • 5.Ritchie DW, Kozakov D, Vajda S. Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions. Bioinformatics. 2008;24(17):1865–1873. doi: 10.1093/bioinformatics/btn334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dominguez C, Boelens R, Bonvin AM. HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc. 2003;125(7):1731–1737. doi: 10.1021/ja026939x. [DOI] [PubMed] [Google Scholar]
  • 7.de Vries SJ, van Dijk M, Bonvin AM. The HADDOCK web server for data-driven biomolecular docking. Nat Protoc. 5(5):883–897. doi: 10.1038/nprot.2010.32. [DOI] [PubMed] [Google Scholar]
  • 8.Lyskov S, Gray JJ. The RosettaDock server for local protein-protein docking. Nucleic Acids Res. 2008;36(Web Server issue):W233–238. doi: 10.1093/nar/gkn216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, Vakser I, Wodak SJ. CAPRI: a Critical Assessment of PRedicted Interactions. Proteins. 2003;52(1):2–9. doi: 10.1002/prot.10381. [DOI] [PubMed] [Google Scholar]
  • 10.Chen R, Mintseris J, Janin J, Weng Z. A protein-protein docking benchmark. Proteins. 2003;52(1):88–91. doi: 10.1002/prot.10390. [DOI] [PubMed] [Google Scholar]
  • 11.Gao Y, Douguet D, Tovchigrechko A, Vakser IA. DOCKGROUND system of databases for protein recognition studies: Unbound structures for docking. Proteins. 2007;69(4):845–851. doi: 10.1002/prot.21714. [DOI] [PubMed] [Google Scholar]
  • 12.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z. Protein-Protein Docking Benchmark 2.0: an update. Proteins. 2005;60(2):214–216. doi: 10.1002/prot.20560. [DOI] [PubMed] [Google Scholar]
  • 14.Hwang H, Pierce B, Mintseris J, Janin J, Weng Z. Protein-protein docking benchmark version 3.0. Proteins. 2008;73(3):705–709. doi: 10.1002/prot.22106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kastritis PL, Bonvin AM. Are scoring functions in protein-protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res. 9(5):2216–2225. doi: 10.1021/pr9009854. [DOI] [PubMed] [Google Scholar]
  • 16.Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
  • 17.Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res. 2002;30(1):264–267. doi: 10.1093/nar/30.1.264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mintseris J, Weng Z. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci U S A. 2005;102(31):10930–10935. doi: 10.1073/pnas.0502667102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mendez R, Leplae R, De Maria L, Wodak SJ. Assessment of blind predictions of protein-protein interactions: current status of docking methods. Proteins. 2003;52(1):51–67. doi: 10.1002/prot.10393. [DOI] [PubMed] [Google Scholar]
  • 20.Barrette-Ng IH, Ng KK, Cherney MM, Pearce G, Ryan CA, James MN. Structural basis of inhibition revealed by a 1:2 complex of the two-headed tomato inhibitor-II and subtilisin Carlsberg. J Biol Chem. 2003;278(26):24062–24071. doi: 10.1074/jbc.M302020200. [DOI] [PubMed] [Google Scholar]
  • 21.Zacharias M. Accounting for conformational changes during protein-protein docking. Curr Opin Struct Biol. 2010;20(2):180–186. doi: 10.1016/j.sbi.2010.02.001. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Table s1

RESOURCES