Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2008 Sep 23;24(22):2634–2635. doi: 10.1093/bioinformatics/btn497

Dockground protein–protein docking decoy set

Shiyong Liu 1, Ying Gao 1, Ilya A Vakser 1,2,*
PMCID: PMC2579708  PMID: 18812365

Abstract

Summary: A protein–protein docking decoy set is built for the Dockground unbound benchmark set. The GRAMM-X docking scan was used to generate 100 non-native and at least one near-native match per complex for 61 complexes. The set is a publicly available resource for the development of scoring functions and knowledge-based potentials for protein docking methodologies.

Availability: The decoys are freely available for download at http://dockground.bioinformatics.ku.edu/UNBOUND/decoy/decoy.php

Contact: vakser@ku.edu

1 INTRODUCTION

Computational techniques for structural modeling of protein–protein interactions are rapidly developing, both in terms of methodology and computing power (Gray, 2006; Vajda and Camacho, 2004). An important activity in the field of protein–protein docking is the community-wide Critical Assessment of Predicted Interactions (CAPRI; http://capri.ebi.ac.uk; Wodak, 2007), which allows comparison of different computational methods on a set of prediction targets.

A number of databases of protein–protein complexes have been compiled and used to investigate physicochemical and structural preferences at protein–protein interfaces (Davis and Sali, 2005; Douguet et al., 2006; Gao et al., 2007; Keskin et al., 2004; Kundrotas and Alexov, 2007; Lu et al., 2003). It is essential for the protein–protein databases to be comprehensive, automatically updated and fully querying, like the ones in the DOCKGROUND project (Douguet et al., 2006; Gao et al., 2007).

Benchmark sets of complexes with both bound and unbound structures have been developed for validation of docking approaches (Gao et al., 2007; Mintseris et al., 2005). The sets contain ∼100 crystallographically determined pairs of proteins. An important part in developing intermolecular potentials and scoring functions is decoy sets of structures (false positive matches). Reliable docking procedures have to distinguish between decoys and correct matches. Development of protein–protein docking decoys started in our lab in 1998. The number of decoys was further expanded by Sternberg and co-workers, and then by Baker, Gray and co-workers (RosettaDock, http://depts.washington.edu/bakerpg), Weng and co-workers (ZDOCK, http://zlab.bu.edu) and others. Currently available decoy sets typically are ranked by scoring functions that involve force field terms, statistical potentials, etc. The ZDOCK set contains tens of thousands of matches per complex, which complicates testing and optimization of computationally expensive scoring functions. The RosettaDock set consists of minimized structures with replaced side chains, targeted for high-resolution (post-refinement) scoring, which may be inappropriate for low-resolution scoring of post-scan/pre-refinement complexes with structural clashes and gaps. Some complexes in the above sets do not contain near-native matches. The decoy set presented in this article, built within the DOCKGROUND project (http://dockground.bioinformatics.ku.edu), involves post-scan matches based on shape complementarity alone and contains 100 decoys per complex plus near-native matches for each complex. Thus, it is an unbiased set that it is optimally suited for testing and optimization of the post-scan scoring functions.

2 METHODS

The docking was performed by our GRAMM-X FFT docking procedure (Tovchigrechko and Vakser, 2005). The procedure performs exhaustive sampling of the translation/rotation space with the soft Lennard–Jones potential, based on our GRAMM algorithm, which has been extensively published and validated over the years (Katchalski-Katzir et al., 1992; Vakser, 1995, 1997; Vakser et al., 1999). The scan stage grid translation step was 1.5 Å and rotation step 6.

DOCKGROUND project is an expanding resource for the development of docking techniques and studies of protein interfaces (http://dockground.bioinformatics.ku.edu; Douguet et al., 2006; Gao et al., 2007). The docking decoys were built for the unbound docking benchmark set Version 2, which contains structures with crystallographically determined bound (co-crystallized) and unbound (crystallized separately) forms. The set was built based on the following selection criteria: sequence identity between bound and unbound structures >97%, sequence identity between complexes <30%, deleted homomultimers (sequence identity between chains <70%) and deleted crystal packing complexes and structures in wrong format. The total number of complexes in the set was 99.

GRAMM-X scan was applied to the set to build docking decoys. The following characteristics from the CAPRI evaluation protocol were computed for 500 000 matches per complex: RMSD of the backbone atoms of the ligand (the smaller the component of the complex; the receptor being the larger one), RMSD of the backbone atoms of the interface residues, the number of native residue–residue contacts in the predicted complex divided by the number of contacts in the native complex and the number of non-native residue–residue contacts in the predicted complex divided by the total number of contacts in the complex. Matches with ligand RMSD<5.0 Å were defined as the near-native ones. The set contains 100 lowest energy non-native structures and at least one near-native structure per complex. The total number of complexes in the decoy set is 61 and includes only complexes where at least one near-native match was found.

3 RESULTS

The RMSD between bound and unbound structure reflects the degree of conformational change upon the complex formation. Table 1 shows the average statistics for the three groups of complexes. The average RMSDs between bound and unbound structure are rather small. This corresponds to the earlier estimates indicating that the majority of protein complexes have small backbone conformational change between bound and unbound forms (Gao et al., 2007).

Table 1.

Average statistics on protein–protein docking decoys

Classification Ligand Receptor Near-native Hitsd Number of
RMSDa RMSDb RMSDc complexes
enzyme/inhibitor 1.69 1.49 2.77 9.1 21
antibody/antigen 1.04 0.92 3.37 7.4 5
others 1.46 1.87 3.23 7.8 35

aUnbound/bound ligand Cα RMSD (Å).

bUnbound/bound receptor Cα RMSD (Å).

cLigand backbone RMSD (Å) in the closest to the native structure near-native match.

dNumber of near-native matches per complex.

GRAMM-X was unable to detect near-native matches in complexes with large conformational changes (primarily due to the domain shifts). Thus such complexes are not present in the decoy set.

The native structures, as opposed to the near-native ones, were deliberately excluded from the set because they are never achievable in practical docking and thus would be an unrealistic reference point for the development of docking methodologies. An example of docking decoys for a particular complex is shown in Figure 1. Application of popular scoring functions ZRANK (http://zdock.bu.edu/software.php) and DFIRE (http://sparks.informatics.iupui.edu) placed the near-native structure in top 10 matches in 40–50% of complexes.

Fig. 1.

Fig. 1.

Example of docking decoys. Matches represented by the ligand's center of mass are shown for 1e96 enzyme-inhibitor complex. the receptor (in green) and the ligand (in cyan) are shown in co-crystallized configuration. The native match is in yellow (not part of the decoy set), 10 near-native matches are in red and 100 non-native matches are in blue.

4 CONCLUSION

A protein–protein docking decoy set is built for the DOCKGROUND unbound benchmark set. The GRAMM-X docking scan was used to generate 100 non-native and at least one near-native match per complex for 61 complexes. The set is a publicly available resource for the development of scoring functions and knowledge-based potentials for protein docking methodologies.

Funding

National Institutes of Health (grant R01 GM074255).

Conflict of Interest: none declared.

ACKNOWLEDGEMENTS

The authors wish to thank Andrey Tovchigrechko for assistance with GRAMM-X docking.

REFERENCES

  1. Davis FP, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics. 2005;21:1901–1907. doi: 10.1093/bioinformatics/bti277. [DOI] [PubMed] [Google Scholar]
  2. Douguet D, et al. DOCKGROUND resource for studying protein-protein interfaces. Bioinformatics. 2006;22:2612–2618. doi: 10.1093/bioinformatics/btl447. [DOI] [PubMed] [Google Scholar]
  3. Gao Y, et al. DOCKGROUND system of databases for protein recognition studies: unbound structures for docking. Proteins. 2007;69:845–851. doi: 10.1002/prot.21714. [DOI] [PubMed] [Google Scholar]
  4. Gray JJ. High-resolution protein–protein docking. Curr. Opin. Struct. Biol. 2006;16:183–193. doi: 10.1016/j.sbi.2006.03.003. [DOI] [PubMed] [Google Scholar]
  5. Katchalski-Katzir E, et al. Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl Acad. Sci. USA. 1992;89:2195–2199. doi: 10.1073/pnas.89.6.2195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Keskin O, et al. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci. 2004;13:1043–1055. doi: 10.1110/ps.03484604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Kundrotas PJ, Alexov E. PROTCOM: searchable database of protein complexes enhanced with domain–domain structures. Nucleic Acids Res. 2007;35:D575–D579. doi: 10.1093/nar/gkl768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lu H, et al. Development of unified statistical potentials describing protein-protein interactions. Biophys. J. 2003;84:1895–1901. doi: 10.1016/S0006-3495(03)74997-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Mintseris J, et al. Protein-protein docking benchmark 2.0: an update. Proteins. 2005;60:214–216. doi: 10.1002/prot.20560. [DOI] [PubMed] [Google Scholar]
  10. Tovchigrechko A, Vakser IA. Development and testing of an automated approach to protein docking. Proteins. 2005;60:296–301. doi: 10.1002/prot.20573. [DOI] [PubMed] [Google Scholar]
  11. Vajda S, Camacho CJ. Protein--protein docking: is the glass half-full or half-empty? Trends Biotechnol. 2004;22:110–116. doi: 10.1016/j.tibtech.2004.01.006. [DOI] [PubMed] [Google Scholar]
  12. Vakser IA. Protein docking for low-resolution structures. Protein Eng. 1995;8:371–377. doi: 10.1093/protein/8.4.371. [DOI] [PubMed] [Google Scholar]
  13. Vakser IA. Evaluation of GRAMM low-resolution docking methodology on the hemagglutinin-antibody complex. Proteins. 1997;(Suppl. 1):226–230. [PubMed] [Google Scholar]
  14. Vakser IA, et al. A systematic study of low-resolution recognition in protein-protein complexes. Proc. Natl Acad. Sci. USA. 1999;96:8477–8482. doi: 10.1073/pnas.96.15.8477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Wodak SJ. From the Mediterranean coast to the shores of Lake Ontario: CAPRI's premiere on the American continent. Proteins. 2007;69:697–698. doi: 10.1002/prot.21805. [DOI] [PubMed] [Google Scholar]

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES