PBSword: a web server for searching similar protein–protein binding sites

Bin Pang; Xingyan Kuang; Nan Zhao; Dmitry Korkin; Chi-Ren Shyu

doi:10.1093/nar/gks527

. 2012 Jun 9;40(Web Server issue):W428–W434. doi: 10.1093/nar/gks527

PBSword: a web server for searching similar protein–protein binding sites

Bin Pang ¹, Xingyan Kuang ¹, Nan Zhao ¹, Dmitry Korkin ^1,2, Chi-Ren Shyu ^1,2,^*

PMCID: PMC3394332 PMID: 22689645

Abstract

PBSword is a web server designed for efficient and accurate comparisons and searches of geometrically similar protein–protein binding sites from a large-scale database. The basic idea of PBSword is that each protein binding site is first represented by a high-dimensional vector of ‘visual words’, which characterizes both the global and local shape features of the binding site. It then uses a scalable indexing technique to search for those binding sites whose visual words representations are similar to that of the query binding site. Our system is able to return ranked results of binding sites in short time from a database of 194 322 domain–domain binding sites. PBSword supports query by protein ID and by new structures uploaded by users. PBSword is a useful tool to investigate functional connections among proteins based on the local structures of binding site and has potential applications to protein–protein docking and drug discovery. The system is hosted at http://pbs.rnet.missouri.edu.

INTRODUCTION

Determining similar protein–protein binding site (PBS) plays an important role in understanding protein–protein interaction mechanisms (1) and has a potential impact on protein function prediction, protein–protein docking, drug discovery and evolutionary studies (2–6). As the size of data repositories of protein–protein interfaces continue to grow, numerous databases have been developed to organize and classify the interaction data at different subunit levels (chain or domain) (7,8). Some recent examples of databases include SCOPPI (9), PIBASE (10), IntAct (11), DOMMINO (12), iPfam (13) and SCOWLP (14). These databases can usually provide basic services of looking up a specific protein–protein interface according to the identification of protein or a group of interfaces based on the classification of protein family [e.g. SCOP (15) or CATH (16)]. In this case, similar binding sites are retrieved in terms of overall sequence and fold similarity of protein, which might not fulfill requirements of the binding site comparison and functional annotation, as a similar fold does not necessarily imply a similar function, and proteins of different folds may acquire similar functions. In contrast with the overall fold similarities, local structure of the binding site is highly possible to be connected to the functions of the protein (17,18). However, as the structure comparison of a query binding site against a large-scale database of binding sites can be very challenging and time consuming. It is a pressing need for the community to have an access to advanced services of searching similar functional sites for a newly discovered or existing protein based on the local patterns of binding site.

One of the most important tasks in constructing such a structure-based search engine is to develop an efficient and accurate method for comparison of binding sites. Early research works mainly used global sequence and structure alignment tools. However, these tools usually concentrated on the similarities of entire protein and may ignore the local structures of binding site. To overcome this issue, one cluster of approach is based on the alignment of local structures or functional groups on the protein surface [e.g. iAlign (19) and I2ISiteEngine (20)] to provide accurate comparison between binding sites. This approach is usually computationally expensive. To accelerate this process, another cluster aims to compare binding sites with extracted features [e.g. distance distribution (21) or moment invariant (22)] of surfaces or structure without explicit alignments. These methods, mainly designed for protein-ligand binding sites (23), have not been extensively evaluated on the datasets of protein–protein binding sites, which are known to have some unique characteristics, such as relatively large and planar surfaces (1).

To meet the challenges of efficiency and accuracy requirements, PBSword is developed to provide the community a web server for searching similar protein binding sites in terms of ‘visual words’. The basic idea of PBSword is originated from the classic method developed in information retrieval area for comparing the similarity of documents based on the word frequency profiles, which has been successfully applied in web search engines. In PBSword server, we further extend the text comparison method and propose a novel approach, which integrates frequency of visual words as well as local spatial relationships among them, to represent the protein binding sites. By loading the visual words representations of database binding sites into a scalable indexing tree, PBSword server can achieve high-throughput while preserving reasonably high precision of binding site comparison.

The key features of PBSword server include the following: (i) The binding site comparison method introduces a novel feature extraction algorithm and online database indexing; (ii) the database of binding site is based on the interactions between domains which are defined using the latest SCOP version (24); (iii) for each retrieved binding site from the database, a 3-dimensional (3D) view of structure and surface, as well as physicochemical properties are presented; (iv) the efficiency has been significantly enhanced to meet the requirements of large-scale protein binding site database searching.

MATERIALS AND METHODS

The system architecture of PBSword server, as shown in Figure 1, contains four modules: (i) database management and preprocessing; (ii) query interfaces; (iii) search engine and (iv) retrieval results visualization. A system tutorial can be viewed at the PBSword website.

Database management and preprocessing

The database of PBSword contains domain–domain binding sites of known protein structures. The structural data are extracted from Protein Data Bank (PDB) (25). If a PDB entry has more than one structure model, the first model is used in the database’s current implementation. For domain assignment, the most recent release (June 2009) of manually curated SCOP database is used. For each PDB structure, each pair of determined subunits (i.e. domains) is analysed to determine whether they interact with each other using the following definition. If any atom of a residue in one protein subunit is within 6 Å of any atom of a residue in another protein subunit, the two residues are determined as the contact pair residues. Currently, the entire PBSword database contains 194 322 redundant binding sites selected from 3123 SCOP families. Two nonredundant (nr) databases, denoted as NR40 and NR60, are constructed using sequence similarity of 40% and 60%, respectively.

The workflow of database preprocessing consists of the following three steps (see top-middle block of Figure 1). First, we select feature points from each database binding site surface and extract corresponding geometric features. Second, a visual vocabulary is built by clustering a large number (∼7 × 10⁵) of feature point descriptors collected from nr dataset. The nr dataset is selected from the entire database by applying a cutoff of 40% sequence identity for each SCOP family using Cluster Database at High Identity with Tolerance (CD-HIT) (26). The clustering method is k-means and each feature cluster is represented by a representative, which is regarded as a visual word and used to form the final vocabulary. The size of vocabulary is determined by k, which is set to 1000 in the PBSword server. Third, according to its descriptor, each feature point from the database binding site surface is associated with the nearest visual word from the vocabulary. This allows each binding site to be represented by the corresponding distribution of visual words. It is noted that the aforementioned processes for the database binding sites are performed offline. Owing to page limitations, interested readers are referred to our article of algorithm for further details and discussions (27).

Query interfaces

There are two types of query methods, ‘query by structure’ and ‘query by ID’, as shown in the top-left and top-right blocks of Figure 1, respectively. Using an Internet browser, a user can upload a new protein structure in PDB format or provide a protein ID contained in a PBSword database to find similar protein binding sites. The target database could be (i) redundant; (ii) NR40 or (iii) NR60.

For the query by structure search, we follow the similar steps as the database binding sites to extract its features, map the features to the nearest visual word and generate the visual word representation. The word representation of the query binding site is then sent to the search engine.

For the query by protein ID search, users can provide (i) SCOP IDs for the interacting subunits or (ii) PDB ID and chain ID for the subunit under investigation. For the second option, chain ID of interacting partner is optional. In that case, PBSword will search the database to find matched binding site and allow user to select one from the matched list. After the query binding site is selected, the corresponding word representation is then sent to the search engine.

Search engine

When the redundant PBSword database is selected as target, the online binding site search is performed on two customized indexing trees to avoid time-consuming one-by-one feature similarity calculation for the two query methods, namely query by ID and query by 3D structure. In this case, the query protein binding site can be represented by a data point in the visual word (or feature) space populated by the database binding sites as mentioned in the previous two subsections. Thus, searching similar binding site from the database is analogous to the identification of n nearest neighbors in the feature space. Such a search can be completed in log(N) time, where N is the total number of binding sites in the database. When the NR40 or NR60 database is selected, binding site similarity and associated z-score are calculated for each database binding site.

Retrieval results visualization

The visualization of retrieval result includes seven parts: (i) structure and surface display, (ii) ranked list, (iii) sequence, (iv) SCOP classification, (v) properties of binding site, (vi) properties of each binding site residue and (vii) property statistics of a SCOP family. The properties of binding site mainly include accessible surface area (ASA), polarity, hydrophobicity, hydrogen bonds, planarity and gap index, which are originally defined in (28,29). For completeness, we briefly introduce the calculation of these properties as follows. The ASA of binding site, ASA_bs, is calculated as follows:

where ASA₁ and ASA₂ are the ASAs of the subunit before and after its interacting partner presents, respectively. The ASA is calculated using the NACCESS (30), an implementation of method proposed in (31). A residue is defined as binding site residue if it loses 1.0 Å² of ASA after subunit partner presents. The polarity of binding site is defined as follows (28):

where ASA_polar represents the difference of ASA of polar atoms before and after interacting partner presents. The hydrophobicity is measured using method proposed in (29). The number of hydrogen bonds is calculated using the program HBPLUS (32). The planarity is defined as the root mean squared deviation between all binding site atoms and a best-fit plane through all the binding site atoms, which is calculated using the PRINCIP program from the SURFNET package (33). The gap index is defined as follows (28):

where gap volume is a measure of the closeness of the interface between the two subunits and calculated using the SURFNET package (33).

The retrieval results for an example query binding site 1m3d_78535_B_78538_C are shown in the top-left panel of Figure 2a. In PBSword, we use the identifier same as (22) to name each binding site: <PDB-ID>_<SCOP-domain of the binding site>_<Chain-ID of the binding site>_<SCOP-domain of the binding partner>_<Chain-ID of the binding partner>. Accordingly, each subunit is defined as <PBD_ID>_<SCOP-domain ID>_<Chain-ID>. For each query, a set of 100 top-ranked binding sites is returned to the user, eight at a time for each page. To visualize the search results, a 3D structure and surface view of the top-retrieval result is displayed to the user. The user can select any of the ranked results from the top-right panel. The top-left panel in Figure 2a presents the structure and surface view of the top-ranked result, 1t61_106535_D_106538_E, which is generated by clicking on the thumbnail image on the top-right panel. In addition, the users can (i) select to show/hide structures of two subunits by clicking the checkboxes and (ii) specify different display themes of binding site, such as opaque/translucent surface, by clicking on the buttons. The ranked list of protein binding sites can be downloaded from the result pages.

The sequence panel (Figure 2b) shows the sequence information of a subunit and its partner. For easy identification, the binding site residues are shown in red font and the residues with intermolecular hydrogen bond are underlined. The users can use the ‘residue checkbox’ under the residue to interact with the 3D structure view shown in Figure 2a. Clicking on the ‘residue checkbox’ will highlight one designated residue. Hyperlinks pointing to the protein’s corresponding entry in PDB, PDBSum (34), SCOPPI (9) and SCOP (24) are also provided.

The SCOP classification panel (Figure 3a) shows the description of corresponding SCOP class, fold, superfamily, family and species for two subunits. The properties of binding site and its interacting partner are shown in Figure 3b, including the number of binding site residues, ASA, percentage of ASA, percentage of polarity, percentage of hydrophobicity, planarity, number of hydrogen bonds and gap index. By clicking on the hyperlink of SCOP family at the row ‘Statistics of family’, user can view the histogram and summary statistics of each property by SCOP family (see Figure 3d). The properties of each binding site residue, shown in Figure 3c, include ASA of all atoms and polar atoms for a specific residue and percentage of ASA against the entire binding site ASA, as well as the number of intermolecular hydrogen bonds. The family statistics panel (Figure 3d) shows the statistics summary of properties of binding sites belonging to a SCOP family, including total number of binding sites, amino acids compositions, as well as the mean (standard deviation) and histogram of binding site properties. The properties include ASA, percentage of polarity, percentage of hydrophobicity, planarity, hydrogen bonding and gap index. In this panel, hydrogen bonding is defined as the number of hydrogen bonds per 100 Å² ASA.

For a search with query ID, PBSword retrieval results can be generated in real-time. For the query with protein structures, however, the system will usually take minutes to generate surface and extract features, which is dependent on the size of the query binding site. Our system provides the following two options for the users: (i) PBsword server will return a session ID for the query along with an estimated execution time after the query protein structure has been uploaded. The user can then bookmark the link of the session ID and check the resulting page a few minutes later (ii) If the user is willing to provide an email address when the query protein structure is uploaded, PBSword server will send ranked results to the user’s email account.

Performance evaluation

We applied the PBSword algorithm to SCOPPI binding site classification and compared its performance with a feature-based method, moment invariants (MI) (22), and an alignment-based method, iAlign (19), on an nr database of 2819 protein binding sites selected from SCOPPI 1.69 (22). Our experimental results show that PBSword algorithm can achieve comparable classification accuracy with iAlign and improve accuracy of MI by 36% on the nr dataset. Simultaneously, PBSword algorithm exhibits a significant efficiency improvement over the alignment-based method. For example, PBSword algorithm takes 0.31 second for a one-against-all search on the nr dataset, whereas iAlign spends 1016 seconds on a complete scan. In PBSword server, the efficiency has been further enhanced by using the indexing trees to organize the visual words representations of database, which can efficiently retrieve top 100 best matched sites for a query binding site without exhaustively performing one-against-all comparisons over all the 194 322 binding sites in the database.

DISCUSSION

Searching similar protein binding sites from a large-scale dataset is extremely important for various biological applications. The PBSword web server presented in this article comes equipped with an efficient and accurate search engine with a user-friendly interface and an informative retrieval result visualization design. Our server can return retrieval results in short time while preserving high accuracy. It is expected that this web server will be beneficial to the life sciences community by revealing functional and evolutionary connections between proteins based on the local similarity of binding site.

We finally emphasize that PBSword, as a feature-based method for comparing similarity of binding sites, is not designed to be a replacement of existing alignment-based methods (e.g. iAlign). Instead, it works as a complementary approach to the structure comparison methods and offers an efficient way to filter out dissimilar binding sites.

FUNDING

National Science Foundation [DBI-0845196 to D.K.]; Shumaker Endowment in Bioinformatics (to B.P., X. K. and N.Z.). Funding for open access charge: University of Missouri Shumaker Endowment in Bioinformatics.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors are grateful to the researchers and groups who made the following software packages and databases available for us to use in PBSword: the Jmol package (http://jmol.sourceforge.net/) for generating view of proteins, the SCOP for constructing database, MSMS/NACESS/HBPLUS/SURFNET software for generating surface and properties of binding sites and the PDB (http://www.pdb.org) for maintaining the tertiary structures. The authors acknowledge funding from National Science Foundation.

REFERENCES

1.Bahadur R, Zacharias M. The interface of protein-protein complexes: analysis of contacts and prediction of interactions. Cell. Mol. Life Sci. 2008;65:1059–1072. doi: 10.1007/s00018-007-7451-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Bradford J, Needham C, Bulpitt A, Westhead D. Insights into protein–protein interfaces using a Bayesian Network Prediction method. J. Mol. Biol. 2006;362:365–386. doi: 10.1016/j.jmb.2006.07.028. [DOI] [PubMed] [Google Scholar]
3.Henschel A, Kim W, Schroeder M. Equivalent binding sites reveal convergently evolved interaction motifs. Bioinformatics. 2006;22:550–555. doi: 10.1093/bioinformatics/bti782. [DOI] [PubMed] [Google Scholar]
4.Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O. Architectures and functional coverage of protein-protein interfaces. J. Mol. Biol. 2008;381:785–802. doi: 10.1016/j.jmb.2008.04.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wu CH, Huang H, Nikolskaya A, Hu Z, Barker WC. The iProClass integrated database for protein functional analysis. Comput. Biol. Chem. 2004;28:87–96. doi: 10.1016/j.compbiolchem.2003.10.003. [DOI] [PubMed] [Google Scholar]
6.Zhao N, Pang B, Shyu CR, Korkin D. Structural similarity and classification of protein interaction interfaces. PLoS One. 2011;6:e19554. doi: 10.1371/journal.pone.0019554. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief Bioinform. 2009;10:217–232. doi: 10.1093/bib/bbp001. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.De Las Rivas J, Fontanillo C. Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 2010;6:e1000807. doi: 10.1371/journal.pcbi.1000807. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Winter C, Henschel A, Kim WK, Schroeder M. SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res. 2006;34:D310–D314. doi: 10.1093/nar/gkj099. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Davis FP, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics. 2005;21:1901–1907. doi: 10.1093/bioinformatics/bti277. [DOI] [PubMed] [Google Scholar]
11.Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012;40:D841–D846. doi: 10.1093/nar/gkr1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kuang X, Han JG, Zhao N, Pang B, Shyu CR, Korkin D. DOMMINO: a database of macromolecular interactions. Nucleic Acids Res. 2012;40:D501–D506. doi: 10.1093/nar/gkr1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Finn RD, Marshall M, Bateman A. iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics. 2005;21:410–412. doi: 10.1093/bioinformatics/bti011. [DOI] [PubMed] [Google Scholar]
14.Teyra J, Doms A, Schroeder M, Pisabarro MT. SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces. BMC Bioinformatics. 2006;7:104. doi: 10.1186/1471-2105-7-104. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
16.Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH—a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
17.Yin S, Proctor EA, Lugovskoy AA, Dokholyan NV. Fast screening of protein surfaces using geometric invariant fingerprints. Proc. Natl. Acad. Sci. U S A. 2009;106:16622–16626. doi: 10.1073/pnas.0906146106. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Gao M, Skolnick J. Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc. Natl. Acad. Sci. U S A. 2010;107:22517–22522. doi: 10.1073/pnas.1012820107. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Gao M, Skolnick J. iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics. 2010;26:2259–2265. doi: 10.1093/bioinformatics/btq404. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Shulman-Peleg A, Mintz S, Nussinov R, Wolfson H. In: Algorithms in Bioinformatics. Jonassen I, Kim J, editors. Vol. 3240. Berlin/Heidelberg: Springer; 2004. pp. 194–205. [Google Scholar]
21.Das S, Kokardekar A, Breneman CM. Rapid comparison of protein binding site surfaces with property encoded shape distributions. J. Chem. Inf. Model. 2009;49:2863–2872. doi: 10.1021/ci900317x. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Sommer I, Muller O, Domingues F, Sander O, Weickert J, Lengauer T. Moment invariants as shape recognition technique for comparing protein binding sites. Bioinformatics. 2007;23:3139–3146. doi: 10.1093/bioinformatics/btm503. [DOI] [PubMed] [Google Scholar]
23.Das S, Krein MP, Breneman CM. PESDserv: a server for high-throughput comparison of protein binding site surfaces. Bioinformatics. 2010;26:1913–1914. doi: 10.1093/bioinformatics/btq288. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008;36:D419–D425. doi: 10.1093/nar/gkm993. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Berman HM. The Protein Data Bank: a historical perspective. Acta Crystallogr. A. 2008;64:88–95. doi: 10.1107/S0108767307035623. [DOI] [PubMed] [Google Scholar]
26.Li W, Jaroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–283. doi: 10.1093/bioinformatics/17.3.282. [DOI] [PubMed] [Google Scholar]
27.Pang B, Zhao N, Korkin D, Shyu C-R. Fast protein binding site comparisons using visual words representation. Bioinformatics. 2012;28:1345–1352. doi: 10.1093/bioinformatics/bts138. [DOI] [PubMed] [Google Scholar]
28.Jones S, Marin A, Thornton JM. Protein domain interfaces: characterization and comparison with oligomeric protein interfaces. Protein Eng. 2000;13:77–82. doi: 10.1093/protein/13.2.77. [DOI] [PubMed] [Google Scholar]
29.Jones S, Thornton JM. Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 1997;272:121–132. doi: 10.1006/jmbi.1997.1234. [DOI] [PubMed] [Google Scholar]
30.Hubbard SJ, Thornton JM. 1993. ‘NACCESS’, computer program. [Google Scholar]
31.Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971;55:379–400. doi: 10.1016/0022-2836(71)90324-x. [DOI] [PubMed] [Google Scholar]
32.McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 1994;238:777–793. doi: 10.1006/jmbi.1994.1334. [DOI] [PubMed] [Google Scholar]
33.Laskowski RA. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph. 1995;13:323–330, 307–308. doi: 10.1016/0263-7855(95)00073-9. [DOI] [PubMed] [Google Scholar]
34.Laskowski RA. PDBsum new things. Nucleic Acids Res. 2009;37:D355–D359. doi: 10.1093/nar/gkn860. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B1] 1.Bahadur R, Zacharias M. The interface of protein-protein complexes: analysis of contacts and prediction of interactions. Cell. Mol. Life Sci. 2008;65:1059–1072. doi: 10.1007/s00018-007-7451-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B2] 2.Bradford J, Needham C, Bulpitt A, Westhead D. Insights into protein–protein interfaces using a Bayesian Network Prediction method. J. Mol. Biol. 2006;362:365–386. doi: 10.1016/j.jmb.2006.07.028. [DOI] [PubMed] [Google Scholar]

[gks527-B3] 3.Henschel A, Kim W, Schroeder M. Equivalent binding sites reveal convergently evolved interaction motifs. Bioinformatics. 2006;22:550–555. doi: 10.1093/bioinformatics/bti782. [DOI] [PubMed] [Google Scholar]

[gks527-B4] 4.Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O. Architectures and functional coverage of protein-protein interfaces. J. Mol. Biol. 2008;381:785–802. doi: 10.1016/j.jmb.2008.04.071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B5] 5.Wu CH, Huang H, Nikolskaya A, Hu Z, Barker WC. The iProClass integrated database for protein functional analysis. Comput. Biol. Chem. 2004;28:87–96. doi: 10.1016/j.compbiolchem.2003.10.003. [DOI] [PubMed] [Google Scholar]

[gks527-B6] 6.Zhao N, Pang B, Shyu CR, Korkin D. Structural similarity and classification of protein interaction interfaces. PLoS One. 2011;6:e19554. doi: 10.1371/journal.pone.0019554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B7] 7.Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief Bioinform. 2009;10:217–232. doi: 10.1093/bib/bbp001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B8] 8.De Las Rivas J, Fontanillo C. Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 2010;6:e1000807. doi: 10.1371/journal.pcbi.1000807. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B9] 9.Winter C, Henschel A, Kim WK, Schroeder M. SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res. 2006;34:D310–D314. doi: 10.1093/nar/gkj099. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B10] 10.Davis FP, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics. 2005;21:1901–1907. doi: 10.1093/bioinformatics/bti277. [DOI] [PubMed] [Google Scholar]

[gks527-B11] 11.Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012;40:D841–D846. doi: 10.1093/nar/gkr1088. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B12] 12.Kuang X, Han JG, Zhao N, Pang B, Shyu CR, Korkin D. DOMMINO: a database of macromolecular interactions. Nucleic Acids Res. 2012;40:D501–D506. doi: 10.1093/nar/gkr1128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B13] 13.Finn RD, Marshall M, Bateman A. iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics. 2005;21:410–412. doi: 10.1093/bioinformatics/bti011. [DOI] [PubMed] [Google Scholar]

[gks527-B14] 14.Teyra J, Doms A, Schroeder M, Pisabarro MT. SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces. BMC Bioinformatics. 2006;7:104. doi: 10.1186/1471-2105-7-104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B15] 15.Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]

[gks527-B16] 16.Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH—a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]

[gks527-B17] 17.Yin S, Proctor EA, Lugovskoy AA, Dokholyan NV. Fast screening of protein surfaces using geometric invariant fingerprints. Proc. Natl. Acad. Sci. U S A. 2009;106:16622–16626. doi: 10.1073/pnas.0906146106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B18] 18.Gao M, Skolnick J. Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc. Natl. Acad. Sci. U S A. 2010;107:22517–22522. doi: 10.1073/pnas.1012820107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B19] 19.Gao M, Skolnick J. iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics. 2010;26:2259–2265. doi: 10.1093/bioinformatics/btq404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B20] 20.Shulman-Peleg A, Mintz S, Nussinov R, Wolfson H. In: Algorithms in Bioinformatics. Jonassen I, Kim J, editors. Vol. 3240. Berlin/Heidelberg: Springer; 2004. pp. 194–205. [Google Scholar]

[gks527-B21] 21.Das S, Kokardekar A, Breneman CM. Rapid comparison of protein binding site surfaces with property encoded shape distributions. J. Chem. Inf. Model. 2009;49:2863–2872. doi: 10.1021/ci900317x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B22] 22.Sommer I, Muller O, Domingues F, Sander O, Weickert J, Lengauer T. Moment invariants as shape recognition technique for comparing protein binding sites. Bioinformatics. 2007;23:3139–3146. doi: 10.1093/bioinformatics/btm503. [DOI] [PubMed] [Google Scholar]

[gks527-B23] 23.Das S, Krein MP, Breneman CM. PESDserv: a server for high-throughput comparison of protein binding site surfaces. Bioinformatics. 2010;26:1913–1914. doi: 10.1093/bioinformatics/btq288. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B24] 24.Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008;36:D419–D425. doi: 10.1093/nar/gkm993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gks527-B25] 25.Berman HM. The Protein Data Bank: a historical perspective. Acta Crystallogr. A. 2008;64:88–95. doi: 10.1107/S0108767307035623. [DOI] [PubMed] [Google Scholar]

[gks527-B26] 26.Li W, Jaroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–283. doi: 10.1093/bioinformatics/17.3.282. [DOI] [PubMed] [Google Scholar]

[gks527-B27] 27.Pang B, Zhao N, Korkin D, Shyu C-R. Fast protein binding site comparisons using visual words representation. Bioinformatics. 2012;28:1345–1352. doi: 10.1093/bioinformatics/bts138. [DOI] [PubMed] [Google Scholar]

[gks527-B28] 28.Jones S, Marin A, Thornton JM. Protein domain interfaces: characterization and comparison with oligomeric protein interfaces. Protein Eng. 2000;13:77–82. doi: 10.1093/protein/13.2.77. [DOI] [PubMed] [Google Scholar]

[gks527-B29] 29.Jones S, Thornton JM. Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 1997;272:121–132. doi: 10.1006/jmbi.1997.1234. [DOI] [PubMed] [Google Scholar]

[gks527-B30] 30.Hubbard SJ, Thornton JM. 1993. ‘NACCESS’, computer program. [Google Scholar]

[gks527-B31] 31.Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971;55:379–400. doi: 10.1016/0022-2836(71)90324-x. [DOI] [PubMed] [Google Scholar]

[gks527-B32] 32.McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 1994;238:777–793. doi: 10.1006/jmbi.1994.1334. [DOI] [PubMed] [Google Scholar]

[gks527-B33] 33.Laskowski RA. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph. 1995;13:323–330, 307–308. doi: 10.1016/0263-7855(95)00073-9. [DOI] [PubMed] [Google Scholar]

[gks527-B34] 34.Laskowski RA. PDBsum new things. Nucleic Acids Res. 2009;37:D355–D359. doi: 10.1093/nar/gkn860. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

PBSword: a web server for searching similar protein–protein binding sites

Bin Pang

Xingyan Kuang

Nan Zhao

Dmitry Korkin

Chi-Ren Shyu

Abstract

INTRODUCTION

MATERIALS AND METHODS