Abstract
PocketQuery (http://pocketquery.csb.pitt.edu) is a web interface for exploring the properties of protein–protein interaction (PPI) interfaces with a focus on the discovery of promising starting points for small-molecule design. PocketQuery rapidly focuses attention on the key interacting residues of an interaction using a ‘druggability’ score that provides an estimate of how likely the chemical mimicry of a cluster of interface residues would result in a small-molecule inhibitor of an interaction. These residue clusters are chemical starting points that can be seamlessly exported to a pharmacophore-based drug discovery workflow. PocketQuery is updated on a weekly basis to contain all applicable PPI structures deposited in the Protein Data Bank and allows users to upload their own custom structures for analysis.
INTRODUCTION
Protein–protein interactions (PPIs) are essential for biological function and are an emerging class of therapeutic targets (1,2). PPIs have proven to be a challenging target for drug discovery (3) and have performed poorly in high-throughput screens (4), possibly due to the historical bias of existing chemical libraries (5). One successful approach for targeting PPIs is the rational design of small-molecules that mimic the interaction of a few key residues at the protein–protein interface (6–8). These residues are typically deeply buried ‘anchor’ residues (9) and/or ‘hot-spot’ residues (10,11). The prediction of such residues has received a great deal of attention, and approaches use structural features (9,12–14), sequence conservation (15–18), or, most successfully, a multi-feature consensus approach (19–24). Although most predictors focus on identifying individual residues, analyzing clusters of nearby residues has been found to be more informative when identifying chemical starting points for the design of small-molecule PPI inhibitors (25). PocketQuery provides an interface for exploring high-level features of a PPI interface and rapidly focusing attention on the key clusters of residues that are likely small-molecule inhibitor starting points.
There are numerous online resources for analyzing and predicting the properties of PPIs (26). In particular, several web servers support the exploration and visualization of properties of PPI interface residues such as sequence conservation (27), surface area calculations (14,28,29), and predicted hot spots (14,22). PocketQuery provides a 3D interface to explore all of these properties, and interface residue properties are precomputed for all PPI structures in the Protein Data Bank (PDB) resulting in the immediate availability of most structures of interest.
PocketQuery is unique in its focus on identifying small-molecule starting points from PPI structure in the form of clusters of interface residues. A cluster of co-located residues provides greater specificity than a single residue, and, unlike the full collection of interface residues, the molecular interactions of a small cluster are accessible to small-molecules. Clusters are ranked according to a ‘druggability’ score where high scoring clusters likely delineate a potential binding site on a receptor surface. PocketQuery is complementary to methods that identify binding sites through an analysis of the receptor (30). Unlike receptor-only methods, PocketQuery requires the full PPI structure, but the residues identified by PocketQuery not only delineate a putative binding site on the protein, they also define a set of molecular interactions selected for by evolution. These interactions define a pharmacophore that can be seamlessly exported into a virtual screening workflow.
MATERIALS AND METHODS
PocketQuery contains an analysis of every PPI structure in the PDB and is updated on a weekly basis as new structures are made available. Structures are not filtered by experimental method or resolution; users must make their own determination as to whether a structure is of high enough quality to support a meaningful analysis. The first biological assembly deposited in the PDB is analyzed. If no biological assembly is available (e.g. for an NMR structure), then the first model of the asymmetric unit is analyzed. Additionally, users may submit their own structures for analysis. Very large oligomeric structures (such as viral capsids) are reduced to a single monomer and its neighbors to reduce the computational overhead of the analysis. Each structure is preprocessed with CHARMM version 31b1 (31) to add missing atoms, including hydrogens, and optimize hydrogen bonding.
The following energetic, structural and evolutionary properties are computed for each interface residue of a PPI:
ΔGFC: an estimate of the change of free energy (kcal/mol) for a residue upon complexation. Computed using FastContact (12). More negative values indicate a stronger interaction.
ΔΔGR: an estimate of the change in free energy of an alanine mutation. Computed using Rosetta (13). More positive values indicate the mutation destabilizes the complex and thus the original residue has a stronger interaction.
ΔSASA: the change in solvent accessible surface area (SASA) of a residue. This is the difference between the SASA of the bound conformation of a chain in the complexed state and the bound conformation as an independent chain. Computed using naccess (http://www.bioinf.manchester.ac.uk/naccess/).
ΔSASA%: the relative ΔSASA as computed by naccess. Expressed as a percentage.
Cons: a conservation score computed using Scorecons (32). A higher score indicates a higher degree of conservation.
Rate: an evolutionary rate computed using Rate4Site (33). A higher score indicates a higher rate and lower degree of conservation.
The full protocol for computing these properties is reported elsewhere (25). Any residue with ΔSASA > 0.05 Å2 is treated as an interface residue. All possible clusters of interface residues with a maximal span of 12Å are computed. The cluster properties include the aggregated residue properties (minimum, maximum, average and total values) as well as the types of residues in the cluster, the size of the cluster (number of residues) and the maximal distance between cluster residues.
Most significantly, PocketQuery provides a consensus score for each cluster that is derived from a structural analysis of protein–ligand and protein–protein structures (25). The score, which ranges from zero to one, is the output of a support vector machine trained using structures of small-molecules that bind at the PPI interface. When these ligand-bound structures are aligned with the PPI structure, the small-molecule overlaps a cluster of residues in the PPI, and these overlapping residues delineate a small-molecule binding site at the PPI interface. Clusters with properties that are consistent with these overlapping clusters receive a higher score. That is, a high score suggests that the chemical mimicry of the cluster would be a good starting point for the design of a small-molecule inhibitor of the interaction, and the score provides a rough indication of the ‘druggability’ of the PPI. Since this druggability score is derived from a machine learning structural analysis, its quality and relevance should only increase as more protein–ligand and protein–protein structures are used in its computation. Consequently, the score for all PocketQuery clusters will be updated on a biennial basis to benefit from the constantly increasing amount of structural information.
The PocketQuery interface is implemented using JavaScript and the Java-based Jmol (http://www.jmol.org/) molecular viewer. A modern, standards compliant web browser with a recent Java plugin is required.
POCKETQUERY INTERFACE
PocketQuery provides an easy to use graphical interface for searching for clusters, displaying and browsing the clusters that are the results of a search, and visualizing the molecular structure and properties of clusters within the 3D PPI structure. The full results of a search as well as the structures of specific clusters can be exported into common, analyzable and file formats.
Search
The PocketQuery search interface is shown in Figure 1. The interface is divided into a Search panel on the left, where the criteria for the search can be entered, and a table of result clusters on the right. The search criteria includes structure-level information, such as the PDB id and chain, and all the computed cluster properties. An arbitrary number of search criteria may be conjunctively joined to precisely filter the results. For instance, searching for all clusters with a cluster size equal to one and a total ΔGFC (FastContact energy) < −3 would identify all predicted individual hot-spot residues. If a PDB id is not provided, a PDB-wide search is performed. A PDB-wide search can be narrowed by specifying keywords that must match within the PDB title or keywords fields.
Figure 1.
The PocketQuery search interface. The Search panel on the left allows the user to set an arbitrary number of conjunctively joined filters, such as specifying a minimum average ΔSASA for each cluster. Preset filtering criteria are available (bottom left) or custom searches may be created and shared using the Load/Save buttons. The matching clusters from the PPI structures deposited in the PDB are displayed in the right panel.
As shown in the lower left of Figure 1, a number of preset settings for search criteria are available. Additionally, the search settings may be saved to or restored from a file to enable collaborative sharing of search results.
Clusters
The cluster results table is shown in the right of Figures 1, 2 and 3 and is always visible. Each row of the table corresponds to a single cluster of residues and each column to a cluster property. Clusters may be sorted by any available property by clicking the column heading. As each cluster has dozens of associated properties, it is unwieldy to view them all at once. Instead, as shown in the lower right of Figure 2, different views of the data may be selected to focus in on different aspects of the interface. For example, the Residue Centric view, which is shown in Figure 3 and is most appropriate when searching for only clusters of size one, displays the residue names and sequence identifiers, but omits the cluster size and maximum distance (which are always one and zero for single-residue clusters). The full search results, consisting of all the computed properties for every matching cluster, can be downloaded as a comma separated text file through the Save Results button.
Figure 2.
Browsing the PocketQuery search results. The Clusters panel on the right displays the table of clusters matching the specified search criteria. Different consolidated views of the many available cluster properties can be selected, as shown in the bottom right. Results may be sorted by any numerical property. Here they are shown sorted by a ‘druggability’ score, where are higher score indicates the similarity of a cluster to residue clusters known to delineate small-molecule binding sites at a PPI interface. Selecting a result cluster in the results table brings up a molecular viewer panel featuring a 3D JMol interface of the PPI structure. The residues of the selected cluster(s) and their properties are shown in the bottom of the molecular viewer panel.
Figure 3.
Visualizing the properties of a protein–protein interface. A variety of display styles are available for the receptor protein, ligand protein and selected cluster(s). The receptor surface is shown color mapped by partial charge. The color of the interface residues of the ligand protein (including the cluster residues) may be color mapped to any of the computed properties, such as energy estimates, sequence conservation or ΔSASA.
Clusters may be visualized in the context of the PPI by clicking on the corresponding row. As shown in Figure 2, this launches a molecular viewer in the left panel of the interface, replacing the search panel. If multiple clusters from the same chain are selected (i.e. using the Ctrl and Shift keys) as shown in Figure 3, their residues are merged into a single cluster in the molecular viewer panel. The residue properties of the displayed cluster are shown in the bottom of the molecular viewer panel, as shown in Figure 2.
Visualization
The molecular viewer may be manipulated using the standard Jmol controls. Additionally, the viewer panel, shown in Figure 3, can be used to customize the visual styles of the receptor protein, ligand protein and cluster residues. The ligand protein is the single chain that contains the selected cluster whereas the receptor protein consists of all other chains in the PPI complex. The default display style is for the receptor and ligand protein residues to be shown as wireframes with the cluster residues shown as sticks, as shown in Figure 2. The receptor protein is displayed with a rendered surface that is color mapped by the partial charge of the residues (the surface may be omitted by setting the transparency to 100%). The interface residues of the ligand protein may be colored by residue property, as shown in Figure 3. Property values are mapped to a rainbow (ROYBG) spectrum where the smallest values are always red and the largest blue.
Export
Once a cluster has been identified as a promising starting point for small-molecule design, it can be saved as a PDB file or exported directly into a pharmacophore-based virtual screening workflow. This functionality is provided under the Export tab of the molecular viewer panel. Export is available for two online virtual screening search engines: AnchorQuery (http://anchorquery.csb.pitt.edu) and ZINCPharmer (http://zincpharmer.csb.pitt.edu). Both AnchorQuery and ZINCPharmer screen for compounds using a 3D pharmacophore, the spatial arrangement of the essential features (such as hydrophobic regions or hydrogen bonds) of an interaction. A candidate pharmacophore is derived directly from the cluster–receptor structure exported by PocketQuery and can be easily refined by the user.
AnchorQuery is a specialized interactive pharmacophore search technology. AnchorQuery includes a library of > 1.5 billion conformers of > 21 million novel chemical compounds. These compounds are accessible through one-step multi-component reaction chemistry. The compounds are designed to be biased to target PPIs by virtue of always containing a functional mimic of a specific amino acid (the ‘anchor’ feature of the pharmacophore). AnchorQuery provides rapid access to a large chemical space specialized for targeting PPIs, but synthesis is required for experimental validation.
ZINCPharmer is a general interactive pharmacophore search technology for the ZINC database (34). ZINCPharmer is routinely updated to search the most recent set of purchasable compounds from ZINC. The size of this library is on the order of 100–200 million conformations of 10–20 million compounds. Although historical collections of compounds are thought to be poorly suited for targeting PPIs (5), compounds found through ZINCPharmer have the advantage that they may be immediately purchased for experimental validation.
DISCUSSION
PocketQuery includes precomputed properties for all the PPI structures in the PDB resulting in the immediate availability of almost all structures of interest. Not only are multiple properties computed, such as energy estimates and sequence conservation, but these properties are also computed with multiple methods. Property values can be displayed both numerically and visually color-mapped directly onto the PPI structure. This allows for a critical assessment of the potential contribution of an interface residue in the formation and stabilization of the PPI complex.
PocketQuery is a valuable tool for the interactive, user-guided exploration of PPI interfaces, and the analysis of clusters of residues combined with the consensus ‘druggability’ score results in the rapid focusing of attention on the key residues of the interaction. Once an appropriate set of residues is identified, PocketQuery provides an entry point into an interactive online drug discovery workflow for the development of PPI inhibitors. PocketQuery is freely available at http://pocketquery.csb.pitt.edu
FUNDING
National Institutes of Health (NIH) [R01GM097082]. Funding for open access charge: NIH [1R21GM087617].
Conflict of interest statement. None declared.
REFERENCES
- 1.Wells J, McClendon C. Reaching for high-hanging fruit in drug discovery at protein–protein interfaces. Nature. 2007;450:1001–1009. doi: 10.1038/nature06526. [DOI] [PubMed] [Google Scholar]
- 2.Dömling A. Small molecular weight protein-protein interaction antagonists–an insurmountable challenge? Curr. Opin. Chem. Biol. 2008;12:281–291. doi: 10.1016/j.cbpa.2008.04.603. [DOI] [PubMed] [Google Scholar]
- 3.Whitty A, Kumaravel G. Between a rock and a hard place? Nat. Chem. Biol. 2006;2:112–118. doi: 10.1038/nchembio0306-112. [DOI] [PubMed] [Google Scholar]
- 4.Macarron R. Critical review of the role of HTS in drug discovery. Drug Discov. Today. 2006;11:277–279. doi: 10.1016/j.drudis.2006.02.001. [DOI] [PubMed] [Google Scholar]
- 5.Sperandio O, Reynès CH, Camproux AC, Villoutreix BO. Rationalizing the chemical space of protein-protein interaction inhibitors. Drug Discov. Today. 2010;15:220–229. doi: 10.1016/j.drudis.2009.11.007. [DOI] [PubMed] [Google Scholar]
- 6.Popowicz G, Dömling A, Holak T. The structure-based design of Mdm2/Mdmx–p53 inhibitors gets serious. Angew. Chem. Int. Ed. 2011;50:2680–2688. doi: 10.1002/anie.201003863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liu S, Wu S, Jiang S. HIV entry inhibitors targeting gp41: from polypeptides to small-molecule compounds. Curr. Pharm. Des. 2007;13:143–162. doi: 10.2174/138161207779313722. [DOI] [PubMed] [Google Scholar]
- 8.Christ F, Voet A, Marchand A, Nicolet S, Desimmie BA, Marchand D, Bardiot D, Van derVeken NJ, Van Remoortel B, Strelkov SV. Rational design of small-molecule inhibitors of the LEDGF/p75-integrase interaction and HIV replication. Nat. Chem. Biol. 2010;6:442–448. doi: 10.1038/nchembio.370. [DOI] [PubMed] [Google Scholar]
- 9.Rajamani D, Thiel S, Vajda S, Camacho CJ. Anchor residues in protein-protein interactions. PNAS. 2004;101:11287. doi: 10.1073/pnas.0401942101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Clackson T, Wells JA. A hot spot of binding energy in a hormone-receptor interface. Science. 1995;267:383–386. doi: 10.1126/science.7529940. [DOI] [PubMed] [Google Scholar]
- 11.Moreira I, Fernandes P, Ramos M. Hot spots: a review of the protein–protein interface determinant amino-acid residues. Proteins: Struct. Funct. Bioinf. 2007;68:803–812. doi: 10.1002/prot.21396. [DOI] [PubMed] [Google Scholar]
- 12.Camacho C, Zhang C. FastContact: rapid estimate of contact and binding free energies. Bioinformatics. 2005;21:2534. doi: 10.1093/bioinformatics/bti322. [DOI] [PubMed] [Google Scholar]
- 13.Kortemme T, Kim D, Baker D. Computational alanine scanning of protein-protein interfaces. Sci. STKE. 2004;2004:l2. doi: 10.1126/stke.2192004pl2. [DOI] [PubMed] [Google Scholar]
- 14.Meireles L, Dömling AS, Camacho CJ. ANCHOR: a web server and database for analysis of protein-protein interaction binding pockets for drug discovery. Nucleic Acids Res. 2010;38:W407. doi: 10.1093/nar/gkq502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bromberg Y, Rost B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics. 2008;24:i207. doi: 10.1093/bioinformatics/btn268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Keskin O, Ma B, Nussinov R. Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 2005;345:1281–1294. doi: 10.1016/j.jmb.2004.10.077. [DOI] [PubMed] [Google Scholar]
- 17.Ofran Y, Rost B. Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 2007;3:e119. doi: 10.1371/journal.pcbi.0030119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 1996;257:342–358. doi: 10.1006/jmbi.1996.0167. [DOI] [PubMed] [Google Scholar]
- 19.Cho K, Kim D, Lee D. A feature-based approach to modeling protein–protein interaction hot spots. Nucleic Acids Res. 2009;37:2672. doi: 10.1093/nar/gkp132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lise S, Archambeau C, Pontil M, Jones D. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinformatics. 2009;10:365. doi: 10.1186/1471-2105-10-365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Darnell S, Page D, Mitchell J. An automated decision-tree approach to predicting protein interaction hot spots. Proteins Struct. Funct. Bioinf. 2007;68:813–823. doi: 10.1002/prot.21474. [DOI] [PubMed] [Google Scholar]
- 22.Zhu X, Mitchell JC. KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins Struct. Funct. Bioinf. 2011;79:2671–2683. doi: 10.1002/prot.23094. [DOI] [PubMed] [Google Scholar]
- 23.Tuncbag N, Gursoy A, Keskin O. Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics. 2009;25:1513. doi: 10.1093/bioinformatics/btp240. [DOI] [PubMed] [Google Scholar]
- 24.Guney E, Tuncbag N, Keskin O, Gursoy A. HotSprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 2008;36:D662. doi: 10.1093/nar/gkm813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Koes DR, Camacho CJ. Small-molecule inhibitor starting points learned from protein-protein interaction inhibitor structure. Bioinformatics. 2012;28:784–791. doi: 10.1093/bioinformatics/btr717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief. Bioinform. 2009;10:217–232. doi: 10.1093/bib/bbp001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics. 2003;19:163. doi: 10.1093/bioinformatics/19.1.163. [DOI] [PubMed] [Google Scholar]
- 28.Gabdoulline RR, Wade RC, Walther D. MolSurfer: a macromolecular interface navigator. Nucleic Acids Res. 2003;31:3349–3351. doi: 10.1093/nar/gkg588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J. CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res. 2006;34:W116–W118. doi: 10.1093/nar/gkl282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Henrich S, Salo-Ahen O, Huang B, Rippmann F, Cruciani G, Wade R. Computational approaches to identifying and characterizing protein binding sites for ligand design. J. Mol. Recogn. 2010;23:209–219. doi: 10.1002/jmr.984. [DOI] [PubMed] [Google Scholar]
- 31.Brooks BR, Bruccoleri RE, Olafson BD. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 1983;4:187–217. [Google Scholar]
- 32.Valdar WSJ. Scoring residue conservation. Proteins Struct. Funct. Bioinf. 2002;48:227–241. doi: 10.1002/prot.10146. [DOI] [PubMed] [Google Scholar]
- 33.Mayrose I, Graur D, Ben-Tal N, Pupko T. Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol. Biol. Evol. 2004;21:1781. doi: 10.1093/molbev/msh194. [DOI] [PubMed] [Google Scholar]
- 34.Irwin JJ, Shoichet BK. ZINC- a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45:177–182. doi: 10.1021/ci049714. [DOI] [PMC free article] [PubMed] [Google Scholar]