Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2015 Dec 11;112(52):15910–15915. doi: 10.1073/pnas.1518946112

Detection of secondary binding sites in proteins using fragment screening

R Frederick Ludlow 1, Marcel L Verdonk 1, Harpreet K Saini 1, Ian J Tickle 1, Harren Jhoti 1,1
PMCID: PMC4703025  PMID: 26655740

Significance

The regulation of proteins in biological systems is essential to their function and nature has evolved a diverse array of mechanisms by which to achieve such regulation. Indeed, the primary function of a protein may be regulated by interaction with endogenous ligands or other protein partners binding at secondary sites. In this study, we report that fragment screening using X-ray crystallography can identify such secondary sites that may have a biological function, which in turn implies that the opportunities for modulating protein function with small molecules via such sites are far more widespread than previously assumed. Many of the secondary sites we discovered were previously unknown and therefore offer potential for novel approaches to modulate these protein targets.

Keywords: protein structure, protein function, X-ray crystallography, fragment-based drug design

Abstract

Proteins need to be tightly regulated as they control biological processes in most normal cellular functions. The precise mechanisms of regulation are rarely completely understood but can involve binding of endogenous ligands and/or partner proteins at specific locations on a protein that can modulate function. Often, these additional secondary binding sites appear separate to the primary binding site, which, for example for an enzyme, may bind a substrate. In previous work, we have uncovered several examples in which secondary binding sites were discovered on proteins using fragment screening approaches. In each case, we were able to establish that the newly identified secondary binding site was biologically relevant as it was able to modulate function by the binding of a small molecule. In this study, we investigate how often secondary binding sites are located on proteins by analyzing 24 protein targets for which we have performed a fragment screen using X-ray crystallography. Our analysis shows that, surprisingly, the majority of proteins contain secondary binding sites based on their ability to bind fragments. Furthermore, sequence analysis of these previously unknown sites indicate high conservation, which suggests that they may have a biological function, perhaps via an allosteric mechanism. Comparing the physicochemical properties of the secondary sites with known primary ligand binding sites also shows broad similarities indicating that many of the secondary sites may be druggable in nature with small molecules that could provide new opportunities to modulate potential therapeutic targets.


Allosteric and other noncompetitive regulatory binding sites on proteins have long been of interest to biologists and also to drug discovery scientists (14). The intricate mechanisms by which proteins are regulated often invoke binding at these sites, which can be viewed as “secondary sites” separate from the primary site where, for example, an enzyme would perform its catalytic function. Indeed, given the significant size of most proteins it could be argued that it would be inefficient for evolution not to have resulted in multiple binding sites that could be used to regulate function. Furthermore, drug molecules that target such sites can offer an orthogonal mechanism for modulating the biological activity of a target protein and may have improved selectivity (57) and resistance profiles (8, 9). The different mechanisms by which orthosteric and allosteric inhibitors operate also provide different opportunities (and pitfalls) for drug development (10). In particular, ligands targeting these sites can provide a mechanism for activating enzymes, something that is unlikely to be achievable by binding to the catalytic site. An example of this is the allosteric glucokinase activators that are currently being investigated for treating type 2 diabetes (11).

Experimentally identifying biologically relevant secondary sites is challenging: although the existence of some sites may be inferred, for example from the dependency on a protein–protein interaction (PPI), other sites may currently have no known function or endogenous ligand. Despite their interest, it remains a challenging task to identify such sites on proteins and confirm their biological effect. Assay formats that detect binding at a certain site (e.g., displacement of a fluorescently tagged or radiolabeled ligand) will not necessarily detect binding to a noncompetitive site. Conversely, some functional assay formats and direct ligand-binding methods (e.g., isothermal titration calorimetry, surface plasmon resonance) may detect the effect of a small molecule binding to a secondary site but be unable to distinguish it from a competitive ligand. Furthermore, many computational methods have been developed to predict ligand-binding sites from protein sequence conservation, 3D structure (12), or both (13). These tools have had some success in predicting sites from the apo structure (14), including the presence of cryptic pockets (15). However, identifying ligand-induced pockets remains a largely unsolved problem. Clearly, strategies to identify secondary sites more readily would be of great importance to help unravel the mechanisms by which protein function is regulated and could provide novel approaches for therapeutic intervention.

The emergence of fragment-based drug discovery as an approach to identify novel small-molecule drug candidates has been well documented in recent years (16). In particular, screening of fragment libraries using X-ray crystallography has been shown to be a rather effective approach to sample chemical space, revealing previously unobserved pockets and ligand binding modes. The approach allows detection of direct binding of the fragment to the target, is very sensitive (Kd > 10 mM, depending on the solubility of the compound), and can also identify ligand-induced pockets (17), as well as preformed sites. The immediate availability of the crystal structure is also of great use in assessing the potential druggability of a newly identified site, facilitating the decision to undertake further biological work to determine whether the site is likely to have a functional role. Finally, given a diverse fragment library, the process allows a chemically unbiased screen of the available protein surface, which maximizes the probability of finding alternative binding sites.

Indeed, our laboratory has previously reported on the successful application of fragment screening using X-ray crystallography for identifying secondary, allosteric, sites for three separate targets. The first of these studies involved the discovery of an allosteric inhibitor of the viral protease-helicase protein HCV NS3 (18). In this case, fragment screening against the full-length protein identified a previously unknown allosteric binding site at the interface of the two domains. Close inspection of the HCV NS3 crystal structures led to the hypothesis that compounds binding at this site would stabilize an inactive conformation of the protein, and would thus inhibit its protease function, which was confirmed experimentally. In a second example, a fragment screen against the pyruvate kinase M2 enzyme (PKM2) (19) revealed l-serine bound to a previously uncharacterized binding pocket. It was subsequently shown that l-serine binding to this allosteric site activates PKM2. Finally, the allosteric bicarbonate binding site of human soluble adenylate cyclase (SolAC) was also characterized using X-ray crystallography (20, 21), and a subsequent fragment screen against this target identified inhibitors that occupy this site.

In other studies, Arnold and coworkers (22) described a fragment screen against HIV-1 reverse transcriptase in which seven previously unobserved binding sites were identified. An enzyme assay confirmed that three of the alternative site ligands were inhibitory. Jahnke et al. (23) at Novartis have used NMR fragment screening followed by X-ray crystallography to identify a previously unknown and druggable allosteric pocket on the protein farnesyl pyrophosphate synthase (FPPS). Development of the fragment hits resulted in a number of cell-active submicromolar inhibitors of FPPS. Novartis scientists have also reported the results of an NMR-based fragment screen against c-abl (24) showing binding in the allosteric myristoyl pocket (25). Finally, a fragment tethering approach was used by Shokat and coworkers to identify an allosteric pocket on the oncogenic G12C mutant of K-Ras (Kirsten rat sarcoma) (26). Of particular note is the fact that this pocket could not be observed in the apo X-ray structure due to the disordered nature of that part of the protein.

Although there have been several previous reports confirming the presence of biologically relevant secondary binding sites on proteins, it is unknown whether this is a general feature of proteins. To answer this question, we analyzed in-house data from 24 previous fragment-based drug discovery campaigns against a wide variety of protein targets where X-ray crystallography had been used as a primary screening technique. To the best of our knowledge, this is the first analysis of this kind, and our results indicate that secondary binding sites are indeed present in a majority of the proteins analyzed. In most of these cases, these secondary binding sites have not been previously identified and therefore have significant implications for the potential to generate new chemical tools to probe unexploited biological mechanisms. We therefore propose that fragment screening against proteins represents a useful and well-validated method for the identification of secondary, potentially biologically relevant, sites.

Results

Analysis of Fragment Screening Campaigns.

In this analysis, we gathered all X-ray crystal structures of protein targets for which at least 100 different fragments had been screened in-house using X-ray crystallography. This resulted in 5,590 holo X-ray crystal structures, comprising 4,950 distinct compounds and 24 protein targets. The exact protocols and screening cascades that led to the data presented here will have varied according to the target, but in broad lines our approach has been described by Hartshorn et al. (27, 28). In some cases we had collected data for different mutant forms of a protein, and in these cases data from all forms was used. Putative sites were identified through a combination of predictive tools [e.g., LIGSITE (29)], peak-finding methods, and visual inspection of the electron density. Once defined, ligand placement into the electron density was carried out automatically for the vast majority of the structures using our Autosolve software (28).

We were careful to exclude any incidental buffer molecules, inorganic ions, and cryoprotectants from our analysis. We also excluded a significant number of occupied sites that we thought were likely to be an artifact of the crystal environment, such as ligands bound to a site formed by the protein and a symmetry-related molecule. Finally, for each protein target, we used visual inspection to cluster the multiple ligands identified into discrete sites. Sites were divided in “primary sites” and “secondary sites.” For each target, one primary site was selected based on knowledge of the protein function—e.g., for enzymes, we selected the active site, and for PPI targets, we selected the main PPI interaction site. Small-molecule cofactor sites that were systematically occupied in our X-ray fragment screen (e.g., the glutathione site in PGDS) were also assigned to be primary sites. All remaining sites were defined to be secondary sites. Some of the secondary sites have a known biological function, but for the majority of these sites their function is unknown.

Although we did not have a strict definition for how big or small a site could be, we tried to be conservative in our estimate of the number of sites per target. For example, multiple subpockets within a larger pocket would be counted as a single site.

The results of this analysis are presented in Table 1. A total of 53 distinct sites was observed across 24 protein targets, an average of 2.2 sites per target with at least two sites observed for the majority (67%) of targets. These numbers should be seen as a lower bound: in addition to our use of conservative site definitions, some sites may have been occluded by crystal contacts in the particular crystal system used for the fragment screen, whereas some other sites may bind chemo-types that were not represented in the screening library.

Table 1.

Number of ligand-binding sites detected across 24 in-house fragment screening campaigns

Target No. sites observed Target No. sites observed
MetAP2 2 FGFR1 1
MELK 2 hRAS 2
PGDS (r)* 4 KEAP1 (m)* 2
Lp-PLA2 2 PKM2 6
CDK2 3 ERK2 2
Urokinase 1 eIF4E 2
PARP1 (m)* 1 XIAP 1
HCV NS3 4 DNA ligase (s.a)* 3
iNOS (m)* 1 ATAD2 2
SolAC 1 BACE1 1
JAK2 1 HSPA2 5
HSP90A 2 SETD2 2

In total, 53 sites were discovered, with an average of 2.2 ± 1.2 sites per target.

*

Mouse (m), rat (r), and Staphylococcus aureus (s.a) proteins were used for these targets.

Although we did identify an allosteric bicarbonate site for SolAC (20), we considered this too close to the main active site to be counted as a separate site for this analysis.

Examples of Newly Identified Sites.

Fig. 1 shows all primary and secondary sites that we identified for four of the targets reported in Table 1. The first example is CDK2 where the same fragment (compound 1) is seen to bind to two different pockets on the target: one molecule binds to the known ATP binding site, and the other interacts with a region on the C-terminal lobe, well away from the active site. Interestingly, a derivative of d-Luciferin has recently been observed to bind in this site, through a very similar hydrogen-bonding motif (Inset) (30). Compound 2 was also observed to bind at two locations, in this case at the ATP site and at a highly conserved secondary site (Fig. 1A, green ligand) in the region where CDK2 interacts with Cyclin A (PDB 1FIN) (31). Ligand binding has been previously reported at this site by Betzi et al. (32) (PDB 3PXF), who propose that disrupting the CDK2–cyclinA complex via this site could lead to allosteric inhibitors of CDK2.

Fig. 1.

Fig. 1.

Overlaid binding sites observed for four targets in this study. (A) CDK2 (PDB 5FP5) with compound 1 (yellow, cyan, PDB 5FP5) and compound 2 (green, PDB 5FP6). (Inset) Overlay of d-Luceferin derivative from PDB 4D1Z and compound 1. (B) HSPA2 protein from PDB 5FPE with fragments 3 (dark gray and cyan), 4 (green, PDB 5FPM), 5 (light gray, PDB 5FPD), and 6 (orange, PDB 5FPN). (C) DNA ligase PDB 5FPO with fragment 7 (green, PDB 5FPR) and 8 (yellow, cyan, PDB 5FPO). (D) Protease (gray) and helicase (blue) domains of HCV NS3 in complex with 8 (gray, PDB 4B6E), 9 (green, PDB 5FPS), 10 (yellow, PDB 5FPY), 11 (cyan, PDB 5FPT). (Inset) Overlay of our fragment 11 (cyan, PDB 5FPT) with compound 12 reported by Boehringer Ingelheim (yellow, PDB 4OJQ).

The second example (Fig. 1B) is the heat shock protein HSPA2 (also known as HSP70) for which we observed fragments binding at five distinct sites. One of these sites is the known nucleotide binding cleft (dark gray ligand), but we are unaware of any previous reports of the other four sites. Rodina et al. (33) have previously reported allosteric inhibitors of HSPA2, though none of the sites we have observed overlap with the site postulated in their report. The third example (Fig. 1C) depicts NAD-dependent DNA ligase, an antibacterial target (34). In addition to the known adenosine and nicotinamide binding sites (yellow and green ligands, respectively), a fragment was observed in another previously unknown site. Finally, Fig. 1D shows four fragments bound to the HCV NS3 protein. The allosteric tunnel site discussed in the Introduction is shown (gray ligand) as well as fragments bound to the ATP site (green) and two distinct regions of the nucleic acid binding groove (cyan and yellow). Compound 12, originally reported by workers at Schering Plough (35), also binds within the nucleic acid binding groove, as confirmed by a recent X-ray structure published by researchers at Boehringer Ingelheim (14). Our compound 11 is very similar to compound 12 and adopts the same binding mode (PDB 4OJQ; Inset).

The data presented in Table 1 represent all of the sites we have observed throughout our work on these targets that we believe are genuine features of the protein rather than a product of the crystal packing. Based on our previous experience with HCV, SolAC, and PKM2, where we were able to unequivocally demonstrate that the secondary sites were biologically relevant, we would anticipate that many, if not most, of these newly discovered sites would be similarly relevant. However, experimentally validating all of these secondary sites is unpractical so we resorted to analyzing each in terms of three properties: (i) the level of evolutionary conservation of the residues within the site, (ii) the conformational flexibility of the residues lining the site, and (iii) the area and polarity of the molecular surfaces buried upon binding.

Evolutionary Conservation of Binding Sites.

The first analysis we performed was an assessment of the sequence conservation of pocket residues. For each protein target, we identified orthologous proteins from other vertebrate species. Our expectation is that orthologs (e.g., human and mouse CDK2) will share similar regulatory mechanisms. We do not expect such regulatory mechanisms to be as strongly conserved among paralogs (e.g., human CDK2 and human P38) (36, 37). For a protein site with no biological function, we would expect the sequence conservation of pocket residues between the human and an orthologous protein to be similar to the overall sequence conservation between the two proteins. Conversely, a relatively high sequence conservation within a site suggests there is some evolutionary pressure to conserve that pocket. We think that such a situation is suggestive of biological function. Therefore, we generated a multiple-sequence alignment (MSA) for each of the targets presented here against orthologous proteins from closely related species (overall sequence identity greater than 80%). Orthologs for each of our targets were identified by BLASTP searches (E value < 0.01) against SwissProt/TrEMBL (38) protein sequences from the mammalian, vertebrate, and rodent databases. (For DNA ligase and HCV NS3, we used the bacterial and viral databases, respectively.) Only the top BLASTP hit from each species was considered and an MSA of the query and hit sequences was constructed using MAFFT (39). Next, we compared the global sequence identity of each ortholog to the sequence identity within each of the identified binding sites. [To define the site, we selected the 20 protein residues closest to a representative ligand. Where possible, we used the native ligand (e.g., nucleotide); in other cases, we took the largest synthetic ligand with Mr < 500 for which we had an X-ray structure.]

Fig. 2 shows the local, site-based sequence identity for the orthologs, plotted against the global sequence identity for that ortholog; sequence identities are relative to the reference sequence (i.e., the sequence used in the X-ray crystal structure). Fig. 2A shows data for primary sites. For comparison, Fig. S1 shows the expected distribution if site conservation equaled global residue conservation. It is clear that in the vast majority (90%) of cases, the site-based sequence identity is greater than the global sequence identity. This indicates that, as expected, the primary sites of the targets in our test set are highly conserved. Fig. 2B shows a similar analysis but now for all of the secondary sites identified in the protein targets, including the three examples discussed in the Introduction. Interestingly, the trend is very similar to that of the primary sites: for the majority (74%) of cases, the sequence identity within the secondary site is higher than the global sequence identity, that is, in general, the secondary sites that support fragment binding tend to be more conserved across orthologous proteins. Even when we exclude those secondary sites with a known biological function (Fig. S2), the sequence conservation within sites is still higher than the global sequence identity in 71% of cases (P < 0.0001). [The probability of observing these results by chance can be calculated by bootstrapping (Supporting Information); in all cases, the results are significant at P < 0.0001.] This suggests that many of the secondary sites may indeed have a biological function. On the whole, relative to the primary sites, the sequence conservation for the secondary sites is slightly lower. This could be due to a number of reasons. It is likely that some of the sites simply have no biological function—the cluster of HSPA2 points at the bottom of Fig. 2B certainly suggests that there is at least one very poorly conserved site in our dataset (also see Fig. S3). Another explanation is that the regulatory sites for these proteins are less conserved between the species included here as they may have diverged to bind other endogenous ligands or perform other functions.

Fig. 2.

Fig. 2.

Ortholog site vs. global sequence identity for (A) primary sites (active sites, known cofactors, etc.) and (B) secondary sites. Each point represents a single ortholog sequence, jitter applied for clarity. A reference plot (the distribution expected if there was no particular conservation of the site residues) is shown in Supporting Information (Fig. S1).

Fig. S1.

Fig. S1.

Null hypothesis plot for ortholog site vs. global sequence identity. Both primary and secondary sites are combined here. Each point represents a single ortholog sequence, jitter applied for clarity.

Fig. S2.

Fig. S2.

Site sequence identity plotted against global sequence identity for all secondary sites with unknown function.

Fig. S3.

Fig. S3.

Site sequence identity plotted against global sequence identity for the five fragment binding sites in HSPA2. Site 1: ATP site, compound 3 (dark gray fragment, PDB 5FPE); 2: cyan fragment, compound 3, PDB 5FPE; 3: compound 4, green fragment, PDB 5FPM; 4: compound 5, light gray, PDB 5FPD; and 5: compound 6, orange, PDB 5FPN.

Mobility of Binding Sites.

Although some level of conformational movement of residues is often required on ligand binding, we may expect more generally, however, ligand binding sites to be relatively rigid compared with surface residues, as this reduces the entropic cost of the formation of the complex. We decided to investigate mobility of the residues in both the primary and secondary sites using a computational approach. The method we used is based on a knowledge-based force field we developed, and a Voronoi tessellation algorithm described by McConkey et al. (40) (Supporting Information). The resulting atomic mobility index (AMI) ranges from 0.0 Å for completely buried atoms to 2.8 Å for completely solvent exposed atoms with the potential to be highly mobile. The algorithm was run on all structures in our test set, after removing all ligands and water molecules, and the results are shown in Fig. 3A. As expected, we were able to show a significant difference between the AMI values for general surface atoms and those in primary ligand-binding sites. More importantly, the secondary sites we discovered in our fragment screening campaigns also exhibited lower AMI values than general protein surface atoms. For all 27 primary sites, the mean mobility index of the site is lower than that of general surface atoms (P < 0.0001). For secondary sites, the mean mobility index values are lower than those of general surface atoms for 23 out of 26 sites (P < 0.0001). This, albeit a simplistic model of local residue flexibility, suggests that, similar to primary ligand-binding sites, most of the secondary sites we discovered are locally more structured than general protein surface atoms.

Fig. 3.

Fig. 3.

Flexibility of primary (green)- and secondary (red)-site atoms, compared with general protein surface atoms (blue). Results are shown for (A) the mean computed local atomic mobility index (AMI) for the structures in our test set, and (B) the mean normalized B factors as observed in X-ray apo structures. Sites for which we developed a potent lead compound are shown in filled circles. The empty circles are sites we have not attempted to, or were so far unable to develop potent lead compounds against.

We also analyzed the atomic displacement parameters—or “B factors”—of residues in the primary and secondary sites to explore their mobility. A similar analysis was performed by Yuan et al. (41), where they reported a tendency for active-site residues in apo structures to have lower normalized B factors than non–active-site residues. We calculated the chain-normalized residual B factors [after subtracting the translation, libration, screw-rotation (TLS) component; Supporting Information] for all surface atoms and the mean B factors for each site, and these are shown plotted in Fig. 3B. Contrary to what was observed by Yuan et al., we observed no significant difference between the B factors of residues involved in ligand binding and other surface residues. It could be that this discrepancy is simply a reflection of the types of targets that were included in the two studies. However, another explanation could be that, as far as we are aware, Yuan et al. did not correct for the TLS component of the B factors. We would suggest that this correction should be included because the TLS component represents the rigid-body atomic displacements that may not be correlated with flexibility of individual residues (also see Fig. S4). The fact that the B factors of primary- and secondary-site atoms are not lower than those of general protein surface atoms indicates either that unoccupied ligand-binding sites genuinely are not considerably more structured than other parts of the protein surface, or that experimental B factors from X-ray crystal structures are poor estimates of the flexibility of protein atoms in solution. Long-timescale molecular-dynamics simulations or NMR structures might provide a better assessment of whether ligand-binding sites on proteins are more structured than other surface areas, but this is beyond the scope of this work.

Fig. S4.

Fig. S4.

Normalized B factors of all surface atoms for each target (blue) and the subset of surface atoms in primary and secondary sites (green and red, respectively).

Physical Properties of the Sites.

We also explored the potential of the secondary sites to support the binding of drug-like molecules. Our approach was to compare the secondary sites to the primary sites in terms of level of buried protein on fragment binding and also physical properties such as polarity of the protein surface. Fig. S5 shows the fraction of ligand area that is buried on binding to the protein, Fb, for primary and secondary sites. On average, ligands binding to secondary sites are slightly less buried than ligands binding primary sites: Fb=0.85±0.09 for primary sites and Fb=0.74±0.14 for secondary sites. However, when we consider the Fb values for sites we have developed lead series against (filled circles, Fig. S5A), many of the secondary sites fall within this range. Furthermore, the fraction of buried protein surface that is polar in nature, Fp, is shown in Fig. S5B, and again, primary and secondary sites are highly similar. For primary sites, Fp=0.30±0.10, whereas for secondary sites, Fp=0.31±0.12. By comparison, when we consider the entire protein surface area, the average polarity is significantly higher, Fp=0.45±0.03, which indicates that both primary and secondary sites are lipophilic in nature.

Fig. S5.

Fig. S5.

Fraction of the ligand area that is buried on binding to the protein (Left) and fraction of the protein area contacting the ligand that is polar (Right). Results are shown for the primary (green) and secondary (red) sites that were part of our analysis. Filled circles represent sites against which we have been able to develop potent lead compounds. The horizontal line on the Right represents the average fraction of the total protein surface area that is polar.

Finally, we analyzed the physical properties of fragments that bind primary sites compared with those that bind secondary sites. We generated distributions for four properties: heavy-atom count, ClogP, and number of H-bond donors and acceptors (Fig. S6). For each property, the distribution for primary-site binders was highly similar to the distribution for secondary sites. Taken together, these results indicate that secondary sites exhibit features that are in general similar to primary sites and may indeed be druggable.

Fig. S6.

Fig. S6.

Histograms for primary sites (blue) and secondary sites (red) depicting ligand properties ClogP, heavy-atom count (HAC), number of H-bond donors (NDON), and number of H-bond acceptors (NACC).

Discussion

Proteins are complex molecules and their appropriate regulation in biological processes is essential to maintain normal cellular functions. The mechanisms by which proteins are controlled vary widely, and in this study we have provided evidence that most proteins may have more than one binding site for endogenous ligands and/or other proteins that regulate their function. With hindsight, this discovery may not be too surprising given that the typical size of proteins should accommodate more than one binding site and that evolution often drives biological systems to maximize efficiency. However, the existence of multiple binding sites on proteins has not, to date, been broadly recognized as a general feature for the majority of proteins. The targets that we investigated have a wide range of activities, but nearly all are enzymes that may require more extensive levels of regulation. In most cases, the primary site of the enzyme has been previously well characterized and strategies to inhibit its function would typically involve developing compounds that bind at the primary site, for example, ATP competitive inhibitors of protein kinases. In contrast, the secondary sites that we discovered may allow development of compounds that inhibit (or activate) the enzyme via an alternative mechanism, be that allosteric regulation or inhibition of other binding events. It is of note that the majority of the proteins in this study exhibited a secondary site that may suggest that opportunities to modulate protein function without directly inhibiting the primary site may be more widespread than currently perceived.

Although we have clearly identified multiple secondary sites on a variety of proteins at which fragment binding has been observed, translating these to useful drugs will require significant effort and is dependent on many factors. For example, whether the site allows development of drug molecules that are orthosteric in nature and compete with a natural ligand or alternatively are more allosteric in nature, stabilizing a particular conformation, will be dependent on the specifics of the particular protein target. In addition, the strategies required in either case will be different (10).

Given that generating experimental confirmation of a biological role for each of these newly discovered secondary sites is impractical, we instead used computational analysis to assess their nature. First, our sequence analysis of orthologs supports their potential biological relevance as it is unlikely that the sequences would be conserved by evolution if they were not performing an important function. We are not, however, suggesting that all of the secondary sites will have a biological role. Indeed, the sequence conservation for some of the sites is not strong, and these sites we would suggest are less likely to be functionally relevant. For example, for HSPA2, there is a cluster of orthologs indicating lower sequence conservation of a particular secondary site (shown in Fig. 1B and Fig. S3), which may indicate this site is not functionally relevant or has evolved to bind a different endogenous ligand. However, in the majority of cases, the secondary sites exhibit good sequence conservation, which is consistent with their performing an important biological role. Furthermore, our analysis of local mobility and physical properties of these secondary sites suggests that many are in general similar in nature to primary sites that are known to be druggable. Specifically, both primary and secondary sites exhibit comparable mobility suggesting that they may be preorganized to a similar degree, and also the fraction of ligand surface buried and the polarity of the protein surface are not dissimilar between the two types of sites.

In this study, we have also demonstrated that fragment-based screening using X-ray crystallography can be effective in identifying alternative, potentially biologically relevant, secondary sites on protein targets. This underlines the potential of fragment-based screening to efficiently sample chemical space and reveal previously unknown interactions given its unbiased nature. As shown in the examples, such as CDK2, it is not unusual for a particular fragment to bind in more than one pocket of a single protein, or indeed in more than one protein. We would emphasize that this is not nonspecific binding that is occurring, but because the fragment has low molecular complexity it is able to bind to complementary protein motifs in several locations. This is consistent with the fact that the electron densities of the fragments are typically very well defined and that, although they exhibit very low binding affinity, with Kd often in the millimolar range, the interactions they make are highly efficient (27). Indeed, these crystal structures of protein–fragment complexes are a good basis to develop novel chemical tools that would probe the function of these secondary binding sites and experimentally confirm their role, if any, in biological processes.

Calculation of Atomic Mobility Indices

For each structure, all ligand atoms and water molecules were removed, and symmetry atoms within 8 Å were added. Next, a Voronoi algorithm was applied to all atoms in the system. This places each atom at the center of a polyhedron, where the faces correspond to either covalent bonds or nonbonded contacts to the atoms surrounding the central atom. The particular tessellation algorithm we used here [described by McConkey et al. (40)] places the faces of the polyhedral at a distance that is defined as follows:

dp(i)=d2(i,j)+[r(i)+rw]2[r(j)+rw]22d(i,j),

where dp(i) is the distance from atom i to the contact face, d(i,j) is the distance between atoms i and j, r(i) and r(j) are the van der Waals radii of atoms i and j, respectively, and rw is the van der Waals radius of a water molecule (1.4 Å). No contact faces are generated for which d(i,j)>r(i)+r(j)+2rw; these polygon faces are capped with a sphere cap; these are the solvent-exposed parts of the atom. For the contact area between two atoms, instead of calculating the area of the polygon contact face, the algorithm calculates the areas projected through the contact polygon onto a sphere of radius r(i)+rw around the atom.

For each atom, its covalent, nonbonded, and solvent-exposed areas are then used to estimate to which extent the local environment of the atom allows it to move around. To this effect, we defined the following parameter, which we will refer to as the atomic mobility index (AMI):

AMI(i)=1at(i)(ae(i)2rw+ja(i,j)[d(i,j)dclash(i,j)]+ka(i,k)AMI(k)),

where ae(i) is the solvent-exposed area of atom i. at(i) is the total area (covalent, nonbonded, and exposed) of atom i. The first summation is over all nonbonded contacting atoms j, whereas the second summation is over all covalently bonded atoms k. a(i,j) and a(i,k) are the contact areas for the nonbonded contacts, and for the covalent bonds, respectively. dclash(i,j) is the clash distance for nonbonded contacts between atoms i and j. The summation over the covalent bonds involves a recursive calculation of the AMIs of the covalently bonded atoms. This recursive summation is done up to 10 bonds out from the central atom—at that stage, the AMI of the central atom has converged. The AMIs can vary from 0.0 Å for atoms for which all contacts are at their clash distances, to 2.8 Å for completely solvent-exposed atoms.

Statistical Significance of Site Sequence Identity Results

The global conservation of each ortholog against its reference target was used to randomly assign each of its site residues as either “conserved” or “not conserved.” For example, if the global sequence identity for an ortholog is 90%, then each site residue has a 90% chance of being assigned conserved and a 10% chance of being assigned not conserved. This process was repeated N times, and for each of these repeats the fraction Fr of cases for which the site conservation is higher than the global conservation is recorded. Fr was then compared with the observed fraction Fo of cases for which the site conservation is higher than the global conservation. The P value was then calculated simply as follows: P = n/N, where n is the number of repeats for which FrFo; at n = 10,000, we had still not reached this threshold.

Null Hypothesis for Local Conservation

In addition to the significance calculations described above, this section puts the results we present in Fig. 2 of the main article into perspective by generating the null hypothesis plot shown in Fig. S1. In this plot, we have taken the global identity for each ortholog and used this to randomly assign each residue in each site as either conserved or not conserved. For each site, this results in a site conservation that simply reflects the underlying global sequence identity. It is clear from Fig. S1 that the distribution seen here is very different from what we presented in Fig. 2 of the main article, which further supports the fact that many of the alternative sites we identified by fragment screening are likely to have a biological function.

Site Conservation of Secondary Sites Without a Known Biological Function

As described in the main text, the secondary sites were further subdivided into those with and without known biological function. The sequence conservation for the sites with no known function is shown in Fig. S2.

Site Conservation for HSPA2

Fig. S3 shows the percentage site sequence identity vs. global sequence identity for all orthologs (>80% global identity) identified for protein HSPA2. The color of the points matches the color of the fragments in Fig. 1B. Site 4 (light gray), which binds compound 5, is very poorly conserved, whereas residues in the other four sites are more highly conserved among orthologs than those elsewhere on the protein.

B-Factor Processing and Normalization

It is important that—like Yuan and colleagues—we use apo structures for this analysis, because ligand binding generally reduces the B factors of the protein atoms they interact with (42). Hence, for each site in our dataset, we attempted to identify an apo structure in which the site was well resolved, unoccupied, and unoccluded by crystal contacts (i.e., to other chains or symmetry units). We were able to find suitable apo structures for 21 out of 27 primary and/or cofactor sites as well as 20 out of 24 secondary sites. For each site, we selected the holo structure of the smallest ligand observed at that location. Residues forming a contact with the ligand were noted and the corresponding site residues in the apo structure recorded. The comparisons presented below are between the surface atoms belonging to the site residues and the reference set of all surface atoms. We define our surface atoms to include any atom with some solvent-exposed area but exclude atoms contacting other protein chains or within 8 Å of a symmetry-related protein molecule.

All of the apo structures were rerefined using a translation, libration, screw-rotation (TLS) model. This allowed us to decouple large motions of protein (sub)domains from local motion of individual atoms. For each atom, a residual B factor was calculated by subtracting the TLS component using the program TLSANL (43). The residual B factors were normalized by calculating their standard score with respect to the rest of the chain, i.e., for each atom, Bnorm is determined by subtracting the mean chain B factor, <B>chain, from the atom B factor, B, and dividing by the SD for the chain σB,chain:

Bnorm=BBchainσB,chain.

Without applying the TLS correction, we did see a small difference in B factors, as shown below in Fig. S4; however, this signal appears to be merely a function of the position of the site residues: the TLS contribution to the B factor will be higher for residues further away from the center of displacement of each rigidly modeled TLS domain—in practice, the surface residues tend to be further away from the center of displacement of each domain than the sites so have correspondingly higher B factors.

Surface Properties of Ligands and Binding Sites

We calculated the fraction of total ligand surface area contacting the protein for all ligands, as well as the fraction of the protein contact area that is polar in nature. These properties are shown in Fig. S5.

Physical Properties of Ligands Bound at Primary and Secondary Sites

For each bound ligand, we calculated ClogP, heavy-atom count, and the number of H-bond donors and acceptors. Histograms of these are shown in Fig. S6. Data are normalized such that each site contributes equally to the histogram (e.g., ligands from a site with 2 hits will have 10 times the weight of ligands from a site with 20 hits).

Acknowledgments

We thank all of our crystallographers, as well as all of the other scientists at Astex, past and present for the tremendous amount of work that has gone into generating the >5,000 X-ray crystal structures of protein–fragment complexes, which has enabled us to perform our analyses. In particular, we thank Marc O’Reilly, Joe Patel, Dominic Tisi, Tom Davies, and Puja Pathuri for enabling and providing the crystal structures for the examples shown in Fig. 2. We also thank Chris Murray and Glyn Williams for helpful comments on the manuscript.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The atomic coordinates and structure factors have been deposited in the Protein Data Bank, www.pdb.org (PDB ID codes 5FP5, 5FP6, 5FPD, 5FPE, 5FPM, 5FPN, 5FPO, 5FPR, 5FPS, 5FPT, and 5FPY).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1518946112/-/DCSupplemental.

References

  • 1.Hardy JA, Wells JA. Searching for new allosteric sites in enzymes. Curr Opin Struct Biol. 2004;14(6):706–715. doi: 10.1016/j.sbi.2004.10.009. [DOI] [PubMed] [Google Scholar]
  • 2.Conn PJ, Christopoulos A, Lindsley CW. Allosteric modulators of GPCRs: A novel approach for the treatment of CNS disorders. Nat Rev Drug Discov. 2009;8(1):41–54. doi: 10.1038/nrd2760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nussinov R, Tsai CJ. Allostery in disease and in drug discovery. Cell. 2013;153(2):293–305. doi: 10.1016/j.cell.2013.03.034. [DOI] [PubMed] [Google Scholar]
  • 4.Hauske P, Ottmann C, Meltzer M, Ehrmann M, Kaiser M. Allosteric regulation of proteases. ChemBioChem. 2008;9(18):2920–2928. doi: 10.1002/cbic.200800528. [DOI] [PubMed] [Google Scholar]
  • 5.Langmead CJ, Christopoulos A. Functional and structural perspectives on allosteric modulation of GPCRs. Curr Opin Cell Biol. 2014;27:94–101. doi: 10.1016/j.ceb.2013.11.007. [DOI] [PubMed] [Google Scholar]
  • 6.Lewis JA, Lebois EP, Lindsley CW. Allosteric modulation of kinases and GPCRs: Design principles and structural diversity. Curr Opin Chem Biol. 2008;12(3):269–280. doi: 10.1016/j.cbpa.2008.02.014. [DOI] [PubMed] [Google Scholar]
  • 7.Ohren JF, et al. Structures of human MAP kinase kinase 1 (MEK1) and MEK2 describe novel noncompetitive kinase inhibition. Nat Struct Mol Biol. 2004;11(12):1192–1197. doi: 10.1038/nsmb859. [DOI] [PubMed] [Google Scholar]
  • 8.Di Santo R. Inhibiting the HIV integration process: Past, present, and the future. J Med Chem. 2014;57(3):539–566. doi: 10.1021/jm400674a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Adrián FJ, et al. Allosteric inhibitors of Bcr-abl-dependent cell proliferation. Nat Chem Biol. 2006;2(2):95–102. doi: 10.1038/nchembio760. [DOI] [PubMed] [Google Scholar]
  • 10.Nussinov R, Tsai CJ. The different ways through which specificity works in orthosteric and allosteric drugs. Curr Pharm Des. 2012;18(9):1311–1316. doi: 10.2174/138161212799436377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Grimsby J, Berthel SJ, Sarabu R. Glucokinase activators for the potential treatment of type 2 diabetes. Curr Top Med Chem. 2008;8(17):1524–1532. doi: 10.2174/156802608786413483. [DOI] [PubMed] [Google Scholar]
  • 12.Kozakov D, et al. The FTMap family of web servers for determining and characterizing ligand-binding hot spots of proteins. Nat Protoc. 2015;10(5):733–755. doi: 10.1038/nprot.2015.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009;5(12):e1000585. doi: 10.1371/journal.pcbi.1000585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.LaPlante SR, et al. Integrated strategies for identifying leads that target the NS3 helicase of the hepatitis C virus. J Med Chem. 2014;57(5):2074–2090. doi: 10.1021/jm401432c. [DOI] [PubMed] [Google Scholar]
  • 15.Bowman GR, Geissler PL. Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites. Proc Natl Acad Sci USA. 2012;109(29):11681–11686. doi: 10.1073/pnas.1209309109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Murray CW, Rees DC. The rise of fragment-based drug discovery. Nat Chem. 2009;1(3):187–192. doi: 10.1038/nchem.217. [DOI] [PubMed] [Google Scholar]
  • 17.Murray CW, Verdonk ML, Rees DC. Experiences in fragment-based drug discovery. Trends Pharmacol Sci. 2012;33(5):224–232. doi: 10.1016/j.tips.2012.02.006. [DOI] [PubMed] [Google Scholar]
  • 18.Saalau-Bethell SM, et al. Discovery of an allosteric mechanism for the regulation of HCV NS3 protein function. Nat Chem Biol. 2012;8(11):920–925. doi: 10.1038/nchembio.1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chaneton B, et al. Serine is a natural ligand and allosteric activator of pyruvate kinase M2. Nature. 2012;491(7424):458–462. doi: 10.1038/nature11540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Saalau-Bethell SM, et al. Crystal structure of human soluble adenylate cyclase reveals a distinct, highly flexible allosteric bicarbonate binding pocket. ChemMedChem. 2014;9(4):823–832. doi: 10.1002/cmdc.201300480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kleinboelting S, et al. Crystal structures of human soluble adenylyl cyclase reveal mechanisms of catalysis and of its activation through bicarbonate. Proc Natl Acad Sci USA. 2014;111(10):3727–3732. doi: 10.1073/pnas.1322778111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bauman JD, et al. Detecting allosteric sites of HIV-1 reverse transcriptase by X-ray crystallographic fragment screening. J Med Chem. 2013;56(7):2738–2746. doi: 10.1021/jm301271j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jahnke W, et al. Allosteric non-bisphosphonate FPPS inhibitors identified by fragment-based discovery. Nat Chem Biol. 2010;6(9):660–666. doi: 10.1038/nchembio.421. [DOI] [PubMed] [Google Scholar]
  • 24.Jahnke W, et al. Binding or bending: Distinction of allosteric Abl kinase agonists from antagonists by an NMR-based conformational assay. J Am Chem Soc. 2010;132(20):7043–7048. doi: 10.1021/ja101837n. [DOI] [PubMed] [Google Scholar]
  • 25.Zhang J, et al. Targeting Bcr-Abl by combining allosteric with ATP-binding-site inhibitors. Nature. 2010;463(7280):501–506. doi: 10.1038/nature08675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ostrem JM, Peters U, Sos ML, Wells JA, Shokat KM. K-Ras(G12C) inhibitors allosterically control GTP affinity and effector interactions. Nature. 2013;503(7477):548–551. doi: 10.1038/nature12796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hartshorn MJ, et al. Fragment-based lead discovery using X-ray crystallography. J Med Chem. 2005;48(2):403–413. doi: 10.1021/jm0495778. [DOI] [PubMed] [Google Scholar]
  • 28.Mooij WT, et al. Automated protein-ligand crystallography for structure-based drug design. ChemMedChem. 2006;1(8):827–838. doi: 10.1002/cmdc.200600074. [DOI] [PubMed] [Google Scholar]
  • 29.Hendlich M, Rippmann F, Barnickel G. LIGSITE: Automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model. 1997;15(6):359–363, 389. doi: 10.1016/s1093-3263(98)00002-3. [DOI] [PubMed] [Google Scholar]
  • 30.Rothweiler U, et al. Luciferin and derivatives as a DYRK selective scaffold for the design of protein kinase inhibitors. Eur J Med Chem. 2015;94:140–148. doi: 10.1016/j.ejmech.2015.02.035. [DOI] [PubMed] [Google Scholar]
  • 31.Jeffrey PD, et al. Mechanism of CDK activation revealed by the structure of a cyclinA-CDK2 complex. Nature. 1995;376(6538):313–320. doi: 10.1038/376313a0. [DOI] [PubMed] [Google Scholar]
  • 32.Betzi S, et al. Discovery of a potential allosteric ligand binding site in CDK2. ACS Chem Biol. 2011;6(5):492–501. doi: 10.1021/cb100410m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Rodina A, et al. Identification of an allosteric pocket on human hsp70 reveals a mode of inhibition of this therapeutically important protein. Chem Biol. 2013;20(12):1469–1480. doi: 10.1016/j.chembiol.2013.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Howard S, et al. Fragment-based discovery of 6-azaindazoles as inhibitors of bacterial DNA ligase. ACS Med Chem Lett. 2013;4(12):1208–1212. doi: 10.1021/ml4003277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Gesell J, McCoy M, Senior M, Wang Y-S, Wyss D. In: Modern Magnetic Resonance. Webb GA, editor. Springer; Dordrecht, The Netherlands: 2006. pp. 1419–1428. [Google Scholar]
  • 36.Tharmalingam S, Burns AR, Roy PJ, Hampson DR. Orthosteric and allosteric drug binding sites in the Caenorhabditis elegans mgl-2 metabotropic glutamate receptor. Neuropharmacology. 2012;63(4):667–674. doi: 10.1016/j.neuropharm.2012.05.029. [DOI] [PubMed] [Google Scholar]
  • 37.Hudson JW, Golding GB, Crerar MM. Evolution of allosteric control in glycogen phosphorylase. J Mol Biol. 1993;234(3):700–721. doi: 10.1006/jmbi.1993.1621. [DOI] [PubMed] [Google Scholar]
  • 38.Bairoch A, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33(Database issue):D154–D159. doi: 10.1093/nar/gki070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.McConkey BJ, Sobolev V, Edelman M. Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics. 2002;18(10):1365–1373. doi: 10.1093/bioinformatics/18.10.1365. [DOI] [PubMed] [Google Scholar]
  • 41.Yuan Z, Zhao J, Wang ZX. Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng. 2003;16(2):109–114. doi: 10.1093/proeng/gzg014. [DOI] [PubMed] [Google Scholar]
  • 42.Yang CY, Wang R, Wang S. A systematic analysis of the effect of small-molecule binding on protein flexibility of the ligand-binding sites. J Med Chem. 2005;48(18):5648–5650. doi: 10.1021/jm050276n. [DOI] [PubMed] [Google Scholar]
  • 43.Howlin B, Butler SA, Moss DS, Harris GW, Driessen HPC. TLSANL: TLS parameter-analysis program for segmented anisotropic refinement of macromolecular structures. J Appl Cryst. 1993;26:622–624. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES