Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2006 Oct 11;103(43):15835–15840. doi: 10.1073/pnas.0605224103

Chemical screening methods to identify ligands that promote protein stability, protein crystallization, and structure determination

Masoud Vedadi *, Frank H Niesen , Abdellah Allali-Hassani *, Oleg Y Fedorov , Patrick J Finerty Jr *, Gregory A Wasney *, Ron Yeung *, Cheryl Arrowsmith *, Linda J Ball , Helena Berglund , Raymond Hui *, Brian D Marsden , Pär Nordlund , Michael Sundstrom , Johan Weigelt , Aled M Edwards *,§
PMCID: PMC1595307  PMID: 17035505

Abstract

The 3D structures of human therapeutic targets are enabling for drug discovery. However, their purification and crystallization remain rate determining. In individual cases, ligands have been used to increase the success rate of protein purification and crystallization, but the broad applicability of this approach is unknown. We implemented two screening platforms, based on either fluorimetry or static light scattering, to measure the increase in protein thermal stability upon binding of a ligand without the need to monitor enzyme activity. In total, 221 different proteins from humans and human parasites were screened against one or both of two sorts of small-molecule libraries. The first library comprised different salts, pH conditions, and commonly found small molecules and was applicable to all proteins. The second comprised compounds specific for protein families of particular interest (e.g., protein kinases). In 20 cases, including nine unique human protein kinases, a small molecule was identified that stabilized the proteins and promoted structure determination. The methods are cost-effective, can be implemented in any laboratory, promise to increase the success rates of purifying and crystallizing human proteins significantly, and identify new ligands for these proteins.

Keywords: chemical biology, crystallography, human


Structural, functional, and chemical genomics (proteomics) are disciplines that aim to determine the biochemical, cellular, and physiological functions of proteins on a genome scale. Many of the central, important experimental approaches that are involved, such as protein-based screens for small-molecule inhibitors, depend on the availability of purified and active proteins. To meet this demand, many large projects are devoted to developing methods to generate large numbers of purified proteins. However, the task is proving challenging: on average, for proteins from prokaryotes, only 50–70% of soluble proteins and 30% of membrane proteins can be readily expressed in recombinant form, and only 30–50% of these expressed proteins can be purified to homogeneity (1, 2). The success rates for human proteins are predicted to be significantly lower.

To improve the general rates of protein purification, efforts have focused largely on alterations of the recombinant host, the expression conditions, changes of the construct encoding the protein, and the purification conditions. It is also known that the expression and purification of a protein can be improved significantly by the addition of a specific ligand, which serves to stabilize the protein, thereby reducing its propensity to unfold, aggregate, or succumb to proteolysis. This parameter has not been studied systematically, although in individual cases the addition of a specific ligand has had dramatic effects. For example, the recombinant expression of the guinea pig and human forms of the enzyme 11β-hydroxysteroid dehydrogenase-1 in bacteria was increased dramatically by the addition of an inhibitor of the enzyme to the growing cells (3) [X. Wu, K. L. Kavanagh, and U. Oppermann, personal communication; Protein Data Bank (PDB) ID code 2BEL]. Altering the composition of the purification buffer can also significantly influence protein stability. A classic example is the case of DnaB, whose enzymatic activity had a half-life of only a few minutes at 4°C in the consensus purification buffer at the time, but could be stabilized for hours at 60°C after a systematic screen for optimal solution conditions (4). The use of the optimized solution conditions allowed for purification of DnaB in large amounts and its crystallization.

The systematic identification and use of ligands or solution conditions that maximally stabilize a protein might significantly improve the success rates of genome-scale protein purification, crystallization, and functional characterization. Perhaps the simplest way to accomplish this task would be to extend the example of DnaB in which the sensitivity of the protein to heat denaturation would be monitored as a function of the solution conditions and temperature (5). Ligands that interact preferentially (specifically or nonspecifically) with the native state of a protein would increase the thermal stability, provided that the ligand concentration exceeds its KD value (6).

We applied both fluorescence- and light-scattering-based approaches to measure the thermal stability of 221 recombinant proteins in the presence and absence of a range of chemicals. Purified proteins were subjected to gradually increasing temperature in both methods, and the temperature shift between the melting temperature (Tm) in the presence and absence of a bound ligand was measured. The extent of the temperature shift is believed to be proportional to the affinity of the ligand for a given protein, i.e., for a given binding pocket with regard to the enthalpy of unfolding, ΔUH (6, 7). In the two screening methods implemented here, the denaturation process was monitored differently. The first measures fluorescence from a dye whose emission properties changed upon interaction with unfolded protein. This use of environmentally sensitive dyes to monitor thermal unfolding was reported in 1997 (8) and adapted to microplate format to enable high throughput in 2001 by Salemme and coworkers (9). The plot of fluorescence intensity versus temperature has a hyperbolic shape for a two-state unfolding mechanism, which can be described by using the same equations used to describe thermal denaturation monitored by differential scanning calorimetry. The second measured the denaturation and subsequent aggregation of unfolded proteins by using static light scattering. The use of light scattering to monitor protein stability was first described by Kurganov in 2002 (10). As implemented, both methods require relatively small amounts of protein, can be performed in hours, can be used to study hundreds of conditions in parallel and can be readily adapted to be performed on commercially available instruments.

The fluorescence and light-scattering approaches were applied to recombinant proteins from humans and parasites in two experimental formats. In the first, the proteins were screened against a set of “generic” solution conditions designed to identify stabilizing conditions comprising salts, pH, and simple additives, such as nucleotides (Table 2, which is published as supporting information on the PNAS web site). In the second, which was targeted to proteins for which the activity was known, proteins were screened against a library of small molecules selected to be likely candidates for binding (e.g., protein kinases were screened against a library of known inhibitors from the patent literature). Our aims were to characterize the methods by analyzing a statistically meaningful number of proteins, determine the frequency with which more optimal solution conditions and small-molecule inhibitors could be identified by each method, and determine the frequency with which these improved conditions were able to promote protein purification and/or crystallization.

Results

Two different screening methods, which we termed differential scanning fluorimetry (DSF) and differential static light scattering (DSLS), were used to identify solution conditions or ligands that stabilized a protein against thermal denaturation. The DSF screening format has been described, using six proteins as test cases (9). The DSLS format has been described in the patent literature (11); we report on its application (Fig. 3, which is published as supporting information on the PNAS web site). Our first aim was to compare and contrast the two screening methods, as applied to a significant number of different human and parasitic proteins and performed on commercially available systems. Our second aim was to assess whether and how often the preferred solutions or ligands would facilitate protein purification or crystallization.

General Properties of the Two Methods.

Initially, the properties of the two methodologies used were determined based on the analysis of dozens of different proteins on three different, commercially available platforms. For simplicity, we report the results from two representative proteins, citrate synthase and the cytosolic sulfotransferase 1C1 (SULT1C1), to highlight the behavior and dependence of the Tm or temperature of aggregation (Tagg) on the instrumentation and the addition of ligands.

Variability of the instrumentation.

Two different commercial multiwell-format PCR instruments and one fluorescence plate reader were used to determine the observed Tm. Another multiwell-format commercial instrument (Stargazer, Harbinger Biotech, Toronto, Canada) was used to measure the observed Tagg. Pig heart citrate synthase (Roche, Indianapolis, IN), a commercially available protein with well defined properties, was used as a standard to determine the reproducibility of the observed Tm for each instrument (Table 3, which is published as supporting information on the PNAS web site). The reproducibility of the Tm measured with both PCR devices (Mx3005p and iCycler) and the fluorescence plate reader (FluoDia T70) were similar (0.2°C and 0.5°C, respectively). A standard deviation of 0.4°C was determined in the Tagg for the light-scattering method over hundreds of measurements. With the exception of both PCR devices, the observed Tm and Tagg varied slightly between the sides and middle of the plates, likely caused by the uneven heat distribution (Fig. 4, which is published as supporting information on the PNAS web site). Although the variation could be modeled for each individual instrument, we elected not to do so because the variability (<0.3°C) was much smaller than the temperature shift (>2°C) that we observed for the binding of known, specific ligands of micromolar affinity.

The average values for the Tm and Tagg determined for citrate synthase [Tm of 52.4 ± 0.5°C (fluorescence plate reader), Tm of 53.0 ± 0.2°C (PCR), and Tagg of 53.2 ± 0.4°C] compare well with the Tm for citrate synthase determined by either circular dichroism (53.3 ± 0.1°C) or differential scanning calorimetry (53.8 ± 0.3°C) under the same solution conditions. The observed Tm and Tagg values for a given protein were highly reproducible. However, for any given protein, the absolute values of Tm and Tagg often differed by a few degrees depending on the experimental conditions and the instrumentation (see Table 1); for example, the Tagg was more greatly influenced by the protein concentration. In general, under the experimental conditions used here, and with the described instrumentation, the Tm and Tagg values were within 4°C for ≈50% of proteins tested. There were larger variations, sometimes up to 15°; these occurred with proteins that had unusually high initial fluorescence readings. It is likely that these proteins had exposed hydrophobic patches in their initial states, perhaps as a result of partial unfolding.

Table 1.

Analysis of 61 different proteins by both DSF (Tm) and DSLS (Tagg)

Protein name Annotation Tagg, StarGazer Tm, FluoDia TaggTm Initial fluorescence Maximum fluorescence Species
CP-PFA0260c Hypothetical protein 57.0 ± 0.2 56.9 ± 1.2 0.1 140,403 630,420 Cryptosporidium parvum
MAL7P1.161 Dynein light chain, putative 63.1 ± 0.3 63.0 ± 0.2 0.1 680,030 2,691,643 Plasmodium falciparum
PY01515:H1-I328 Putative orotidine-monophosphate-decarboxylase 58.2 ± 0.3 58.0 ± 0.3 0.2 268,642 1,877,034 Plasmodium yoelii
SIRT3-03 Sirtuin 3 49.3 ± 0.1 49.5 ± 0.2 −0.2 1,535,111 4,373,321 Human
PKN-PFI1195c Hypothetical protein 51.1 ± 0.3 50.8 ± 0.5 0.3 282,567 2,989,412 Plasmodium knowlesi
CP-PF13 0129 Ribosomal protein L6 homologue, putative 45.9 ± 0.8 46.3 ± 0.4 −0.4 2,762,196 5,191,841 Cryptosporidium parvum
PFI1760w:L50-V214 Hypothetical protein 58.5 ± 0.1 59.0 ± 0.1 −0.5 2,154,195 3,999,060 Plasmodium falciparum
CP-PF13 0341 DNA-directed RNA polymerase 2, putative 41.0 ± 0.2 41.9 ± 0.4 −0.9 1,894,164 4,356,378 Cryptosporidium parvum
Sult 1B1-01 Sulfotransferase family, cytosolic, 1B, member 1 48.1 ± 0.1 49.0 ± 0.3 −0.9 462,840 5,979,320 Human
PF14 0477:M1-L297 Signal recognition particle 54 kDa protein, putative 44.6 ± 0.1 43.6 ± 0.2 1 3,213,411 5,033,276 Plasmodium falciparum
CP-PFE1470w Cell cycle regulator protein, putative 38.1 ± 0.1 39.3 ± 0.2 −1.2 1,043,223 5,558,803 Cryptosporidium parvum
CP-PFI0775w Glycolipid transfer protein, putative 51.3 ± 0.3 52.6 ± 0.4 −1.3 2,913,864 7,241,443 Cryptosporidium parvum
TgTwinScan 7042:M1-R163 Ubiquitin-conjugating enzyme, putative 43.3 ± 0.1 44.8 ± 0.2 −1.5 3,377,942 6,850,677 Toxoplasma gondii
Sult 1A3 Sulfotransferase 47.3 ± 0.1 45.7 ± 0.6 1.6 1,201,464 4,149,913 Human
LCMT1-03 Leucine carboxyl methyltransferase; CGI-68 protein 46.1 ± 0.9 44.5 ± 0.1 1.6 2,569,126 4,056,262 Human
PF13 0131 Hypothetical protein 45.1 ± 0.1 46.8 ± 0.1 −1.7 3,195,670 7,649,142 Plasmodium falciparum
PY01469 Dynein light chain-related 58.8 ± 0.5 60.6 ± 0.4 −1.8 3,080,115 4,755,282 Plasmodium yoelii
Sult 1C2-01 Sulfotransferase family, cytosolic, 1C, member 2 45.2 ± 0.1 43.4 ± 0.1 1.8 395,241 1,675,990 Human
Sult14A Sulfotransferase 60.7 ± 0.3 58.5 ± 0.9 2.2 453,323 3,069,279 Human
TgTwinScan 3341:P66-L222 Ubiquitin-conjugating enzyme e2, putative 51.9 ± 0.3 54.2 ± 0.2 −2.3 926,623 5,174,241 Toxoplasma gondii
AD003-02 Methyltransferase, hypothetical 45.9 ± 0.1 43.6 ± 0.1 2.3 1,086,214 4,943,999 Human
CP-PF11_0208 Phosphoglycerate mutase, putative 58.7 ± 0.2 61.2 ± 0.1 −2.5 259,635 3,977,006 Cryptosporidium parvum
ppi60.477.641 Human peptidylprolyl isomerase domain and WD repeat cont 52.7 ± 0.3 55.2 ± 0.1 −2.5 243,298 2,687,218 Human
PBG-MAL13P1.227 Ubiquitin-conjugating enzyme, putative 52.2 ± 0.1 48.8 ± 0.5 3.4 492,176 2,003,484 Plasmodium berghei
CP-PF14 0083 Ribosomal protein S8e, putative 50.2 ± 0.2 46.6 ± 0.2 3.6 1,488,902 2,123,956 Cryptosporidium parvum
PY02252 Deoxyribose-phosphate aldolase 50.8 ± 0.3 46.9 ± 0.2 3.9 3,185,388 6,961,045 Plasmodium yoelii
PFE1595c:Y90-Y226 Hypothetical protein 56.4 ± 0.3 60.4 ± 0.5 −4 724,354 2,692,007 Plasmodium falciparum
PFE1600w:D118-I388 Hypothetical protein 53.0 ± 0.3 48.9 ± 0.7 4.1 933,893 4,689,603 Plasmodium falciparum
PY02076 Adenosine deaminase 46.0 ± 0.2 41.8 ± 0.3 4.2 1,848,540 5,815,292 Plasmodium yoelii
CHAT 08 Choline acetyltransferase 42.9 ± 0.3 38.6 ± 0.2 4.3 921,440 2,943,115 Human
PBG-MAL13P1.204 Exoribonuclease PH, putative 48.8 ± 0.3 44.3 ± 0.3 4.5 2,263,480 4,229,781 Plasmodium berghei
Sult 1C3-01 Sulfotransferase 39.4 ± 0.2 34.8 ± 0.3 4.6 2,273,487 4,760,180 Human
PFL0660w:V10-G83 Dynein light chain 1 61.0 ± 0.4 65.8 ± 0.1 −4.8 242,958 2,973,879 Plasmodium falciparum
PY07267 Dynein 14-kDa light chain, flagellar outer arm., putative 48.0 ± 0.4 52.8 ± 0.3 −4.8 1,665,569 4,916,198 Plasmodium yoelii
HSA9761-02 Putative dimethyladenosine transferase 55.8 ± 0.3 49.3 ± 0.4 6.5 2,100,672 3,291,114 Human
ppi63.7.179c Peptidylprolyl isomerase 46.4 ± 0.3 39.8 ± 0.1 6.6 700,230 5,121,437 Human
PFD1185w:N47-Y283 Hypothetical protein 67.0 ± 0.7 60.2 ± 0.9 6.8 619,068 3,050,360 Toxoplasma gondii
Sult 1E1-01 Sulfotransferase 45.9 ± 0.1 38.6 ± 0.4 7.3 1,823,734 4,796,702 Human
ppi65.280.457 Peptidylprolyl isomerase-like 2 isoform b 43.1 ± 0.1 35.4 ± 0.3 7.7 2,355,281 3,921,091 Human
PFE1600w:N68-Q509 Hypothetical protein 58.1 ± 0.1 48.4 ± 0.6 9.7 636,832 2,707,644 Plasmodium falciparum
COMT 09 Catechol-O-methyltransferase NI 47.3 ± 0.7 NA 648,417 1,991,950 Human
COMT-02 Catechol-O-methyltransferase No Tagg 46.6 ± 0.3 NA 719,047 1,836,093 Human
ppi40.90.301c Peptidylprolyl isomerase E isoform 1 No Tagg 57.9 ± 0.4 NA 379,627 2,987,857 Human
PDE9A-03 Phosphodiesterase 9A No Tagg 38.7 ± 0.3 NA 2,543,185 4,951,075 Human
CP-PFL0595c Glutathione peroxidase 64.5 ± 0.2 NI NA 744,088 3,046,504 Cryptosporidium parvum
PV-MAL13P1.227:M17-C163 Ubiquitin-conjugating enzyme 62.9 ± 0.6 NI NA 246,679 2,113,208 Plasmodium vivax
PV-PF14 0053:E30-M309 Ribonucleotide reductase small subunit 42.7 ± 0.1 HF NA 9,204,732 6,863,139 Plasmodium vivax
PFF0625w:M1-G420 Nucleolar GTP-binding protein 1, putative 38.3 ± 0.1 HF NA 1,031,949 1,542,887 Plasmodium falciparum
TgGlmHMM 3960:M1-L260 UMP-CMP kinase, putative 56.3 ± 0.4 HF NA 1,734,186 1,871,153 Toxoplasma gondii
CP-MAL13P1.135 Snare protein homologue, putative 62.9 ± 0.3 HF NA 12,793,351 2,130,593 Cryptosporidium parvum
PBG-PF10_0087 Diphthine synthase 54.9 ± 0.3 HF NA 4,334,843 5,512,029 Plasmodium berghei
PF07 0062:N544-R632 GTP-binding translation elongation factor No Tagg No Tm NA 454,159 361,466 Plasmodium falciparum
PFB0985c:K70-L153 Hypothetical protein No Tagg No Tm NA 509,929 223,057 Plasmodium falciparum
PV-PF10 0245:H470-N641 Glucosamine-fructose-6-phosphate aminotransferase No Tagg No Tm NA 170,686 358,571 Plasmodium vivax
CP-PF10_0066 Hypothetical protein No Tagg No Tm NA 143,126 242,015 Cryptosporidium parvum
PY00693:D10-D201 Cyclophilin-like protein No Tagg No Tm NA 150,342 531,035 Plasmodium yoelii
PKN-PF14 0017 Lysophospholipase No Tagg No Tm NA 653,102 273,327 Plasmodium knowlesi
PY02905 60S acidic ribosomal protein P2 No Tagg HF NA 4,838,677 1,362,400 Plasmodium yoelii
PDE4D-01 Phosphodiesterase 4D, Drosophila No Tagg HF NA 5,514,751 3,449,558 Human
CP-PFC0400w 60S Acidic ribosomal protein P2 No Tagg HF NA 1,722,331 1,037,984 Cryptosporidium parvum
CP-PF14 0323 Calmodulin No Tagg HF NA 1,490,456 948,218 Cryptosporidium parvum

A total of 61 proteins were screened by DSF and DSLS under the same solution conditions. In some instances, either the Tagg or Tm parameters could not be measured. NI, the curve was not interpretable; HF, the protein/dye mixture exhibited high initial fluorescence; NA, not applicable.

The effect of known ligands on Tagg and Tm.

We tested the effects of ligands on both Tm and Tagg with the human cytosolic sulfotransferase 1C1 (SULT1C1), which catalyzes the transfer of a sulfate group from 3′-phosphoadenosine-5′-phosphosulfate (PAPS) to a variety of substrates. The reaction produces a sulfonated substrate and 3′-phosphoadenosine-5′-phosphate (PAP).

For DSF, human SULT1C1 was aliquoted into each well of a 384-well plate at 100 μg/ml in the presence of SYPRO orange and different concentrations of PAP, and the plate was heated from 27°C to 75°C. The observed Tm of SULT1C1 in the absence of PAP was 48.4 ± 0.2°C. There was a significant increase in the observed Tm in the presence of PAP; the lowest concentrations of PAP that stabilized SULT1C1 >3°C was 87 μM (Fig. 1A). This concentration of PAP also caused a similar increase in the Tagg (Fig. 1B). The stability of the protein increased as the concentration of PAP was increased to 9 mM, at which the ΔTm and ΔTagg approached plateaus of ≈8°C (Fig. 1 C and D). Our results support the findings of Matulis et al. (6) and Bullock et al. (7) that ligand binding increases protein thermal stability, and that the effect is proportional to the concentration and affinity of the ligand.

Fig. 1.

Fig. 1.

Analysis of Sult1C1 and its binding to PAP by DSF and DSLS. (A and B) Thermostability of Sult1C1 in the presence of 0 (●), 0.08 (○), 0.35 (▾), 1.4 (▵), 5.6 (■), 11.2 (□), 22.5 (⧫), and 45 (◇) mM PAP measured by DSF (A) and DSLS (B). (C and D) The increases in the thermal stability, ΔTm and ΔTagg, as a function of the concentration of PAP, measured by DSF and DSLS, are shown in C and D, respectively.

Reproducibility of ΔTagg and ΔTm.

To use the two methods and the selected hardware as screening platforms, the ΔTm and ΔTagg must be reproducible. Accordingly, the Tm and Tagg for SULT1C1 were measured up to 12 times in the presence and absence of 0.5 mM of PAP. The protein consistently showed greater stability in the presence of PAP (Tagg = 53.1 ± 0.2°C; Tm = 53.8 ± 0.5°C) than in its absence (Tagg = 48.4 ± 0.3°C; Tm = 48.4 ± 0.2°C). The resulting ΔTagg and ΔTm were 4.7 ± 0.5°C and 5.4 ± 0.7°C, respectively, suggesting that the two methods in these formats are able to measure these parameters reproducibly and can be used to screen proteins for the binding of new ligands.

General Applicability of the Methods.

As anticipated from previous studies on small numbers of proteins (11, 12), we found that increases in both the Tm and Tagg were correlated with binding of ligands. The specific instruments used here could measure the transitions reproducibly within 0.5°C. We were then interested to determine the fraction of proteins to which the two methods could be applied, to assess whether these methods could be applied broadly.

The Tm and Tagg were determined and compared for 61 different proteins (Table 1). For 40 proteins, both a Tm and a Tagg could be measured reproducibly and with thermal envelopes that conformed to the prototypical melting transitions. Neither a Tagg nor a Tm could be measured for 10 proteins, presumably because of high thermal stability or some other property of the protein that was incompatible with the method (e.g., not properly folded). There were 11 proteins that could be analyzed only with one or the other method; for 7 proteins only a Tagg could be measured and for 4 proteins only a Tm could be measured. All proteins that did not display an interpretable Tm displayed aberrantly high initial fluorescence in the presence of SYPRO orange. It is possible that these proteins contain hydrophobic binding pockets/cavities accessible to the dye. Of the 40 proteins for which both a Tm and a Tagg could be measured, the difference between Tagg and Tm varied depending on the protein; for 16 proteins Tagg was lower than Tm, whereas for 24 proteins Tagg was higher than Tm. It is possible that aggregation kinetics or a stabilization effect by the dye account for these differences.

Application of Screening Methods.

We applied the screening platforms to identify ligands or buffer conditions that might stabilize proteins and aid protein purification and/or crystallization. Two types of small-molecule screens were implemented. In the first, the proteins were screened against a set of common solution conditions and sets of physiologically relevant ligands, such as nucleotides and cofactors. In the second, proteins were screened against libraries of small molecules that were designed especially for the protein or protein family being investigated. For example, protein kinases were screened against a set of previously identified and validated inhibitors.

Screening against solutions containing ranges of pH and salt.

A total of 221 proteins were screened by using one of the methods against buffers covering a pH range from 6 to 9 and two different salt concentrations (100 and 500 mM NaCl). In >50% of the cases a condition was identified that stabilized the protein by >4°C against thermal denaturation compared with the original buffer (Hepes buffer, pH 7.5/150 mM NaCl) (Table 4, which is published as supporting information on the PNAS web site). Although it was not possible to extract unifying trends, we did observe that most proteins were stabilized in this assay by the addition of higher concentrations of salt. However, 27% of proteins were more stable in lower concentrations of salt.

In several instances the identification of a stabilizing solution contributed to the ability to purify, concentrate, or crystallize the protein. For example, the E2 ubiquitin-conjugating enzyme from Cryptosporidium parvum was purified and concentrated to 7 mg/ml for crystallization trials in standard buffer (Hepes, pH 7.5 in 500 mM NaCl). A buffer screen using DSLS found that the protein was more stable in low salt at pH 9, and the use of this buffer enabled the protein to be concentrated to 28 mg/ml. Using DSF, more optimal purification conditions for human RGS6 (at pH 6.5), human RGS16 (at pH 9.0), and human RGS17 (at pH 8.5) were identified. None of these RGS proteins could be concentrated under standard conditions, but the use of the optimized conditions allowed them to be concentrated to >10 mg/ml, crystals to be formed, and the structures to be determined [PDB ID codes: 2ES0 (RGS6), 2BT2 (RGS16), and 1ZV4 (RGS17)].

Calpain 1 could be purified and crystallized under standard conditions, but the crystals diffracted poorly (3.0–3.2 Å). A buffer screen showed that the protein was more resistant to aggregation under lower salt conditions, and the use of these conditions during purification led to a different crystal form of higher quality and ease to reproduce, which led to a structure at higher resolution (2.4 Å; PDB ID code 2ARY). Purified Trb2 kinase domain constructs were aggregating and visibly precipitated out of solution before and after gel filtration when using standard buffer conditions (20 mM Hepes, pH 7.5/0.5 M NaCl/2 mM DTT). By diluting the protein and screening in different buffer conditions, the optimal buffer condition was found to be 100 mM NaCl, 20 mM bicine (pH 9.0), and 1 mM DTT; under these conditions, the Trb2 kinase domain was soluble and readily concentrated to 20–30 mg/ml.

Screening against a library of physiologically relevant compounds.

Physiologically relevant small molecules provide a potentially rich source of compounds for stabilizing proteins. Accordingly, we generated libraries that comprised physiologically relevant compounds (PHY library) and other molecules that might be predicted to be “generic” stabilizers of proteins, such as detergents and metals. One representative library comprised 160 compounds that included amino acids, nucleotides, nucleosides, sugars, cofactors, divalent cations, common substrates and products, and some other additives (Table 2). To minimize the number of screens and the protein used, the compounds were combined in different groups of two to six compounds. If a group of compounds was shown to stabilize the proteins, the protein was then rescreened against the individual compounds (deconvolution).

There are several examples in which the use of these libraries contributed directly to a crystal structure. For example, the C2 domain of PrkCh was purified under standard conditions but could not be concentrated beyond 2 mg/ml. We found that the addition of 5% glycerol stabilized the protein and allowed the protein to be concentrated to >4 mg/ml and subsequently crystallized. Interestingly, contrary to our expectation, the addition of glycerol to proteins did not have a general stabilizing effect. Among a subset of 28 proteins tested for the influence of glycerol, only 8 were stabilized by >2°C (at pH 7.5). Pyruvate kinase was purified and crystallized, but the crystals were difficult to optimize. l-phenylalanine was found to stabilize the protein by using DSLS, and the inclusion of l-Phe in the crystallization buffer at 10 mM facilitated the formation of crystals diffracting to 2.2 Å, from which the structure was solved (PDB ID code 1ZJH). Fe-superoxide dismutase was crystallized but the crystals were of poor quality. The inclusion of 5 mM MnCl2, which was found to stabilize the protein with DSLS, permitted the growth of crystals that diffracted to 2.2 Å, from which the structure was solved (PDB ID code 2AWP). Human Cdc2-like kinase, CLK1, could not be sufficiently concentrated for crystallization. Addition of a mixture of l-arginine and l-glutamic acid (13) enabled concentration to 10 mg/ml, thus providing the means to solve the structure in the presence of a specific inhibitor (PDB ID code 1Z57).

The methodology has also been applied to determine the concentrations of known substrates and cofactors that are optimal for crystallization trials. For example, the bifunctional PAPS synthetase was known to bind ATP. Initial attempts to crystallize PAPS synthetase in the presence of up to 5 mM ATP were unsuccessful. With DSLS, the PAPS synthetase was titrated against higher concentrations of ADP and ATP, and this experiment indicated that much higher concentrations of ADP and ATP were required to saturate the enzyme under the conditions tested (Fig. 2). The protein was then crystallized in the presence of 100 mM ATP, the resulting crystals were diffracted to 2.4 Å, and the structure was subsequently solved (PDB ID code 2AX4). ADP was found in the active site of the crystallized enzyme, suggesting ATP was hydrolyzed in the crystallization trials.

Fig. 2.

Fig. 2.

Dependence of the thermostability of PAPS synthase as a function of the concentration of ATP, measured by DSLS. PAPS synthase was incubated with increasing concentrations of ATP, and the ΔTagg was measured by DSLS in duplicate.

Unbiased small-molecule screens can also guide the experiment in unanticipated directions. Adenosine deaminase was subjected to crystallization trials in the presence and absence of adenine, but no crystals could be obtained. In the screen of physiological compounds, which includes nucleotides and deoxynucleotides, deoxyguanosine was identified as the strongest stabilizer of adenosine deaminase. Although the nucleoside was not found in the structure, crystals of adenosine deaminase that diffracted to 2.0 Å were obtained in the presence of deoxyguanosine (PDB ID code 2AMX).

The components of the library of physiologically relevant compounds were not uniformly found as being active. More than 50% of the compounds were never shown to stabilize any protein. A few additives were frequently identified as stabilizers, raising the possibility of their being false positives. However, among these compounds were those that might be predicted to act as general, nonspecific protein-stabilizing compounds, including n-dodecyl-β-d-maltoside and a mixture of 50 mM l-arginine and 50 mM l-glutamic acid, which stabilized 26% and 16% of all proteins against thermal denaturation (>4°C), respectively. The promiscuous stabilization by l-arginine and l-glutamic acid confirms the report from Golovanov et al. (13). The promiscuity of these ligands suggests that they may prove to be useful additives for crystallization screens.

Focused libraries for specific proteins and protein families.

For some proteins, there may be considerable prior knowledge about the compounds that are likely to bind, and in these instances it may be useful to generate a protein-specific library of compounds. The most direct path for creating such a library is to explore the academic and patent literature, the PDB, and other databases, such as BRENDA (www.brenda.uni-koeln.de), to identify substrates, inhibitors and/or cofactors that have been shown to bind the protein or closely related proteins. In many instances, these compounds are available from commercial suppliers.

Protein family-specific libraries were created for a number of human enzyme families (e.g., deacetylases, sulfotransferases, protein kinases, methyltransferases, and oxidoreductases). In many cases, the use of these libraries identified compounds that facilitated protein crystallization. In one example, purified NAD-dependent deacetylase sirtuin 5 (SIRT5) was screened against a set of compounds known to bind deacetylases, and suramin was shown to stabilize the protein. SIRT5 was then cocrystallized with suramin, and crystals that diffracted to 2 Å were obtained, and the structure was solved (PDB ID code 2FZQ).

For protein kinases, the use of a library of inhibitors proved to be a very effective strategy for increasing the success rate in producing well diffracting crystals. Our ≈500-compound library comprised mostly compounds that mimic the binding mode of adenine (14, 15). To date, this library has been used to screen 32 serine-threonine protein kinases, and for 84% of them at least one compound was identified that caused a Tm shift of >4°C (O.Y.F., A. Bullock, F.H.N., B.M., and S. Knapp, unpublished work). In 9 of 12 cases in which we determined the structure of the catalytic domain, the use of the inhibitor in crystallization trials directly contributed to obtaining the crystal structure (Table 5, which is published as supporting information on the PNAS web site). For example, Cdc2-like kinase, CLK1, was cocrystallized with 10Z-hymenialdisine (PDB ID code 1Z57). Among the different examples, electron density corresponding to the ligand could be seen in all proteins except PDB ID codes 2FK9, 1ZJH, and 2AMX.

Correlation of protein stabilization and affinity of binding.

The thermal stabilization assays were used primarily to identify compounds that could promote protein purification or crystallization. We observed that the increase of the transition temperatures, ΔTm and ΔTagg, were highly reproducible when measured with DSF and DSLS, respectively [using the proteins SULT1C1, PIM-1 (7), and CLK1]. We have not undertaken a systematic effort to correlate the degree of temperature shift with binding affinities, although inhibition data of selected compounds for several proteins, including three protein kinases (PIM-1, CLK1, and CLK3) showed that Tm shifts >4°C translate into values for IC50 <1 μM. At least in one instance, the degree of stabilization was correlated with the relative affinity; in studies of a set of compounds derived from one scaffold, there was a correlation between Tm and binding affinities (7). Operationally, we have observed that temperature shifts >2°C are experimentally reproducible, but that higher temperature shifts (>4°C) are better correlated with positive outcomes in protein crystallization.

Discussion

The use of small-molecule ligands to promote protein purification, concentration, and crystallization contributed significantly to our ability to generate crystal structures. Of the 200 protein structures that have been determined within the Structural Genomics Consortium as of March 2006 (http://sgc.utoronto.ca/SGC-WebPages/sgc-structures.php), ≈100 were crystallized in the presence of a ligand, and ≈20 of the structures were determined in the presence of ligand whose identity could not have been predicted a priori (Table 5). Clearly, the use of small-molecule chemical screens will be an important contributor to success for structural genomics in general and specifically for the structural biology of human proteins.

One of the main goals of both chemical biology and drug discovery is to generate specific and selective agonists and/or antagonists for each human protein or for specific sets of human proteins. The availability of large numbers of purified human proteins from protein families provided by structural genomics efforts, focused chemical libraries that are designed for each family, and readily implemented and cost-effective screening technologies such as those described in this article will facilitate the creation of a dataset that maps the intersection of each human protein with the small-molecule universe.

Materials and Methods

Cloning, Expression, and Purification.

Proteins were cloned, expressed, and purified as described at www.thesgc.com.

Aggregation-Based Screening Using Static Light Scattering.

Temperature-dependent aggregation was measured by using static light scattering (StarGazer) (11, 12). Fifty microliters of protein (0.4 mg/ml) was heated from 27°C to 80°C at a rate of 1°C per min in each well of a clear-bottom 384-well plate (Nunc, Rochester, NY) under a variety of solution conditions. Incident light was shone on the protein drop from beneath at an angle of 30°. Protein aggregation was monitored by measuring the intensity of the scattered light every 30 s with a CCD camera. The pixel intensities in a preselected region of each well were integrated to generate a value representative of the total amount of scattered light in that region. These total intensities were then plotted against temperature for each sample well and fitted to the Boltzman equation by nonlinear regression. The resulting point of inflection of each resulting curve was defined as the Tagg (Fig. 3).

Before initiating any screen, the Tagg was determined for each protein to assess the suitability for the method (≈20% of proteins did not display a Tagg). For the screen, the Tagg was determined in the presence of different compounds in comparison to the reference. The concentrations of compounds that were used ranged from 100 μM to 1 mM, depending on the expected affinity and the necessity to limit the concentration of DMSO to 2%. The higher concentrations (1 mM) were used for compounds that were expected to bind with weaker affinity, such as the compounds from our library of physiological compounds (Table 2). Ligand binding was detected by monitoring the increase in Tagg in the presence of the ligand. Compounds that caused a >2°C increase in Tagg were observed to be significantly outside of the range of experimental error. Intensities were plotted as a function of temperature by using a software package developed internally.

Fluorescence-Based Screening.

A fluorescence microplate reader (FluoDia T70, Photon Technology International, Lawrenceville, NJ) or one of two real-time PCR devices (Mx3005p from Stratagene, La Jolla, CA, or iCycler from Bio-Rad, Hercules, CA) were used to monitor protein unfolding (Table 3) by the increase in the fluorescence of the fluorophor SYPRO Orange (Invitrogen, Carlsbad, CA). Protein samples (10 μM or 25–100 μg/ml) in Hepes buffer (pH 7.5) containing 150 mM NaCl and the appropriate concentration of ligand in a reaction volume of 20–25 μl were incubated in 96- or 384-well microplates (MJ Research, Cambridge, MA) in the fluorescence plate reader or in 96-well PCR microplates (ABGene, Surrey, U.K.) in the RT-PCR devices. For experiments testing for favorable solution conditions, the concentration of all buffers used was 100 mM.

Before initiating a full screen, each protein was scanned to assess the suitability for the method (≈25% of the protein constructs did not display a melting curve that allowed derivation of the midpoint of transition, Tm) and determine the lowest concentration of protein that generated a strong signal. Compound concentrations within the screens varied between 10 μM and 1 mM, depending on the anticipated affinity and the requirement to limit the concentration of DMSO to >2%. For scans in the fluorescence plate reader 10 μl of mineral oil (Sigma, St. Louis, MO) was layered on top of the protein solution to prevent evaporation. Optical foil was used to cover the plates in the RT-PCR devices. The samples were heated at 1°C per min, from 25°C to 75°C or 100°C, depending on the instrument. The fluorescence intensity was measured every 1–3°C. The rate of heating was found to affect the observed Tm, but not the degree with which the Tm changed upon binding of a ligand (16).

Fluorescence intensities were plotted as a function of temperature by using the same, internally developed software package as was used for the static light scattering data.

Storage of Compounds.

Compounds were stored as described at www.sgc.utoronto.ca/SGC-WebPages/Toronto-Technology.php/sgct-compoundstorage.pdf.

Supplementary Material

Supporting Information
pnas_0605224103_index.html (126.6KB, html)

Acknowledgments

We thank all members of the Structural Genomics Consortium who contributed their proteins and expertise. The Structural Genomics Consortium is a registered charity (no. 1097737) funded by the Canada Foundation for Innovation, the Canadian Institutes for Health Research, Genome Canada through the Ontario Genomics Institute, Ontario Challenge Fund, GlaxoSmithKline, Ontario Innovation Trust, Swedish Foundation for Strategic Research, Vinnova, the Knut and Alice Wallenberg Foundation, and the Wellcome Trust.

Abbreviations

DSF

differential scanning fluorimetry

DSLS

differential static light scattering

Tm

melting temperature

Tagg

temperature of aggregation

PDB

Protein Data Bank

PAPS

3′-phosphoadenosine-5′-phosphosulfate

PAP

3′-phosphoadenosine-5′-phosphate.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS direct submission.

References

  • 1.Christendat D, Yee A, Dharamsi A, Kluger Y, Savchenko A, Cort JR, Booth V, Mackereth CD, Saridakis V, Ekiel I, et al. Nat Struct Biol. 2000;7:903–909. doi: 10.1038/82823. [DOI] [PubMed] [Google Scholar]
  • 2.Dobrovetsky E, Lu ML, Andorn-Broza R, Khutoreskaya G, Bray JE, Savchenko A, Arrowsmith CH, Edwards AM, Koth CM. J Struct Funct Genomics. 2005;6:33–50. doi: 10.1007/s10969-005-1363-5. [DOI] [PubMed] [Google Scholar]
  • 3.Elleby B, Svensson S, Wu X, Stefansson K, Nilsson J, Hallen D, Oppermann U, Abrahmsen L. Biochim Biophys Acta. 2004;1700:199–207. doi: 10.1016/j.bbapap.2004.05.003. [DOI] [PubMed] [Google Scholar]
  • 4.Arai K, Yasuda S, Kornberg A. J Biol Chem. 1981;256:5247–5252. [PubMed] [Google Scholar]
  • 5.Murphy KP. Methods Mol Biol. 2001;168:1–16. doi: 10.1385/1-59259-193-0:001. [DOI] [PubMed] [Google Scholar]
  • 6.Matulis D, Kranz JK, Salemme FR, Todd MJ. Biochemistry. 2005;44:5258–5266. doi: 10.1021/bi048135v. [DOI] [PubMed] [Google Scholar]
  • 7.Bullock AN, Debreczeni JE, Fedorov OY, Nelson A, Marsden BD, Knapp S. J Med Chem. 2005;48:7604–7614. doi: 10.1021/jm0504858. [DOI] [PubMed] [Google Scholar]
  • 8.Poklar N, Lah J, Salobir M, Macek P, Vesnaver G. Biochemistry. 1997;36:14345–14352. doi: 10.1021/bi971719v. [DOI] [PubMed] [Google Scholar]
  • 9.Pantoliano MW, Petrella EC, Kwasnoski JD, Lobanov VS, Myslik J, Graf E, Carver T, Asel E, Springer BA, Lane P, Salemme FR. J Biomol Screen. 2001;6:429–440. doi: 10.1177/108705710100600609. [DOI] [PubMed] [Google Scholar]
  • 10.Kurganov BI. Biochemistry (Mosc) 2002;67:409–422. doi: 10.1023/a:1015277805345. [DOI] [PubMed] [Google Scholar]
  • 11.Senisterra G, Markin E, Yamazaki K, Hui R. 20040072356. US Patent Appl. 2004
  • 12.Senisterra G, Hui R, Vedadi M. 2005079526. US Patent Appl. 2005
  • 13.Golovanov AP, Hautbergue GM, Wilson SA, Lian LY. J Am Chem Soc. 2004;126:8933–8939. doi: 10.1021/ja049297h. [DOI] [PubMed] [Google Scholar]
  • 14.Pierce AC, Sandretto KL, Bemis GW. Proteins. 2002;49:567–576. doi: 10.1002/prot.10259. [DOI] [PubMed] [Google Scholar]
  • 15.Bleicher KH, Bohm HJ, Muller K, Alanine AI. Nat Rev Drug Discov. 2003;2:369–378. doi: 10.1038/nrd1086. [DOI] [PubMed] [Google Scholar]
  • 16.Lo MC, Aulabaugh A, Jin G, Cowling R, Bard J, Malamas M, Ellestad G. Anal Biochem. 2004;332:153–159. doi: 10.1016/j.ab.2004.04.031. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
pnas_0605224103_index.html (126.6KB, html)
pnas_0605224103_4.pdf (43.4KB, pdf)
pnas_0605224103_3.pdf (54.7KB, pdf)
pnas_0605224103_1.pdf (705.6KB, pdf)
pnas_0605224103_2.pdf (47.6KB, pdf)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES