Abstract
Using a pseudo-atom approach, the three-dimensional crystallographic phases for the protein crambin (a = 40.76, b = 18.49, c = 22.33 Å, β = 90.61°, space group P21) were determined to 6 Å by direct methods. First, the centrosymmetric h0ℓ set was assigned phases by symbolic addition, and the initial solution was then refined by Fourier methods. Phase values of strong reflections were then permuted, and the decision to change the phase value for two of these was made by consulting a cross-correlation of the experimental density histogram to the theoretical or known histogram for the protein. The two-dimensional basis was then extended by the Sayre equation into three dimensions by assigning a phase to a third allowed hkℓ origin-defining reflection and an algebraic value to another axial reflection. The correct solution was again identified by the histogram correlation, yielding a solution in which the mean phase error for all 98 reflections was 61.5° or 23.1° for the 21 most intense reflections. A parallel study with another protein indicates this method may have general utility.
Keywords: protein crystallography, direct methods, Sayre equation, density histograms, x-ray diffraction
Significant strides have been made recently in solving the crystal structures of proteins at atomic resolution by direct methods for crystallographic phase determination. The concept of atomic resolution may denote either the true resolution of all atomic components of the polypeptide backbone (1) or the resolution of just the heavy atoms used, e.g., to obtain an anomalous scattering signal (2) because it is merely the Rayleigh criterion for resolving these scattering entitites that seems to be important.
There is also considerable interest in attacking the phase problem for macromolecules at low resolution. One reason for this is that an accurate phase assignment to relatively high-resolution data can sometimes lead to an ambiguous definition of the molecular envelope boundaries (3), particularly if the low-resolution diffraction maxima are unrecorded or not assigned accurate phase values. Although conventional direct phasing methods seem to retain their validity within the first scattering intensity envelope (4) (e.g., to a resolution near 6 Å where the average intensity has a nodal value near zero), the phase problem in this region remains a challenge. For example, there are no conservative geometric rules relating density regions in the macromolecule at low resolution of the type that are similar to the reasonable chemical bonding constraints between atoms. Thus, one cannot identify readily which attempt at a solution is “correct” just from its appearance.
Partial success in two-dimensional ab initio phase determination at low resolution (e.g., with electron crystallographic data) has resulted from several approaches to the problem, maximum entropy and likelihood techniques (5, 6). Globular approximations have also been successful in two ways. In x-ray crystallography, random glob generators coupled with a suitable figure of merit have determined molecular envelopes from three-dimensional data, after clustered trial solutions have been averaged (7). A reciprocal space approach has also been explored. From a pseudo-atomic distribution of density in the protein, the unit cell was re-scaled so that an atomic scattering factor could be used to normalize the observed intensity data, following a suggestion made by Harker (8). This has also been found to be effective for phasing two-dimensional electron diffraction data sets from proteins with a large α-helix content (9–12), where the atomistic assumption has an obvious application. In this paper the extension of this methodology to three-dimensions is explored with x-ray data for the phase determination of crambin at 6 Å.
Data and Methods
Diffraction Data.
Calculated structure factor amplitudes and phases from the protein crambin (13) (Mr = 6,287.82) were taken from an atomic model that accurately simulates the hydration, as well as the protein structure, and hence the density of the crystal is well modeled (1, 14). The space group is P21 where a = 40.76, b = 18.49, c = 22.33 Å, β = 90.61°. There are 98 unique hkℓ diffraction maxima within the 6 Å limiting resolution used in this study. In a parallel study, 6 Å data (104 reflections) from monoclinic rubredoxin (15) (P21: a = 19.97, b = 41.45, c = 24.41 Å, β = 108.39°, Mr = 7465.23), a model with accurately predicted solvent structure (14), were similarly treated.
It was assumed that the distribution of density could be simulated by pseudo-atomic globs and that the Fourier transform of these globular subunits could be modeled, as shown earlier (9–12), after a 10-fold reduction of the unit cell dimensions, by an atomic scattering factor. (For the [010] projection, a pseudo-atom model with coordinates chosen at identified glob centers was found to fit the published phases by a mean error of 28.4°; for the [001] projection the mean error, 15.3° was somewhat better. For rubredoxin, corresponding values are: [010]: 18.0°; [001]: 29.1°) In this case the electron scattering factor for carbon (16) was used as the model for the glob transform. Normalized structure factors (17) were generated, therefore, from |Eh|2 = Ih/ɛΣfc2. The atomic scattering factor was also used for (zonal) structure factor calculations during Fourier refinement.
Phase Determination.
The Σ1- and Σ2-three-phase structure invariants were then generated from the three-dimensional normalized structure factors. For space group P21 (b-axis unique), symmetry-equivalent reflections the phase relationships of equivalent reflections are given in the International Tables for X-ray Crystallography (18). The minimal information required for a basis set (19) to assign phase terms to all other reflections was assessed by a convergence test (20). By symbolic addition (21), the centrosymmetric (h0ℓ) data were then assigned phase values [including setting two permissible origin-defining reflections (19), accepting some from Σ1-estimates and assigning some algebraic values], followed by Fourier refinement. Most intense reflections in the final phase list were then permuted, and several figures of merit (see below) evaluated the need for further changes to this zonal set. After completion, the strongest reflections were used as a basis for expansion into the three-dimensional data set via the Sayre equation (22): Fh = (θ/v)ΣkFkFh−k. A third origin-defining reflection (19) was then defined as well as an algebraic phase term to be permuted. For the Sayre expansions, a correct value was given for F000, but an estimate that minimized the negative density of ensuing electron density maps could be used just as well to stabilize the convolution (23).
Figures of Merit.
Often, it was necessary to pick an optimal phase solution from two or more choices. Several figures of merit were evaluated for this purpose. One that has been often used is the Luzzati (24) test for electron density map flatness: 〈Δρ4〉, where Δρ = ρ − ρ̄. (Ideally, the best phase solution minimizes this figure of merit.) When F000 = 0.0 for calculating these maps, the average term ρ̄ is also zero. With an atomistic estimate of glob centers used for a structure factor calculation, a Patterson correlation coefficient (25), (Σmomc/(Σmo2Σmc2)1/2), where mo = |Fo2| − 〈|Fo2|〉, etc., was also evaluated. (The subscripts o and c denote “observed” and “calculated” values, respectively.) Finally, it was assumed that the density histogram (26) v(t) could also be determined a priori for this protein (4, 7, 27). Its expected appearance at low resolution (7, 26, 27) is shown in Fig. 1a. Given an observed density histogram v(t) from the electron density map calculated for any phase solution, then a figure of merit is immediately suggested because the cross-correlation function (28) ψ12(τ) = ∫v1(t + τ)v2(t) dt of an experimental histogram (trial phase set) with the expected distribution should approximate the autocorrelation function ψ(τ) = ∫v(t + τ)v(t) dt (Fig. 1b) as the phase error decreases. Although the autocorrelation function is maximally peaked at the zero value, the test for a skew distribution of the cross-correlation function comparing normalized sum of differences at ψ(τ) and ψ(−τ) was more useful. The minimum value of skewness was sought because the sum of differences in the mirror-symmetric autocorrelation function is zero. This criterion is the reverse of the one used to predict the correctness of an experimental density histogram: i.e., where its skewness is a desirable property (29). The appearance of the histograms for maps generated with different amounts of phase error has been depicted by Lunin (27), where increasing phase error leads to a more symmetric, Gaussian-like, distribution. The histogram for crambin based on phased 6 Å x-ray data actually resembled a theoretical case in which some slight phase error was present (27).
Results
For the 10 h0ℓ Σ2-phase invariant sums with the largest values of A = (2/)|EhEkE−h−k|, the average value of φ = ϕh + ϕk + ϕ−h−k is 72° when the value 0° is expected (17). Nevertheless, the origin was defined by setting ϕ(303) = 0 and ϕ(20 3̄) = 0. Four Σ1-estimates were also accepted: namely, a 0° estimate for (402) and a 180° estimate for reflections: (400), (20 2̄), and (40 2̄). Two reflections, (40 3̄) and (302), were assigned algebraic values. This generated four phase sets with 16 terms, from which electron density maps were calculated, monitoring the value of the Luzzati (24) figure of merit, testing density flatness. Coordinates of globs in maps from the two solutions with the lowest value of 〈Δρ4〉 lay near one another, so these were averaged for the structure factor calculation to estimate the complete h0ℓ phase list. This was followed by three cycles of Fourier refinement. In this phase set, there were 11 of the 38 reflections with incorrect phase assignments [including the shift of the weak ϕ(303) term], most of which are associated with medium or weak intensity reflections. The phases of the most intense reflections were then individually permuted, and the asymmetry of the test ψ12(τ) cross-correlation functions was evaluated after the histograms were obtained for the ensuing h0ℓ maps (including the F000 term in the calculation). A separate symbolic phasing of hk0 data had also been carried out. From an ambiguous assignment indicated for an intense reflection, the ϕ(300) phase term was shifted from 0 to 180° to test for the best solution based on the properties of ψ12(τ). After this test, resulting in a phase shift, the next reflection most likely to be shifted in phase was determined in the same way: i.e., the ϕ(401) value was changed from 0 to 180°. As shown in Table 1, there were no remaining phase errors for reflections in which |Fh| ≥ 200.0. These 12 largest phased structure factors were reserved for a basis set to be expanded by the Sayre equation. The electron density maps for this projection are compared in Fig. 2. (For rubredoxin, symbolic addition finds 13 entirely correct phases for the h0ℓ data; Fourier refinement yields only 3 errors for all 20 predicted phases, all associated with weak reflections.)
Table 1.
h0ℓ | |F| | ϕ | h0ℓ | |F| | ϕ |
---|---|---|---|---|---|
−601 | 118.5 | 180 | 101 | 174.5 | 180* |
−501 | 255.5 | 180 | 102 | 139.2 | 180* |
−502 | 72.6 | 0 | 103 | 27.0 | 180 |
−401 | 174.6 | 180 | 200 | 278.1 | 180 |
−402 | 93.1 | 180 | 201 | 93.4 | 0* |
−403 | 216.8 | 0 | 202 | 56.0 | 0* |
−301 | 52.9 | 0 | 203 | 291.5 | 180 |
−302 | 129.0 | 180 | 300 | 217.6 | 180 |
−303 | 122.6 | 0 | 301 | 166.5 | 180 |
−201 | 149.6 | 0 | 302 | 183.3 | 180 |
−202 | 204.8 | 180 | 303 | 62.5 | 180* |
−203 | 107.7 | 0 | 400 | 340.4 | 180 |
−101 | 283.2 | 0 | 401 | 282.6 | 180 |
−102 | 2.4 | 180 | 402 | 170.4 | 0 |
−103 | 78.9 | 0 | 500 | 179.8 | 180 |
001 | 307.6 | 180 | 501 | 78.4 | 180 |
002 | 190.3 | 0* | 502 | 116.1 | 0* |
003 | 103.8 | 0 | 600 | 32.8 | 180 |
100 | 308.3 | 0 | 601 | 115.6 | 0* |
*Incorrect assignment.
For the three-dimensional phase expansion via the Sayre equation (22), the value of a third origin-defining phase, ϕ(312) = −158.9°, as permitted by the space group (19), was included. [This value, taken from the previous x-ray determination (1), was used only to facilitate comparison to the published phase values.] In addition, an algebraic term was permuted ±45°, ±135° for ϕ(020), which has an actual phase value of 141.1°, to generate four three-dimensional phase sets. Selection of the best solution was again based on the optimal value of the cross-correlation function ψ12(τ), using density histograms constructed from sections at y/b = 0.0 and 0.333. The minimum of this cross-correlation asymmetry S = Σi|ψ(τi) − ψ(τ−i)|/ψ(0) again defined this best solution unequivocally, as shown in Table 2, where a summary of mean phase values for different classes of reflections is also given. Examples of the experimental density histograms and their cross-correlation functions are shown in Fig. 1. It is clear that the best phase accuracy is found for the strongest reflections, but mean phase averages over other classes of reflections are again much better than the random estimate and in accord with other favorable phase predictions for proteins at low resolution (3). The electron density maps calculated with the most intense reflections reveal that much of the polypeptide backbone is covered correctly with an electron density envelope (Fig. 3). (For rubredoxin, a similar expansion gives an overall mean error of 75.6° for all 104 phases, but only 24.8° or 56.3° for the 14 or 26 most intense reflections, respectively.)
Table 2.
ϕ(020) | 135° | 45° | −45° | −135° | No. of reflections |
---|---|---|---|---|---|
〈|Δϕ|〉 | |||||
|F| ≥ 200 | 23.1 | 38.5 | 54.0 | 38.0 | 21 |
100 < |F| < 200 | 67.2 | 78.0 | 93.9 | 79.2 | 39 |
|F| ≤ 100 | 76.9 | 90.1 | 78.9 | 66.8 | 38 |
All reflections | 61.5 | 74.2 | 79.5 | 65.6 | 98 |
Newly phased reflections | 70.9 | 85.6 | 91.7 | 75.6 | 85 |
Figures of merit | |||||
S | 0.27 | 0.56 | 0.49 | 0.79 | |
F | 1.05 | 0.94 | 0.96 | 0.98 |
Discussion
The phase determination outlined above was not exactly a straightforward process. First of all, it is probably unrealistic to expect the globular model to be accurate for three-dimensional potential distributions. The best fit of a glob model would be anticipated for individual projections. Nevertheless, when the zonal Σ2 phase invariant sums were evaluated with published phase values, there were 4 values of φ = ϕh + ϕk + ϕ−h−k differing by 180° for the 10 ranked according to the largest value of A. The evaluation of the hk0 reflections was somewhat more favorable. Although there were two values of φ near 180° in the list of the top ranked triples, these were found for the eighth and ninth ranked invariants. In the top 10 ranked hkℓ phase invariants, however, there were 5 with φ value near 180°, including the triple invariant with the largest A value. It was, therefore, somewhat of a surprise to find that the values of the four most probable Σ1 invariants were correctly predicted.
The second problem encountered in this phase determination is the difficulty with finding a truly robust figure of merit for identifying the best phase solution among several. This problem has been echoed by other investigators (4, 30). Although the Luzzati criterion for density flatness is often qualitatively useful, it cannot be relied upon for fine distinctions: e.g., when individual phases are being permuted. Despite favorable indications in earlier two-dimensional determinations (9–12), the Patterson correlation coefficient is also unreliable. In this study, this figure of merit was not suitable for picking out the best phase solution when the h0ℓ set was refined by Fourier methods. In fact, the Luzzati figure of merit was a better criterion.
In this study, the cross-correlation of the observed density histogram with the expected value has been the most reliable means of determining the best phase set. Although an experimental histogram for crambin was used in the initial discrimination of phase solutions via cross-correlation with the trial histograms, subsequent evaluations of the three-dimensional phase sets, via the ideal histogram in Fig. 1a or one with a slight amount of phase error (see ref. 27), did not affect the identification of the best solution. However, there was an additional problem with the evaluation of skewness in the cross-correlation functions because it was not altogether clear where to locate the peak origin. Following the definition of correlation coefficients in signal analysis (31), the origin problem might be resolved when the Fourier transform of the cross-correlation function U(f) = ΣiI(fi) is utilized in a figure of merit F = UAB(f)/UAA(f), where I(fi) are the intensity values of the frequency components at fi, for either the cross-correlation function (AB) or the autocorrelation function (AA). As shown in Table 2, its maximum value of this ratio is also useful for detecting the best phase set. (A maximum value of F greater than 1.00 indicates some error in the Fourier transform calculation.)
Seemingly, the best tactic for three-dimensional phase determination at low resolution is to take great pains first of all to obtain the best phase set possible for a zonal projection. For three-dimensional phase determination, the Sayre equation is preferable to a symbolic addition approach because it averages over several possible contributors to a given phase, and, despite the inaccuracy of individual three-phase invariants, it seems to provide a useful result, especially for the strongest reflections in the data set. It may also be preferable to retain the structure factor magnitudes in this convolution rather than their normalized values because the shape of the pheomenological scattering factor is only approximately valid, leading to inaccurate prediction of |Eh| magnitudes. (This problem with the amplitude transform of the globular model is indicated by relatively high crystallographic residuals for the h0ℓ and hk0 data sets when the carbon scattering factor approximate is used—respectively, R = 0.54 and 0.40.)
It appears, therefore, that ab initio phase determinations for macromolecules at low resolution may not be an impossible goal. Tests of other representative structures in other space groups must be carried out to determine whether there are general truths to be found rather than episodically favorable outcomes; the experimental amplitudes from these proteins should also be evaluated in future work. Appraisal of optimal figures of merit, seemingly the greatest challenge facing us now for the identification of best solutions, is also a prime consideration. The evaluation of density histograms is certainly worth pursuing further. For example, a recent study (32) demonstrating that a three-dimensional low-resolution structure determination of trigonal rubredoxin might be feasible from observed x-ray structure factor amplitudes was somewhat surprising because the low-resolution data were strongly affected by the high ammonium sulfate concentration of the solvent space. If the crystallographic phases from the x-ray model (33) were applied to the 6 Å amplitudes, the map density histogram again followed the distribution expected for other proteins. This result indicates that the same endpoint could be exploited as a figure of merit for phase determination, perhaps explaining why phasing attempts with these experimental data were so promising.
Acknowledgments
Thanks are due to Prof. M. M. Teeter for providing the structure factor magnitudes and phases to the Hauptman-Woodward Medical Research Institute that were used for this determination and to Dr. M. P. McCourt for generating three-dimensional electron density maps. Research was funded by a grant from the National Institute for General Medical Sciences (Grant GM-46733), which is gratefully acknowledged.
Footnotes
Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.060019197.
Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.060019197
References
- 1.Weeks C M, Hauptman H A, Smith G D, Blessing R H, Teeter M M, Miller R. Acta Crystallogr D. 1995;51:33–38. doi: 10.1107/S090744499400925X. [DOI] [PubMed] [Google Scholar]
- 2.Smith G D, Nagar B, Rini J M, Hauptman H A, Blessing R H. Acta Crystallogr D. 1998;54:799–804. doi: 10.1107/s0907444997018805. [DOI] [PubMed] [Google Scholar]
- 3.Podjarny A D, Schevitz R W, Sigler P B. Acta Crystallogr A. 1981;37:662–668. [Google Scholar]
- 4.Podjarny A D, Urzhumtsev A G. Methods Enzymol. 1997;276:641–658. doi: 10.1016/S0076-6879(97)76084-1. [DOI] [PubMed] [Google Scholar]
- 5.Gilmore C J, Nicholson W V, Dorset D L. Acta Crystallogr A. 1996;52:937–946. doi: 10.1107/s0108767396008744. [DOI] [PubMed] [Google Scholar]
- 6.Gilmore C J, Nicholson W V. In: Direct Methods for Solving Macromolecular Structures. Fortier S, editor. Dordrecht, the Netherlands: Kluwer; 1998. pp. 455–462. [Google Scholar]
- 7.Lunin V Yu, Lunina N L, Petrova T E, Vernoslova E A, Urzhumtsev A G, Podjarny A D. Acta Crystallogr D. 1995;51:896–903. doi: 10.1107/S0907444995005075. [DOI] [PubMed] [Google Scholar]
- 8.Harker D. Acta Crystallogr. 1953;6:731–736. [Google Scholar]
- 9.Dorset D L. Proc Natl Acad Sci USA. 1997;94:1791–1794. doi: 10.1073/pnas.94.5.1791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dorset D L. Acta Crystallogr A. 1997;53:445–455. doi: 10.1107/s0108767397003280. [DOI] [PubMed] [Google Scholar]
- 11.Dorset D L. Acta Crystallogr A. 1998;54:290–295. doi: 10.1107/s0108767397016905. [DOI] [PubMed] [Google Scholar]
- 12.Dorset D L, Jap B K. Acta Crystallogr D. 1998;54:615–621. doi: 10.1107/s0907444997018982. [DOI] [PubMed] [Google Scholar]
- 13.Teeter M M, Roe S, Heo N H. J Mol Biol. 1993;230:292–311. doi: 10.1006/jmbi.1993.1143. [DOI] [PubMed] [Google Scholar]
- 14.Blessing R H, Guo D Y, Langs D A. Acta Crystallogr D. 1996;52:257–266. doi: 10.1107/S0907444995014053. [DOI] [PubMed] [Google Scholar]
- 15.Dauter Z, Sieker L C, Wilson K S. Acta Crystallogr A. 1992;48:42–59. doi: 10.1107/s0108768191010613. [DOI] [PubMed] [Google Scholar]
- 16.Doyle P A, Turner P S. Acta Crystallogr A. 1961;24:390–397. [Google Scholar]
- 17.Hauptman H A. Crystal Structure Determination. The Role of the Cosine Seminvariants. New York: Plenum; 1972. [Google Scholar]
- 18.Henry N F M, Lonsdale K, editors. International Tables for X-Ray Crystallography Vol. 1: Symmetry Groups. (3rd ed.) Birmingham, U.K.: Kynoch; 1969. p. 375. [Google Scholar]
- 19.Rogers D. In: Theory and Practice of Direct Methods in Crystallography. Ladd M F C, Palmer R A, editors. New York: Plenum; 1980. pp. 23–92. [Google Scholar]
- 20.Germain G, Main P, Woolfson M M. Acta Crystallogr B. 1970;26:274–285. [Google Scholar]
- 21.Karle J, Karle I L. Acta Crystallogr. 1966;21:849–859. [Google Scholar]
- 22.Sayre D. Acta Crystallogr. 1952;5:60–65. [Google Scholar]
- 23.Dorset D L, Kopp S, Fryer J R, Tivol W F. Ultramicroscopy. 1995;57:59–89. doi: 10.1016/0304-3991(94)00161-f. [DOI] [PubMed] [Google Scholar]
- 24.Luzzati V, Mariani P, Delacroix H. Makromol Chem Macromol Symp. 1966;15:1–17. [Google Scholar]
- 25.Drenth J. Principles of Protein X-Ray Crystallography. New York: Springer; 1994. p. 224. [Google Scholar]
- 26.Zhang K Y J, Main P. Acta Crystallogr. 1990;A46:41–46. [Google Scholar]
- 27.Lunin V Yu. Acta Crystallogr D. 1993;49:90–99. doi: 10.1107/S0907444992009247. [DOI] [PubMed] [Google Scholar]
- 28.Gaskill J D. Linear Systems, Fourier Transforms, and Optics. New York: Wiley; 1978. pp. 172–176. [Google Scholar]
- 29.Podjarny A D, Yonath A. Acta Crystallogr A. 1977;33:655–661. [Google Scholar]
- 30.Lunin V Yu, Lunina N L, Petrova T E, Urzhumtsev A G, Podjarny A D. Acta Crystallogr D. 1998;54:726–734. doi: 10.1107/s0907444997012456. [DOI] [PubMed] [Google Scholar]
- 31.Mason S J, Zimmermann H J. Electronic Circuits, Signals, and Systems. New York: Wiley; 1960. pp. 197–222. [Google Scholar]
- 32.Dorset, D. L. & McCourt, M. P. (2000) Z. Kristallogr., in press.
- 33.Watenpaugh K D, Sieker L C, Herriott J R, Jensen L H. Acta Crystallogr B. 1973;29:943–956. [Google Scholar]