Abstract
An X-ray free electron laser is a new source of x-rays some 10 × 109 times brighter than any previous X-ray source, giving rise to the possibility of structure determination of individual biological particles without crystallization. Some of the earliest samples used in the X-ray free electron laser are viruses because they are about the largest of reproducible bioparticles. We show how common virus near-symmetries can be exploited to find a first approximation to their structures to give a starting point for a perturbation approach to determine their structures.
I. INTRODUCTION
As demonstrated in a recent experiment,1 a single virus particle may be injected into a very short and bright pulsed beam from a coherent X-ray free electron laser (XFEL) and is capable of diffracting enough intensity to form a high-quality diffraction pattern before suffering significant damage.2 Experiments have suggested the feasibility of this approach for both single particles3 and nanocrystals.4 Therefore, a stream of particles can be directed into an XFEL beam, and diffraction patterns recorded sequentially, each corresponding to a different particle orientation. There have also been other significant applications of XFEL radiation, such as the discovery from wide-angle X-ray scattering (WAXS) of a “protein quake” that might explain the utilization of solar energy during photosynthesis.5 Here, we develop a method to recover the 3D structure of a particle without knowing the orientation of the particle contributing to each diffraction pattern. Though the method we describe assumes a form of symmetry (icosahedral or helical) suggested by Caspar and Klug6 to be usual with regular viruses, we acknowledge that many viruses deviate from this symmetry with their internal genetic material1 or with external features such as spikes7 or hair.1 Icosahedral or helical symmetry often remains the symmetry associated with the bulk of the capsid. For example, the low-resolution image of the capsid in Fig. 3 of Ref. 1 appears to be largely icosahedral with little evidence of the hair that is known to cover this virus. This suggests that a symmetric structure like those deduced here from simulated XFEL data may form an excellent first guess to such a structure which can later be modified by a form of perturbation theory as used in quantum mechanics, or else form an excellent support as a starting point in an iterative algorithm8 that constrains the reciprocal space data to the measured angular correlations, and the real-space image to a dynamic support based on the shrink-wrap algorithm9 with no symmetry restrictions. Our approach in our earlier paper10 was to find the optimum set of coefficients that ensured positivity of the diffraction intensities. This is not a very strong constraint, and there are a variety of combinations of signs which are consistent with this constraint. A stronger constraint and one which tends to determine these signs more uniquely is that which makes them agree with the magnitudes and signs of the triple correlations11 derived from the ensemble of diffraction patterns. In this paper, we derive precise expressions for these triple correlations in terms of the relevant expansion coefficients of the diffraction volume, and show how this constraint may be used for determining the signs of the expansion coefficients in the cases of both icosahedral and helical symmetry. The resulting 3D diffraction volume can be phased iteratively with algorithms like charge flipping12,13 or standard fiber diffraction phasing algorithms (e.g., Refs. 14 and 15) in the case of a helical virus.
II. AVERAGE ANGULAR CORRELATIONS
For a set of experimental diffraction patterns, the average pair correlation function may be defined by
| (1) |
where q and refer to two resolution shells, which can be identified with concentric circles on individual diffraction patterns, and refers an average over diffraction patterns. It would be noted that the angular correlations themselves (Eq. (1)) are between intensities on the same diffraction patterns, and thus insensitive to shot-to-shot intensity variations, a problem with an X-ray laser. Of course, invokes an average over different diffraction patterns. However, such an average is much less sensitive to shot-to-shot variations than correlations between different diffraction patterns. All that is required is a reasonable statistical distribution of the shot-to-shot intensity variations. It should perhaps be pointed out that this method of determining a diffraction volume from random particle orientations works from averages over all particle orientations, and at no stage involves the determination of the relative orientations of the particles contributing to the individual diffraction patterns.
It can be shown that, in terms of the spherical harmonic expansion coefficients of the diffraction volume, this angular correlation reduces to16,17
| (2) |
which is independent of , where
| (3) |
and
| (4) |
where and refer to scattering angles of the incident radiation corresponding to the resolution shells q and . Note that, in the above equations, the quantity Bl, which is easily recoverable from the set of measured diffraction patterns using the orthogonality property of Legrendre polynomials, is a quadratic function of the spherical harmonic expansion coefficients . If the expansion coefficients could be recovered directly, it would be possible to reconstruct that diffraction volume via
| (5) |
Unfortunately, the quantities , in general, do not determine the coefficients uniquely.16,17 For example, the coefficients depend on the extra magnetic quantum number m, not specified by Bl. However, for particles of known symmetry, each l quantum number may be associated with known values of m in some known ratio. As we show below, this is certainly true of icosahedral and helical viruses. Under such circumstances, as we show below, it is possible to extract the coefficients from the measured .
We begin by considering the icosahedral case.
III. ICOSAHEDRAL PARTICLE
The recovery of charge density of icosahedral virus from pair angular correlation has been reported in a previous work.10 Since the quantities extractable from the pair correlations depend only on the angular momentum quantum number, l, the use of icosahedral harmonics (which also depend on the same quantum numbers as they are sums over spherical harmonics of magnetic quantum numbers m in a known proportion) allows their magnitudes to be determined by these quantities alone. Although icosahedral harmonic expansion coefficients for real intensities are real, also there remains an ambiguity of sign. In our previous work,10 we determined these signs by a positivity constraint on the intensities. This is not a very strong constraint and does not always determine the signs of the icosahedral harmonic expansion coefficients uniquely. We hereby describe an alternative, and stricter, way of constraining these signs by adjusting them to agree with the magnitudes and signs of the triple angular correlations.11 As we will see, this approach is equally applicable to the great classes of virus symmetry, noted by Caspar and Klug.6 Icosahedral harmonics, , are defined by
| (6) |
where the alm are a set of known coefficients, tabulated as in, e.g., Ref. 18. A general diffraction volume of icosahedral symmetry can be described by an expansion of the form
| (7) |
where the amplitudes denote the precise combinations of the different icosahedral harmonics needed to specify the 3D diffraction volume of each particle. The quantities only depend on l and q, precisely the parameters specifying the quantities determinable by experiment.
Indeed, in terms of the quantities , it is easy to see that
| (8) |
Consequently, the magnitudes of the coefficients may be determined from the shell-diagonal parts of Bl via
| (9) |
Although icosahedral harmonics are known to be real and consequently the expansion coefficients may be chosen to be real to represent a real diffraction volume, there is still an uncertainty of sign. Fortunately, it is possible to determine these signs from other quantities determinable from the experimental data, namely, the triple angular correlations defined by11
| (10) |
It should be noted that these two-point triple correlations consist of functions of the products of only two pixel intensities, and consequently on averaging over perhaps a million diffraction patterns is likely to give converged values from XFEL diffraction patterns of typical proteins. The current application is to an entire virus, which is perhaps two orders of magnitude larger than a typical protein, so the problem of weak scattered intensities probably does not arise. It can be shown that16 this can be written as
| (11) |
In this case,
| (12) |
where
| (13) |
Here, is a coefficient defined by
| (14) |
It should be noted that Tl may be extracted from the measurable quantities C3 as before, using the orthogonality property of Legendre functions. We may write
| (15) |
where is real function of q, and alm is real. In Eq. (5), is the expansion coefficient with respect to complex spherical harmonics. It is different from most previous works which expand intensity in terms of real spherical harmonics.10 In Eq. (15), alm is related to the tabulated values of Ref. 18 by
| (16) |
The z axis has been chosen to be along a five-fold axis of the icosahedron, and the zx plane chosen along a mirror plane cutting the pentagon in the xy plane between two vertices. Such a choice ensures that all quantities in Eq. (15) are real. Note that the real square root in Eq. (9) exists because is real by the definition in Eq. (3). In a previous work,10 we determine the signs of by the positivity condition of intensity. Here, we will make use of the triple correlation which in general provides a more stringent condition on the signs. Because of Friedel's rule,
| (17) |
l must be even. The simple form in Eq. (15) is valid only for l < 30, and alm is non-zero only for selected values of l.18 The allowed even values of l for non-degenerate icosahedral harmonics are 0, 6, 10, 12, 16, 18, 20, 22, 24, 26, and 28. For a given shell, specified by a value of q, the number of possible combinations of the signs of gl is . We can determine these signs by an exhaustive search over all possible combinations of signs in order to fit in Eq. (12). By means of Eq. (8),
| (18) |
This relation can be used to propagate the signs from the reference shell to other shells, characterized by a general q. However, numerical instability may occur if in Eq. (18) is close to zero. This instability can be overcome by taking multiple shells q as the reference shell. For each l, we can go through all shells to pick out the one with the largest magnitudes of . That shell is then used as the qref for that particular l. To demonstrate this method, simulated experimental data for B and T were obtained directly from the Protein Data Bank (PDB) file of the icosahedral virus Paramecium bursaria Chlorella virus (PBCV-1). The sign combinations which give the best fit between the theoretical (Eq. (12)) and experimental (Eq. (11)) shell-diagonal components of T for all the reference shells were found. The signs can then be propagated to other shells using the chosen multiple references by the non-shell-diagonal B's. With all the signs determined, the Ilm(q) and hence the diffraction volume can be calculated. The flipping method12,13 was used for phasing the diffraction volume to give the charge density of PBCV-1 shown in Fig. 1. Since , where R = 1000 Å, is the radius of the virus, and lmax = 28 for PBCV-1, we have qmax = 28/1000 = 0.028 Å−1. Therefore, this method is restricted to a resolution of or 220 Å. Fig. 1 was calculated for this resolution. As expected, it is icosahedrally symmetric.
FIG. 1.
Charge density of Chlorella by flipping algorithm applied to the diffraction volume determined as described from quantities extractable from angular correlations of the intensities of XFEL diffraction patterns.
The degeneracy of icosahedral harmonics for l between 30 and 44 is only two-fold.19 Between l = 30 and l = 44, therefore, one may write
| (19) |
where the matrix O and its transpose OT are single parameter (2 × 2) orthogonal matrices that cannot be determined by Bl alone since , where I is the identity matrix. However, they can be determined by an optimization algorithm from the triple correlations , where they appear in triplicate and therefore do not cancel. Further work is in progress on this idea. Although to achieve even higher resolution would require the use of many-fold degenerate icosahedral harmonics, there do not, in principle, seem to be reasons they cannot also be found by such techniques, because all the q cross-diagonal Tl's can be determined by this technique from experimental data. To put it another way, although there are extra parameters specified by the extra quantum number n specifying the degerate icosahedral harmonics, we can also exploit the extra information in the q off-diagonal triple correlations which we do not exploit currently. In other words, we are currently using only a fraction of the data available to us by working only with the q diagonal values of the Tl quantities.
IV. HELICAL VIRUS
By the theory of Cochran, Crick, and Vand (CCV), a helix with u subunits or proteins per period should have an integral multiple of u-fold azimuthal symmetry,20,21 For the tobacco mosaic virus (TMV) with a radius of 100 Å, u = 49. For a resolution of 12 Å, lmax = 48. Therefore, only zero-fold or azimuthal symmetry is possible below this resolution. Consequently, only the m = 0 term exists in Eq. (3), which reduces to
| (20) |
From Eq. (20), we have
| (21) |
which has exactly the same form as in the icosahedral case except now the angular momentum goes up to 48, and we have a much larger number of possible sign combinations. Again, we make use of triple correlation to determine the signs. The theoretical expression for the triple correlations may be written
| (22) |
To demonstrate this method, the simulated experimental data for B and T of TMV were obtained directly from CCV theory. In this case, since the number of sign combinations is too large for an exhaustive search, we use simulated annealing (SA) method instead to minimize the difference between theoretical and experimental triple correlations on a reference shell. In the case of helical virus, it turns out that the magnitude of does not decay rapidly as a function of l. Therefore, any shell can be chosen as reference shell without causing numerical instability. The SA algorithm was started with a sufficiently high temperature with a high acceptance rate (about 97%) and a random sign combination at a reference shell; the temperature was then slowly decreased with a cooling rate of 0.9. Each sign at temperature T was flipped one at a time, and the new configuration was always accepted if the energy parameter decreases or was negative. In the present case, is defined as the change of R factor between theoretical and experimental (q diagonal) triple correlations at . On the other hand, if the energy increases or is positive, the configuration is accepted with a probability according to the Metropolis criterion.22 At each temperature, 500 initial sweeps were performed on all signs to thermalize the system. Each sweep consists of successive flips over all the signs. Then 500 additional sweeps were performed and the R factors sampled and averaged. The temperature was lowered, and the same procedure was repeated until the acceptance rate is very low or the average R factor stayed constant for 4 successive temperatures.
The final optimized combination of signs was used to find the 's at the reference shell. Again, after the optimum sign combination was found by triple correlation on a reference shell, the signs were propagated to other shells by the non-diagonal pair correlation matrix in Eq. (20). Having thus obtained the signs of the coefficients in addition to their magnitudes, the diffraction volume was calculated by
| (23) |
The comparison between this recovered intensity and original CCV intensity along the layer lines is shown in Fig. 2. This was found to consist of some discrete layer planes perpendicular to the helical axis (qz axis) at positions , where λ = 0,1,2, etc., and c is the period as expected from fiber diffraction. As the recovered intensity along the layer lines is the same as that in conventional fiber diffraction experiment, the standard fiber diffraction methods for phasing fiber diffraction intensity can be applied to obtain the structure of TMV.14,15
FIG. 2.
Fiber diffraction intensity versus radial wave vector. Line is from current theory. Dots from CCV theory. λ is the layer line index with the wave vector component along the axis , where c is the period of the helix.
Finally, the real space image Fig. 3 was reconstructed as described in the caption from a diffraction volume of a single c-repeat unit and repeated 5 times to produce an image from a 5-unit virus, by repeating the image 5 times to produce a 5 c-repeat unit structure. We reconstructed a real-space image from this diffraction volume with the same flipping algorithm that we used for the icosahedral virus. It should be pointed out that methods of fiber diffraction have perfected over the years methods of reconstructing a real-space structure from the layer-line intensities. Therefore, it is sufficient for us to reconstruct the correct layer-line intensities by this method to enable “fiber diffraction without fibers”20
FIG. 3.
Real space image reconstructed from the quantities B and T capable of being recovered from XFEL diffraction patterns of randomly oriented particles in SO(3) by assuming m = 0 for the layer line intensities. The reconstructed image of a single unit cell was then repeated five times along the c-axis to produce this image. Reprinted with permission from H.-C. Poon et al., Phys. Rev. Lett. 110, 265505 (2013). Copyright 2013 American Physical Society.
V. CONCLUSION
We have shown that for molecules with either icosahedral or helical symmetry, the symmetries of near-symmetries associated with almost all regular viruses,6 the spherical harmonic components of the intensity can be obtained directly up to a sign from the pair correlations B. In order to resolve the sign ambiguity, the triple correlations of the experimental intensities are exploited. For the helical TMV, we demonstrated how the fiber diffraction intensity can be recovered without the need for aligning the randomly orientated fibers by experimental means. In both cases, a reconstructed real-space image was obtained by an iterative phasing algorithm applied to the recovered diffraction volume.
ACKNOWLEDGMENTS
We acknowledge support for this work from a National Science Foundation Science and Technology Center (NSF Grant No. 1231306).
References
- 1. Ekeberg T. et al. , Phys. Rev. Lett. 114, 098102 (2015). 10.1103/PhysRevLett.114.098102 [DOI] [PubMed] [Google Scholar]
- 2. Neutze R. et al. , Nature 406, 752 (2000). 10.1038/35021099 [DOI] [PubMed] [Google Scholar]
- 3. Chapman H. N. et al. , Nat. Phys. 2, 839 (2006). 10.1038/nphys461 [DOI] [Google Scholar]
- 4. Chapman H. N. et al. , Nature 470, 73 (2011). 10.1038/nature09750 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Arnlund D. et al. , Nat. Methods 11, 923 (2014). 10.1038/nmeth.3067 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Caspar D. L. D. and Klug A., Cold Spring Harbor Symp. Struct. Biol. 27, 1 (1962). 10.1101/SQB.1962.027.001.005 [DOI] [PubMed] [Google Scholar]
- 7. Cherrier M. V. et al. , Proc. Natl. Acad. Sci. USA 106, 11085 (2009). 10.1073/pnas.0904716106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Donatelli J. J., Zwart P. H., and Sethian J. A., private communication (2015).
- 9. Marchesini S. et al. , Phys. Rev. B 68, 140101 (2003). 10.1103/PhysRevB.68.140101 [DOI] [Google Scholar]
- 10. Saldin D. K., Schwander P., Poon H. C., Uddin M., and Schmidt M., Opt. Express 19, 17318 (2011). 10.1364/OE.19.017318 [DOI] [PubMed] [Google Scholar]
- 11. Kam Z., J. Theor. Biol. 82, 15 (1980). 10.1016/0022-5193(80)90088-0 [DOI] [PubMed] [Google Scholar]
- 12. Oszlányi G. and Süto A., Acta Crystallogr., A 60, 134 (2004). 10.1107/S0108767303027569 [DOI] [PubMed] [Google Scholar]
- 13. Oszlányi G. and Süto A., Acta Crystallogr., A 61, 147 (2005). 10.1107/S0108767304027746 [DOI] [PubMed] [Google Scholar]
- 14. Namba K. and Stubbs G., Science 231, 1401 (1986). 10.1126/science.3952490 [DOI] [PubMed] [Google Scholar]
- 15. Pattanayek R. and Stubbs G., J. Mol. Biol. 228, 516 (1992). 10.1016/0022-2836(92)90839-C [DOI] [PubMed] [Google Scholar]
- 16. Kam Z., Macromolecules 10, 927 (1977). 10.1021/ma60059a009 [DOI] [Google Scholar]
- 17. Saldin D. K., Shneerson V. L., Fung R., and Ourmazd A., J. Phys. Condens. Matter 21, 134014 (2009). 10.1088/0953-8984/21/13/134014 [DOI] [PubMed] [Google Scholar]
- 18. Jack A. and Harrison S. C., J. Mol. Biol. 99, 15 (1975). 10.1016/S0022-2836(75)80155-0 [DOI] [PubMed] [Google Scholar]
- 19. Zheng Y. and Doerschuk P., Siam J. Math. Anal. 32, 538 (2000). 10.1137/S0036141098341770 [DOI] [Google Scholar]
- 20. Poon H. C., Schwander P., Uddin M., and Saldin D. K., Phys. Rev. Lett. 110, 265505 (2013). 10.1103/PhysRevLett.110.265505 [DOI] [PubMed] [Google Scholar]
- 21. Cochran W., Crick F. H., and Vand V., Acta Crystallogr. 5, 581 (1952). 10.1107/S0365110X52001635 [DOI] [Google Scholar]
- 22. Metropolis N., Rosenbluth A. W., Rosenbluth M. N., Teller A. H., and Teller E., J. Chem. Phys. 21, 1087 (1953). 10.1063/1.1699114 [DOI] [Google Scholar]



