Significance
HIV-1 particle assembly is driven by the viral Gag protein, which oligomerizes into a hexameric array on the inner surface of the viral envelope, forming a truncated spherical lattice containing large and small gaps. Gag is then cut by the viral protease, disassembles, and rearranges to form the mature, infectious virus. Here, we present structures and molecular dynamics simulations of the edges of the immature Gag lattice. Our analysis shows that Gag dimers are the basic assembly unit of the HIV-1 particle, lattice edges are partial hexamers, and partial hexamers are prone to structural changes allowing protease to cut Gag. These findings provide insights into assembly of the immature virus, its structure, and how it disassembles during maturation.
Keywords: HIV-1, cryo-electron tomography, molecular dynamics simulations, virus assembly, maturation
Abstract
The CA (capsid) domain of immature HIV-1 Gag and the adjacent spacer peptide 1 (SP1) play a key role in viral assembly by forming a lattice of CA hexamers, which adapts to viral envelope curvature by incorporating small lattice defects and a large gap at the site of budding. This lattice is stabilized by intrahexameric and interhexameric CA-CA interactions, which are important in regulating viral assembly and maturation. We applied subtomogram averaging and classification to determine the oligomerization state of CA at lattice edges and found that CA forms partial hexamers. These structures reveal the network of interactions formed by CA-SP1 at the lattice edge. We also performed atomistic molecular dynamics simulations of CA-CA interactions stabilizing the immature lattice and partial CA-SP1 helical bundles. Free energy calculations reveal increased propensity for helix-to-coil transitions in partial hexamers compared to complete six-helix bundles. Taken together, these results suggest that the CA dimer is the basic unit of lattice assembly, partial hexamers exist at lattice edges, these are in a helix-coil dynamic equilibrium, and partial helical bundles are more likely to unfold, representing potential sites for HIV-1 maturation initiation.
The polyprotein Gag is the main structural component of HIV-1, consisting of the MA (matrix), CA (capsid), NC (nucleocapsid), and p6 domains as well as the spacer peptides SP1 and SP2 (1). Gag is produced in infected host cells and trafficked to the plasma membrane, where it assembles into a hexagonal lattice via its CA domain and recruits other viral proteins and the viral RNA genome (1, 2). Assembly of the curved Gag lattice is commensurate with membrane bending at the site of assembly, after which recruitment of Endosomal Sorting Complex Required for Transport III (ESCRT-III) components by the p6 domain of Gag induces membrane scission and release of the immature virus particle (2). The hexagonal Gag lattice accommodates curvature in the growing bud by incorporating vacancy defects (3). The activity of ESCRT-III is timed such that the final immature lattice is incomplete, giving rise to an additional large gap in the lattice, resulting in a truncated spherical shape (4–6).
During or after budding, the viral protease is activated and cleaves this immature Gag lattice into its component domains, which leads to structural rearrangement within the virus particle (2). The released CA domains assemble to form a closed, conical capsid around the condensed ribonucleoprotein (RNP) complex of the mature virus (1, 7). Maturation is required for the virion to become infectious (1).
Within the immature virus particle, the N-terminal domain of CA (CANTD) forms trimeric interactions linking three Gag hexamers while the C-terminal domain of CA (CACTD) forms dimeric interactions mediated by helix 9 of CA, linking two Gag hexamers together (8). The CACTD additionally forms intrahexamer interactions around the sixfold axis of the hexamer (8, 9). Amphipathic helices formed by the C-terminal residues of CACTD and the N-terminal residues of SP1 junction assemble into a six-helix bundle (6HB), thereby imposing hexagonal order on the CA domains, via classical knobs-in-holes packing mediated by exposed hydrophobic side chains, as also seen in coiled coils (9, 10). In combination, these relatively weak interactions give rise to a very dynamic, reversible assembly process that prevents the assembling lattice from becoming trapped in kinetically unfavorable states (11). This robust assembly behavior is consistent with icosahedral viruses (12–15). It is not surprising, therefore, that the energetics of Gag assembly are tightly controlled and highly dependent on scaffolding effects from the viral RNA and the membrane-interacting MA domain of Gag in order to ensure productive viral assembly (11, 16). Analysis of the diffusion pattern of fluorescently labeled Gag supports the notion that Gag is trafficked to the site of assembly as low-order multimers, although it is still unclear whether these are Gag dimers, trimers, or other multimeric forms of Gag (16, 17).
The primary assembly unit of the Gag lattice remains largely unknown. We can identify two hypothetical ways in which the lattice could assemble. First, the lattice could grow by addition of Gag hexamers (or sets of six component monomers), such that the CA-SP1 junction is assembled within a hexameric 6HB at all positions in the lattice. In this case, interfaces between hexamers would be unoccupied at the edge of the lattice. From a purely energetic perspective, this appears most reasonable. Second, the lattice could form via addition of Gag dimers or Gag trimers (or equivalently from sets of either two or three component monomers). This would maintain, for example, the dimeric CA-CA interhexamer interactions but leave incomplete hexamers at the lattice edges, including unoccupied hexamer-forming interfaces along the CA-SP1 bundle. It additionally remains unclear whether the unoccupied Gag-Gag interfaces at the lattice edges are simply exposed, or whether they are stabilized by alternative conformations of individual domains or proteins, or by other binding partners. Understanding the structure of the edge of the immature Gag lattice therefore has implications for understanding the mechanism of virus assembly.
Viral assembly, budding, and maturation are tightly linked, and disrupting the kinetics of any of these processes can give rise to defects in maturation and formation of noninfectious viral particles (1, 18, 19). The rate-limiting proteolytic cleavage site in the maturation process resides within the CA-SP1 6HB (20). Unfolding of the helical bundle is required to allow proteolytic cleavage to proceed (21–23), but the exact mechanism for protease access to this site is not known. The spatial localization of proteolytic processing within the context of the immature Gag lattice is relevant: Does the protease act on Gag within the lattice, or does it act on the edges of the Gag lattice, causing a cascade of lattice disruption? At the lattice edge, is the substrate for the protease with a 6HB or within an incomplete hexamer? Understanding the structure of the edge of the immature Gag lattice therefore has implications for understanding the mechanism of virus maturation.
High-resolution immature Gag structures have previously been determined directly from purified viruses by cryo-electron tomography (cryo-ET) and subtomogram averaging (10). These structures represent an average hexamer within the immature lattice, with a full complement of six Gag hexamer neighbors. Here, we have applied subtomogram classification and averaging approaches to an existing immature virus dataset (10) in order to determine the structures of Gag assemblies at lattice edges. We also applied atomistic molecular dynamics simulations to assess the roles of the different CA-CA interactions in immature lattice stabilization and predict the properties of the structures we observe at lattice edges. Together, our results suggest that the basic unit of immature HIV-1 assembly is a Gag dimer and partial CA-SP1 helical bundles are present at the edges of the assembled lattice and may be substrates for initiation of maturation.
Results
Cryo-ET to Reveal the Structure of Gag at the Lattice Edge.
As a starting point for analysis of the edges of the immature lattice, we took the cryo-ET dataset from which Schur et al. (10) previously determined a 4.2-Å map of immature HIV-1 CA and CA-SP1 directly from purified viruses (Electron Microscopy Data Bank [EMDB] accession no. EMD-4017) (Fig. 1A). The data were partially reprocessed to ensure that as much of the Gag layer was retained in the dataset as possible (Fig. 1B). The coordinates of complete or partial Gag hexamers were computationally analyzed to identify those in the vicinity of lattice edges (Fig. 1B), which were subsequently used as input for further image classification.
Image classification of subtomograms aims to sort them based upon differences in macromolecular structure but is complicated by noise in the data and the missing wedge problem (24–26): missing information in Fourier space due to physical limitations on the angular range across which a sample can be tilted in the electron microscope. Computational methods are required to compensate for this missing information (25). We employed two different classification approaches to achieve good separation of structural classes and to validate our results. These approaches were: 1) wedge-masked difference principal component analysis (WMD PCA) (26) and 2) multireference alignment and classification using synthetic references (27). These classification approaches are described in more detail in Materials and Methods and in SI Appendix, Fig. S1. Both classification approaches sorted the immature hexamers at the edge of the lattice, according to whether one, two, or three neighboring hexamers were missing. We did not identify hexamers lacking four or five neighboring hexamers, which could imply either that hexamer species lacking four or five neighbors do not exist at the edge of the lattice or these species exist but are excluded from the dataset because they do not align to a hexameric reference. Class membership differed between the two classification approaches when applied to the same input data, but this is not unusual for classification of noisy, missing wedge-affected data. Both approaches converged to similar structural classes (Fig. 2 and SI Appendix, Figs. S2 and S3). These structural classes illustrate the variety of structures present at the lattice edge.
When we analyzed the appearance of the CACTD region of hexamers for which one neighboring hexamer was missing, we found that they were missing one CACTD. Similarly, hexamers for which two or three neighboring hexamers were missing lacked two or three CACTDs (only four or three copies of CACTD were visible). We considered the hypothesis that the missing CACTDs are present but are highly mobile and, therefore, appear absent. We rejected this hypothesis because the CANTD and CA-SP1 regions, which might act as tethers for a flexible CACTD, are also absent (see below). Gag therefore appears to be behaving as a dimer—when one CACTD is absent, its dimeric partner is also absent. Note that hexamers lacking Gag subunits do not appear to relax into multimers with higher symmetry, e.g., hexamers missing one Gag subunit are not equivalent to Gag pentamers. These data imply that CA dimers are the basic assembly unit of the Gag lattice and the edge of the Gag lattice therefore consists of partial hexamers assembled from Gag dimers.
When one CACTD dimer is absent, the symmetry of the lattice means that one CANTD will be missing from each of two CANTD trimers. We observed that when one CACTD dimer is missing from a hexamer, the density for two CANTD trimers is missing. These data imply that when a CANTD trimer is lacking one member, the remaining two CANTD are no longer stabilized in their positions in the lattice and become mobile, hence they are not resolved (Fig. 2).
Molecular Dynamics of Lattice Edges.
We performed atomistic molecular dynamics simulations to assess the impact of the dimer and trimer interfaces on the structural stability of CA-SP1 hexamers. We characterized stability by assessing the root-mean-squared deviation (rmsd) from the atomic model (Protein Data Bank [PDB] ID code 5L93) at Cα resolution within each of the 12 α-helices (we denote the CA-SP1 junction as helix 12 or H12). A larger rmsd value qualitatively indicates a greater mean shift from the atomic model while a larger distribution of rmsd values suggests more structural variability; the median and interquartile range (IQR) of the rmsd are distinct, yet related, indicators of disorder. Hereafter, we will define a protein segment having an increasingly large median and IQR as being “more disordered.” As a baseline, the rmsd of a complete Gag hexamer (Fig. 3A) exhibited a median (IQR) of 3.3 (1.1) Å per helix.
We next considered an incomplete hexamer with two neighbors missing as observed in our cryo-EM dataset. In the clockwise-most CA-SP1 monomer, the CACTD is dimerized, while the CANTD has one, rather than two, of its trimer contacts (Fig. 3B). This CA-SP1 monomer has a median (IQR) of around 3.7 (1.2) Å per helix in the CACTD (helices 8–11), which suggests that the CACTD structure is similar to that of the complete CA-SP1 hexamer. The median (IQR) of the CANTD (helices 1–7), however, increases to around 5.4 (2.9) Å per helix, consistent with increasing disorder throughout the CANTD due to the absence of one trimer contact. Interestingly, removal of the second CANTD trimer binding partner (Fig. 3C) results in a further shift of the median (IQR) of the CANTD to around 17.9 (4.6) Å (most evident for H5 and H6) while the median (IQR) of the CACTD domain persists around 4.6 (1.2) Å per helix. Removal of the CACTD dimerization partner causes a significant increase in the median (IQR) to 17.5 (11.6) Å and 8.2 (2.4) Å per helix for both CANTD and CACTD, respectively (Fig. 3D).
Taken together, our observations show that the loss of CANTD trimer contacts induces more disorder throughout the CANTD while the loss of CACTD dimer contacts induces more disorder in both the CANTD and CACTD. These findings are consistent with the importance of the CACTD dimerization interface and, to a lesser extent, the CANTD trimerization interface, in stabilizing immature CA-SP1 hexamers. Moreover, our computational analysis supports our cryo-EM observations, which suggest that CA dimers act as the basic assembly unit of the Gag lattice.
Cryo-ET Suggests Structured CA-SP1 Regions in Incomplete Hexamers.
In our density maps of incomplete lattice edge hexamers, we observe that when one or more Gag subunits are missing from a hexamer, the CA-SP1 6HB density becomes slightly weaker but does not disappear from the maps (Figs. 2 and 4). The resolution of the maps is insufficient to characterize the bundle structure in hexamers missing Gag subunits, but the size and position of the density suggest that helical secondary structure is being maintained (Fig. 4)—once uncoiled, this region would not be expected to give rise to significant density. Additionally, the density for the loop between H10 and H11 in the CACTD was observed to be weaker when the neighboring CACTD in the hexamer was absent, suggesting increased flexibility in the part of the CACTD directly upstream of a partial helical bundle (Fig. 4).
The presence of density in the CA-SP1 6HB region suggests that it can still exist in a stable form even when fewer than six helices are present and that the bundle structure can adapt to loss of a helix without becoming completely disordered. This raises the question of how the bundle accommodates loss of up to half of its constituent helices while retaining some ordered packing, given that a crystal structure of this region in a full CA-SP1 6HB exhibits classical knobs-in-holes packing of the hydrophobic residues exposed along the amphipathic CA-SP1 helix (9).
Molecular Dynamics to Assess Helix-Coil Transitions in the 6HB.
Molecular dynamics simulations provided further insight into the structure and thermodynamics of the CA-SP1 6HB in both complete and incomplete hexamers. In all three incomplete hexamer cases studied above, we find that despite a median rmsd of around 8.0 Å for H12 (the CA-SP1 junction), the IQR remains low at 1.6 Å (Fig. 3 B–D and SI Appendix, Fig. S4). Within our simulated timescale (around 410 ns), H12 maintains an α-helical secondary structure but tends to be distorted with respect to the 6HB quaternary structure. The same H12 in complete hexamers, however, has a small median and IQR of around 4.3 and 1.6 Å, respectively, and maintains both its α-helical and quaternary structure. These two observations suggest that the loss of neighboring CA-SP1 monomers distorts the quaternary structure of the 6HB while the secondary structure is maintained. However, it is known from NMR spectroscopy experiments that the CA-SP1 region exists in a helix-coil equilibrium, even within complete hexamers (28). To assess the free energy of the helix-coil transition in both complete and incomplete hexamers, we performed Well-Tempered Metadynamics (WT-MetaD) simulations (see details in Materials and Methods).
We projected the free energy onto two coordinates. The first is the alpha-beta similarity (ABsim), which quantifies the number of phi and psi dihedral angles (ϕ) throughout H12 that are consistent with an α-helix; when the CA-SP1 junction is completely helical (or nonhelical), ABsim is 30 (or 0). The second is the first component (tIC) from time-structure independent component analysis (tICA), which is a dimensional reduction technique used to identify the slowest varying linear projections of data. By construction, the first tIC refers to the slowest collective mode as described by changes to ϕ. Formal definitions and details on both of these coordinates can be found in Materials and Methods.
We compare the two-dimensional projected free energy surfaces for a helix in a complete hexamer, a helix with two neighboring helices in an incomplete hexamer, and the clockwise-most helix (i.e., an exposed helix with one neighbor) in an incomplete hexamer in Fig. 5 A–C, respectively. The free energy surfaces for a helix with two adjacent helices in complete (Fig. 5A) and incomplete (Fig. 5B) hexamers appear to be qualitatively similar. The minimum free energy paths shown in Fig. 5 A and B indicate that the free energy barrier heights for the helix-to-coil transition are comparable (around 7 and 6 kcal/mol, respectively). The free energy surface for the clockwise-most helix in the incomplete hexamer (Fig. 5C), however, is notably different. In particular, the free energy surface exhibits lower barrier heights, such as the 3 kcal/mol helix-to-coil barrier seen in its minimum free energy path (Fig. 5C). We also note that this helix appears to undergo a helix-to-coil transition following a different structural route, as indicated by the positional differences of the minimum free energy path in configurational space with respect to the former two minimum free energy paths.
The two observed helix-to-coil unfolding routes are as follows. The first path is what we term the “swing-out” route, in which the helix begins unfolding below residue S368 and above residue R361 (residues V362-M367 have a propensity to stay helical) while remaining within the helical bundle, after which the helical segment escapes from the bundle and unfolds while solvated (depicted in Fig. 5D). The second path is what we term the “in situ unraveling” route, in which the helix processively unfolds from the bottom of the helix while contacts with the adjacent helix are maintained, after which the helix completely unfolds and detaches (depicted in Fig. 5E). Our simulations suggest that the primary unfolding pathway for helices with two neighbors (in both complete and incomplete hexamers) is the “swing-out” route while the primary unfolding pathway for the exposed helix in incomplete hexamers is the “in situ unraveling” route.
Inositol hexakisphosphate (IP6) is a small molecule that binds within the central pore of immature CA-SP1 hexamers and is known to be an assembly cofactor for the immature virus (29). To test the importance of IP6 on the helix-coil transition, we conducted our simulations both in the presence (all data described above) and absence of IP6 (SI Appendix, Fig. S5). Our simulations show that the absence of IP6 induces modest quantitative differences; the helix-to-coil transition free energy barriers are 3–4 kcal/mol for helices with two neighbors in complete and incomplete hexamers (compared to 6–7 kcal/mol computed in the presence of IP6). The exposed helix in the incomplete hexamer has a transition barrier of 1–2 kcal/mol without IP6 (compared to 3 kcal/mol with IP6). Hence, IP6 tends to reduce the propensity for helix-to-coil transitions, likely by stabilizing the helical bundle, but does not compensate for the increased helix-to-coil transitions expected in partial hexamers. Similarly, we find that helices in partial hexamers are more likely to explore both swing-out and in situ unraveling transition pathways. We conclude from our simulations that the helix-to-coil transition for the exposed helix in an incomplete 6HB is more amenable to unfolding than that of complete hexamers by virtue of an alternative unfolding pathway with a free energy barrier height that is reduced by up to 4 kcal/mol.
Discussion
Implications for Immature Virus Assembly.
Our subtomogram averaging analysis shows structures present at the discontinuous edges of the immature Gag lattice. These structures have been derived from viruses with an inactive protease purified 44 h posttransfection. They therefore most likely do not represent transient intermediates but stable end states. We observe that growth of the lattice ceases such that the lattice edge does not form at the boundary between Gag hexamers (Fig. 6A), but instead it forms at the boundaries between Gag dimers (Fig. 6 B and C). In other words, incomplete Gag hexamers exist at the lattice edge, while all Gag monomers we observe are dimerized. These observations suggest that Gag lattice growth proceeds by the addition of Gag dimers.
Our observations are consistent with the severe assembly phenotypes of mutations in the dimer interface such as WM184,185AA (30, 31). They are also consistent with the observation that constructs in which oligomerization-promoting NC is replaced with a dimerizing leucine zipper domain are competent for assembly (32).
Each Gag dimer contributes to two CANTD trimers and, in agreement with this, we observed loss of density for two CANTD trimers in our maps for each missing dimer. These results show that CANTD domains that are not fully trimerized are not packed in an ordered manner into the Gag lattice.
Our observations suggest that Gag dimers are the key assembly unit during immature virus assembly. The addition of Gag dimers to a growing lattice will continue until addition of further Gag dimers is unfavorable due to constraints of lattice geometry, or until the available Gag is depleted. This could occur such that growth typically arrests where each edge dimer has at least one monomer which is part of a complete hexamer. This would result in partial Gag hexamers at the lattice edges generally having one to three members (Fig. 6B). Alternatively, this could occur such that growth typically arrests where each edge dimer has both constituent monomers bound to a partial hexamer. This would result in partial hexamers with three to five members (Fig. 6C), and these are the classes which we observed. This observation suggests that the binding affinity of a dimer where both constituent monomers make interactions within partial bundles is higher than the binding affinity when only one monomer makes interactions, even if it completes a helical bundle. Furthermore, binding of a dimer such that both constituent monomers make interactions may be favored due to the avidity effect of having two compatible binding sites. This suggests the following hierarchy of association events during assembly: If a dimer binds via one monomer, it creates a site where a second dimer can bind with both monomers and the higher binding affinity makes it likely that this second binding site will be occupied (Fig. 6D). As a result, we observe that where a site exists on the growing lattice edge that can accommodate a Gag dimer such that both component monomers join adjacent partial hexamers, then a Gag dimer is generally bound to that site.
Our findings support coarse-grained molecular dynamics that previously simulated immature Gag lattice assembly as the addition of Gag dimers through 6HB interactions (11). These simulations observed the association of Gag dimers at edges that cyclically varied between the edge cases presented in Fig. 6 B and C. However, Gag dimers tended to favor the formation of trimer-of-dimers to maximize 6HB contacts, i.e., the state depicted in Fig. 6C, which further suggests that the association constant of the second binding site (k2) depicted in Fig. 6D is larger than that of the first binding site (k1). Partial hexamers at edges were predicted to form throughout the assembly until passivated by Gag addition and predicted to persist as lattice defects when kinetically trapped (11), consistent with our observations here.
Overall, our data suggest that immature HIV-1 assembly proceeds by recruitment of Gag dimers into the growing lattice via formation of intrahexameric interactions. The conservation of the arrangement of the CACTD structures among the retroviruses, contrasting with the variable arrangement of the CANTD (8, 33, 34), suggests to us that this assembly route is likely to be conserved.
Implications for Virus Maturation.
Unfolding of the CA-SP1 helical bundle appears to be the main structural determinant of the transition from the immature Gag lattice to a mature CA lattice (35). The 6HB formed by the CA-SP1 junction has been shown to exist in a helix-coil equilibrium, and this equilibrium is likely to limit access of the protease to the cleavage site within the bundle (28). Cleavage makes the transition irreversible. We observed that the CA-SP1 helical bundle was still somewhat ordered even when up to 3 Gag subunits were missing from a hexamer. As the resolution of our structures is insufficient to unambiguously resolve the structures of these partial 6HBs, we performed molecular dynamics simulations to assess their structural stability. Consistent with a previous study that found SP1 to prefer a helical conformation when locally concentrated by Gag oligomerization (36), our simulations show that partial helical bundles can remain ordered but that there is an increased probability of uncoiling of the CA-SP1 helix of the Gag molecule at the edge (from a top-down view) of the partial bundle. Coordination of the 6HB by IP6 seems to hinder uncoiling (by 2–4 kcal/mol) but is insufficient to completely inhibit uncoiling, especially in partial bundles. These observations suggest that CA-SP1 cleavage may initiate stochastically within partial hexamers at lattice edges, which would then cause local disassembly of the lattice as it undergoes structural maturation. This, in turn, would destabilize hexamers immediately adjacent to the maturation event by removing the interhexamer interactions that are involved in both dimer and trimer formation in the immature lattice, and these hexamers would then be more likely to undergo maturation compared to those in the middle of the lattice. This would proceed toward the middle of the lattice as a “wave” of maturation from one or more initiation sites at lattice edges, promoting lattice disassembly and proceeding to consume CA-SP1 inwards from that site.
Materials and Methods
Generation of the Cryo-EM Dataset.
A previously published cryo-ET dataset of purified, immature HIV-1 viral particles, which yielded a 4.2-Å structure of the immature hexamer (10) (EMDB accession no. EMD-4017), was used as a starting point for analysis of immature Gag lattice edges. This dataset consists of 74 tomograms containing 484 viruses previously used for structural determination, with an unbinned pixel size of 1.35 Å/pixel.
To ensure data completeness, we used roughly aligned subtomogram positions from an intermediate step in the processing of the dataset above (10), immediately prior to the exclusion of subtomograms based on cross-correlation value. These positions had been generated by three successive iterations of alignment and averaging of 8× binned subtomograms against an initial sixfold symmetric reference. As subtomogram extraction positions were oversampled for the initial angular search, duplicate subtomograms that had aligned onto the same positions were removed from the dataset by applying a pairwise distance criterion of four binned pixels (4.32 nm). The remaining positions were then visualized in UCSF Chimera using a custom plugin as described in Qu et al. (33), and misaligned subtomogram positions were removed by manual inspection. Misaligned positions were defined as those positions not conforming to the geometry of the hexagonal lattice, for example those that were substantially rotated out of the plane of the lattice. We did not exclude any positions based on cross-correlation coefficient (CCC), since this could result in partial hexamers being removed from the dataset if they correlated less strongly to the sixfold hexameric reference.
Selection and Preparation of Lattice Edge Positions.
The remaining 178,750 aligned subtomogram positions were then analyzed to identify the edges of the immature Gag lattice. A custom MATLAB script was used to identify every possible pattern of missing neighbors around each hexamer position in the lattice map of each virus in the dataset. Using this script, we identified 62,815 potential edge positions and oriented all of them to place the predicted gap in the lattice in a single direction. Noncontiguous gap classes were discarded. As the number of subtomograms missing four or five neighbors was very low, we retained only subtomograms with one, two, or three contiguous neighboring hexamers missing. This resulted in a dataset of subtomogram positions containing 57,134 points, which we pooled.
The coordinates of the oriented subtomogram positions for edge hexamers were scaled for use with 4× binned data. Subtomograms were extracted from 4× binned tomograms with a box edge size of 72 binned pixels, corresponding to 388.8 Å in each dimension. One iteration of angular refinement was then performed against the final 4× binned average previously generated by Schur et al (10). The alignment was performed using an 8 × 2° angular search range for all Euler angles, a 32.4-Å low-pass filter, C6 symmetry, and a mask around the central hexamer and all six neighboring positions. The resulting subtomogram positions were used as the starting point for image-based classification. The average of the aligned subtomograms was also generated for subsequent use in wedge-masked difference map and multireference alignment-based classification.
Classification by PCA of WMD.
We adapted wedge-masked difference map-based subtomogram classification (hereafter referred to as WMD PCA), originally described by Heumann, Hoenger, and Mastronarde (26) for use with subtomogram averaging scripts based on the TOM (37), AV3 (25), and Dynamo (38) packages.
To generate the missing wedge mask, 100 subtomograms were extracted from empty “noise” regions in each tomogram and normalized to a mean gray value of 0 with a variance of 1. Their amplitude spectra were calculated and averaged to generate a Fourier weight for that tomogram that describes missing information in Fourier space due to the missing wedge and the CTF—this is the wedge mask for that tomogram.
Each subtomogram was rotated into the reference frame according to the angles calculated during angular refinement. The same rotation was applied to the corresponding wedge volume. The subtomogram and normalized reference were Fourier transformed, low-pass filtered to 30 Å, multiplied by the wedge mask, and inverse Fourier transformed to generate the wedge-weighted volume. A real-space mask was then applied to the weighted volumes so that only the central hexamer and its six immediate neighbors were considered for difference map calculation. The gray values under the mask were again normalized and the weighted and masked subtomogram volume was subtracted from the weighted and masked average of all rotated subtomograms in order to generate a wedge-masked difference map.
The difference map voxels under the masked region of interest were then stored as an m × n matrix, where m is the number of voxels under the mask and n is the number of subtomograms in the dataset. Singular value decomposition was performed on the matrix of difference map voxels to decompose the voxel matrix D according to the relationship D = USVT. The first 30 left singular vectors of the matrix were obtained from the matrix U, reshaped to match the mask, and stored as the first 30 eigenvolumes of the dataset. SVT was stored as this provides the corresponding eigencoefficients for use in clustering the data.
Eigenvolumes were inspected manually to identify those corresponding to structural differences between the average structure and the subtomograms, rather than those describing residual differences due to the orientation of the subtomograms relative to the missing wedge. A subset of eigenvolumes was selected. The eigencoefficients corresponding to these selected eigenvolumes were used as input for k-means clustering in MATLAB with 10 replicates and k = 30. The subtomograms in the dataset were then grouped into classes based on these clusters, and the average of each class was generated. Class averages were inspected visually, and classes containing one, two, or three as well as no missing hexameric neighbors around the central hexamer were identified, with some classes rotated in-plane by one hexamer position (i.e., 60°). The in-plane rotation angle of the subtomogram positions in these rotated classes was adjusted in order to match the configurations seen in the other classes. Multiple classes were identified as missing one, two, and three neighboring hexamers, and these classes were pooled into larger, single classes with one, two, and three missing neighboring hexamers before generation of the class averages (SI Appendix, Fig. S2C). Image processing statistics are summarized in SI Appendix, Table S1.
Classification by Multireference Alignment.
Subtomograms were divided into equal-sized subsets according to odd and even particle number and averaged in order to produce two starting averages, each with half of the data. The odd and even half-references were then multiplied in MATLAB by masks constructed in order to down-weight one, two, or three hexamers around the central hexamer (SI Appendix, Fig. S3). The original, nonmultiplied references were also included to allow identification of complete hexamers with a full complement of six neighboring hexamers. Subtomograms were then aligned against all of these artificial references for six iterations, with a low-pass filter of 29.9 Å, a real-space mask passing the central hexamer as well as three of its contiguous neighbors in the gap direction, and a restricted in-plane angular search allowing only rotation of 60° in each direction. The reference to which each subtomogram aligned with the highest CCC was used to assign the class of that subtomogram. Simulated annealing was used for stochastic sampling in order to allow subtomograms to escape from local minima between alignment iterations, using a scaling factor of 0.3 with the approach described by Hrabe et al (27). Class membership had converged onto stable classes by the sixth iteration, and the results of this alignment iteration were used for subsequent structure generation and analysis.
Generation of Class Averages.
The pooled classes from WMD PCA classification, as well as the final classes from multireference classification, were then unbinned to regenerate the class averages from 2× binned subtomograms as the final structures.
Using the subset of data (39% of the total) that was assigned to the same class by both WMD-PCA and multireference alignment, we obtained similar structures to those obtained using the individual methods, although the resolution was lower due to the reduced amount of data.
Molecular Dynamics Simulations and Analysis.
The initial all-atom protein configuration was adopted from an atomic model (PDB ID code 5L93); systems with missing monomers were initialized by deleting relevant monomers. Myo-IP6 (fully deprotonated) was randomly positioned between the two rings of lysine (K290 and K359) in the central pore region and each of the six incomplete pore regions along the exterior protein interface. All proteins were solvated by water and 150 mM NaCl in a rhombic dodecahedron simulation domain large enough to contain a 1.5-nm layer of water perpendicular to each exterior protein interface. Energy minimization was performed using steepest descent until the maximum force was less than 1,000 kJ mol−1 nm−1. Equilibration was performed with harmonic restraints (using a 1,000 kJ mol−1 nm−2 spring constant) on each heavy atom throughout the protein for 10 ns in the constant NVT ensemble using stochastic velocity rescaling (39) at 310 K and a damping time of 0.1 ps. Restraints were then removed and each system was allowed to equilibrate for 100 ns in the constant NPT ensemble using a Nosé–Hoover thermostat (40) (2-ps damping time) and a Parrinello–Rahman barostat (41) (10-ps damping time) at 310 K and 1 bar.
Two types of production runs were performed. Simulations to characterize the flexibility in protein structure were performed over 300 ns with configurations saved every 20 ps. Rigid structural alignment of the protein complex in each frame was performed with the CTD domains of the interior CA-SP1 hexamer in reference to the atomic model. The atomic model was also used as reference to compute rmsd values after alignment.
Simulations to characterize free energies were performed over 2,490 ns and used the following two coordinates. The first is the alpha-beta similarity (ABsim), which is given by:
[1] |
where i denotes an index over the considered dihedrals and residues, ϕi is the dihedral angle, and ϕiref is the reference dihedral angle. Here, we include the phi and psi angles of the CA-SP1 junction (Gag residues 356–370) and set ϕiref to −60°, such that when the junction is completely α-helical, ABsim is 30.
The second are components identified from tICA (42), which is a technique used to identify linear projections of data (i.e., linear combinations of features) that capture the slowest varying motions by maximizing the autocorrelation function; here, we use cos(ϕi) as our feature set and a lag time (τ) of 35 ns to construct a covariance matrix:
[2] |
Each tIC is each eigenvector identified by solving the generalized eigenvalue problem. Here, we consider the first tIC (i.e., the slowest mode or the eigenvector corresponding to the largest eigenvalue) as our second free energy coordinate.
The WT-MetaD algorithm (43) was used with a Gaussian bias (using a height and width of 0.5 kJ/mol and 0.1) deposited every 1 ps along the ABsim coordinate with a bias factor of 25 kBT. The resultant free energy surfaces were projected onto different coordinates using the Tiwary–Parrinello reweighting algorithm (44). The MSMBuilder python library (45) was used to perform tICA on a separate molecular dynamics trajectory of a peptide representing the CA-SP1 junction. The second tIC was also considered as an additional coordinate for dimensional reduction of the underlying free energy surface (SI Appendix, Fig. S6) and shows quantitatively similar minimum free energy paths.
All simulations were prepared and simulated with GROMACS 2016 (46) using the CHARMM36m forcefield (47). A timestep of 2 fs was used in all simulations with hydrogen-containing bonds constrained using the LINCS algorithm (48). Metadynamics simulations were performed using the PLUMED 2.4 plugin (49). Table 1 summarizes relevant statistics for each simulated system.
Table 1.
System | Size, no. of atoms | Trajectory length, ns | Replicas |
18-mer CA-SP1 + 7 IP6 | 552,932 | 410 | 1 |
12-mer CA-SP1 + 7 IP6 | 499,154 | 410 | 1 |
11-mer CA-SP1 + 7 IP6 | 498,519 | 410 | 1 |
10-mer CA-SP1 + 7 IP6 | 498,184 | 410 | 1 |
6-mer CA-SP1 + IP6 | 248,655 | 2,600 | 4 |
MetaD on interior helix | |||
4-mer CA-SP1 + IP6 | 241,709 | 2,600 | 4 |
MetaD on interior helix | |||
4-mer CA-SP1 + IP6 | 241,709 | 2,600 | 4 |
MetaD on exposed helix |
Model Fitting and Lattice Maps.
An atomic model derived from a high-resolution cryo-EM structure of the immature Gag CA domain hexamer (PDB ID code 5L93) was used for all model fitting into the cryo-EM density maps (10). The coordinates for the CANTD and CACTD from this model were fit as rigid bodies into each corresponding position in the density maps using UCSF Chimera (50). This was done for each of the density maps containing hexamers with one, two, or three missing CA protomers from the central hexamer position. The atomic models shown in positions where there was no CA density (Fig. 2) were positioned for illustration purposes using the corresponding density map of the average complete hexamer from this dataset (10).
Lattice maps showing the positions and orientations of aligned subtomograms within the coordinate system of the original tomograms were plotted in UCSF Chimera using a custom plugin (33).
Supplementary Material
Acknowledgments
We acknowledge Kun Qu for providing advice and assistance in data processing and classification of subtomogram data and Jake Grimmett and Toby Darling from Medical Research Council Laboratory of Molecular Biology Scientific Computing for providing technical support. A.J.P. acknowledges support from the Ruth L. Kirschstein National Research Service Award Postdoctoral Fellowship F32-AI150477 by the National Institutes of Health. G.A.V. acknowledges support from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health Grant R01-AI150492. This work used computational resources provided by the Extreme Science and Engineering Discovery Environment, which is supported by National Science Foundation Grant ACI-1548562. J.A.G.B. acknowledges funding from the European Molecular Biology Laboratory and MRC Grant MC_UP_1201/16.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2020054118/-/DCSupplemental.
Data Availability.
The structures described in this paper have been deposited in the Electron Microscopy Data Bank (https://www.ebi.ac.uk/pdbe/emdb) under accession codes EMD-12147, EMD-12148, EMD-12149, EMD-12150, EMD-12151, EMD-12152 and EMD-12153.
References
- 1.Pornillos O., Ganser-Pornillos B. K., Maturation of retroviruses. Curr. Opin. Virol. 36, 47–55 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sundquist W. I., Kräusslich H.-G., HIV-1 assembly, budding, and maturation. Cold Spring Harb. Perspect. Med. 2, a006924 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Briggs J. A. G., et al. , Structure and assembly of immature HIV. Proc. Natl. Acad. Sci. U.S.A. 106, 11090–11095 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wright E. R., et al. , Electron cryotomography of immature HIV-1 virions reveals the structure of the CA and SP1 Gag shells. EMBO J. 26, 2218–2226 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Carlson L. A., et al. , Cryo electron tomography of native HIV-1 budding sites. PLoS Pathog. 6, e1001173 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Johnson D. S., Bleck M., Simon S. M., Timing of ESCRT-III protein recruitment and membrane scission during HIV-1 assembly. eLife 7, e36221 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mattei S., Glass B., Hagen W. J. H., Kräusslich H.-G., Briggs J. A. G., The structure and flexibility of conical HIV-1 capsids determined within intact virions. Science 354, 1434–1437 (2016). [DOI] [PubMed] [Google Scholar]
- 8.Schur F. K. M., et al. , Structure of the immature HIV-1 capsid in intact virus particles at 8.8 Å resolution. Nature 517, 505–508 (2015). [DOI] [PubMed] [Google Scholar]
- 9.Wagner J. M., et al. , Crystal structure of an HIV assembly and maturation switch. eLife 5, e17063 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schur F. K. M., et al. , An atomic model of HIV-1 capsid-SP1 reveals structures regulating assembly and maturation. Science 353, 506–508 (2016). [DOI] [PubMed] [Google Scholar]
- 11.Pak A. J., et al. , Immature HIV-1 lattice assembly dynamics are regulated by scaffolding from nucleic acid and the plasma membrane. Proc. Natl. Acad. Sci. U.S.A. 114, E10056–E10065 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Endres D., Zlotnick A., Model-based analysis of assembly kinetics for virus capsids or other spherical polymers. Biophys. J. 83, 1217–1230 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hagan M. F., Chandler D., Dynamic pathways for viral capsid assembly. Biophys. J. 91, 42–54 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rapaport D. C., Role of reversibility in viral capsid growth: A paradigm for self-assembly. Phys. Rev. Lett. 101, 186101 (2008). [DOI] [PubMed] [Google Scholar]
- 15.Hagan M. F., Elrad O. M., Jack R. L., Mechanisms of kinetic trapping in self-assembly and phase transformation. J. Chem. Phys. 135, 104115 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yang Y., et al. , Roles of Gag-RNA interactions in HIV-1 virus assembly deciphered by single-molecule localization microscopy. Proc. Natl. Acad. Sci. U.S.A. 115, 6721–6726 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Inamdar K., Floderer C., Favard C., Muriaux D., Monitoring HIV-1 assembly in living cells: Insights from dynamic and single molecule microscopy. Viruses 11, 72 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lee S.-K., Potempa M., Swanstrom R., The choreography of HIV-1 proteolytic processing and virion assembly. J. Biol. Chem. 287, 40867–40874 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pettit S. C., Sheng N., Tritch R., Erickson-Viitanen S., Swanstrom R., The regulation of sequential processing of HIV-1 Gag by the viral protease. Adv. Exp. Med. Biol. 436, 15–25 (1998). [DOI] [PubMed] [Google Scholar]
- 20.Pettit S. C., Lindquist J. N., Kaplan A. H., Swanstrom R., Processing sites in the human immunodeficiency virus type 1 (HIV-1) Gag-Pro-Pol precursor are cleaved by the viral protease at different rates. Retrovirology 2, 66 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Prabu-Jeyabalan M., Nalivaika E., Schiffer C. A., Substrate shape determines specificity of recognition for HIV-1 protease: Analysis of crystal structures of six substrate complexes. Structure 10, 369–381 (2002). [DOI] [PubMed] [Google Scholar]
- 22.Alvizo O., Mittal S., Mayo S. L., Schiffer C. A., Structural, kinetic, and thermodynamic studies of specificity designed HIV-1 protease. Protein Sci. 21, 1029–1041 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lee S.-K., et al. , Context surrounding processing sites is crucial in determining cleavage rate of a subset of processing sites in HIV-1 Gag and Gag-Pro-Pol polyprotein precursors by viral protease. J. Biol. Chem. 287, 13279–13290 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bartesaghi A., et al. , Classification and 3D averaging with missing wedge correction in biological electron tomography. J. Struct. Biol. 162, 436–450 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Förster F., Pruggnaller S., Seybert A., Frangakis A. S., Classification of cryo-electron sub-tomograms using constrained correlation. J. Struct. Biol. 161, 276–286 (2008). [DOI] [PubMed] [Google Scholar]
- 26.Heumann J. M., Hoenger A., Mastronarde D. N., Clustering and variance maps for cryo-electron tomography using wedge-masked differences. J. Struct. Biol. 175, 288–299 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hrabe T., et al. , PyTom: A python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. J. Struct. Biol. 178, 177–188 (2012). [DOI] [PubMed] [Google Scholar]
- 28.Wang M., et al. , Quenching protein dynamics interferes with HIV capsid maturation. Nat. Commun. 8, 1779 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dick R. A., et al. , Inositol phosphates are assembly co-factors for HIV-1. Nature 560, 509–512 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Joshi A., Nagashima K., Freed E. O., Mutation of dileucine-like motifs in the human immunodeficiency virus type 1 capsid disrupts virus assembly, gag-gag interactions, gag-membrane binding, and virion maturation. J. Virol. 80, 7939–7951 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ono A., Waheed A. A., Joshi A., Freed E. O., Association of human immunodeficiency virus type 1 gag with membrane does not require highly basic sequences in the nucleocapsid: Use of a novel gag multimerization assay. J. Virol. 79, 14131–14140 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Crist R. M., et al. , Assembly properties of human immunodeficiency virus type 1 gag-leucine zipper chimeras: Implications for retrovirus assembly. J. Virol. 83, 2216–2225 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Qu K., et al. , Structure and architecture of immature and mature murine leukemia virus capsids. Proc. Natl. Acad. Sci. U.S.A. 115, E11751–E11760 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Schur F. K. M., Dick R. A., Hagen W. J. H., Vogt V. M., Briggs J. A. G., The structure of immature virus-like rous sarcoma virus gag particles reveals a structural role for the p10 domain in assembly. J. Virol. 89, 10294–10302 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mattei S., et al. , High-resolution structures of HIV-1 Gag cleavage mutants determine structural switch for virus maturation. Proc. Natl. Acad. Sci. U.S.A. 115, E9401–E9410 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Datta S. A. K., et al. , Dimerization of the SP1 region of HIV-1 gag induces a helical conformation and association into helical bundles: Implications for particle assembly. J. Virol. 90, 1773–1787 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Nickell S., et al. , TOM software toolbox: Acquisition and analysis for electron tomography. J. Struct. Biol. 149, 227–234 (2005). [DOI] [PubMed] [Google Scholar]
- 38.Castaño-Díez D., Kudryashev M., Arheit M., Stahlberg H., Dynamo: A flexible, user-friendly development tool for subtomogram averaging of cryo-EM data in high-performance computing environments. J. Struct. Biol. 178, 139–151 (2012). [DOI] [PubMed] [Google Scholar]
- 39.Bussi G., Donadio D., Parrinello M., Canonical sampling through velocity rescaling. J. Chem. Phys. 126, 014101 (2007). [DOI] [PubMed] [Google Scholar]
- 40.Martyna G. J., Klein M. L., Tuckerman M., Nosé–Hoover chains: The canonical ensemble via continuous dynamics. J. Chem. Phys. 97, 2635–2643 (1992). [Google Scholar]
- 41.Parrinello M., Rahman A., Crystal structure and pair potentials: A molecular-dynamics study. Phys. Rev. Lett. 45, 1196–1199 (1980). [Google Scholar]
- 42.M Sultan M., Pande V. S., tICA-Metadynamics: Accelerating metadynamics by using kinetically selected collective variables. J. Chem. Theory Comput. 13, 2440–2447 (2017). [DOI] [PubMed] [Google Scholar]
- 43.Barducci A., Bussi G., Parrinello M., Well-tempered metadynamics: A smoothly converging and tunable free-energy method. Phys. Rev. Lett. 100, 020603 (2008). [DOI] [PubMed] [Google Scholar]
- 44.Tiwary P., Parrinello M., A time-independent free energy estimator for metadynamics. J. Phys. Chem. B 119, 736–742 (2015). [DOI] [PubMed] [Google Scholar]
- 45.Harrigan M. P., et al. , MSMBuilder: Statistical models for biomolecular dynamics. Biophys. J. 112, 10–15 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Abraham M. J., et al. , GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25 (2015). [Google Scholar]
- 47.Huang J., et al. , CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hess B., Bekker H., Berendsen H. J. C., Fraaije J. G. E. M., LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 18, 1463–1472 (1997). [Google Scholar]
- 49.Tribello G. A., Bonomi M., Branduardi D., Camilloni C., Bussi G., PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 185, 604–613 (2014). [Google Scholar]
- 50.Pettersen E. F., et al. , UCSF Chimera–A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The structures described in this paper have been deposited in the Electron Microscopy Data Bank (https://www.ebi.ac.uk/pdbe/emdb) under accession codes EMD-12147, EMD-12148, EMD-12149, EMD-12150, EMD-12151, EMD-12152 and EMD-12153.