Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2020 Nov 28;120(6):1097–1104. doi: 10.1016/j.bpj.2020.10.048

A multiscale coarse-grained model of the SARS-CoV-2 virion

Alvin Yu 1, Alexander J Pak 1, Peng He 1, Viviana Monje-Galvan 1, Lorenzo Casalino 2, Zied Gaieb 2, Abigail C Dommer 2, Rommie E Amaro 2, Gregory A Voth 1,
PMCID: PMC7695975  PMID: 33253634

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the COVID-19 pandemic. Computer simulations of complete viral particles can provide theoretical insights into large-scale viral processes including assembly, budding, egress, entry, and fusion. Detailed atomistic simulations are constrained to shorter timescales and require billion-atom simulations for these processes. Here, we report the current status and ongoing development of a largely “bottom-up” coarse-grained (CG) model of the SARS-CoV-2 virion. Data from a combination of cryo-electron microscopy (cryo-EM), x-ray crystallography, and computational predictions were used to build molecular models of structural SARS-CoV-2 proteins, which were then assembled into a complete virion model. We describe how CG molecular interactions can be derived from all-atom simulations, how viral behavior difficult to capture in atomistic simulations can be incorporated into the CG models, and how the CG models can be iteratively improved as new data become publicly available. Our initial CG model and the detailed methods presented are intended to serve as a resource for researchers working on COVID-19 who are interested in performing multiscale simulations of the SARS-CoV-2 virion.

Significance

This study reports the construction of a molecular model for the SARS-CoV-2 virion and details our multiscale approach toward model refinement. The resulting model and methods can be applied to and enable the simulation of SARS-CoV-2 virions.

Introduction

The onset of the global coronavirus disease 2019 (COVID-19) pandemic has brought intense investigation into the molecular components of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) encoded by the virus’s 30-kb genome. Structural biology efforts using cryo-electron microscopy (cryo-EM) and x-ray crystallographic techniques are currently reporting new structures of viral proteins every week (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), and computational structure prediction efforts are targeting unresolved sections of the genome using a variety of protein folding algorithms. Computational and experimental studies are underway to find new molecular therapeutics that can inhibit viral activity or further elucidate the mechanisms of action of SARS-CoV-2 proteins (13, 14, 15, 16). The computer simulation of large-scale SARS-CoV-2 processes such as virion assembly, budding, entry, and fusion will remain intrinsically challenging to investigate using all-atom (AA) molecular dynamics (MD), owing to the computational cost of meaningfully simulating the hundreds of millions to billions of atoms involved.

A holistic model of the SARS-CoV-2 virion can provide insight into the mechanisms of large-scale viral processes and the collective behavior of macromolecules involved in viral replication and infectivity. SARS-CoV-2 virions contain four main structural proteins: the spike (S), membrane (M), nucleocapsid (N), and envelope (E) proteins (17). S proteins are glycosylated trimers that mediate fusion and entry, in part by attaching enclosed fusion peptide sequences into the membranes of host cells (18). M proteins appear as dimeric complexes embedded within the virion envelope and are believed to anchor ribonucleoprotein complexes to the envelope (19,20). N proteins associate with and organize RNA into ribonucleoprotein structures found in the interior of virions (21,22). Lastly, E proteins are thought to form pentameric ion channels that are found at the lipid bilayers of virion membranes and contribute to viral budding (23).

In this work, we construct a largely “bottom-up” coarse-grained (CG) model of the SARS-CoV-2 virion from the currently available structural and atomistic simulation data on SARS-CoV-2 proteins. In general, this model serves as a resource for researchers working on COVID-19 and as a platform to incorporate computational and experimental data. This model also enables new multiscale studies of SARS-CoV-2 processes to possibly help find treatment and prevention strategies against COVID-19. Atomistic trajectory and experimental structural data deposited in the National Science Foundation (NSF) Molecular Sciences Software Institute (MolSSI) will be incorporated as they become publicly available (24). In this work, we detail several of our CG methods used to iteratively develop a CG model for the full SARS-CoV-2 virion, in which molecular interactions between CG particles are derived using a combination of phenomenological, experimental, and atomistic simulation approaches.

Methods

Building models from structural data

We first constructed atomic models of the structural proteins of the SARS-CoV-2 virion (Fig. 1). AA models of the open and closed state of the S protein were built based on the cryo-EM structures of the spike ectodomain (Protein Data Bank, PDB: 6VYB, 6VXX) (5), respectively, and atomic models of the N protein were constructed based on the x-ray crystallographic structure of the nucleocapsid N-terminal domain (NTD) (PDB: 6M3M) (27). Glycosylation sites were modeled using Glycan Reader & Modeler in CHARMM-GUI (28) and the site-specific glycoprofile derived from mass spectrometry and cryo-EM analysis (29,30). Homology models for the S-protein stalk, including the HR2 and TM domains, were assembled as α-helical trimeric bundles using MODELER (31) on the basis of secondary structure assignments in JPred4 (32). Homology models for the SARS-CoV-2 N protein C-terminal domain (CTD) were created from the x-ray crystallographic structure of the SARS-CoV N protein CTD (PDB: 2CJR) (33). Missing amino acid backbones in loop regions were built in MODELER, and side chain rotamers were built using SCWRL4 (34). We used atomic models for the M-protein dimer (25) and the pentameric E ion channel (26) that were developed by homology.

Figure 1.

Figure 1

Viral proteins of SARS-CoV-2. The genome of SARS-CoV-2 is shown in the top panel. Nonstructural proteins (NSPs) encoded in the open reading frame (ORF) 1ab are colored in orange, and the full genome is in teal. (A) AA models of the structural proteins of SARS-CoV-2 consisting of the S, E, M, and N proteins are given. Asterisks indicate homology-modeled protein structures for M and E (25,26). (B) A schematic of the virion surface from cryo-EM images of the virion is given, adapted from (19). To see this figure in color, go online.

AA protein models (see discussion below) were subsequently simulated and coarse grained to generate the CG models (see Fig. 2 and sections below). A previously developed CG model for lipids was used, consisting of three CG beads per lipid and distinct bead types for lipid headgroups and hydrophobic tails (35). A single-component CG lipid bilayer was generated in a spherical configuration and equilibrated using CG MD simulations under constant NVT conditions in LAMMPS (36). We note that in the future, more complex CG lipid models (37) can be added. Transmembrane segments of component membrane proteins were visually identified and assigned based on secondary structure motifs. Individual lipids on the outer leaflet of the spherical bilayer were randomly selected and used as initial positions for embedding spike, membrane, and envelope proteins. For each initial position, the center of mass of the transmembrane domain was aligned with the center of the lipid bilayer, and the principal axis of the protein was aligned with the vector normal to the lipid bilayer. Transmembrane regions were then substituted for the overlapping CG lipids to embed the proteins. The procedure was iterated until a spike, membrane, and envelope protein density on the virion surface was achieved that was approximately consistent with current available experimental estimates of ∼25, 1000, and 20 per virion, respectively, from cryo-EM and biochemical data (38, 39, 40). The diameters of the membrane envelope are ∼100 and 140 nm including the S proteins on the virion surface. As higher-resolution experimental data are released, the overall structure of this model can be refined.

Figure 2.

Figure 2

CG models of the SARS-CoV-2 structural proteins. (A) The CG model of the S-protein trimer in the open state is shown. The protein monomers are depicted as pink, green, and cyan beads, respectively; the monomer in pink has an exposed receptor binding domain. Each of the 22 (× 3) N-linked glycans are depicted as gray beads. (B) The CG model of the pentameric E protein is depicted as orange beads. (C) The CG M dimer model is depicted as yellow and blue spheres, overlaid on top of the AA model of the M dimer. Each monomer has 36 CG sites, and the red lines indicate the approximate positions of the transmembrane region. (D) The CG model of the N protein CTD helix in complex with viral RNA is shown. The N protein helix and bonds derived from the hENM are depicted in cyan, and the RNA is depicted as orange beads. To see this figure in color, go online.

AA MD simulations of the S protein

Two glycosylated models of the open and closed spike were inserted into a symmetric 225 Å × 225 Å lipid bilayer mimicking the composition of the endoplasmic reticulum (ER)-Golgi intermediate compartment (41,42). The lipid patch was built using CHARMM-GUI. The complete protein-membrane system was solvated using the TIP3P water model (43) and neutralized with chloride and sodium ions to maintain a 150-mM concentration. Each system contained ∼1.7 million atoms. Minimization and equilibration were performed using the CHARMM36 force field (44,45) and NAMD 2.14 (46). Production runs were performed in the NPT ensemble using a Langevin thermostat at 310 K and Nosé-Hoover Langevin barostat at 1 atm. All production runs used a 2-fs timestep and the SHAKE algorithm. Multiple replicas of AA MD simulations of the open (3×) and closed (3×) systems were performed on NSF Frontera at the Texas Advanced Computing Center (TACC), achieving an aggregate sampling of 3.0 and 1.8 μs, respectively.

CG model of the S protein

The CG model of the glycosylated S protein (Fig. 2 A) was parameterized from the AA MD simulations described above (47). Reference statistics used conformations sampled equally from both open and closed states, with AA trajectories spanning 3.0 and 1.8 μs, respectively. First, the protein was mapped to CG beads using essential dynamics coarse graining (EDCG) (48). We used 60 and 50 CG beads for the S1 and S2 domains, respectively, and the 22 N-linked glycans were each mapped to a single bead. Intraprotein interactions were represented as a heteroelastic network model (hENM) with bond energies k(rr0)2, where k is the spring constant of a particular CG bond and r0 is the equilibrium bond length. These parameters were optimized using the hENM method (49). Interprotein interactions within the S-trimers were composed of excluded volume, attractive, and screened electrostatic terms. For excluded volume interactions, a phenomenological soft cosine potential, A[1 + cosπrrc], was used, where A = 25 kcal/mol and rc is the onset for excluded volume. Attractive, nonbonded interactions between interprotein contacts were modeled as the sum of two Gaussian potentials, A1exp[(rijr1)22σ12]+A2exp[(rijr2)22σ22], where r1 and σ1 are the mean and standard deviation determined by a fit to the pair correlation between CG sites i and j through least-squares regression. The constants A1 and A2 were optimized using relative-entropy minimization (REM). Screened electrostatics were modeled using Yukawa potentials, qiqj/(4πεrε0rij)exp(-κrij), where qi is the aggregate charge of CG site i, κ = 1.274 nm−1 is the inverse Debye length for 0.15 M NaCl, and ɛr is the effective dielectric constant of the protein environment, approximated as 17.5 (50).

CG models of the M and E proteins

AA simulations of the M-protein dimer were performed using homology models and a membrane model based on the ER. The membrane model included PC:PE:PI:PS:Chol lipids (0.45:0.10:0.23:0.10:0.12 mol fraction) as an initial approximation to the ER-Golgi intermediate compartment (ERGIC). The protein-membrane systems were solvated and neutralized in a similar fashion as described previously. The simulations were equilibrated for 400 ns before a 4-μs production run on Anton 2. All simulations were run in the constant NPT ensemble at 310 K and 1 atm using the CHARMM36m force field. A CG model containing ∼5 residues per CG bead was mapped from the reference statistics of the AA MD simulations using the EDCG (Fig. 2 C), and hENM approaches. A CG model for the E protein was developed by linearly mapping the amino acid sequence to particles at a resolution of 1 CG bead per five amino acids (Fig. 2 B).

CG model of the N protein

Several studies suggest that the CTD of the N protein assembles into a helix that contains two RNA binding grooves (21,51). Based on these studies, we constructed atomic and CG models of the viral ribonucleoprotein complex (vRNP) by iterating between CG and AA simulations. We first constructed an atomic model of the N protein CTD helix with two RNA binding grooves by stacking three copies of the CTD octamer structure (PDB: 2CJR), which is composed of four CTD dimers and homology modeled from the x-ray crystallographic structure of the SARS-CoV N protein CTD (33). The CTD helix was simulated in the CHARMM36m force field for 400 ns. We then constructed the CTD helix model using EDCG combined with hENM, followed by manually placing CG RNA beads into the groove of the helix (Fig. 2 D). The positions of the CG beads were used as restraints to build an atomic model of the vRNP complex. Finally, the vRNP model was relaxed and simulated in the CHARMM36m force field for 400 ns. It is important to note that recent cryo-EM studies have found granule-like densities within the virion for the vRNP complex (22). Structural detail into how CTD oligomers (including the previously proposed helical model) and RNA fit into these densities will likely require higher-resolution images.

Deriving CG molecular interactions from AA simulations

Several computational approaches have been developed to build or refine CG models using data from AA or fine-grained simulations. Our approach to coarse-graining the SARS-CoV-2 virion is to couple several CG methods in a hierarchical fashion. CG sites or “beads” are mapped from atomic structures using EDCG, a method designed to preserve the principal modes of motion sampled during atomic-level simulations (48). In EDCG, a given CG mapping operator, MRN: rnRN, that relates the configurations of the atomistic trajectory (rn) to that of the CG model (RN) is variationally optimized using simulated annealing. Typically, the mapping is constructed so that contiguous segments of a protein’s primary amino acid sequence are mapped to distinct CG sites. For a fixed number of CG beads, N, the sets of atoms that are mapped to CG sites are adjusted to minimize the target residual:

χ2=13NI=1Ni,j|rirj|2t:i,jεI,ji, (1)

where I = 1, …, N is the CG site index; the brackets, t, denote a time-averaged quantity; the sum over i, j is a sum over all unique pairs in the set of atoms belonging to the CG site, I; and ri = xixit is the displacement of atom i from the atom’s mean position, xit. Note that the residual is small when the displacements, ri and rj, are similar, i.e., the motions of atoms in the same CG site are correlated. A new map is constructed and either accepted or rejected according to a Metropolis-Hastings criterion (i.e., accepted if χn2<χn+12; otherwise, accepted or rejected such that the new map has probability ρ = exp[−(χn+12χn2)/T], where n is the number of iterations for simulated annealing and T is the coupling to a fictitious temperature that is gradually lowered during optimization).

After defining the AA ↔ CG map, intramolecular interactions within a single polypeptide chain are treated using elastic network models (ENMs) to capture protein flexibility. In the hENM method (49), effective harmonic bonds are assigned to all pairs of particles in the CG model within a tunable distance cutoff that all initially have the same force constant, kij, between particles i and j to construct the bonded topology of the CG model. The harmonic force constants are optimized by first computing the normal modes of this model. In other words, solving the eigenvalue problem,

Hvk=ωk2Mvk, (2)

where H is the Hessian Hi,j = 2Vqiqj|m, M is the diagonal matrix for the masses of the particles, and ωk the frequency for the mode of motion. Note that this is the solution to the equation of motion

Md2qdt2+Hq=0, (3)

where q is the generalized coordinate, and that for N classically interacting particles near the potential energy minimum, qm,

V(q)=V(qm)+iVqi|m(qiqi,m)+12i,j2Vqiqj|m(qiqi,m)(qjqj,m)+O(qqm)3, (4)

V(qm) is a constant, and Vqi|mis zero. Using the normal modes, mean-squared fluctuations rij2=(xijxij)2 for each i, j pair can be computed by rescaling the amplitudes according to an equipartition of energy that reflects the temperature of the atomistic data. Harmonic force constants for each bond in the CG ENM are then iteratively adjusted so that fluctuations in the CG model, match that of the atomistic data, i.e.,

1kijn+1=1kijnα(rij2CGrij2AA), (5)

where n is the number of iterations and α is a parameter that controls the magnitude of the adjustment for each iteration.

For the intermolecular interactions between proteins, nonbonded CG interactions are determined either using force matching (a.k.a. multiscale CG) (52,53) or REM approaches (54,55). In multiscale CG, the CG potential is constructed from a linearly independent basis set

U(RN)=D=1NDϕDUD(RN), (6)

where the functional forms for the basis potentials, UD (e.g., B-splines, Lennard-Jones, etc.), and the number of them, ND, are determined by the user. The coefficients {ϕD} are variationally optimized such that the following residual is minimized:

χ2=13NI=1N|fI(rn)FI(MRN(rn))|2t, (7)

where FI(RN) = −U(RN) is the CG force and fI(rn) is the atomistic force on the CG site I. Similarly, in the REM approach, the objective function for minimization is the Kullback-Liebler divergence, which provides a metric for the differences between the atomistic and CG probability distributions

Srel=ρAA(rn)log(ρAA(rn)ρCG(RN))drn+SmapAA, (8)

where ρAA(rn) = ZAA1eβUAA(rn) and ρCG(RN) = ZCG1eβUCGRN in the canonical ensemble and Z is the configurational partition function. Furthermore, the relative entropy can be expressed as a difference between the potential energy and free energy of the atomistic and CG ensembles:

Srel=βUCGUAAAAβ(ACGAAA)+SmapAA, (9)

where A = −kBTlogZ. Minimization of the relative entropy is performed using iterative Newton-Raphson techniques. It is important to note, however, that the quality and fidelity of such CG models are determined by the molecular behavior sampled in the underlying AA simulations.

Incorporating new behavior in CG simulations

Macromolecular complexes such as virions undergo a wide range of behavior, including physical and chemical transitions, that will be difficult to capture through AA simulations alone or even with experimental techniques. This is especially true for processes that involve large conformational changes that are not sampled effectively in AA simulations, whether because of the long timescales required, free energy barriers, or inherent limitations of the simulation force field. For instance, the S protein of SARS-CoV-2 has two proteolytic cleavage sites (at the S1-S2 and S2′ locations), and binds to the host cell receptor, angiotensin-converting enzyme 2 (ACE-2). Cleavage and binding events trigger dramatic conformational changes in the spike that result in the insertion of the fusion peptide into the host cell membrane. High-resolution structural studies of the S-ACE-2 complex have made protein binding simulations amenable to enhanced sampling techniques at the AA level (4,56). The proteolytic cleavage of the spike and large-scale conformational shift toward fusion peptide insertion, however, are more difficult to sample in atomistic simulations. To address these issues, one can use CG molecular simulation techniques that allow CG particles to adaptively switch discrete “states” and interactions, such as ultra-coarse-graining (UCG) (57, 58, 59). In the limit of infrequent internal state switches, UCG implements microscopically reversible state changes that are coupled to a Metropolis-Hastings-like criterion:

Kij=kijmin[kjikijeβ(UjUi),1] (10)

where Kij is the instantaneous switching rate from state i to j, UjUi is the CG effective potential energy difference between states j and i, and the rates kij and kji are model parameters either treated as input or calculated from atomistic simulations. This approach is similar to hybrid kinetic Monte Carlo and MD methods but with a spatial kinematic component, and it can be used to examine the transitions of the spike (i.e. “states”) that lead to the fusion of SARS-CoV-2 with host cells.

Experiments can probe longer timescales than are available from AA MD simulations. In recent cryo-EM images of SARS-CoV-2 particles, the S1 domain of the S protein was found to transiently open and close to bind the ACE-2 receptor (3,5), which are subtle conformational changes that are difficult to sample in atomistic simulations. For these conformational changes—in the case that they cannot be treated as discrete state switches—plastic network models (60) or multiconfiguration coarse graining (MCCG) methods (61) can be used to construct a CG model that continuously transitions from one state to the next. For plastic network models, two known experimental configurations of the protein are used to build a multibasin ENM that represents deviations away from each of the individual conformational minima. A phenomenological interaction Hamiltonian is constructed that couples and mixes the ENMs between the two structural endpoints. In MCCG, the primary difference is that the coupling terms in the Hamiltonian are constructed from a two-state mixing approach, derived on the basis of a mapped potential of mean force that is explicitly computed from AA simulation data along collective variables that distinguish between the two (or more) conformational states at a CG level.

Phenomenological CG models

An alternative (and sometimes necessary “top-down”) route to deriving CG models is to construct a model Hamiltonian and then analyze the model’s resulting behavior in the context of the assumed interactions. Typically, parameterization of such models is designed to fit or reproduce particular observables measured in experimental data and perhaps particular sets of AA simulation data. These can be performed based on variational optimization of some system-specific functional that depends on the experimental observable. Model Hamiltonian approaches have the advantage that physical intuition is clearer but are not systematic because each new problem requires a different treatment for the set of interactions involved. Furthermore, these approaches often require orthogonal experiments to validate the underlying model. Such coarse-graining methods are, however, especially useful in cases for which atomistic simulation is difficult or infeasible to obtain on the system or if the bottom-up methods described above are not expected to yield converged results for the effective CG potential, owing to limited atomistic sampling.

Results and discussion

Here, we present results from the first CG simulations of the SARS-CoV-2 virion (Fig. 3). It should be noted that these are early results, and we can thus expect additional simulations to become available from this model as more experimental data and AA simulations become available for the various virion components. In addition, the overall CG methodology and modeling of the virion will continue to evolve and are works in progress.

Figure 3.

Figure 3

A multiscale model of the SARS-CoV-2 virion. (A) Exterior view of the SARS-CoV-2 virion is given. (B) Interior view of the SARS-CoV-2 virion is given. S-protein trimers are depicted in teal, with the glycosylation sites represented as black spheres. M-protein dimers are in blue, with pentameric E ion channels in orange. The density of S, M, and E proteins was chosen to be consistent with experiments (38, 39, 40). The diameters of the membrane envelope are ∼100 and 140 nm including the S proteins on the virion surface. To see this figure in color, go online.

A CG MD simulation was performed on the complete CG virion model using LAMMPS for 10 × 106 CG time steps (see Video S1). The system was energy minimized using conjugate gradient descent. A temperature of 300 K was maintained with a Langevin thermostat, with a damping constant (tdamp = 10 ps) and 100-fs timestep. Statistics were collected every 100 CG time steps. Several radial distribution functions (RDFs) or pair correlations between CG particles were computed for the MD trajectories of individual S proteins and compared to the mapped AA reference statistics from which the models were derived (Fig. 4 A). In general, the CG model captured the positions and peaks in the pair correlation functions; however, error in the fine structure of the peaks was also present, indicating that refinement involving the addition of more expressive CG basis potentials (e.g., splines) may be necessary.

Figure 4.

Figure 4

Analysis of the CG MD simulations of the SARS-CoV-2 virion. (A) RDFs showing the comparison between mapped AA reference statistics and the CG spike model during the MD simulations are given. The measured RDFs are for CG particles that were mapped from the following AA residues of 1) S1,RBD [S459-D467] and S1,NTD [W104-L118], 2) S1,preRBD [E309-R319] and S2 [A852-L861], 3) S2,CH [A1015-K1028], and 4) S2,CTD [Y1215-V1228]. (B) Principal modes of motion of the SARS-CoV-2 virion computed from the CG MD simulation are shown. Arrows are colored from blue to red, indicating the direction of movement (see Videos S2, S3, and S4 for PC1–3, respectively). The first principal component (PC1) accounts for 51% of the total variation observed during simulation, whereas the second (PC2) and third (PC3) account for 12.5 and 7%, respectively. To see this figure in color, go online.

Video S1. CGMD Simulation of the SARS-CoV-2 Virion for 1 x 106 CG Timesteps
Download video file (57.1MB, mp4)

We performed principal component analysis (PCA) on a subset of the CG particles to examine collective modes of motion of the virion (Fig. 4 B). The Cartesian coordinates of one particle for every 15 CG lipids, one for every M and E protein, and one for every 3 S particles were extracted from the trajectory data and used to compute the covariance matrix, ci,j = 1N1t=1Nri(t)rj(t), where ri(t) is the mean-free position vector, ri(t) = xixit of particle i. The highest-variance eigenmode, PC1 (see Video S2), corresponds to splaying motions in the S1-S2 domain of the S protein and accounts for 51% of the total variance seen during the simulations. Similarly, PC2 (see Video S3) accounts for 12.5% of the variance and corresponds to rocking motions of the S1-S2 domain, whereas PC3 (see Video S4) accounts for 7.0% of the total variance and corresponds to twisting of the S1-S2 during CG MD. In general, there was a high degree of variance in the S protein, and these correlated modes of motion are likely informative of how the virion collectively utilizes spike proteins to explore and detect receptors. Longer CG simulations with more expressive CG models will likely be required to uncover additional modes of motion in the virion, including modes that involve the structural M, N, and E proteins.

Video S2. Mode of Motion of the Virion along Principal Component 1
Download video file (2.2MB, mp4)
Video S3. Mode of Motion of the Virion along Principal Component 2
Download video file (2MB, mp4)
Video S4. Mode of Motion of the Virion along Principal Component 3
Download video file (2.3MB, mp4)

Conclusions

This work provides an initial CG molecular model of the SARS-CoV-2 virion and details a bottom-up CG approach capable of further refining the model as new atomistic and experimental data become available. Currently, the lipid envelope is described using a particle-based phenomenological model with a soft tunable bending modulus well suited for large-scale membrane deformations, whereas the M and E proteins are modeled as rigid bodies. Intraspike interactions were developed using REM approaches on the basis of extensive, microsecond AA simulations of the spike protein. The N protein is modeled on the basis of AA simulations of helical oligomers in complex with RNA. Cross-interactions between the lipids and structural proteins used attractive Gaussian potentials between the hydrophobic lipid tails and the transmembrane domains of membrane proteins. This virion model will be iteratively refined and improved as structural, biochemical, and AA trajectory data are publicly released. The construction of an integrated CG model from individual atomistic simulations will also benefit from new developments in systematic methods for ensuring consistency between CG models developed from the reference statistics of those simulations. In particular, methods that variationally optimize in a “divide and conquer” fashion on the basis of joint statistics will likely improve model fidelity. Nonetheless, despite these noted challenges, we find that the behavior of SARS-CoV-2 structural proteins is coupled in the virion.

CG simulations of viral processes have helped elucidate a wide range of mechanisms in viruses. For example, in HIV, CG simulations contributed to the understanding of the self-assembly of the capsid (62) and innate immune sensor recognition and block of viral activity (63), as well as its inhibition by drug molecules (64). Atomistic simulations of ligand binding have also revealed a variety of unexpected, drug-targetable protein-ligand interaction sites (65, 66, 67, 68, 69, 70, 71). It is likely that molecular probes into the processes involving a holistic model of the SARS-CoV-2 virion can help reveal new routes to combat the virus by exploiting viral mechanisms involving large-scale behavior.

The CG virion model is available at https://doi.datacite.org/dois/10.34974%2Fq8ya-wh69 and https://github.com/alvinyu33/sars-cov-2-public. The model will be periodically updated with new versions as data are added and the model refined.

Author contributions

A.Y., A.J.P., P.H., V.M.-G., and G.A.V. designed research. A.Y. performed modeling and analysis on the virion. A.Y. and A.J.P. performed modeling and analysis on the S protein. A.Y. and P.H. performed modeling and analysis on the N protein. A.Y. and V.M.-G. performed modeling and analysis on the M protein. L.C., Z.G., A.C.D., and R.E.A. contributed AA simulation data on the S protein. A.Y., A.J.P., P.H., V.M.-G., L.C., Z.G., A.C.D., R.E.A., and G.A.V. wrote the manuscript.

Acknowledgments

This work was supported in part by the NSF through NSF RAPID grant CHE-2029092 (A.J.P, P.H., and G.A.V.); in part by the National Institute of General Medical Sciences of the National Institutes of Health through grant R01 GM063796 (V.M.-G. and G.A.V.); and in part by National Institutes of Health GM132826, NSF RAPID MCB-2032054, an award from RCSA Research, and a UC San Diego Moore’s Cancer Center 2020 SARS-COV-2 seed grant (L.C., Z.G., A.C.D., and R.E.A.). A.Y. acknowledges support from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under grant F32 AI150208. A.J.P. acknowledges support from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under grant F32 AI150477. Computational resources were provided by the Research Computing Center at the University of Chicago, Frontera at the Texas Advanced Computer Center funded by the NSF grant (OAC-1818253), and the Pittsburgh Super Computing Center through the Anton 2 machine under grant R01GM116961 from the National Institutes of Health and the specific allocation PSCA17046P. The Anton 2 machine at PSC was generously made available by D.E. Shaw Research.

Editor: Tamar Schlick.

Footnotes

Supporting material can be found online at https://doi.org/10.1016/j.bpj.2020.10.048.

References

  • 1.Berman H.M., Westbrook J., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Berman H., Henrick K., Nakamura H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
  • 3.Wrapp D., Wang N., McLellan J.S. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367:1260–1263. doi: 10.1126/science.abb2507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yan R., Zhang Y., Zhou Q. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science. 2020;367:1444–1448. doi: 10.1126/science.abb2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Walls A.C., Park Y.-J., Veesler D. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 2020;181:281–292.e6. doi: 10.1016/j.cell.2020.02.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shang J., Ye G., Li F. Structural basis of receptor recognition by SARS-CoV-2. Nature. 2020;581:221–224. doi: 10.1038/s41586-020-2179-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hillen H.S., Kokic G., Cramer P. Structure of replicating SARS-CoV-2 polymerase. Nature. 2020;584:154–156. doi: 10.1038/s41586-020-2368-8. [DOI] [PubMed] [Google Scholar]
  • 8.Kern D.M., Sorum B., Brohawn S.G. Cryo-EM structure of the SARS-CoV-2 3a ion channel in lipid nanodiscs. bioRxiv. 2020 doi: 10.1101/2020.06.17.156554. [DOI] [Google Scholar]
  • 9.Surya W., Li Y., Torres J. Structural model of the SARS coronavirus E channel in LMPG micelles. Biochim. Biophys. Acta Biomembr. 2018;1860:1309–1317. doi: 10.1016/j.bbamem.2018.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Frick D.N., Virdi R.S., Silvaggi N.R. Molecular basis for ADP-ribose binding to the Mac1 domain of SARS-CoV-2 nsp3. Biochemistry. 2020;59:2608–2615. doi: 10.1021/acs.biochem.0c00309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rut W., Lv Z., Olsen S.K. Activity profiling and structures of inhibitor-bound SARS-CoV-2-PLpro protease provides a framework for anti-COVID-19 drug design. bioRxiv. 2020 doi: 10.1101/2020.04.29.068890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhang L., Lin D., Hilgenfeld R. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science. 2020;368:409–412. doi: 10.1126/science.abb3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sztain T., Amaro R., McCammon J.A. Elucidation of cryptic and allosteric pockets within the SARS-CoV-2 protease. bioRxiv. 2020 doi: 10.1101/2020.07.23.218784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Babuji Y., Blaiszik B., Wagner R. Targeting SARS-CoV-2 with AI- and HPC-enabled lead generation: a first data release. arXiv. 2020 https://arxiv.org/abs/2006.02431 arXiv:2006.02431. [Google Scholar]
  • 15.Zimmerman M.I., Porter J.R., Bowman G.R. Citizen scientists create an exascale computer to combat COVID-19. bioRxiv. 2020 doi: 10.1101/2020.06.27.175430. [DOI] [Google Scholar]
  • 16.Woo H., Park S.-J., Im W. Developing a fully glycosylated full-length SARS-CoV-2 spike protein model in a viral membrane. J. Phys. Chem. B. 2020;124:7128–7137. doi: 10.1021/acs.jpcb.0c04553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Masters P.S. In: Advances in Virus Research. Maramorosch K., Shatkin A.J., editors. Academic Press; 2006. The molecular biology of coronaviruses; pp. 193–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Cai Y., Zhang J., Chen B. Distinct conformational states of SARS-CoV-2 spike protein. Science. 2020;369:1586–1592. doi: 10.1126/science.abd4251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Neuman B.W., Adair B.D., Buchmeier M.J. Supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy. J. Virol. 2006;80:7918–7928. doi: 10.1128/JVI.00645-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Siu Y.L., Teoh K.T., Nal B. The M, E, and N structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles. J. Virol. 2008;82:11318–11330. doi: 10.1128/JVI.01052-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chang C.K., Hou M.-H., Huang T.H. The SARS coronavirus nucleocapsid protein--forms and functions. Antiviral Res. 2014;103:39–50. doi: 10.1016/j.antiviral.2013.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Yao H., Song Y., Li S. Molecular architecture of the SARS-CoV-2 virus. Cell. 2020;183:730–738.e13. doi: 10.1016/j.cell.2020.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Schoeman D., Fielding B.C. Coronavirus envelope protein: current knowledge. Virol. J. 2019;16:69. doi: 10.1186/s12985-019-1182-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Amaro R.E., Mulholland A.J. A community letter regarding sharing biomolecular simulation data for COVID-19. J. Chem. Inf. Model. 2020;60:2653–2656. doi: 10.1021/acs.jcim.0c00319. [DOI] [PubMed] [Google Scholar]
  • 25.Heo L., Feig M. Modeling of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins by machine learning and physics-based refinement. bioRxiv. 2020 doi: 10.1101/2020.03.25.008904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Srinivasan S., Cui H., Korkin D. Structural genomics of SARS-CoV-2 indicates evolutionary conserved functional regions of viral proteins. Viruses. 2020;12:360. doi: 10.3390/v12040360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kang S., Yang M., Chen S. Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites. Acta Pharm. Sin. B. 2020;10:1228–1238. doi: 10.1016/j.apsb.2020.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jo S., Kim T., Im W. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 2008;29:1859–1865. doi: 10.1002/jcc.20945. [DOI] [PubMed] [Google Scholar]
  • 29.Watanabe Y., Allen J.D., Crispin M. Site-specific glycan analysis of the SARS-CoV-2 spike. Science. 2020;369:330–333. doi: 10.1126/science.abb9983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shajahan A., Supekar N.T., Azadi P. Deducing the N- and O- glycosylation profile of the spike protein of novel coronavirus SARS-CoV-2. Glycobiology. 2020 doi: 10.1093/glycob/cwaa042. cwaa042. Published online May 4, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Webb B., Sali A. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics. 2016;54:5.6.1–5.6.37. doi: 10.1002/cpbi.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Drozdetskiy A., Cole C., Barton G.J. JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 2015;43:W389–W394. doi: 10.1093/nar/gkv332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen C.-Y., Chang C.K., Huang T.H. Structure of the SARS coronavirus nucleocapsid protein RNA-binding dimerization domain suggests a mechanism for helical packaging of viral RNA. J. Mol. Biol. 2007;368:1075–1086. doi: 10.1016/j.jmb.2007.02.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Krivov G.G., Shapovalov M.V., Dunbrack R.L., Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009;77:778–795. doi: 10.1002/prot.22488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Grime J.M.A., Madsen J.J. Efficient simulation of tunable lipid assemblies across scales and resolutions. arXiv. 2019 https://arxiv.org/abs/1910.05362 arXiv:1910.05362. [Google Scholar]
  • 36.Plimpton S. Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 1995;117:1–19. [Google Scholar]
  • 37.Pak A.J., Dannenhoffer-Lafage T., Voth G.A. Systematic coarse-grained lipid force fields with semiexplicit solvation via virtual sites. J. Chem. Theory Comput. 2019;15:2087–2100. doi: 10.1021/acs.jctc.8b01033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ke Z., Oton J., Briggs J.A.G. Structures and distributions of SARS-CoV-2 spike proteins on intact virions. Nature. 2020 doi: 10.1038/s41586-020-2665-2. Published online August 17, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Turoňová B., Sikora M., Beck M. In situ structural analysis of SARS-CoV-2 spike reveals flexibility mediated by three hinges. Science. 2020;370:203–208. doi: 10.1126/science.abd5223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Bar-On Y.M., Flamholz A., Milo R. SARS-CoV-2 (COVID-19) by the numbers. eLife. 2020;9:e57309. doi: 10.7554/eLife.57309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.van Meer G., Voelker D.R., Feigenson G.W. Membrane lipids: where they are and how they behave. Nat. Rev. Mol. Cell Biol. 2008;9:112–124. doi: 10.1038/nrm2330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Casares D., Escribá P.V., Rosselló C.A. Membrane lipid composition: effect on membrane and organelle structure, function and compartmentalization and therapeutic avenues. Int. J. Mol. Sci. 2019;20:2167. doi: 10.3390/ijms20092167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jorgensen W.L., Chandrasekhar J., Klein M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
  • 44.Best R.B., Zhu X., Mackerell A.D., Jr. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. J. Chem. Theory Comput. 2012;8:3257–3273. doi: 10.1021/ct300400x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Huang J., Rauscher S., MacKerell A.D., Jr. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods. 2017;14:71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Phillips J.C., Braun R., Schulten K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005;26:1781–1802. doi: 10.1002/jcc.20289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Casalino L., Gaieb Z., Amaro R.E. Beyond shielding: the roles of glycans in the SARS-CoV-2 spike protein. ACS Cent. Sci. 2020;6:1722–1734. doi: 10.1021/acscentsci.0c01056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhang Z., Lu L., Voth G.A. A systematic methodology for defining coarse-grained sites in large biomolecules. Biophys. J. 2008;95:5073–5083. doi: 10.1529/biophysj.108.139626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Lyman E., Pfaendtner J., Voth G.A. Systematic multiscale parameterization of heterogeneous elastic network models of proteins. Biophys. J. 2008;95:4183–4192. doi: 10.1529/biophysj.108.139733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Li L., Li C., Alexov E. On the dielectric “constant” of proteins: smooth dielectric function for macromolecular modeling and its implementation in DelPhi. J. Chem. Theory Comput. 2013;9:2126–2136. doi: 10.1021/ct400065j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Klein S., Cortese M., Chlanda P. SARS-CoV-2 structure and replication characterized by in situ cryo-electron tomography. bioRxiv. 2020 doi: 10.1101/2020.06.23.167064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Izvekov S., Voth G.A. A multiscale coarse-graining method for biomolecular systems. J. Phys. Chem. B. 2005;109:2469–2473. doi: 10.1021/jp044629q. [DOI] [PubMed] [Google Scholar]
  • 53.Noid W.G., Chu J.-W., Andersen H.C. The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models. J. Chem. Phys. 2008;128:244114. doi: 10.1063/1.2938860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Shell M.S. The relative entropy is fundamental to multiscale and inverse thermodynamic problems. J. Chem. Phys. 2008;129:144108. doi: 10.1063/1.2992060. [DOI] [PubMed] [Google Scholar]
  • 55.Chaimovich A., Shell M.S. Coarse-graining errors and numerical optimization using a relative entropy framework. J. Chem. Phys. 2011;134:094112. doi: 10.1063/1.3557038. [DOI] [PubMed] [Google Scholar]
  • 56.Lan J., Ge J., Wang X. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature. 2020;581:215–220. doi: 10.1038/s41586-020-2180-5. [DOI] [PubMed] [Google Scholar]
  • 57.Dama J.F., Sinitskiy A.V., Voth G.A. The theory of ultra-coarse-graining. 1. General principles. J. Chem. Theory Comput. 2013;9:2466–2480. doi: 10.1021/ct4000444. [DOI] [PubMed] [Google Scholar]
  • 58.Davtyan A., Dama J.F., Voth G.A. The theory of ultra-coarse-graining. 2. Numerical implementation. J. Chem. Theory Comput. 2014;10:5265–5275. doi: 10.1021/ct500834t. [DOI] [PubMed] [Google Scholar]
  • 59.Katkar H.H., Davtyan A., Voth G.A. Insights into the cooperative nature of ATP hydrolysis in actin filaments. Biophys. J. 2018;115:1589–1602. doi: 10.1016/j.bpj.2018.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Maragakis P., Karplus M. Large amplitude conformational change in proteins explored with a plastic network model: adenylate kinase. J. Mol. Biol. 2005;352:807–822. doi: 10.1016/j.jmb.2005.07.031. [DOI] [PubMed] [Google Scholar]
  • 61.Sharp M.E., Vázquez F.X., Voth G.A. Multiconfigurational coarse-grained molecular dynamics. J. Chem. Theory Comput. 2019;15:3306–3315. doi: 10.1021/acs.jctc.8b01133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Grime J.M.A., Dama J.F., Voth G.A. Coarse-grained simulation reveals key features of HIV-1 capsid self-assembly. Nat. Commun. 2016;7:11568. doi: 10.1038/ncomms11568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Yu A., Skorupka K.A., Voth G.A. TRIM5α self-assembly and compartmentalization of the HIV-1 viral capsid. Nat. Commun. 2020;11:1307. doi: 10.1038/s41467-020-15106-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Pak A.J., Grime J.M.A., Voth G.A. Off-pathway assembly: a broad-spectrum mechanism of action for drugs that undermine controlled HIV-1 viral capsid formation. J. Am. Chem. Soc. 2019;141:10214–10224. doi: 10.1021/jacs.9b01413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Schames J.R., Henchman R.H., McCammon J.A. Discovery of a novel binding trench in HIV integrase. J. Med. Chem. 2004;47:1879–1881. doi: 10.1021/jm0341913. [DOI] [PubMed] [Google Scholar]
  • 66.Dror R.O., Green H.F., Shaw D.E. Structural basis for modulation of a G-protein-coupled receptor by allosteric drugs. Nature. 2013;503:295–299. doi: 10.1038/nature12595. [DOI] [PubMed] [Google Scholar]
  • 67.Yu A., Lau A.Y. Glutamate and glycine binding to the NMDA receptor. Structure. 2018;26:1035–1043.e2. doi: 10.1016/j.str.2018.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Yu A., Lau A.Y. Energetics of glutamate binding to an ionotropic glutamate receptor. J. Phys. Chem. B. 2017;121:10436–10442. doi: 10.1021/acs.jpcb.7b06862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Yu A., Salazar H., Lau A.Y. Neurotransmitter funneling optimizes glutamate receptor kinetics. Neuron. 2018;97:139–149.e4. doi: 10.1016/j.neuron.2017.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Yu A., Alberstein R., Lau A.Y. Molecular lock regulates binding of glycine to a primitive NMDA receptor. Proc. Natl. Acad. Sci. USA. 2016;113:E6786–E6795. doi: 10.1073/pnas.1607010113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Yu A., Lee E.M.Y., Voth G.A. Atomic-scale characterization of mature HIV-1 capsid stabilization by inositol hexakisphosphate (IP6) Sci. Adv. 2020;6:eabc6465. doi: 10.1126/sciadv.abc6465. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Video S1. CGMD Simulation of the SARS-CoV-2 Virion for 1 x 106 CG Timesteps
Download video file (57.1MB, mp4)
Video S2. Mode of Motion of the Virion along Principal Component 1
Download video file (2.2MB, mp4)
Video S3. Mode of Motion of the Virion along Principal Component 2
Download video file (2MB, mp4)
Video S4. Mode of Motion of the Virion along Principal Component 3
Download video file (2.3MB, mp4)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES