Abstract
Since the identification of the SARS‐CoV‐2 virus as the causative agent of the current COVID‐19 pandemic, considerable effort has been spent characterizing the interaction between the Spike protein receptor‐binding domain (RBD) and the human angiotensin converting enzyme 2 (ACE2) receptor. This has provided a detailed picture of the end point structure of the RBD‐ACE2 binding event, but what remains to be elucidated is the conformation and dynamics of the RBD prior to its interaction with ACE2. In this work, we utilize molecular dynamics simulations to probe the flexibility and conformational ensemble of the unbound state of the receptor‐binding domain from SARS‐CoV‐2 and SARS‐CoV. We have found that the unbound RBD has a localized region of dynamic flexibility in Loop 3 and that mutations identified during the COVID‐19 pandemic in Loop 3 do not affect this flexibility. We use a loop‐modeling protocol to generate and simulate novel conformations of the CoV2‐RBD Loop 3 region that sample conformational space beyond the ACE2 bound crystal structure. This has allowed for the identification of interesting substates of the unbound RBD that are lower energy than the ACE2‐bound conformation, and that block key residues along the ACE2 binding interface. These novel unbound substates may represent new targets for therapeutic design.
Keywords: coronavirus, molecular dynamics simulation, protein conformation, protein dynamics, protein modeling, SARS‐CoV‐2, spike glycoprotein
1. INTRODUCTION
The ongoing COVID‐19 pandemic is caused by the novel coronavirus SARS‐CoV‐2 (CoV2), first detected in Wuhan, China in late 2019. 1 The CoV2 genome encodes 29 proteins. Among these proteins is the membrane‐anchored spike glycoprotein, a class I membrane fusion protein. The spike protein complex is composed of a homo‐trimeric assembly of monomers containing 1273 residues and 22 N‐linked glycans 2 and is responsible for SARS‐CoV‐2 attachment and entry into host‐cells. The virus attachment to human host‐cells is mediated by the interaction of the viral Spike protein's receptor‐binding domain (RBD) with the host‐cell angiotensin‐converting enzyme 2 (ACE2) receptor (Figure 1). Disruption of the binding interface between the Spike protein RBD and host‐cell ACE2 receptor would provide a means of preventing SARS‐CoV‐2 infection at the very first step.
Several key aspects of the binding interaction between the Spike protein RBD and the human ACE2 receptor have been characterized for both SARS‐CoV‐2 and SARS‐CoV (CoV1), the coronavirus responsible for a previous pandemic in 2002/2003. 3 These include determining the identity of the binding residues that mediate the interaction between the viral Spike RBD and ACE2, the nature of these residue‐level interactions, and the overall strength of the interaction. Both the CoV1‐RBD and CoV2‐RBD binding sites for ACE2 adopt a similar interface (Figure 1), consisting of long, unstructured stretches of 14 residues, which form a range of stabilizing hydrogen bonds, van der Waals contacts, and salt bridges with ACE2. 4 , 5 , 6 The RBD binding interface in general contains four loops (Figure 2A) that have the potential to be dynamic and flexible, both in an unbound state as well as after binding to the ACE2 receptor. Several groups have previously investigated the conformational dynamics and flexibility of the RBD when in complex with the ACE2 receptor through molecular dynamics (MD) simulations, identifying that residues 472–490 (Loop 3) and residues 495–506 (Loop 4) near the binding interface within the receptor binding motif (RBM) as the most flexible regions within the Spike RBD. 5 , 7 , 8 In the case of CoV2, three residues within the flexible Loop 3 of the RBD (F486, N487, and Y489) were identified to participate in stabilizing interactions with ACE2. 4 Interestingly, several antibodies developed to target the RBD have also been found to bind to the flexible Loop 3 (Figure S1A). When antibodies interact with Loop 3, the distribution of Loop 3 conformations is greater than when Loop 3 is not part of the binding interface (Figure S1B). This suggests that Loop 3 has an inherent conformational flexibility that is not observed from static structures of the RBD‐ACE2 complex. The role that protein dynamics play in mediating protein–protein binding is not only of great importance to understanding the basic mechanisms of binding, but also plays a crucial role in the design of protein binding therapeutics. 9 , 10
Although much progress has been made in understanding the interaction between the Spike RBD and ACE2, what remains to be elucidated is the flexibility and conformational dynamics of the RBD in an unbound state. The internal motions of proteins play a key role in their interactions and functionality, a fact that is often lost in static structures derived from electron microscopy and X‐ray crystallography. Understanding the conformational ensemble of RBD states without a binding partner may reveal novel targets not observed in static structures of the RBDs, which will aid in the design of therapeutics targeting this important binding domain. In this work, we utilize MD simulations to probe the flexibility and conformational sampling of the SARS‐CoV and SARS‐CoV‐2 RBDs in an unbound state. We focus on the Loop 3 region of the RBD, which contains several residues that participate in stabilizing interactions with ACE2 and is a hot‐spot of several common single amino acid mutations that have been identified during the ongoing COVID‐19 pandemic. We find that Loop 3 represents a localized area of dynamic flexibility in an unbound state, and our simulations suggest that this flexibility is resilient to perturbation by mutations. Finally, using loop‐modeling to probe novel conformations of the Loop 3 region, we have identified interesting substates of the unbound RBD that block the binding interface and are lower energy than the conformation of the RBD bound to ACE2, and thus may represent enticing targets for therapeutic intervention.
2. RESULTS
2.1. Microsecond timescale MD simulations of wild‐type CoV1 and CoV2 RBDs reveal localized flexibility in Loop 3
In the context of the full trimeric Spike protein complex, the unbound binding‐competent state of the RBD corresponds to Spike protein structures with RBDs in an “up” configuration (e.g., PDB ID 6vsb). In this “up” state, the RBD is standing separate and free from the other domains of the protein complex, and in particular the ACE2‐binding interface of the RBD is entirely solvent exposed. As such, multi‐microsecond MD simulations of a single RBD in solution were recorded in order to explore the conformational flexibility and dynamics of the unbound wild‐type spike protein RBDs from CoV1 and CoV2. The initial coordinates used in the MD trajectories were taken from the high‐resolution structures determined by X‐ray diffraction of CoV1‐RBD in complex with a neutralizing antibody (PDB: 2dd8) and of CoV2‐RBD in complex with the human ACE2 receptor (PDB: 6m0j; Figure 1). Glycosylation of the RBD was not included since the two glycosylation sites that have been identified are at the very N‐terminal region of the RBD (N331 and N343), where they are far from the residues that make up the ACE2 binding domain. 11
While it contains a well ordered β‐sheet core, much of the RBD is unstructured (Figure 1) and in particular 4 different loops make up the binding interface (Figure 2A, green and pink) of the RBD with ACE2. Residues 438–450 (CoV2) or residues 425–437 (CoV1) make up Loop 1, residues 455–470 (CoV2) or residues 442–457 (CoV1) make up Loop 2, residues 471–491 (CoV2) or residues 458–477 (CoV1) make up Loop 3, and residues 495–508 (CoV2) or residues 481–494 (CoV1) make up Loop 4. An assessment of the residue level propensity for disorder, using the Protein Disorder prediction System (PrDOS) webserver, 12 indicates that while none of these regions is considered intrinsically disordered, the loop regions of both CoV1‐RBD and CoV2‐RBD do show an increased disordered propensity (Figure 2B) relative to the rest of the RBD. PrDOS was used without template‐based prediction and thus reports only on the disorder probability of the local amino acid composition. A prediction false positive rate of 5% was used, and values above the 50% threshold (dotted line) indicate regions of predicted disorder.
As observed from an analysis of root‐mean‐square deviation (RMSD) with respect to the starting structure from 4 μs MD trajectories (Figure S2A) both CoV1‐RBD and CoV2‐RBD remain in a stable equilibrium conformation over the time‐course of the MD trajectories, with average RMSD values of 1.42 Å for CoV1 and 1.39 Å for CoV2. However, the RMSD of CoV1‐RBD shows several large fluctuations, suggesting that CoV1‐RBD is more conformationally flexible than CoV2‐RBD. Indeed, this is observed in the calculated per residue root‐mean‐square‐fluctuation (RMSF) profiles (Figure 3A) and in snapshots along the MD trajectory (Figure 3B). The CoV2‐RBD does show a localized area of increased flexibility in residues 369–373 relative to CoV1‐RBD, and both CoV1 and CoV2 RBDs have substantial flexibility in the Loop 3 region from ~471 to 491 (Figure 3A, inset), which is part of the large ACE2 binding interface. The average conformations obtained from the MD simulations of CoV1‐RBD and CoV2‐RBD are quite similar (Figure 3C), with the major differences localized to Loop 3 centralized around the conserved disulfide bond between residues 480 and 488 (Figure 3C, enlargement).
2.2. Conformational flexibility of Loop 3 in the free RBD
In order to understand the interaction mechanisms of the RBD with binding partners and for design of therapeutics, one needs to understand the conformations accessible in the free state of the RBD prior to binding. The MD simulations starting from the crystal structures show that Loop 3 does display the highest flexibility within the RBD and can sample conformations that are quite different from the crystal structure, yet on average there is not a large deviation from the starting structure. This might result from being trapped near the starting point of the crystallized ACE2‐bound state of the RBD. There are no high‐resolution crystal structures of the unbound CoV2 RBD. Many of the cryo‐EM structures of the trimeric Spike protein that show the RBD in an unbound and binding‐competent (“up”) state contain missing residues corresponding to the important ACE2 binding interface, and in particular the Loop 3 region, suggesting that this region is flexible and can adopt a variable conformational ensemble prior to binding the ACE2 receptor. To better probe the conformational flexibility of the RBD in the free state, and especially of the regions of missing residues observed in cryo‐EM structures, we conducted MD simulations of several unique loop‐model structures of the Loop 3 region (Figure 4). The KinematicMover algorithm within PyRosetta was used to generate 100 new conformations of the Loop 3 region of the CoV2 RBD (PDB: 6m0j) and 100 new conformations of the Loop 3 region (residues 458–477) of the CoV1 RBD (PDB: 2dd8), making sure to maintain the disulfide bonds (CoV2: C480–C488, CoV1: C467–C474). Five of these new conformations were then chosen at random, and subjected to energy minimization and relaxation protocols in PyRosetta as described in Section 4. The five energy minimized loop‐models were used as starting structures for 750 ns of MD simulation, and snapshots from these simulations are shown along the outer edges of Figure 4, along with the average structures from each of the five simulations overlaid in the center of the figure and an enlargement of the Loop 3 region shown in the boxes at the bottom.
The loop modeling shows that Loop 3 can take on a vast range of conformations. In general, the loop models represent an increase in free energy, but both CoV1‐RBD and CoV2‐RBD have one loop model that represents an average conformation with lower free energy compared to the crystal structure. The average difference in free energy between the conformations sampled in the simulations from crystal structures versus the simulations of the different loop‐models is summarized in Table 1. These ΔG values were calculated as the difference between the average Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) free energy for each loop model simulation and the average MM/GBSA free energy from the 4 μs simulations described in the previous section. The RMSD of the conformations sampled during each MD simulation was calculated with respect to the starting structure of that specific unique loop model simulation. These RMSDs show large variation between different loop models for both CoV2 (Figure S2C) and CoV1 (Figure S2D), although the CoV1 loop models (1.8–2.7 Å) are clustered at a smaller RMSD average than the CoV2 loop models (2.0–4.8 Å). To better deconvolute the contribution of Loop 3, the backbone RMSD with respect to the starting structure was re‐calculated ignoring the Loop 3 residues (Figure S2C,D) and also by only considering the Loop 3 residues (Figure S2C,D). When viewed in this fashion, it becomes clear that the increase in RMSD observed in the loop‐modeled simulations relative to the crystal structure simulations is a result of the increased flexibility of the new loop conformations. Indeed when viewed on a per‐residue basis (Figure S3), the overall RMSF profiles of the loop models maintains the same topology as the wild‐type simulations, while displaying drastic increases in the RMSF values of Loop 3 for both CoV1 and CoV2 (maximum RMSF ~8–9 Å, compare to Figure 3A). The range of conformations probed in the loop modeling, some of which were more energetically favorable than the crystal structure, suggests that Loop 3 is capable of sampling a variety of conformations in solution. Future experimental studies will be necessary to further probe and define the conformational dynamics of the RBD, and especially the Loop 3 region.
TABLE 1.
Loop model | CoV1‐RBD | CoV2‐RBD | |
---|---|---|---|
ΔG (kcal/mol) | ΔG (kcal/mol) | ||
1 (black) | 13.2 | 6.7 | |
2 (green) | 12.1 | 15.4 | |
3 (purple) | 31.7 | 14.2 | |
4 (red) | −0.04 | 6.5 | |
5 (blue) | 10.4 | −3.5 |
2.3. Analysis of spike protein mutations accumulated during the COVID‐19 pandemic highlight a mutational hotspot in Loop 3
It is important to consider how mutations perturb the conformation and dynamics of the RBD, especially during an ongoing pandemic as mutations continue to accumulate. 13 , 14 Continued identification and evaluation of mutants is crucial in order to better understand the evolving nature of the pandemic, and to ensure that the treatments and vaccines whose primary target is the Spike protein continue to be effective. As part of a large collaboration to review and characterize the evolution of the SARS‐CoV‐2 proteome in three‐dimensions, an analysis of the SARS‐CoV‐2 genomes deposited into the GISAID database 15 at the end of June 2020 was conducted. A full description of the methods used to analyze the mutations of all of the SARS‐CoV‐2 proteins, including the Spike protein, can be found elsewhere, 16 and the raw data is made freely available. 17 Based on that analysis of 33 290 viral genomes, there are several interesting trends in the mutations accumulated in the Spike protein RBD. First, 444 (1.3%) contained a mutation in the RBD of the Spike protein; of these, 144 unique sequence variants were identified. The identified mutations account for substitutions of 78 individual residues in the RBD (residues: 330–527), with the top 5 substitutions listed in Figure 2C. Among the flexible loop regions, Loop 1 (residues 438–450) contains five unique mutations, Loop 2 (residues 455–470) contains nine unique mutations, Loop 3 (residues 471–491) contains 21 unique mutations, and Loop 4 contains three unique mutations. While all of the flexible regions of the RBD have residues that have been found to be mutated in the current COVID‐19 pandemic, Loop 3 seems to be a particular hotspot of mutation, with 13 out of 20 residues having at least 1 mutation identified. The top 4 most common mutants of Loop 3, based on number of genomes containing these mutants, were chosen to be studied in more detail: T478I, S477N, V483A, and G476S (Figure 2D).
2.4. MD simulations reveal that the flexible Loop 3 of CoV2‐RBD is resilient to localized mutations
Based on the four common mutations identified in Loop 3 of the RBD, MD simulations of the single mutants G476S, S477N, T478I, or V483A were performed to observe how they affect the RBD's conformation and dynamics. Using the same starting crystal structure as the wild‐type simulation (PDB: 6m0j), four new starting structures were created by mutating the relevant residue in PyRosetta. 18 These new structures were then subjected to the same energy minimization and equilibration conditions as the wild‐type structure (see Section 4), before collecting 2 μs‐long MD simulations for each under the same conditions as used for the wild‐type simulation. Analysis of the backbone RMSD over the time‐course of each simulation (Figure S2B) shows that all of the mutant structures remain in a relatively stable equilibrium from their respective starting points (average RMSD: G476S 1.65 Å; S477N 1.48 Å; T478I 1.46 Å; and V483A 1.44 Å). A closer look at the fluctuations of the backbone atoms again illustrates similar conformations as observed for the wild‐type simulation through the MD snapshots (Figure 5A). The per‐residue RMSF profiles of the mutant simulations show that there is no significant difference in backbone flexibility between the four mutants in Loop 3 (Figure 5B), although T478I appears to be marginally more perturbative than the other mutants, slightly increasing the flexibility of the Loop 3 region. The average structures show that there is virtually no difference between the backbones of the wild‐type or mutants (Figure 5C). This suggests that the conformational flexibility of Loop 3 is resilient to single mutations, and this resiliency may account for the higher number of mutations observed in this region.
2.5. Cluster analysis of the RBD conformational ensembles from MD trajectories
To better examine the conformational states of the RBD binding interface that were sampled during the MD simulations, and to identify binding and nonbinding conformational states, we performed a cluster analysis on each of the wild‐type and mutant simulations using a hierarchical agglomerative (heiragglo) algorithm. 19 Using an epsilon (ε) cutoff of 1.9 Å, the 22 500 conformations of the CoV1‐RBD and CoV2‐RBD were separated into 51 and 14 clusters respectively, whereas the CoV2‐RBD mutants clustered into fewer groups (G476S: four clusters, S477N: four clusters, T478I: two clusters, and V483A: seven clusters). The average RMSD of the residues in the RBD binding interface with respect to the starting crystal structure was then calculated for each cluster. Clusters with low RMSD then represent conformations that are very similar to the crystal structures of RBD bound to ACE2, while clusters with large RMSD correspond to conformations that are very different from these receptor bound states. Figure 6 shows the RBD binding interface of the average conformation of each cluster with the smallest RMSD (i.e., most similar to the bound state) in blue and the largest RMSD (i.e., least similar to the bound state) in pink for each of the MD simulations presented. Interestingly, the biggest difference in conformation is observed with the structures of the largest RMSD clusters of the loop models from both CoV1‐RBD and CoV2‐RBD, where a large portion of Loop 3 is curled back over the binding interface (Figure 6A,B). This conformation of the free RBD may block the binding interface and prevent interactions with ACE2.
3. DISCUSSION
Because of the importance of the spike RBD in the initial binding of the SARS‐CoV2 viral particle to a host cell, it is important to have an understanding of the conformation and dynamics at all stages of the binding event, including in the unbound state. In particular, modeling and identifying conformations of the free state are informative on a range of conformations that can be targeted by therapeutics. However, many of the RBD structures that have been determined and deposited into the PDB are either incomplete, mainly missing residues in the loop sections of the RBD, or are in a bound state in complex with the ACE2 receptor or various neutralizing antibodies. Much of the large binding interface of the RBD does not adopt strong secondary structure elements, but are rather random coil loops. These loop regions (Figure 2A) are not predicted to fall under the definition of intrinsically disordered regions (Figure 2B). However, it is interesting to note that the loops tend to have higher predicted disorder propensity than the rest of the RBD (Figure 2B), a propensity that may be evolutionarily conserved. 20 In addition, the missing residues of these loop regions in many cryo‐EM structures of the spike protein suggest that these loops may be dynamic and sample a conformational ensemble distinct from the bound state. Indeed, in all of the structures investigated in this work the residues within the loop regions show large RMSF values, with Loop 3 having the largest. The fact that Loop 3 is so flexible is quite interesting since this loop is directly adjacent to the binding interface with ACE2, and even provides some stabilizing contacts to residues on ACE2. In order to probe the conformational flexibility of Loop 3 and characterize the unbound conformational space of the RBD beyond the ACE2 bound state, it was necessary to perturb the loop away from the stable low‐energy state of the RBD‐ACE2 crystal structure. These loop models again showed very high flexibility of Loop 3 (i.e., large RMSF), while maintaining the same average flexibility in other regions of the RBD as observed in the simulation starting from the CoV1 or CoV2 crystal structures (compare Figure S3 with Figure 3A). This indicates that the bulk of the RBD structure is resilient to change even in the presence of large conformational flexibility of Loop 3. In addition, some of these loop models represent conformations that are more energetically favorable for the unbound RBD in solution (Table 1) on average.
The large and relatively flat binding interface between the RBD and ACE2 represents an interesting protein–protein interaction that provides a challenging target for traditional small molecule therapeutics that typically bind to well‐defined binding pockets on targets such as enzymes. Instead, with the conformational flexibility afforded to the binding interface of the RBD the identification of lowly populated or transient cryptic binding sites should be considered. Cryptic binding sites are difficult to determine in the unbound apo state of a protein, but are generally found in and around dynamic and flexible protein regions, where the inherent conformational fluctuation allows for cryptic sites to become accessible. 21 , 22 , 23 , 24 By comparing the conformations that the CoV1‐RBD and CoV2‐RBD sampled during our MD simulations to the corresponding crystal structure of ACE2 bound RBD, we were able to identify conformations of the dynamic and flexible loop regions that were distinctly different from the bound state of the RBD. In particular, the MD simulations of the different loop models sampled conformations of the CoV2‐RBD that contained stabilizing interactions between the sidechain of Q493 and the backbone of F486, helping to fold Loop 3 over the binding interface of the RBD (Figure 6B) and which would block the normal RBD binding interface with ACE2. Such examples of conformations that can be sampled by the RBD ensemble in solution, which provide natural interruption of the protein–protein binding interface between RBD and ACE2, represent potential targets that create transient and/or cryptic binding sites that can be exploited by therapeutic design.
The impact of mutations on the structure and conformational flexibility of the spike protein, especially during an ongoing pandemic, is of particular concern when designing therapeutics against SARS‐CoV‐2. For example, based on recent structures of the D614G spike protein obtained by cryo‐EM it is now becoming clear that the D614G mutation interferes with a stabilizing interaction between monomers of the trimeric assembly, 25 providing increased infectivity of the virus by ensuring that all three of the RBDs in the spike protein have the flexibility to adopt binding‐competent, open conformations. 25 , 26 , 27 There is thus a persistent need to maintain a current understanding of the impact of mutations that are manifesting during the course of the current pandemic. Several common RBD mutants have been identified previously and their effects on binding to ACE2 have been probed, including N439K, T478I, V483A, G476S, S494P, V483F, and A475V. 13 Ghorbani et al. characterized these mutants in the context of the full RBD‐ACE2 complex (PDB: 6m0j) through MD simulations, showing a stable and overall similar RMSD among the wild‐type and mutants in the extended loop forming the binding interface with ACE2. 13 The binding free energy between the RBD and ACE2 was also found to be consistent between the wild‐type and the mutants with the exception of T478I, which had a binding free energy 6 kcal/mol higher than the wild‐type. These results are consistent with experimental data from deep mutational scanning and flow cytometry, which found that all naturally occurring mutants have a similar degree of expression and a similar binding affinity for ACE2 as in the wild‐type. 14 Our own analysis of the SARS‐CoV‐2 proteome evolution 16 during the current pandemic has identified G476S, S477N, T478I, and V483A as mutations that have appeared in the RBD and are clustered within Loop 3 (Figure 2D). Similar to the simulations of the RBD‐ACE2 complex, 13 the MD simulations of the mutants of the unbound RBD in this work do not show large perturbing effects on the average conformational state. This suggests that while these mutants may have an impact on the stability of the binding interface with ACE2, they do not greatly perturb the conformational state of the RBD in solution and may even serve to reduce the conformational sampling of the unbound RBD.
The ongoing COVID‐19 pandemic caused by SARS‐CoV‐2 has focused the collective scientific community to quickly provide both knowledge and action to help alleviate the effects of this crisis. In this work, our data indicate that common mutations identified in the Loop 3 region of the CoV2‐RBD are fairly nonperturbing and do not affect its conformational flexibility and sampling in an unbound solution state, suggesting a therapeutic designed to target this region may be broadly applicable to RBDs with mutations in this region. In addition, we have identified unique conformations of the unbound CoV2‐RBD in solution that naturally block the binding interface with ACE2 and may be interesting targets for drug design to interfere with RBD‐ACE2 binding. We hope that these results will help to catalyze future identification of therapies relevant to CoV2 or to future coronaviruses that may emerge.
4. MATERIALS AND METHODS
4.1. Preparation of initial structural models
The structures used to model the wild‐type RBDs were taken from the Coronavirus Structural Taskforce (https://github.com/thorn-lab/coronavirus_structural_task_force), which further refined the high‐resolution structures determined by X‐ray diffraction of CoV1‐RBD in complex with a neutralizing antibody (PDB: 2dd8) and of CoV2‐RBD in complex with the human ACE2 receptor (PDB: 6m0j). In order to isolate the RBD for subsequent MD simulations, the protein modeling platform PyRosetta 18 was employed to remove the ACE2 receptor residues and RBD glycans from the model, leaving only the clean RBD residues. The four selected mutants (G476S, S477N, T478I, and V483A) were then generated from the clean wild‐type RBD structure by creating a decoy of the wild‐type structure in PyRosetta and restricting for the selected mutation. These mutant decoys were then relaxed based on the ref2015_cst score function within PyRosetta. 28 One‐hundred energy minimized decoys for each mutant were generated in this protocol, and the lowest energy decoy for each mutant RBD was selected as the starting structure for MD simulation.
4.2. Rosetta loop modeling
Loop 3 variant structures of the wild‐type RBD were generated using the lowest energy decoy of the wild‐type RBD, using the same protocol as described for the mutant models. The loop being modeled was defined from residues 472–490 in PyRosetta with jumps in the foldtree introduced at residue 470, 481, and 492. The PyRosetta KinematicMover 18 , 28 was then used to search for a different conformation in the loop carbon backbone with residues 472 and 490 as pivots. Only conformations maintaining the critical disulfide between C480 and C488 were selected to output a decoy, and this protocol was run until 100 decoys had been generated. All 100 decoys were then relaxed based on the ref2015_cst score function using PyRosetta. 28 Once again, 100 energy minimized structures for each initial loop decoy were generated and five loop structures were chosen at random for MD simulations. The lowest energy decoy of each of the five loop structures was used for MD simulation.
4.3. MD simulations
All of the water molecules in the initial X‐ray structure were removed. Each protein was immersed in a truncated octahedral box of OPC water molecules 29 with the box border at least 20 Å away from any atoms of the RBD. Each system was neutralized by adding 2 Cl− counter ions. The protein was treated with the ff19SB force field. 30 The simulations were performed with the GPU‐enabled CUDA version of the pmemd module in the AMBER 2018 package. 31 Prior to MD simulation, the systems were subjected to energy minimizations and equilibration. The minimization started with 1000 steps of steepest descent minimization followed by 1000 steps of conjugate gradient minimization. The system was heated from 0 to 300 K over 100 ps with protein position restraints of 10 kcal/mol A−2. Then a series of equilibrations (each lasting 10 ns) were performed at constant temperature of 300 K and pressure of 1 atm with protein position restraints that were incrementally released (10.0, 1, 0.1, and 0 kcal/mol A−2). Periodic boundary conditions were used, and electrostatic interactions were calculated by the particle mesh Ewald method, 32 , 33 with the nonbonded cutoff set to 9 Å. The SHAKE algorithm 34 was applied to bonds involving hydrogen, and a 2 fs integration step was used. Pressure was held constant at 1 atm with a relaxation time of 2.0 ps. The temperature was held at 300 K with Langevin dynamics and a collision frequency of 5.0 ps−1. The production runs for wild‐type CoV1‐RBD and wild‐type CoV2‐RBD are 4 μs, for the CoV2‐RBD mutants are 2 μs, and for the CoV1‐RBD and CoV2‐RBD loop models are 750 ns.
All analysis of MD trajectories, including the RMSD, root‐mean‐square fluctuation (RMSF), hierarchical agglomerative clustering, and the extraction of representative structures from trajectories were performed using CPPTRAJ 35 as implemented in AMBER18. Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) free energy calculations were performed using MMPBSA.py 36 as packaged with AMBER18, using the modified GBn model (igb = 8) 37 , 38 and averaged over 7500 frames of each simulation. Visualization of structures was performed with UCSF Chimera. 39 Cluster analysis was performed on the binding interface toward ACE2 of SARS‐CoV‐RBD and SARS‐CoV2‐RBD (residues 432–492 for SARS‐CoV‐RBD and residues 445–506 for SARS‐CoV2‐RBD, respectively) using the average‐linkage hierarchical agglomerative method. 19 Coordinate RMSD was used as the distance metric. The critical distance ε value was set to 1.9 Å and the sieve value was set to 10. Only the backbone C, CA, and N atoms were used in the clustering.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
AUTHOR CONTRIBUTIONS
Jonathan K. Williams, Baifan Wang, Andrew Sam, David A. Case, and Jean Baum designed research. Jonathan K. Williams, Baifan Wang, and Andrew Sam performed research. Jonathan K. Williams, Baifan Wang, Andrew Sam, and Cody L. Hoop analyzed data. All authors contributed to writing and editing the manuscript.
Supporting information
ACKNOWLEDGMENT
This work was supported by a National Institutes of Health Grant GM136431 (Jean Baum) and Rutgers University Center for COVID‐19 Response and Pandemic Preparedness (CCRP2) research grants (Jean Baum and David A. Case).
Williams JK, Wang B, Sam A, Hoop CL, Case DA, Baum J. Molecular dynamics analysis of a flexible loop at the binding interface of the SARS‐CoV‐2 spike protein receptor‐binding domain. Proteins. 2022;90(5):1044‐1053. doi: 10.1002/prot.26208
Funding information Rutgers University Center for COVID‐19 Response and Pandemic Preparedness, Grant/Award Number: CCRP2; National Institutes of Health Grant, Grant/Award Number: GM136431
DATA AVAILABILITY STATEMENT
All data and protocols are available upon reasonable request to the corresponding author.
REFERENCES
- 1. Chen Y, Liu Q, Guo D. Emerging coronaviruses: genome structure, replication, and pathogenesis. J Med Virol. 2020;92(4):418‐423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Watanabe Y, Allen JD, Wrapp D, McLellan JS, Crispin M. Site‐specific glycan analysis of the SARS‐CoV‐2 spike. Science. 2020;369(6501):330‐333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Cui J, Li F, Shi Z‐L. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17(3):181‐192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Laurini E, Marson D, Aulic S, Fermeglia M, Pricl S. Computational alanine scanning and structural analysis of the SARS‐CoV‐2 spike protein/angiotensin‐converting enzyme 2 complex. ACS Nano. 2020;14(9):11821‐11830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Dehury B, Raina V, Misra N, Suar M. Effect of mutation on structure, function and dynamics of receptor binding domain of human SARS‐CoV‐2 with host cell receptor ACE2: a molecular dynamics simulations study. J Biomol Struct Dyn. 2020;1‐15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Wan Y, Shang J, Graham R, Baric RS, Li F. Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade‐long structural studies of SARS coronavirus. J Virol. 2020;94(7):e00127‐e00120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Nguyen HL, Lan PD, Thai NQ, Nissley DA, O'Brien EP, Li MS. Does SARS‐CoV‐2 bind to human ACE2 more strongly than does SARS‐CoV? J Phys Chem B. 2020;124(34):7336‐7347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Ali A, Vijayan R. Dynamics of the ACE2–SARS‐CoV‐2/SARS‐CoV spike protein interface reveal unique mechanisms. Sci Rep. 2020;10(1):14214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Lee Y, Lazim R, Macalino SJY, Choi S. Importance of protein dynamics in the structure‐based drug discovery of class a G protein‐coupled receptors (GPCRs). Curr Opin Struct Biol. 2019;55:147‐153. [DOI] [PubMed] [Google Scholar]
- 10. Śledź P, Caflisch A. Protein structure‐based drug design: from docking to molecular dynamics. Curr Opin Struct Biol. 2018;48:93‐102. [DOI] [PubMed] [Google Scholar]
- 11. Grant OC, Montgomery D, Ito K, Woods RJ. Analysis of the SARS‐CoV‐2 spike protein glycan shield reveals implications for immune recognition. Sci Rep. 2020;10(1):14991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Ishida T, Kinoshita K. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007;35(2):W460‐W464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Ghorbani M, Brooks BR, Klauda JB. Critical sequence hotspots for binding of novel coronavirus to angiotensin converter enzyme as evaluated by molecular simulations. J Phys Chem B. 2020;124(45):10034‐10047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Starr TN, Greaney AJ, Hilton SK, et al. Deep mutational scanning of SARS‐CoV‐2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell. 2020;182(5):1295‐1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data ‐ from vision to reality. EuroSurveillance. 2017;22(13). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Lubin JH, Zardecki C, Dolan EM, et al. Evolution of the SARS‐CoV‐2 proteome in three dimensions (3D) during the first six months of the COVID‐19 pandemic. bioRxiv. 2020.2012.2001.406637. 2020.
- 17. Lubin JH, Zardecki C, Dolan EM, et al. Evolution of the SARS‐CoV‐2 Proteome in Three Dimensions (3D) during the First Six Months of the COVID‐19 Pandemic—Supplementary tables and models. In: Zenodo; 2020.
- 18. Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script‐based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26(5):689‐691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Shao J, Tanner SW, Thompson N, Cheatham TE. Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms. J Chem Theory and Comput. 2007;3(6):2312‐2334. [DOI] [PubMed] [Google Scholar]
- 20. Sanyal D, Chowdhury S, Uversky VN, Chattopadhyay K. An exploration of the SARS‐CoV‐2 spike receptor binding domain (RBD) – a complex palette of evolutionary and structural features. bioRxiv. 2020:2020.2005.2031.126615. [DOI] [PMC free article] [PubMed]
- 21. Jubb H, Blundell TL, Ascher DB. Flexibility and small pockets at protein–protein interfaces: new insights into druggability. Prog Biophys Mol Biol. 2015;119(1):2‐9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Makley LN, Gestwicki JE. Expanding the number of ‘Druggable’ targets: non‐enzymes and protein–protein interactions. Chem Biol Drug des. 2013;81(1):22‐32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Vajda S, Beglov D, Wakefield AE, Egbert M, Whitty A. Cryptic binding sites on proteins: definition, detection, and druggability. Curr Opin Chem Biol. 2018;44:1‐8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Beglov D, Hall DR, Wakefield AE, et al. Exploring the structural origins of cryptic sites on proteins. Proc Natl Acad Sci. 2018;115(15):E3416‐E3425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Yurkovetskiy L, Wang X, Pascal KE, et al. Structural and functional analysis of the D614G SARS‐CoV‐2 spike protein variant. Cell. 2020;183(3):739‐751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Korber B, Fischer WM, Gnanakaran S, et al. Tracking changes in SARS‐CoV‐2 spike: evidence that D614G increases infectivity of the COVID‐19 virus. Cell. 2020;182(4):812‐827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Zhang L, Jackson CB, Mou H, et al. The D614G mutation in the SARS‐CoV‐2 spike protein reduces S1 shedding and increases infectivity. bioRxiv. 2020.2006.2012.148726. 2020.
- 28. Alford RF, Leaver‐Fay A, Jeliazkov JR, et al. The Rosetta all‐atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13(6):3031‐3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Izadi S, Anandakrishnan R, Onufriev AV. Building water models: a different approach. J Phys Chem Lett. 2014;5(21):3863‐3871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Tian C, Kasavajhala K, Belfon KAA, et al. ff19SB: amino‐acid‐specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J Chem Theory Comput. 2020;16(1):528‐552. [DOI] [PubMed] [Google Scholar]
- 31. AMBER . 2018 [computer program]. University of California, San Francisco; 2018.
- 32. Darden T, York D, Pedersen L. Particle mesh Ewald: an N·log(N) method for Ewald sums in large systems. J Chem Phys. 1993;98(12):10089‐10092. [Google Scholar]
- 33. Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. J Chem Phys. 1995;103(19):8577‐8593. [Google Scholar]
- 34. Ryckaert J‐P, Ciccotti G, Berendsen HJC. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n‐alkanes. J Comput Phys. 1977;23(3):327‐341. [Google Scholar]
- 35. Roe DR, Cheatham TE. PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J Chem Theory Comput. 2013;9(7):3084‐3095. [DOI] [PubMed] [Google Scholar]
- 36. Miller BR, McGee TD, Swails JM, Homeyer N, Gohlke H, Roitberg AE. MMPBSA.Py: an efficient program for end‐state free energy calculations. J Chem Theory Comput. 2012;8(9):3314‐3321. [DOI] [PubMed] [Google Scholar]
- 37. Nguyen H, Roe DR, Simmerling C. Improved generalized born solvent model parameters for protein simulations. J Chem Theory Comput. 2013;9(4):2020‐2034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Mongan J, Simmerling C, McCammon JA, Case DA, Onufriev A. Generalized born model with a simple, robust molecular volume correction. J Chem Theory Comput. 2007;3(1):156‐169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Pettersen EF, Goddard TD, Huang CC, et al. UCSF chimera—a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605‐1612. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data and protocols are available upon reasonable request to the corresponding author.