Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2021 Oct 11;118(43):e2100943118. doi: 10.1073/pnas.2100943118

Distant residues modulate conformational opening in SARS-CoV-2 spike protein

Dhiman Ray a, Ly Le a,b,1, Ioan Andricioaei a,c,2
PMCID: PMC8639331  PMID: 34615730

Significance

The novel coronavirus (SARS-CoV-2) pandemic resulted in the largest public health crisis in recent times. Significant drug design effort against SARS-CoV-2 is focused on the receptor-binding domain (RBD) of the spike protein, although this region is highly prone to mutations causing therapeutic resistance. We applied deep data analysis methods on all-atom molecular dynamics simulations to identify key non-RBD residues that play a crucial role in spike−receptor binding and infection. Because the non-RBD residues are typically conserved across multiple coronaviruses, they can be targeted by broad-spectrum antibodies and drugs to treat infections from new strains that might appear during future epidemics.

Keywords: molecular biophysics, molecular dynamics, virus structure, statistical mechanics, COVID-19

Abstract

Infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) involves the attachment of the receptor-binding domain (RBD) of its spike proteins to the ACE2 receptors on the peripheral membrane of host cells. Binding is initiated by a down-to-up conformational change in the spike protein, the change that presents the RBD to the receptor. To date, computational and experimental studies that search for therapeutics have concentrated, for good reason, on the RBD. However, the RBD region is highly prone to mutations, and is therefore a hotspot for drug resistance. In contrast, we here focus on the correlations between the RBD and residues distant to it in the spike protein. This allows for a deeper understanding of the underlying molecular recognition events and prediction of the highest-effect key mutations in distant, allosteric sites, with implications for therapeutics. Also, these sites can appear in emerging mutants with possibly higher transmissibility and virulence, and preidentifying them can give clues for designing pan-coronavirus vaccines against future outbreaks. Our model, based on time-lagged independent component analysis (tICA) and protein graph connectivity network, is able to identify multiple residues that exhibit long-distance coupling with the RBD opening. Residues involved in the most ubiquitous D614G mutation and the A570D mutation of the highly contagious UK SARS-CoV-2 variant are predicted ab initio from our model. Conversely, broad-spectrum therapeutics like drugs and monoclonal antibodies can target these key distant-but-conserved regions of the spike protein.


The COVID-19 pandemic continues its spread, with more than 160 million confirmed cases worldwide, including 4.1 million deaths by the end of July 2021, according to the World Health Organization. The etiological agent, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a member of the Coronaviridae family, which includes SARS-CoV-1 (2002−2004) and Middle East respiratory syndrome coronavirus (since 2012), viruses with which SARS-CoV-2 has a sequence identity of 79.6% and 50%, respectively (1, 2). Expressed on the surface of SARS-CoV-2, the spike (S) protein plays a crucial role in infection. It binds to the host angiotensin-converting enzyme 2 (ACE2) through the S protein’s receptor-binding domain (RBD), thereby facilitating viral entry into host cells (3, 4). Therefore, the spike protein is a preponderant target for inhibitors that impede SARS-CoV-2 infection. Assessing the genomic variability of SARS-CoV-2 reveals a moderate mutation rate compared to other RNA viruses, around 1.12 ×103 nucleotide substitutions per site per year (5) [but much larger than DNA viruses (6)]; the SARS-CoV-2 mutation rate is at the same level as SARS-CoV-1 (7). Significantly, the spike protein has been demonstrated to be particularly susceptible to acquiring mutations (5, 810). More specifically, a study analyzing 10,022 SARS-CoV-2 genomes from 68 countries revealed 2,969 different missense variants, with 427 variants in the S protein alone (5). This suggests a strong propensity to form new strains with higher virulence and more complicated epidemiology; the dominant D614G mutation and the recent B.1.1.7 mutant are notable examples (11). Spike protein variability can thus render currently existing therapeutic agents ineffective in combating SARS-CoV-2 and other probable SARS epidemics in the future. It therefore is of fundamental value to understand, in microscopic detail, the role of spike mutations in the structural dynamics that triggers infection. Large-scale screening of therapeutic molecules and antibodies is underway, aiming to target the spike protein and, consequently, to prevent infection. Most of the experimental (1215) and computational (1618) efforts for inhibitor design focus on the RBD, despite the fact that this region is highly mutation prone [see, for example, Verkhivker (19), Spinello et al. (20), and our own sequence alignment study in SI Appendix] and can cause resistance to therapeutics. For instance, the mutations in the RBD observed in the emerging viral lineage in South Africa resulted in up to a 10-fold reduction in the neutralization capacity of conventional antibody therapy (21).

However, domains in the spike other than the RBD are also possible targets for inhibition. The human immune system started generating antibodies specific to residues outside the RBD even at the early stages of the pandemic. Liu et al. (22) extracted, from infected patients, multiple COVID-19 neutralizing antibodies, a fraction of which bind to non-RBD epitopes of the S protein, such as the N-terminal domain (NTD). Moreover, a separate group of antibodies, present in the bloodstream of uninfected humans (particularly, and importantly, in children), were observed to bind specifically to the S2 stem region. This structure has virtually identical sequence in all coronavirus strains, including the ones causing the common cold (23). It is the presence of such universal antibodies, selective to the conserved regions of the spike, that is hypothesized to cause the absence of severe infections in children. These observations motivate an effort toward designing small-molecule drugs and antibodies targeted toward residues far from the RBD in the three-dimensional (3D) structure which, consequently, are less prone to mutation.

A number of studies explored druggable hotspots in the spike protein that modulate, via allostery, ACE2 binding (20, 2426). A concurrent computational study established that RBD mutations appearing in the new strains of SARS-CoV-2 can, via allosteric effects, increase binding affinity of the spike to the human ACE2 receptor as well as impair the binding interactions of neutralizing antibodies (20). Hydrogen deuterium exchange mass spectrometry experiments revealed the dampening and the increase of conformational motion at the stalk region and the proteolytic cleavage site, respectively, upon RBD binding to the ACE2 receptor (26). Although this is a noteworthy approach toward targeting non-RBD residues, the S-ACE2−bound complex, once formed, already can initiate infection. So inhibitors which bind to the ACE2-bound spike are only partially effective in preventing viral entry. We, therefore, focus our study on the effect of distant residues on the dynamics of the structural transition in the spike protein leading to the RBD up conformation, that is, before it becomes posed for binding (Fig. 1). This is because RBD opening plays an obligatory role in the infection by displaying the RBD to the ACE2 receptor (2731). Therapeutics inhibiting this structural transition can therefore prevent ACE2 binding altogether, providing a higher degree of barrier toward the infection.

Fig. 1.

Fig. 1.

The RBD opening transition in the SARS-CoV-2 spike protein: The RBD of the chain shown in green is undergoing the down-to-up transition leading to the binding of the human ACE2 receptor.

The down-to-up transition of the S-protein RBD has been subjected to extensive cryogenic electron microscopy (cryo-EM) studies that elucidated the atomic resolution structures of the closed [Protein Data Bank (PDB) ID: 6VXX (32) and 6XR8 (33)], partially open [PDB ID: 6VSB (34)], and fully open [PDB ID: 6VYB (32)] states. Attempts have been made to study the dynamics of RBD opening using force field−based classical molecular dynamics (MD) simulation (35, 36). However, the direct observation of an RBD opening transition is beyond the scope of atomistic MD simulation, primarily because of the large size of the system and the long timescales involved. Yet, combining multiple short trajectories totaling up to 1.03 ms, the Folding@Home project could resolve various intermediates of the RBD opening transition (36). Alternatively, Gur et al. (30) used steered MD (SMD) in combination with unbiased MD simulations to uncover the mechanism of the conformational opening process. They obtained a free energy landscape and delineated the effect of salt bridges on the transition. Moreover, structure-based coarse-grained modeling could be employed to explore the conformational landscape of multiple configurations of the spike trimer (one-up-two-down, two-up-one-down, three-up, and three-down) (37) and, in a separate study, to identify allosteric communication pathways in the spike protein (19). Despite providing qualitative insight, the absence of explicit solvent and atomistic detail and the exclusion of the glycan shield [which plays dynamic roles beyond shielding itself (35, 38)] obscured a quantitative mechanistic picture of the conformational opening transition.

We took a three-pronged approach to identify the distant residues that show correlated motion coupled to RBD opening and closing. First, as an initial exploration, we pulled the RBD of the closed state at nonequilibrium, to generate a putative open structure. Using potential of mean force calculations, obtaining a nonequilibrium work profile, this structure, as expected, led to a value higher than the equilibrium free energy (20 kcal/mol; SI Appendix, Fig. S1). This is akin to a single-molecule pulling experiment (39). However, we emphasize that we do not use the reaction coordinate (RC) to derive any quantitative data. Given the artificiality of the result obtained by following a single degree of freedom, we resorted to the use of the more sophisticated time-lagged independent component analysis (tICA) coordinates to capture the motion. We then designed and employed a way to identify residues important for the conformational change by quantifying the correlation of the backbone torsion angles of the protein with the slowest degrees of freedom, representing the down-to-up transition of the RBD. Thirdly, we took an alternative route to study allosteric connections, by constructing a protein graph connectivity network model that uses the mutual information metric computed from MD trajectories. Taken together, the three approaches resulted in the prediction of a handful of residues in non-RBD regions of the S protein that play a crucial role in the conformational rearrangements of the spike. These residues suggest possible future mutational hotspots, as well as targets for designing inhibitors that can reduce the flexibility of the RBD, leading to reduced receptor binding capability. In fact, the most ubiquitous spike protein mutation to date, the D614G (40), and the A570D mutation in the recently emerged highly contagious UK SARS-CoV-2 strain (41) both appeared among our predicted set of residues, the latter having been discovered after the completion of our calculations.

Results

Correlation between RBD Opening and Backbone Dihedral Angle.

Multiple unbiased trajectories were propagated from different regions of the S-protein RBD opening conformational space, priorly explored by SMD and umbrella sampling (details in SI Appendix). Three of those trajectories were assigned as the closed, partially open, and fully open states, based on the position of free energy minima along the US RC (see SI Appendix, Fig. S1). The stability of these three conformations were ensured by inspection of the trajectories. The cumulative simulation data were projected onto a feature space composed of pairwise distances between residues from RBD and from other parts of the spike near the RBD (details in SI Appendix); these distances increase during the down-to-up transition. The projected trajectory data were subjected to principal component analysis (PCA) (42) and tICA (4345). The former method calculates the degrees of freedom with the highest variance in the system, while the latter obtains the ones with the longest timescale. These methods quantify large conformation changes in complex biomolecules. The goal of performing PCA and tICA is to find one or two coordinates which best describe the RBD opening motion. As one continuous trajectory hardly samples the transition event, multiple short trajectories spanning a large range of the configuration space were used (46).

The first principal component (PC) and the first time-lagged independent component (tIC), obtained from PCA and, respectively, from tICA analysis, could both distinguish the closed and open states, although the partially open and fully open states could not be distinguished within the first two PCs (SI Appendix, Fig. S2). Yet, the projections of the open- and closed-state trajectories along the first two principal components are in agreement with the results of previous long multimicrosecond spike protein simulations (35). Because of the clear distinction between the closed, the open, and the intermediate, partially open states (Fig. 2B), we chose the first two tICs for our subsequent analysis.

Fig. 2.

Fig. 2.

(A) Structure of spike protein with the residues in RBD shown in green color. Non-RBD residues strongly correlated with the RBD opening motion are represented by spheres (color code: chain A, yellow; chain B, blue; chain C, red). The RBD of chain A is performing down-to-up conformational change. (B) The projection of all unbiased trajectories along the two slowest degrees of freedom (tICs) obtained from tICA analysis. (C) Average number of hydrogen bonds for the highest correlated residues/residue pairs (Table 1) in the closed and the partially open states. (D) Normalized distribution of representative backbone dihedral angles strongly correlated with tIC 1 and tIC 2 (Table 1). The distributions are calculated from closed and partially open state trajectories. The corresponding values of the dihedral angles in the PDB structures are marked in the plot for reference.

Large-scale conformational changes in proteins are, at a fundamental level, stemming from complex combinations of transitions between various states of the protein backbone torsional angles ϕ and ψ. These combinations add up to global displacements, which set the timescale for internal friction (47) and gauge the paradoxically large number of conformational states accessible to a protein as it folds (48). On one hand, concerted transitions of the backbone torsions typically lead to large-scale motion. On the other hand, exponential divergence in nonlinear dynamical systems (49) is such that only certain dihedrals are likely to predominantly effect the conformational changes. We therefore hypothesize that there exist specific residues in the spike protein for which the transition in backbone dihedral states results in the opening of the RBD. Moreover, we conjecture that, given the significant mutation rate, and because of the selection pressure on the virus, the residues with large impact in collective motions that facilitate infectivity are likely to be selected in spike mutants. To test this hypothesis, we calculated the Pearson correlation coefficients of the sines and cosines of all the ϕ and ψ backbone torsion angles of all the residues with the first two tICs. The magnitude of the correlation is found to be significantly large only for a handful of torsion angles, whereas the majority show near-zero correlation (SI Appendix, Fig. S3).

We defined a metric called “correlation score” (CS) for each torsion angle in each residue. The value of the metric is computed as

CS(θ)=|C(cos(θ),IC1)|+|C(sin(θ),IC1)|+|C(cos(θ),IC2)|+|C(sin(θ),IC2)|, [1]

where θ, IC1, and IC2 are the vectors containing the time series of the angle θ, the tIC1, and the tIC2, respectively, and C(x,y) is the Pearson correlation coefficient of datasets x and y. The CS metric can take values from zero to four, and a higher value indicates that the particular torsion angle shows a highly correlated (or anticorretated) motion with the slowest conformational change which, in this case, is the RBD transition. We avoided summing over the Φ and Ψ angles of the same residue, or over residues of different protein chains, as it might average out the contributions from each angle and consequently obscure the process of specifying the role of each individual residue in the conformational transition.

We sorted the residues based on the CSs of their torsion angles, and a list is provided in SI Appendix. The highest values of correlation scores are shown primarily by pairs of consecutive residues, with ψ of the first and the ϕ of the second residue, as depicted in Table 1. This suggests that two consecutive torsion angles in certain regions of the protein are the most highly correlated with the RBD opening motion. Most dihedrals belonged to residues in the loop structure joining the RBD with the S2 stem, as this region is a hinge for the opening of the RBD. This correlation does not exclude causation, since change in the conformational state for two subsequent torsion angles can induce crankshaft motion (50) in the backbone which, propagating along the chains, leads to a change in protein structure.

Table 1.

A list of residues for which the backbone torsional angles are strongly correlated with the first two tIC components

Residue Chain Predominant Angle CS
Gln613 A, B, C ψ 2.62, 2.61, 2.59
Asp614 A, B, C ϕ 2.56, 2.54, 2.46
Cys1082 B ϕ 2.25
His1083 B ψ 2.54
Asp1084 B ϕ 2.49
Pro600 A, B, C ψ 1.56, 1.48, 1.52
Gly601 A, B, C ϕ 2.37 2.31, 2.33
Ile569 A, B, C ψ 2.40, 2.49, 2.46
Ala570 A, B, C ϕ 1.56, 2.22, 2.16
Cys136 B ψ 1.42
Asn137 B ψ,ϕ 1.70, 1.80
Phe833 B ψ 2.37
Ile834 A, B, C ψ 1.73, 2.40, 2.12
A ϕ 1.19
Ser112 C ψ,ϕ 1.67, 2.23

The correlation score for each of the dihedral angles (see text and SI Appendix) is also reported.

The distribution of some of the dihedral angles, with highest CSs, in the closed and partially open states, are depicted in Fig. 2D. Similar plots for all torsion angles with CS>2.0 are provided in SI Appendix. We compared the dihedral angle distribution only beween the closed and the partially open states, as the significance of the artificially prepared fully open structure should not be overemphasized. This structure, generated from the closed state using SMD and US (SI Appendix), is only an approximation for a more exact RBD up structure that binds the ACE2 receptor. In literature, the structure with PBD ID 6VSB is sometimes referred to as the open conformation (35), which, in this work, we refer to as the “partially open” state. So, in the rest of the paper, when we attempt to compare the behavior of a chosen set of residues between closed and open states at an atomistic detail level, we only include the closed and partially open configurations. The fact that the highly correlated residues follow a distinctly different distribution in the backbone torsion angle space (Fig. 2D) indicates that a handful of non-RBD residues can play a pertinent role in the conformational change of the spike and, consequently, in the viral infection. Interestingly, the correlated torsion angles span over all three chains of the spike trimer, namely A (the one undergoing the RBD transition), B, and C (Fig. 2A and Table 1), hinting at the potential role of interresidue couplings ranging over long distances in presenting the RBD to the ACE2 receptor. The residues exhibiting highest correlation scores (Table 1), particularly Gln613, Asp614, Pro600, Gly601, Ile569, and Ala570, are present in the linker region joining the RDB with the S2 domain, which, as mentioned above, is the hinge for the opening motion that presents the RBD to the receptor (51). The Phe833 and Ile834 residues, although technically part of S2 domain, can significantly impact the dynamics of the hinge or linker due to their proximity in 3D structure. Similar arguments are applicable for the NTD domain residues such as Cys136 and Asn137 from chain B and Ser112 from chain C, which are able to impact the RBD due to their structural proximity. Interestingly, residues near the stem region, including Cys1082, His1083, and Asp1084, appear in our list as strongly correlated and can potentially be used as a target for broad-spectrum antibody or vaccine design targeting the stem region (52, 53).

When the virus mutates these particular residues in a way that increases its virulence, this increase stems from the propensity of the RBD to “flip” open and thereby increase ACE2 binding. As evidence, we highlight the example of the D614G mutation, which is already observed in numerous strains of the SARS-CoV-2 all over the world (10, 40). Cryo-EM studies have indicated that the D614G mutation is, by itself, capable of altering the conformational dynamics of spike protein by stabilizing an RBD up state over the down conformation (54, 55). D614 is one of the top-ranked residues predicted from our model for the potential to play a crucial role in RBD opening. A glycine residue has the least backbone torsion barrier for conformational transition in ϕψ space, due to the absence of a side chain. Replacing an Asp residue, which has higher barriers to such transitions, with a glycine can increase the flexibility of the backbone, significantly impacting the probability of observing an RBD up conformation. To understand whether this can be the reason why this particular mutation was selected to become so widespread, a comparison of the geometric and “chemical” effects of Gly should be assessed. To this end, we performed additional simulations of the open and closed states of the D614G mutant. Indeed, our simulations of the D614G mutant spike indicate that, unlike the wild-type (WT) system for which a significantly different dihedral angle distribution exists, there is no difference between the closed and the partially open configurations in terms of the torsion angle space explored by residue G614 (SI Appendix, Fig. S14). The glycine residue at 614 position also experienced different degrees of hydrogen bonding and electrostatic interactions (see below).

A wide range of spike protein mutant sequences have been characterized, each with varying degrees of abundance. A relatively rare mutation, A570V, resulted in a decrease of the overall stability of the spike protein in all three states, based on the FoldX empirical force field (10, 56). Free energy values (10) were obtained from only structural data, and no dynamical information was considered in that study. Yet it is worth noting that the change in total and solvation free energies, due to this mutation, were substantially different for the closed and open states, resulting in a change in ΔG for RBD opening. But, as the side chains of Ala and Val are similar in terms of steric bulk, this mutation is unlikely to significantly impact RBD dynamics. As it likely did not increase the evolutionary advantage of the virus by increasing infectivity, this mutation, only occurring in one strain (10) so far, did not become as prevalent as D614G.

On the contrary, an A570D mutation is observed in the same residue in the newly emerged and highly infectious B.1.1.7 strain in the United Kingdom (41, 57, 58). This mutation is likely to play a pertinent role in infection, as it replaces a hydrophobic amino acid with a charged one. This leads to a significant difference in the conformational dynamics of the 570 residue and consequently impacts the large-scale RBD opening motion. Structural biology experiments have established that a mutation in the A570 residue alters the propensity of RBD opening by modulating the hydrophobic interaction of the hinge region with the S2 core (59). Coarse-grained modeling studies explained this observation by noting that A570 is part of a regulatory switch that triggers the conformation change necessary for receptor binding (60).

The B.1.1.7 also shows a P681H mutation close to the highly correlated N679 residue predicted from our model (SI Appendix, Fig. S5). As this mutation replaces a structurally rigid proline residue, it can possibly impact the conformational space accessible to nearby residues, including N679.

Giving pause for thought, these results indicate that mutations in the highest correlated residues (Table 1) can, in fact, have significant physiological impact in changing the course of the pandemic. Therefore, we provide a list of residues (Table 1 and SI Appendix, Figs. S3–S10), future mutations of which could impact RBD dynamics and consequently change the transmissibility or virulence of SARS-CoV-2. (See Data Availability for access to the raw correlation coefficient data for all residues.) Yet, care should be taken with assigning the predominant role in infection to a single, non-RBD domain residue in the UK variant; several other mutations are present that could modulate the binding affinity to the ACE2 receptor (particularly N501Y in the RBD−ACE2 binding interface). However, mutations outside the RBD can indeed play a key role in infection by disproportionately favoring an RBD “up” structure (52, 53, 59).

For a more detailed understanding, we compared the average number of hydrogen bonds per residue group from Table 1 for the closed and the partially open trajectories (Fig. 2C). We observed significant changes in the number of hydrogen bonds in residue groups: Q613/D614, I569/A570, C1082/H1083/D1084, and C136/N137 (Fig. 3). In the closed state, the carboxylate side chain of D614 residue forms hydrogen bonds with K854 and T859 which are lost in the RBD up configuration. These hydrogen bonds will be absent in D614G mutant and likely reduce the energy cost of the conformational transition. Particularly, the loss of hydrogen bond with T859 has been attributed to the higher stability of the RBD up structure in D614G mutant, by Mansbach et al. (61). Our simulations also indicate that there is a loss of one hydrogen bond in the Q613/D614 residues going from RBD down to RBD up conformation in the WT spike. But such loss of hydrogen bonding is not observed in the case of the D614G mutant (SI Appendix). On the contrary, formation of new hydrogen bonds is observed in the other three residue groups (Fig. 3) which can be enhanced or reduced by mutating the residues involved.

Fig. 3.

Fig. 3.

Representative snapshots of the hydrogen bonding pattern of some of the groups of residues from Table 1 and Fig. 1C. Upper corresponds to the closed state, and Lower shows the partially open state.

Additionally, the nonbonded interaction energies (electrostatic and van der Waals [vdW]) of the residues, from Table 1, differ significantly in the two conformations. Unsurprisingly, the D614 is energetically stabilized in the closed conformation in comparison to the open state, due to additional hydrogen bonds. In the D614G mutant, from our analysis, this stabilization is significantly lower in comparison to the WT (SI Appendix, Fig. S15 and Additional results). However, residues P600/G601 are more stabilized in the open state in comparison to the closed state via favorable Coulomb and vdW interactions. A similar effect is observed in C136 for electrostatic energy but is somewhat compensated for by the opposite trend in vdW energy. A570 and N137 have lower electrostatic energy in the closed state despite having fewer hydrogen bonds. In short, non-RDB residues that experience a different amount of nonbonded interaction with the rest of the protein or show different hydrogen bonding patterns in the “RBD down” and “RBD up” state can impact the relative stability of those two conformations when mutated into residues with different properties.

Mutual Information and Network Model.

A different approach to characterize the coupling between distant regions in a protein is to calculate the cross-correlations between the positions of different residues in 3D space. This method is often used to study allosteric effects upon ligand binding (6267). Conventional implementations compute the dynamic cross-correlation map (DCCM) of the position vectors of the Cα atoms (68). However, DCCM ignores correlated motions in orthogonal directions (67). This problem can be avoided by using a linear mutual information (LMI)-based cross-correlation metric, which we use in the current study (62, 69). The cross-correlation matrix elements, Cij, are given by

Cij=(1e(2/d)Iij)1/2, [2]

where Iij is the LMI computed as

Iij=Hi+HjHij, [3]

with H as the Shannon entropy function,

Hi=p(xi)lnp(xi)dxiHij=p(xi,xj)lnp(xi,xj)dxidxj, [4]

where, for two residues i and j, xi,xj are the 3D Cartesian vectors of atomic coordinates of the corresponding Cα atom, whereas p(xi) and p(xi,xj) indicate, respectively, the marginal probability density for xi and the joint probability density of xi and xj (62, 69). The change in cross-correlation between the apo and holo states of a protein is a gauge that traces allosteric communication in the protein by monitoring the changes in the local correlations between protein residues (62). The spike protein conformational change is not an allosteric process by strict definition, as it does not involve the binding of an effector. However, comparing the LMI cross-correlations between the RBD down and up states can help identify residues which behave differently in different protein conformations. More importantly, the difference of correlation between the closed state and a small perturbed structure toward the conformational opening can hint at the residues which gain or lose contact in the beginning stage of the opening transition and consequently initiate the large-scale motion. So we assigned one of the unbiased trajectories as “slightly open,” indicative of an early stage structure in the pathway of opening. We included the unbiased trajectory, corresponding to this structure, in our subsequent analysis, in which we compared the correlation heat map of all residues in the closed form with the three open states, namely the partially open, fully open, and slightly open state, in order to understand the coupling between RBD opening and protein residue fluctuations.

Overall change in LMI correlation is clearly larger for the fully open state in comparison to the partially open state, as evident from the higher appearance of reddish color (Fig. 4 AD). Unsurprisingly, the change is largest for the RBD and proximal residues encompassing the NTD region (residues 100 to 300) in all chains (Fig. 4 EG), as they lose direct contact during the opening motion. Residues in the RBD−S2 linker region in chains A and C (residues 524 to 700) show a large gain in correlation in the initial stage of RBD opening (“slightly open”) despite being not directly in contact with the RBD in the closed form. On the pathway from close to open, relevant correlation changes are already found in the slightly open state when comparing to the closed state (Fig. 4E), with significant changes for the important RBD−S2 linker region. However, more details appear (in other distant regions) as the transition approaches the partially open and fully open states; compare Fig. 4 F and G.

Fig. 4.

Fig. 4.

(AD) Cross-correlation matrices for four states (A, closed; B, slightly open; C, partially open; D, fully open) computed using LMI. (EG) Difference in correlation matrix elements for the (E) slightly, (F) partially, and (G) fully open states with respect to the closed state. RBD regions form highly correlated blocks (D and G), indicating that these residues are largely decoupled from the rest of the protein. Still, signatures of long-distance correlated motion are detectable.

Overall, these results are consistent with the dihedral angle correlations, described in the previous section: The residues in the loop region next to the RBD exhibit a change in the values of the backbone dihedral angles upon the down-to-up transition. The change in the correlation coefficient (ΔCorrelation) is also large for the RBDs and NTDs of chains B and C, which are in close proximity to chain A of the RBD in the closed state. Additionally, linker residues of chain B show significant gain in correlation upon transitioning to the fully open state. Some residues that gain or lose correlation (blue or red coloration in Fig. 4 EG) are situated at the opposite end (S2 region) of the spike, indicating the presence of long-range correlated motion. This long-distance correlation can indeed be a cumulative effect of many small local fluctuations on the way toward the RBD, along structural patches connecting these sites, allowing “distant” residues to shed their impact on the structural transition in RBD. These pathways can be revealed, for example, by the method of Ota and Agard (70), who used energy flow or vibrational energy relaxation to trace them.

For a more profound insight, we built a protein connectivity graph network model. In it, the Cα atoms of each amino acid are the nodes, and the correlations between them are the edges connecting the nodes. The number of nodes in our system is n = 3,438, which makes it one of the largest systems studied previously with this method (6267) [comparable to the work by Saltalamacchia et al. (71) on a splicosome complex involving 4,804 Cα atoms and 270 phosphorus atoms in the network]. We then calculated the betweenness centrality (BC), a graph theoretical measure that provides a way to quantify the amount of information that flows via the nodes and edges of a network. If a node i is working as a bridge between two other nodes along the shortest path joining them, then the BC of node i is given by

BC(i)=stnstigst, [5]

where gst is the total number of geodesics (shortest paths) joining nodes s and t, out of which nsti paths pass through node i (63). The change of BC in the dynamics of the spike protein has been recently observed using coarse-grained simulation methods (19). Despite the coarseness of the model, a handful of residues participating in the information-propagating pathway could be identified directly from the BC values. In the current work, we used the difference in BC as a metric to identify key residues which gain or lose relative importance along the allosteric information pathway. The difference in the normalized BC is measured by comparing the number for the partially open and fully open states with the closed conformation (i.e., BCslightly openBCclosed, BCpartially openBCclosed, and BCfully openBCclosed for every residue in the spike protein) from our all atom trajectories with explicit solvation. Importantly, our model also includes the highly relevant glycan shield, which was shown to modulate the conformational dynamics of the RBD by favoring a down conformation and functioning as a gate for the conformational opening, beyond its general role in shielding (35, 38). However, glycans were not included in the network analysis. While, in principle, it is valid to consider their role, their motion occurs on timescales that are much faster than those of the protein backbone and would be averaged out of any correlation calculation.

For slightly open, partially open, and fully open states, the residues with significant (e.g., >0.1) change in BC are mostly from the NTD region or RBD region of the B and C chains (Figs. 5 and 6). This suggests that the allosteric information flows through the nearby NTDs and RBDs, and mutations in this region can break the allosteric network (63) and affect the functionality of the spike protein. SARS-CoV-2 neutralizing antibodies were indeed observed to bind these regions of the spike (22). The identified residues are in close proximity to the RBD of chain A, the one undergoing the down-to-up transition.

Fig. 5.

Fig. 5.

Residues with large changes in normalized BC due to RBD opening. (Upper) Largest positive changes; (Lower) negative changes; chain index (A, B, or C) mentioned before the residue number. The location of the residue (NTD, RBD, linker, or S2) is also mentioned next to the residue number.

Fig. 6.

Fig. 6.

(A) Structures of the slightly open, partially open, and fully open states of the spike protein, with residues colored according to the difference of BC with respect to the closed conformation. Red indicates most negative and blue indicates most positive values of BC difference. The RBD of chain A that undergoes opening motion is colored in green. The NTD and S2 domains are also indicated. (B) The values of the difference in centrality with respect to closed state plotted as a function of residue indices. Residues with a BC difference of >0.2 are marked with a circle.

So the connectivity captured in the BC data is primarily due to short-range coupled fluctuations. Such couplings are broken when the RBD and NTD move apart, leading to the change in BC. For the same reason, the BC of the RBD of chain A increases in the fully open state as its internal vibrations become more independent of the rest of the protein.

In a culmination of the above, the most interesting aspect is the strikingly large change in BC of the residues which are distant from the RBD in 3D structure. Significant gain or loss of BC is observed in residues 607, 624, 713, 757, 896, and 1,097. The first three residues are present in the linker region joining the RBD with the S2 domain, while the other three are in the S2 itself. The linker region has a strong impact on the dynamics of the RBD, as we already established from the dihedral angle analysis. The allosteric network analysis reinforces this conclusion. Moreover, the large change of BC in the S2 domain indicates a complex long-range information flow connecting the RBD with the core residues of the protein. Electrostatic and vdW energy analysis, similar to that mentioned in the previous section, has been performed on the residues with changes in BC greater than 0.2. The interaction energy of the S2 domain residues such as I896 and G757 is significantly different for the closed and open states along some of the NTD residues like N188, V6, and S98. This has substantial implications for pharmaceutical design, as mutations within the NTD and the S2 domain can impact the receptor-binding propensity of the viral spike. These results also suggest that therapeutics targeted toward the S2 and toward the RBD−S2 linker can be effective in preventing COVID-19 infection, without complications stemming from the high rate of mutations in the RBD.

Concluding Discussion

To tame the raging pandemic, we need to be able to control the fundamental dynamics of the spike protein. Its motion is key to the infection machinery of SARS-CoV-2. By understanding the role of residues in its structure, we can anticipate the effect of new mutations and customize treatments ahead of time. To curb the negative impact of rapid mutations, we here focused on the allosteric effect of protein residues that are away from its rapidly mutating epitope (the RBD) on the conformational change needed for infection.

We performed MD simulations, with unsupervised machine learning (tICA) and graph theory−based analysis to identify the role of physically distant residues in the dynamics of the RBD in the SARS-CoV-2 spike protein. We correlated the protein backbone torsion angles with the slowest degrees of freedom encompassing the structural transition of the RBD. With this approach, we were able to elucidate a small number of distant non-RBD residues which strongly influence the conformational change of the spike, change that, in turn, leads to binding to the ACE2 receptor and then to infection. Residues in the linker between the RBD and the S2 stem work as a hinge by driving the down-to-up RBD transition via backbone torsional changes. Out of the most correlated residues, D614 ranks close to the top. The D614G mutation is currently observed in SARS-CoV-2, and is becoming widespread among infected patients throughout the world. In vitro experiments also established that the single point mutation D614G is capable of altering the “down” to “up” conformational dynamics of the spike (55). The D614G mutant prefers the one RBD “up” state 7 times more than the “three-down” configuration, while they are equally likely in the WT strain (54). The specific role of the D614G mutation was recently also established by coarse-grained MD study (72). Our model predicted the D614 residue as a key player in RBD dynamics from physics-based atomistic simulations without any prior knowledge of the mutation profile of the spike. With glycine being more flexible in its backbone torsion compared to aspartate, this mutation, according to our hypothesis, will facilitate the attainment of the partially open state transiting from the closed structure. Another mutation, A570V, was observed within our predicted residues, but did not yet become as widespread as D614G, likely because it did not have substantial evolutionary advantage. However, a different mutation at residue 570 (A570D) has indeed appeared in the recently emerged more-contagious B.1.1.7 strain of SARS-CoV-2 (41). The consistency with the mutation profile confirms that our dihedral angle−based analysis can not only find out distant residues impacting RBD dynamics but can also predict residues where future mutations can increase infection capability.

A cross-correlation metric based on LMI was also employed to understand the long-distance coupled motion between RBD and non-RBD residues. The change in LMI correlation primarily takes place in the residues adjacent to the RBD, but we could also distinctly observe long-distance effects. BC of each residue of the spike was computed from a Cα-based graph network model for all three conformational states. The residues showing the largest changes in the BC are concentrated in NTD, RBD, and also the linker regions joining the RBD with the rest of the protein, and also in the S2 domain. Dynamic allostery has been shown to impact the dynamics and consequently the binding strength of ACE2 receptor and antibodies to the mutated spike RBD (20). However, significant change in centrality measures of the non-RBD domains in our study suggests that RBD dynamics is also impacted by long-distance allosteric effects within the spike protein itself; this emerges as a result of the collective internal fluctuations of the amino acid residues. Experiments have not yet confirmed the role of S2 mutations for RBD opening dynamics and for its binding affinity to the ACE2 receptor. However, whenever one observes a significant change in infectivity or virulence in the newly emerging strains of SARS CoV-2, it stems from a combined effect of multiple mutations in the S1 and S2 domains. As our computational approach has revealed that certain residues in the S2 domain can potentially modify the propensity of the conformational transition, the next logical step is to mutate those residues only and observe the effect, which hopefully will be addressed in future experimental studies. Moreover, the S2 domain is also involved in the “tectonic” conformational rearrangement required for the piercing of the host cell membrane (73), which we, however, do not study in the current work.

Although our theoretical predictions of relevant residues, connected to the changes in the RDB dynamics and effectively leading to higher virulence, can seem somewhat speculative, similar MD studies on different proteins (74) could reveal single key residues in the hinge domain driving major conformational changes, which subsequently have been confirmed by single-molecule experiments (75). We also point out that our simulation of the WT spike can, by fiat, only identify the residue to be mutated, but not the amino acid to which the residue will be mutated. However, both experimental and computer simulation studies have already established that D614G, A570D, and a few other mutations transform the dynamics of the RBD and favor an “up” state (54, 55, 59, 61, 72). In the current work, we highlighted a set of residues which show high correlation with RBD opening motion. We refrain from performing additional simulations by mutating those residues, as our method cannot predict the end result of the mutation, and scanning over all possible mutations would be computationally expensive. However, we do show that the dihedral angle preference, interaction energy, and hydrogen bonding pattern of concerned residues change significantly due to a D614G mutation. Furthermore, the topic of whether the RBD opening motion increases infectivity can only be resolved in vivo, given the complexity of the viral entry process. A number of experiments and computational studies indicated that binding to ACE2 is feasible only when the RBD is “up” (2731). This indicates that the down-to-up conformational change is indeed necessary for binding to ACE2. While we did not thoroughly study the dynamics of actual mutant spike proteins, the role of specific point mutations in the dynamics of RBD of the spike protein has been explored using MD simulation in the literature (60, 72).

From the point of view of immediate therapeutic interventions, this study opens up the possibility of designing inhibitors that bind to the regions outside the RBD, thereby preventing infection by freezing RBD dynamics via steric restrictions on specific distant residues. Such treatments are less likely to be affected by the evolutionary adaptations in RBD sequence that the virus performs frequently to evade the immune response. In a starker context, future mutations in these key residues can potentially change the infection rate and virulence, giving rise to new strains and significantly altering the course of the pandemic. Our study and future work in this direction can make the scientific community better prepared for such scenarios and can help in efficient prevention of future outbreaks.

Materials and Methods

The details of MD simulations, tICA analysis, and mutual information-based network analysis are provided in SI Appendix. A brief outline is included below.

System Preparation and Simulation Details.

Glycosylated and solvated structures of closed [PDB ID: 6VXX (32)] and partially open [PDB ID: 6VSB (34)] spike head trimers were obtained from the Chemistry at HARvard Molecular Mechanics–Graphical User Interface (CHARMM-GUI) COVID-19 archive (76). The glycans included in the simulation are those from table 1 of ref. 76. All simulations were performed using the CHARMM36m force field (77). For the purpose of the current work, we considered residues 324 to 518 as the RBD. After minimization and short equilibration, SMD simulations were performed to induce the opening of the closed state and closing of the partially open state of the RBD. RCs were chosen to represent the distance of the RBD from its closed-state position. Multiple structures were chosen from the two SMD trajectories, and two independent US simulations were performed for PDB IDs 6VXX and 6VSB. The RC was restrained by a harmonic potential with force constant of 1 kcalmol−1Å−2 for 35 windows and 26 windows for the first and the second set, respectively. The colvars module (78) was used for SMD and US calculations. Free energy profiles were computed using weighted histogram analysis method (79).

Unbiased Simulations.

US trajectory frames were sampled from the regions near the open, closed, and partially open intermediate states judging the free energy value. Unbiased simulations were performed starting from these frames, resulting in 39 trajectories, each 40 ns long. Three of these trajectories were identified as stable conformations corresponding to the closed, partially open, and fully open structures. These three trajectories were extended to 80 ns. A cumulative 1.7-μs of unbiased simulation data was generated and used in subsequent analysis. An additional 40-ns simulation was performed for the D614G mutant spike for the closed and the partially open states. The structures of the mutated species were generated using UCSF Chimera package (80).

The tICA and Mutual Information.

The tICA and PCA were performed using the pyEMMA package (81) on the entire set of unbiased trajectory data. The feature space for PCA and tICA consisted of pairwise distances between specific residues in and around RBD of chain A and the NTD and core domains.

The LMI-based correlation was computed for the closed, slightly open, partially open, and fully open state trajectories. A graph theory−based network model was constructed with the Cα atom of each residue as node. The edge length between nodes was computed from the cross-correlation values using a previously described procedure (62). BC of each residue was computed for each of the three trajectories and compared. All the LMI and network analysis was performed using the bio3D package (82).

Sequence Alignment.

Iterative sequence alignment of the 67 strains of SARS-CoV-2 spike protein sequences from the PDB database was performed using the Multiple Alignment using Fast Fourier Transform with Database of Aligned Structural Homologs (MAFFT-DASH) program (83) using the G-INS-i algorithm. The sequence of PDB ID 6VXX was used as the template. The alignment was analyzed with the ConSurf server (84) to derive conservation scores for each residue position in the alignment.

Acknowledgments

This work was supported by NSF Grant MCB 2028 443. D.R. acknowledges support by the Molecular Science Software Institute seed COVID-19 fellowship funded by NSF Grant OAC-1547 580. We thank Trevor Gokey for a stimulating discussion. The work has benefited from the computational resources of the University of California, Irvine High Performance Computing cluster.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission. V.S.B. is a guest editor invited by the Editorial Board.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2100943118/-/DCSupplemental.

Data Availability

The MD trajectories are available from Zenodo at doi.org/10.528c1/zenodo.5052691. The codes and the residue correlation data used in this study are available from GitHub at https://github.com/dhimanray/COVID-19-correlation-work. All further details about the methods and the data are available within the article and SI Appendix.

Note Added in Proof

After the paper was accepted we became aware of a concurrent study by Mugnai et al. in long-range allosteric communication between the SARS-CoV-2 spike and ACE2 receptor (85).

References

  • 1.Zhou P., et al., A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lu R., et al., Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet 395, 565–574 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ou X., et al., Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV. Nat. Commun. 11, 1620 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tai W., et al., Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: Implication for development of RBD protein as a viral attachment inhibitor and vaccine. Cell. Mol. Immunol. 17, 613–620 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Koyama T., Platt D., Parida L., Variant analysis of SARS-CoV-2 genomes. Bull. World Health Organ. 98, 495–504 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Peck K. M., Lauring A. S., Complexities of viral mutation rates. J. Virol. 92, e01031-17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhao Z., et al., Moderate mutation rate in the SARS coronavirus genome and its implications. BMC Evol. Biol. 4, 21 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.van Dorp L., et al., Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol. 83, 104351 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kaushal N., et al., Mutational frequencies of SARS-CoV-2 genome during the beginning months of the outbreak in USA. Pathogens 9, 565 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Laha S., et al., Characterizations of SARS-CoV-2 mutational profile, spike protein stability and viral transmission. Infect. Genet. Evol. 85, 104445 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Isabel S., et al., Evolutionary and structural analyses of SARS-CoV-2 D614G spike protein mutation now documented worldwide. Sci. Rep. 10, 14031 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rogers T. F., et al., Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model. Science 369, 956–963 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Li W., et al., Rapid identification of a human antibody with high prophylactic and therapeutic efficacy in three animal models of SARS-CoV-2 infection. Proc. Natl. Acad. Sci. U.S.A. 117, 29832–29838 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zost S. J., et al., Potently neutralizing and protective human antibodies against SARS-CoV-2. Nature 584, 443–449 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jiang S., Zhang X., Du L., Therapeutic antibodies and fusion inhibitors targeting the spike protein of SARS-CoV-2. Expert Opin. Ther. Targets 25, 415–421 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Smith M., Smith J. C., Repurposing therapeutics for COVID-19: Supercomputer-based docking to the SARS-CoV-2 viral spike protein and viral spike protein-human ACE2 interface. ChemRxiv [Preprint] (2020). 10.26434/chemrxiv.11871402.v4. Accessed 1 December 2020. [DOI]
  • 17.Chowdhury R., Sai Sreyas Adury V., Vijay A., Singh R. K., Mukherjee A., Atomistic de-novo inhibitor generation-guided drug repurposing for SARS-CoV-2 spike protein with free-energy validation by well-tempered metadynamics. Chem. Asian J. 16, 1634–1642 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Han Y., Král P., Computational design of ACE2-based peptide inhibitors of SARS-CoV-2. ACS Nano 14, 5143–5147 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Verkhivker G. M., Molecular simulations and network modeling reveal an allosteric signaling in the SARS-CoV-2 spike proteins. J. Proteome Res. 19, 4587–4608 (2020). [DOI] [PubMed] [Google Scholar]
  • 20.Spinello A., Saltalamacchia A., Borišek J., Magistrato A., Allosteric cross-talk among spike’s receptor-binding domain mutations of the SARS-CoV-2 South African variant triggers an effective hijacking of human cell receptor. J. Phys. Chem. Lett. 12, 5987–5993 (2021). [DOI] [PubMed] [Google Scholar]
  • 21.Greaney A. J., et al., Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies. Cell Host Microbe 29, 463–476.e6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu L., et al., Potent neutralizing antibodies against multiple epitopes on SARS-CoV-2 spike. Nature 584, 450–456 (2020). [DOI] [PubMed] [Google Scholar]
  • 23.Ng K. W., et al., Preexisting and de novo humoral immunity to SARS-CoV-2 in humans. Science 370, 1339–1343 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Di Paola L., Hadi-Alijanvand H., Song X., Hu G., Giuliani A., The discovery of a putative allosteric site in the SARS-CoV-2 spike protein using an integrated structural/dynamic approach. J. Proteome Res. 19, 4576–4586 (2020). [DOI] [PubMed] [Google Scholar]
  • 25.Olotu F. A., Omolabi K. F., Soliman M. E., Leaving no stone unturned: Allosteric targeting of SARS-CoV-2 spike protein at putative druggable sites disrupts human angiotensin-converting enzyme interactions at the receptor binding domain. Inform. Med. Unlocked 21, 100451 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Raghuvamsi P. V., et al., SARS-CoV-2 S protein:ACE2 interaction reveals novel allosteric targets. eLife 10, e63646 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Shang J., et al., Cell entry mechanisms of SARS-CoV-2. Proc. Natl. Acad. Sci. U.S.A. 117, 11727–11734 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gui M., et al., Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding. Cell Res. 27, 119–129 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yuan Y., et al., Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains. Nat. Commun. 8, 15092 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gur M., et al., Conformational transition of SARS-CoV-2 spike glycoprotein between its closed and open states. J. Chem. Phys. 153, 075101 (2020). [DOI] [PubMed] [Google Scholar]
  • 31.Peng C., et al., Computational study of the strong binding mechanism of SARS-CoV-2 spike and ACE2. ChemRxiv [Preprint] (2020). 10.26434/chemrxiv.11877492.v2. Accessed 1 December 2020. [DOI]
  • 32.Walls A. C., et al., Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292.e6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cai Y., et al., Distinct conformational states of SARS-CoV-2 spike protein. Science 369, 1586–1592 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wrapp D., et al., Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260–1263 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Casalino L., et al., Beyond shielding: The roles of glycans in the SARS-CoV-2 spike protein. ACS Cent. Sci. 6, 1722–1734 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zimmerman M. I., et al., SARS-CoV-2 simulations go exascale to predict dramatic spike opening and cryptic pockets across the proteome. Nat. Chem. 13, 651–659 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Roy S., Jaiswar A., Sarkar R., Dynamic asymmetry exposes 2019-nCoV prefusion spike. J. Phys. Chem. Lett. 11, 7021–7027 (2020). [DOI] [PubMed] [Google Scholar]
  • 38.Sztain T., et al., A glycan gate controls opening of the SARS-CoV-2 spike protein. Nat. Chem., 10.1038/s41557-021-00758-3 (2021). [DOI] [PMC free article] [PubMed]
  • 39.Bustamante C., Alexander L., Maciuba K., Kaiser C. M., Single-molecule studies of protein folding with optical tweezers. Annu. Rev. Biochem. 89, 443–470 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Korber B.et al.; Sheffield COVID-19 Genomics Group , Tracking changes in SARS-CoV-2 spike: Evidence that D614G increases infectivity of the COVID-19 virus. Cell 182, 812–827.e19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Rambaut A., et al., Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/563. Accessed 2 January 2021.
  • 42.Jolliffe I., Principal Component Analysis (Springer-Verlag, New York, NY, ed. 2, 2002). [Google Scholar]
  • 43.Molgedey L., Schuster H. G., Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 72, 3634–3637 (1994). [DOI] [PubMed] [Google Scholar]
  • 44.Pérez-Hernández G., Paul F., Giorgino T., De Fabritiis G., Noé F., Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139, 015102 (2013). [DOI] [PubMed] [Google Scholar]
  • 45.Schwantes C. R., Pande V. S., Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9. J. Chem. Theory Comput. 9, 2000–2009 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Caves L. S., Evanseck J. D., Karplus M., Locally accessible conformations of proteins: Multiple molecular dynamics simulations of crambin. Protein Sci. 7, 649–666 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Echeverria I., Makarov D. E., Papoian G. A., Concerted dihedral rotations give rise to internal friction in unfolded proteins. J. Am. Chem. Soc. 136, 8708–8713 (2014). [DOI] [PubMed] [Google Scholar]
  • 48.Levinthal C., How to fold graciously. Mossbauer Spectrosc. Biol. Syst. 67, 22–24 (1969). [Google Scholar]
  • 49.Eckmann J. P., Ruelle D., Ergodic theory of chaos and strange attractors. Rev. Mod. Phys. 57, 617–656 (1985). [Google Scholar]
  • 50.Fadel A. R., Jin D. Q., Montelione G. T., Levy R. M., Crankshaft motions of the polypeptide backbone in molecular dynamics simulations of human type-alpha transforming growth factor. J. Biomol. NMR 6, 221–226 (1995). [DOI] [PubMed] [Google Scholar]
  • 51.Fallon L., et al., Free energy landscapes from SARS-CoV-2 spike glycoprotein simulations suggest that rbd opening can be modulated via interactions in an allosteric pocket. J. Am. Chem. Soc. 143, 11349–11360 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Sauer M. M., et al., Structural basis for broad coronavirus neutralization. Nat. Struct. Mol. Biol. 28, 478–486 (2021). [DOI] [PubMed] [Google Scholar]
  • 53.Nachbagauer R., et al., A chimeric hemagglutinin-based universal influenza virus vaccine approach induces broad and long-lasting immunity in a randomized, placebo-controlled phase I trial. Nat. Med. 27, 106–114 (2021). [DOI] [PubMed] [Google Scholar]
  • 54.Yurkovetskiy L., et al., Structural and functional analysis of the d614g SARS-CoV-2 spike protein variant. Cell 183, 739–751.e8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Gobeil S. M. C., et al., D614G mutation alters SARS-CoV-2 spike conformation and enhances protease cleavage at the S1/S2 junction. Cell Rep. 34, 108630 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Schymkowitz J., et al., The FoldX web server: An online force field. Nucleic Acids Res. 33, W382-8 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.European Centre for Disease Prevention and Control , Threat Assessment Brief: Rapid increase of a SARS-CoV-2 variant with multiple spike protein mutations observed in the United Kingdom. https://www.ecdc.europa.eu/en/publications-data/threat-assessment-brief-rapid-increase-sars-cov-2-variant-united-kingdom. Accessed 5 January 2021.
  • 58.Meng B.et al.; COVID-19 Genomics UK (COG-UK) Consortium , Recurrent emergence of SARS-CoV-2 spike deletion H69/V70 and its role in the Alpha variant B.1.1.7. Cell Rep. 35, 109292 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Henderson R., et al., Controlling the SARS-CoV-2 spike glycoprotein conformation. Nat. Struct. Mol. Biol. 27, 925–933 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Verkhivker G. M., Di Paola L., Dynamic network modeling of allosteric interactions and communication pathways in the SARS-CoV-2 spike trimer mutants: Differential modulation of conformational landscapes and signal transmission via cascades of regulatory switches. J. Phys. Chem. B 125, 850–873 (2021). [DOI] [PubMed] [Google Scholar]
  • 61.Mansbach R. A., et al., The SARS-CoV-2 Spike variant D614G favors an open conformational state. Sci. Adv. 7, eabf3671 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Rivalta I., et al., Allosteric pathways in imidazole glycerol phosphate synthase. Proc. Natl. Acad. Sci. U.S.A. 109, E1428–E1436 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Negre C. F. A., et al., Eigenvector centrality for characterization of protein allosteric pathways. Proc. Natl. Acad. Sci. U.S.A. 115, E12201–E12208 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.McClendon C. L., Friedland G., Mobley D. L., Amirkhani H., Jacobson M. P., Quantifying correlations between allosteric sites in thermodynamic ensembles. J. Chem. Theory Comput. 5, 2486–2502 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.McClendon C. L., Kornev A. P., Gilson M. K., Taylor S. S., Dynamic architecture of a protein kinase. Proc. Natl. Acad. Sci. U.S.A. 111, E4623–E4631 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ahuja L. G., Aoto P. C., Kornev A. P., Veglia G., Taylor S. S., Dynamic allostery-based molecular workings of kinase:peptide complexes. Proc. Natl. Acad. Sci. U.S.A. 116, 15052–15061 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Bowerman S., Wereszczynski J., Detecting allosteric networks using molecular dynamics simulation. Methods Enzymol. 578, 429–447 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Ichiye T., Karplus M., Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 11, 205–217 (1991). [DOI] [PubMed] [Google Scholar]
  • 69.Lange O. F., Grubmüller H., Generalized correlation for biomolecular dynamics. Proteins 62, 1053–1061 (2006). [DOI] [PubMed] [Google Scholar]
  • 70.Ota N., Agard D. A., Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. J. Mol. Biol. 351, 345–354 (2005). [DOI] [PubMed] [Google Scholar]
  • 71.Saltalamacchia A., et al., Decrypting the information exchange pathways across the spliceosome machinery. J. Am. Chem. Soc. 142, 8403–8411 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Verkhivker G. M., Agajanian S., Oztas D., Gupta G., Computational analysis of protein stability and allosteric interaction networks in distinct conformational forms of the SARS-CoV-2 spike D614G mutant: Reconciling functional mechanisms through allosteric model of spike regulation. J. Biomol. Struct. Dyn., 10.1080/07391102.2021.1933594 (2021). [DOI] [PubMed] [Google Scholar]
  • 73.Walls A. C., et al., Tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion. Proc. Natl. Acad. Sci. U.S.A. 114, 11157–11162 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Wereszczynski J., Andricioaei I., Free energy calculations reveal rotating-ratchet mechanism for DNA supercoil relaxation by topoisomerase IB and its inhibition. Biophys. J. 99, 869–878 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Seol Y., Zhang H., Pommier Y., Neuman K. C., A kinetic clutch governs religation by type IB topoisomerases and determines camptothecin sensitivity. Proc. Natl. Acad. Sci. U.S.A. 109, 16125–16130 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Woo H., et al., Developing a fully glycosylated full-length SARS-CoV-2 spike protein model in a viral membrane. J. Phys. Chem. B 124, 7128–7137 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Huang J., et al., CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Fiorin G., Klein M. L., Hénin J., Using collective variables to drive molecular dynamics simulations. Mol. Phys. 111, 3345–3362 (2013). [Google Scholar]
  • 79.Roux B., The calculation of the potential of mean force using computer simulations. Comput. Phys. Commun. 91, 275–282 (1995). [Google Scholar]
  • 80.Pettersen E. F., et al., UCSF Chimera—A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004). [DOI] [PubMed] [Google Scholar]
  • 81.Scherer M. K., et al., PyEMMA 2: A software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 11, 5525–5542 (2015). [DOI] [PubMed] [Google Scholar]
  • 82.Grant B. J., Rodrigues A. P., ElSawy K. M., McCammon J. A., Caves L. S., Bio3d: An R package for the comparative analysis of protein structures. Bioinformatics 22, 2695–2696 (2006). [DOI] [PubMed] [Google Scholar]
  • 83.Rozewicki J., Li S., Amada K. M., Standley D. M., Katoh K., MAFFT-DASH: Integrated protein sequence and structural alignment. Nucleic Acids Res. 47, W5–W10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Ashkenazy H., et al., ConSurf 2016: An improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344–W350 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Mugnai M. L., Templeton C., Elber R., Thirumalai D.., Role of long-range allosteric communication in determining the stability and disassembly of SARS-COV-2 in complex with ACE2. bioRxiv [Preprint] (2020). 10.1101/2020.11.30.405340. Accessed 31 August 2021. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The MD trajectories are available from Zenodo at doi.org/10.528c1/zenodo.5052691. The codes and the residue correlation data used in this study are available from GitHub at https://github.com/dhimanray/COVID-19-correlation-work. All further details about the methods and the data are available within the article and SI Appendix.

After the paper was accepted we became aware of a concurrent study by Mugnai et al. in long-range allosteric communication between the SARS-CoV-2 spike and ACE2 receptor (85).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES