Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Apr 1;17(4):e1008790. doi: 10.1371/journal.pcbi.1008790

Computational epitope map of SARS-CoV-2 spike protein

Mateusz Sikora 1,2,#, Sören von Bülow 1,#, Florian E C Blanc 1,#, Michael Gecht 1,#, Roberto Covino 1,3,#, Gerhard Hummer 1,4,*
Editor: Alexander MacKerell5
PMCID: PMC8016105  PMID: 33793546

Abstract

The primary immunological target of COVID-19 vaccines is the SARS-CoV-2 spike (S) protein. S is exposed on the viral surface and mediates viral entry into the host cell. To identify possible antibody binding sites, we performed multi-microsecond molecular dynamics simulations of a 4.1 million atom system containing a patch of viral membrane with four full-length, fully glycosylated and palmitoylated S proteins. By mapping steric accessibility, structural rigidity, sequence conservation, and generic antibody binding signatures, we recover known epitopes on S and reveal promising epitope candidates for structure-based vaccine design. We find that the extensive and inherently flexible glycan coat shields a surface area larger than expected from static structures, highlighting the importance of structural dynamics. The protective glycan shield and the high flexibility of its hinges give the stalk overall low epitope scores. Our computational epitope-mapping procedure is general and should thus prove useful for other viral envelope proteins whose structures have been characterized.

Author summary

The SARS-CoV-2 virus has caused a global health crisis. The spike protein exposed at its surface is key for infection and the primary antibody target. However, spike is covered by highly mobile glycan molecules that could impair antibody binding. To identify accessible epitopes, we performed molecular dynamics simulations of an atomistic model of glycosylated spike embedded in a membrane. By combining extensive simulations with bioinformatics analyses, we recovered known antibody binding sites and identified several epitope candidates as targets for further vaccine development.

Introduction

The ongoing COVID-19 pandemic, caused by the SARS-CoV-2 coronavirus, has emerged as the most challenging global health crisis within a century [1]. Vaccination is the most promising strategy to end the pandemic. As for other enveloped viruses [2], the primary vaccine target is the trimeric spike (S) protein on the envelope of SARS-CoV-2. S mediates viral entry into the target cell [37]. After binding to the human angiotensin-converting enzyme 2 (ACE2) receptor, the ectodomain of S undergoes a drastic transition from a prefusion to a postfusion conformation. This transition drives the fusion between viral and host membranes, which triggers internalization of SARS-CoV-2 via endocytic and possibly non-endocytic pathways [8, 9]. Locking the prefusion conformation of S or blocking its interaction with ACE2 would prevent cell entry and infection, a task achieved by a growing number of neutralizing antibodies [1016].

Structure-based rational design promises improvements in vaccine efficacy [17] and could lead to therapeutic cocktails that minimize the risk of immune evasion by using epitopes on non-overlapping regions of S [18]. A detailed understanding of the exposed viral surface will, therefore, be instrumental [17].

Thanks to the extraordinary response of the global scientific community, we already have atomistic structures of S [6, 7, 19, 20] and detailed views of the viral envelope [2124]. However, static structures do not capture conformational changes of S or the motion of the highly dynamic glycans covering it. Molecular dynamics (MD) simulations add a dynamic picture of S and its glycan shield [23, 2527]. Intriguingly, several groups have shown experimentally that glycans not only shield the S protein but also play a role in the infection mechanism [26, 28, 29].

Here, we report on the 2.5 μs-long MD simulation of a full-length atomistic model of four S trimers in the prefusion conformation, amounting to 10 μs of S dynamics. The model includes the transmembrane domain (TMD) embedded in a complex lipid bilayer, along with realistic post-translational modification patterns, i.e., glycosylation of the ectodomain and palmitoylation of the TMD. Although independently developed, we have recently shown that our S protein model and its structural dynamics are in quantitative agreement with recent high-resolution electron cryo-tomography (cryoET) reconstructions [23]. Intact virions present their S proteins either individually, in small groups, or in large clusters, in a strikingly random distribution [23]. To quantify the effect of this heterogeneous distribution, we analyzed the accessibility of S epitopes both in isolation (i.e., with surrounding S proteins removed) and in the dense packing of our simulation model.

We identify epitope candidates on SARS-CoV-2 S by combining information on steric accessibility and structural flexibility with bioinformatic assessments of sequence conservation and epitope characteristics. We recover known epitopes in the ACE2 receptor-binding domain (RBD) and identify several epitope candidates on the spike surface that are exposed, structured, and conserved in sequence. In particular, target sites for antibodies emerge in the functionally important S2 domain harboring the fusion machinery.

Results

Model of full-length S

To search for possible epitopes, we constructed a detailed structural model of glycosylated full-length S. Whereas high-resolution structures of the S head are available [6, 7], the stalk and membrane anchor have so far not been resolved at the atomic level.

We built a model of the complete S by combining experimental structural data and bioinformatic predictions. Our full-length model of the S trimer consists of the large ectodomain (residues 1-1137) forming the head, two coiled coil (CC) domains, denoted CC1 (residues 1138-1158) and heptad repeat 2 (HR2, residues 1167-1204), forming the stalk, the α-helical TMD (residues 1212-1237) with flanking amphipathic helices (AH, 1243-1255) and multiple palmitoylated cysteines, and a short C-terminal domain (CTD, residues 1256-1273), see S1(A) Fig for domain definitions. We modeled the glycosylation pattern as recently revealed for overexpressed S [30] (see S1(B) Fig). Despite passage through an intact Golgi, expressed glycans closely resemble those of native SARS-CoV S [31].

As shown [23], the model fits high-resolution cryoET electron density data of S proteins on the surface of virions extracted from a culture of infected cells remarkably well. It also captures the stalk domain with its three flexible hinges between the S head and CC1, CC1 and HR2, and HR2 and the TMD. The tomographic maps also confirmed the extensive glycosylation of the model [23].

Multi-microsecond atomistic MD simulations reveal dynamics of S and its glycan shield

We performed a 2.5 μs long atomistic MD simulation of a viral membrane patch with four flexible S proteins, embedded at a distance of about 15 nm [32, 33] (Fig 1). During the simulation, the four S proteins remained folded (S2(G)–S2(J) Fig) and stably anchored in the membrane with well-separated TMDs.

Fig 1. View of the simulated atomistic model containing four glycosylated and membrane-anchored S proteins in a hexagonal simulation box.

Fig 1

Three proteins are shown in surface representation with glycans represented as green sticks. One protein is shown in cartoon representation, with the three chains colored individually and glycans omitted for clarity. Water is shown as a transparent blue surface and ions are omitted for clarity. Two simulation box edges are not drawn for better visibility.

The S heads tilted dynamically and interacted with their neighbors (S1 Movie). High-resolution cryoET images [23] and a recent MD study [26] independently revealed significant head tilting associated with the flexing of the joints in the stalk, in strong support of our observations. Being highly mobile, the glycans cover most of the S surface (Fig 2A–2C).

Fig 2. S glycan dynamics from MD simulations.

Fig 2

Time-averaged glycan electron density isosurfaces are shown at high (A), medium (B), and low (C) contour levels, respectively. The blue-to-white protein surface indicates high-to-low accessibility in ray analysis. (Inset) Snapshots (sticks) of a biantennary, core-fucosylated and sialylated glycan at position 1098 along the MD trajectory.

Antibody binding sites predicted from accessibility, rigidity, sequence conservation, and sequence signature

Accessibility of the S ectodomain

Antibody binding requires at least transient access to epitopes. The glycan shield covering the surface proteins of enveloped viruses can sterically hinder access to these binding sites, helping SARS-CoV-2 to evade a robust immune response [34]. We assessed the accessibility of S on the viral membrane and the surface coverage by glycans by (i) ray and (ii) antigen-binding fragment (Fab) docking analyses of the S configurations in our MD simulations. In the ray analysis, we illuminated the protein model by diffuse light; in the Fab docking analysis, we performed rigid-body Monte Carlo simulations of S configurations taken from the MD simulations together with the SARS-CoV-2 antibody CR3022 Fab to quantify how easily a Fab antibody could access the surface of S. To account for protein and glycan mobility, we performed both analyses individually for 4 × 250 snapshots taken at 10 ns time intervals from the 2.5 μs MD simulation with four glycosylated S proteins.

The dynamic glycan shield effectively covers the S surface (Fig 3A and 3B). Even though glycans cover only a small fraction of the protein surface at any given moment (Fig 1), their high mobility leads to a strong steric shielding of S (Fig 2). A comparison of the ray (S3(A)–S3(C) Fig) and Fab docking results (S3(D)–S3(I) Fig) for glycosylated and unglycosylated S illustrates this effect. We consider ray and docking analyses to be complementary: The ray analysis provides an upper bound to the accessibility because the thin rays can penetrate more easily through the glycan shield than antibodies, whereas the rigid-body docking gives a lower bound because it does not take into account any induced fit from interactions between glycans and antibody. Importantly, the two methods are consistent in identifying regions of high and low accessibility (Fig 4A and 4B). Ray and docking analyses show that glycans cause a reduction in accessibility by about 34% and 80%, respectively (Table B in S1 Text). The most marked effect occurs in the HR2 coiled coil close to the membrane. Without glycosylation, HR2 is fully accessible; with glycosylation, HR2 becomes inaccessible to Fab docking. Whereas small molecules may interact with the HR2 protein stalk, antibodies are blocked from surface access, in agreement with recent independent simulations [26].

Fig 3. Epitopes identified from MD simulations and bioinformatics analyses.

Fig 3

Accessibility scores from (A) ray analysis and (B) Fab rigid-body docking are combined with (C) rigidity scores, all averaged over 4 × 2.5 μs of S protein MD simulations. Also included are (D) a sequence conservation score [35], and (E) BepiPred-2.0 epitope sequence-signature prediction. (F) Combined epitope score. (G) Binding sites of known neutralizing antibodies. Higher color intensity in A-F indicates a higher score and higher color intensity in G indicates sites binding to multiple different antibodies.

Fig 4. Epitope scores of the S ectodomain.

Fig 4

Panels (A-F) and colors as in Fig 3. All values are filtered and normalized (see Methods). Labels E1–E9 in (F) highlight candidate epitopes. Green lines indicate glycosylation sites. Black rectangles show known antibody binding sites, also indicated in black along the S sequence in the bottom box.

On SARS-CoV-2 virions, S proteins occasionally form dense clusters, which may enhance the avidity of the interactions with human host cells [23]. To quantify the effect of crowding, we compared the epitope accessibility of S from the ray and docking analyses in the dense simulation system (S3(C) and S3(I) Fig) and in isolation (S3(B) and S3(H) Fig), i.e., with the other proteins removed from the MD system. Overall, protein crowding reduced the accessibility of S by another ∼ 5% in the ray analysis and ∼ 6% in the Fab docking analysis, resulting in a combined accessibility reduction by glycans and crowded proteins of ∼ 39% and ∼ 86%, respectively.

Rigidity of S

Structured epitopes are expected to bind strongly and specifically to antibodies. By contrast, mobile regions tend to become structured in the bound state, entailing a loss in entropy and may not retain their structure when presented in a vaccine construct. With the aim of eliciting a robust immune response, we chose to include rigidity in our epitope score. Here, we focus on motions of domains on the scale of about 1 nm. We analyzed large-scale conformational dynamics associated with the flexible hinges in the stalk and membrane anchor in another paper [23]. We determined the root-mean-square fluctuations (RMSF) by superimposing protein structures and converting the RMSF into a rigidity score, as described in Methods.

The surface of S presents both dynamic and rigid regions (Fig 4C). Interestingly, the RBD and its surroundings are comparably flexible, consistent with the experimental finding of large differences in the structure of the three peptide chains in open and closed states [7]. By contrast, the protein surface of the S2 domain covering the fusion machinery is relatively rigid (Fig 4C), possibly to safeguard this functionally critical domain in the metastable prefusion conformation.

Sequence conservation

Targeting epitopes whose sequences are highly conserved will ensure efficacy across strains and prevent the virus from escaping immune pressure through mutations with minimal fitness penalty. We estimated the sequence conservation from the naturally occurring variations at each amino acid position in the sequences collected and curated by the GISAID initiative (https://www.gisaid.org/). The analysis of 30,426 amino acid sequences revealed that S is overall highly conserved, with no mutation recorded for 52% of the amino acid positions. As conservation score, we mapped the entropy at each position to the interval between zero and one (see Methods). Even surface regions are mostly well conserved in sequence (Fig 4D).

Sequence-based immunogenicity predictor

Conserved, rigid, and accessible regions present good candidates for binding of protein partners in general. To complement this information, we assessed the immunogenic potential based on sequence signatures targeted by antibodies. The epitope-like motifs in the S sequence identified by using the BepiPred 2.0 server [36] lie scattered across the S ectodomain and include known epitopes (Figs 3E and 4E), but also contain buried regions inaccessible to antibodies.

Consensus epitope score

We combined our accessibility, rigidity, conservation, and immunogenicity scores into a single consensus epitope score (Figs 3F and 4F). By taking the product of all individual scores, we ensured that epitope candidates have high scores in all features. This stringent requirement eliminates many candidate sites, mostly because accessibility scores (Fig 4A and 4B) and the rigidity score (Fig 4C) show opposite trends, in line with the extensive occurrence of flexible loops on the S surface.

Using our consensus score, we identified nine epitope candidates (E1–E9; Fig 5 and Table 1). Epitope candidates E3–E6 recover known epitopes (Figs 3F, 3G and 4F), in some cases achieving residue-level accuracy (S4 Fig); in addition, we identify epitope candidates E1, E2, and E7–E9. All epitope candidates reside in the structured head of S. By contrast, low accessibility and high flexibility in the hinges [23] give the stalk low overall epitope scores.

Fig 5. S epitope candidates.

Fig 5

(A) Top view of S represented as in Fig 3F. Epitope candidates are labeled according to Table 1. (B) Side view with coloring and labels as in A. (C) Zoom-ins on epitope candidates (E1, E2, E7–E9) in a cartoon representation and colored as in A. Residues with an epitope consensus score >0.2 are shown in yellow licorice representation.

Table 1. Epitope candidates shown in Figs 35.
Epitope Residues
E1 15-28, 63-79, 247-260
E2 97, 178-189, 207-219
E3 137-164
E4 332-346
E5 403-406, 438, 440-451, 495-506
E6 452-476, 479-482, 484-494
E7 527-537
E8 603-605, 633-642, 656-661, 674-693
E9 808-814

In our simulation, two of the RBD domains were sampled in the closed conformation and one in the open conformation. To assess the effect of open and closed states, we computed the accessibility and consensus scores for all eight protein chains with closed RBD conformation (S10 Fig). A comparison with Fig 4 highlights that the only sizable differences in accessibility occur in the RBD region.

Crowding of S causes a significant drop in the score of epitope candidates E1 and E2, whereas it has only little effect on the candidates E3–E4 and E7–E8, and no effect on the candidates E5–E6 and E9 (S5 Fig).

Collective behavior of S

Despite the remarkably random distribution of S at the surface of the virions, densely populated patches are not uncommon (cf. Fig 5G in [23]). Multiple S will likely come in contact if simultaneously bound to a single ACE2 dimer or when cross-linked by antibodies. Taking advantage of our simulation setup, we analyzed interactions between S. During 2.5 μs we observed a number of contacts involving both glycans and protein surfaces, which resulted in partial jamming of three of the S in a characteristic triangular arrangement of S head domains and with the fourth S only weakly interacting with two of its neighbors (S9(A) Fig). In the jammed state glycans formed an extensive network of interactions, as could be seen in S9(B) Fig. To quantify relative roles of particular residues and glycan moieties we computed a contact map of inter-S interactions (S9(C) and S9(D) Fig). We found the majority of protein-mediated contacts to reside in the unstructured loops within NTD regions. Glycan-mediated contacts concentrate in the sequons located in the NTD and RBD areas and at the bottom of the S head. Despite their size, glycans located on the stalk were not involved in the inter-S contacts. Instead, they remained relatively shielded by the much more spacious head domains.

Discussion

Known antibody interactions validate the epitope identification procedure

A rapidly growing number of studies report on antibody binding to the S protein [1016] and provide us with excellent reference data to validate our strategy for epitope identification. The focus in these studies has been on antibodies binding to the exposed RBD of S to achieve a high degree of neutralization by blocking binding to the ACE2 receptor. Yuan et al. structurally characterized the binding of SARS-CoV-neutralizing antibody CR3022 to the SARS-CoV-2 S protein ectodomain [10, 11, 37]. Their structure reveals an epitope distal to the ACE2 binding site that requires at least two of the S protomers to be in the open conformation to permit binding without steric clashes. Interestingly, while our simulations do not probe the doubly open configuration, the epitope reported by Yuan et al. [10] is still successfully identified with a significant consensus score. Moreover, epitopes for other reported antibodies H014 [12], CB6 [13], P2B-2F6 [14], S309 [16], and 4A8 [38] also match regions of high consensus score. In particular, our candidate epitopes E5 and E6 overlap with the reported binding sites in the RBD for neutralizing antibodies [3943]. We conclude that our epitope-identification methodology is robust.

Dependence on detailed glycosylation pattern

Mass spectrometry on recombinant S indicated extensive glycosylation [30] with oligomannose, and sialylated and fucosylated hybrid and complex glycans. Despite recent cryoET images of intact viral particles confirming occupancy of the majority of sequons and revealing glycan branching [23], the extent and composition of glycans in situ remains poorly understood. Pre-Golgi budding of the virions [44], overloading cells with polysaccharide production and high density of viral glycans (also reported in HIV [45]) can all contribute to non-canonical and not fully matured glycans.

We addressed this uncertainty by repeating our docking accessibility analysis for different glycosylation patterns (S1(B) Fig). In addition to the “full” glycan pattern used in the simulations, we analyzed the accessibility in a resampled simulation with all sites occupied by the mannose-type glycans (Mannose-5, Man5). Remarkably, the reduced glycan shield impedes Fab accessibility almost as effectively (∼75%) as the full shield (∼80%), even if epitopes E7–E9 become somewhat more exposed with shorter glycans (S3(D)–S3(H) Fig). Interestingly, the largest and most processed complex glycans are found on the flexible stalk [30], suggesting that this region is critical for the viral cycle and must be shielded from the immune system. Overall, we conclude that even a light glycan coverage might hinder the antibody accessibility of the protein in a significant manner.

Structural and dynamic characteristics of candidate epitopes

Epitopes E1–E3 are part of the N-terminal domain (NTD, residues 1-291), which is formed mostly of antiparallel β sheets. All three epitopes include flexible loops and folded β strands (S6(A) Fig). Interestingly, epitope E2 includes residue 207 (Table 1) and is in close proximity to residue 177, both of which have been reported by Schoof and co-workers [43] to be involved in binding an allosteric nanobody. We propose that E2 could represent the full binding site of this nanobody, which has not been mapped completely. Thus, our method may also be used to complement experimental characterizations of epitopes. Epitope E4 is located on a two-turn α-helix flanked by a short twin α-helix and lying on a five-strand antiparallel β-sheet. This arrangement provides the epitope with remarkable stability (S6(B) Fig). Epitopes E5 and E6 are located on the apical part of S in the RBD, and are composed mostly of flexible loops. E5 and E6 jointly span a contiguous surface in chain A, which is in the open conformation. By contrast, in the closed chains B and C, this surface is altered and E6 is buried (S6(C) Fig). The epitope E7 is part of a stable helix that connects neighboring β-sheets (S6(D) Fig). E8 comprises two quite long and flexible loops (residues 634-641 and 674-693), and two shorter and less flexible ones (S6(E) Fig). Finally, E9 is located on a short and flexible loop (S6(F) Fig).

Glycans as epitopes

Even though glycans sterically hinder the accessibility of the surface of S, antibodies can in some instances tolerate the close proximity to glycans [34]. Moreover, glycans themselves can be part of epitopes in SARS-CoV-2 S [16, 26] and HIV-1 Env [46]. While this could open up possibilities for epitope binding, the natural variability of the glycan shield [30], along with its extensive structural dynamics demonstrated here, currently preclude a systematic search for glycan-involving epitopes. Moreover, with human and viral proteins carrying chemically equivalent glycan coats, the risk of autoreaction is significant [46]. Therefore, we concentrated here on sterically accessible amino acid epitopes.

Conclusions

We identified epitope candidates on the SARS-CoV-2 S protein surface by combining accurate atomistic modelling, multi-microsecond MD simulations, and a range of bioinformatics analysis methods. We concentrated on sites that are accessible to antibodies, unencumbered by the glycan shield, and fairly rigid. We also required these sites to be conserved in sequence and to display signatures expected to elicit an immune response. From all these features, we determined a combined consensus epitope score that predicts nine distinct epitope sites. Validating our methodology, we recovered five epitopes that overlap with experimentally characterized epitopes, including a “cryptic” site [10].

Highly dynamic glycans cover the S surface to a great extent and could produce immunogenic shielding by suppressing some interaction modes with antibodies. Even though the instantaneous surface coverage of the glycans is low, over time relatively few well placed glycans cover most of the protein surface. In particular, only three N-glycosylation sites per protein chain suffice to shield the stalk domain and block antibody binding to this functionally critical part of the protein. New and conflicting reports emerge on the glycan types on the S surface [30, 47], with glycan composition possibly varying from host to host. We considered both light and heavy glycan coverages in our analysis, which should encompass most of the glycan variability. We obtained an excellent correspondence in the glycan coverage in a direct comparison to high-resolution tomographic maps of S proteins on intact virions [23]. We found that already the light glycosylation sterically hinders the interaction between antibodies and S in a significant manner.

The different epitopes we predicted provide starting points to engineering stable immunogenic constructs that robustly elicit the production of antibodies. A fragment-based epitope presentation avoids the many challenges of working with full-length S, a multimeric and highly dynamic membrane protein, whose prefusion structure is likely metastable [48]. Epitopes E1, E2, E3, and E8 are particularly promising candidates. They are located on distinct S domains that could fold independently and present these epitopes in a native-like manner [49]. Mutational escape by SARS-CoV-2 can lead to loss of neutralization of specific antibodies [18]. The use of antibody cocktails targeting spatially distinct epitopes on S should suppress the development of resistance [18, 42, 50]. The approach we introduced in this paper is general and can be extended to predict epitopes for other viral proteins. In particular, we envision an integrated analysis of diverse betacoronaviruses, with the ultimate aim of producing a vaccine that guarantees broad protection against multiple members of this virus family.

Methods

Full-length molecular model of SARS-CoV-2 S glycoprotein

Our simulation system contained four membrane-embedded SARS-CoV-2 S proteins assembled from available resolved structures and models for the missing parts (S7 Fig). The spike head was modeled based on a recently determined structure (PDB ID: 6VSB [6]) with one RBD domain in an open conformation and glycans modeled according to [30]. The stalk connecting the S head to the membrane was modeled de novo as trimeric coiled coils, consistent with an experimental structure of the HR2 domain in SARS-CoV S (PDB ID: 2FXP [51]). The TMD as well as the cytosolic domain were modeled de novo. See S8 Fig for a view of the final model.

Molecular dynamics simulations

We assembled four membrane-embedded full-length S proteins to form one large membrane patch. To maximize sampling while maintaining a biologically plausible S density [32, 33], we set the initial distance between centers of mass of the stalks of neighboring S to about 15 nm. This guaranteed at least 1 nm of spacing between any two of S’s most extended glycans and thus no contacts between S in the initial configuration (cf. S5(A) Fig). Patches of comparably high density have been observed in experiments [23]. The full simulation system consisted of ∼4.1 million atoms. After 300 ns of equilibration, we performed production simulations of the four S proteins for 2.5 μs in the NpT ensemble with GROMACS 2019.6. We used the CHARMM36m protein and glycan force fields, in combination with the TIP3P water model, and sodium and chloride ions (150 mM). The time series of a series of parameters show that the system remains stable during the whole simulation (S2 Fig).

Rigidity analysis

We quantified the local rigidity in terms of RMSF values. For each frame and each chain, the Cα atoms were rigid-body aligned to the average structure. For these aligned structures, the Cα RMSF was calculated. Then, for each residue of interest, we quantified the local flexibility as the average RMSF values of residues within 15 Å distance, weighted by the relative surface area of each residue [52]. These flexibility profiles were averaged over the four spike copies and three chains. The local rigidity was then defined as the reciprocal of the flexibility.

Accessibility analysis

The accessibility of the S protein surfaces was probed by illuminating the protein in diffuse light, as detailed below, and by rigid-body docking of the Fab of the antibody CR3022 [10], as detailed in the S1 Text.

For the illumination analysis, rays of random orientation emanate from a half-sphere with radius 25 nm around the center of mass of the protein. They are absorbed by the first heavy atom they pass within 1.5 Å. Structures of single S collected at 10 ns intervals from the simulation of four S embedded in the membrane were each probed with 106 rays. To quantify the effect of glycosylation, the analysis was performed with and without including the glycan shield. In addition, the effect of protein crowding on the ray accessibility was probed by considering all protein atoms of other S proteins with a minimum distance ≤3 nm from the illuminated S.

Sequence variability analysis

To estimate the evolutionary variability of the S protein, we analyzed the aligned amino acid sequences released by the GISAID initiative on 25 May 2020 (https://www.gisaid.org/). We first built the consensus sequence with the most common amino acid (the mode) at each position across the whole data set. We then kept only 1273 amino acid long sequences, and filtered out corrupted sequences by discarding those having a Hamming distance from the consensus larger than 0.2. With the remaining 30,426 sequences, we estimated the conservation at each position [35]. Our conservation score is defined as the normalized difference between the maximum possible entropy and the entropy of the observed amino acid distribution at a given position, cons(i) = 1 + ∑k pk (i) log pk (i) / log 20, where pk(i) is the probability of observing amino acid k at position i in the sequence.

Sequence-based epitope predictions

We estimated the epitope probability prediction by using the BepiPred 2.0 webserver (http://www.cbs.dtu.dk/services/BepiPred/), with an Epitope Threshold of 0.5 [36]. BepiPred 2.0 uses a random forest model trained on known epitope-antibody complexes.

Consensus score for epitope prediction

We integrated the information of the different analyses into the consensus epitope score. We first applied a 3D Gaussian filter with σ = 5 Å to the ray and docking scores. We then mapped each score to the interval [0, 1], with outliers mapped to the extremes listed in Table A in S1 Text. Finally, we multiplied the individual scores together to obtain the consensus score, which was also mapped to [0, 1].

Supporting information

S1 Text. Detailed Modelling Procedures, Detailed Methods, Consensus Score Parameters.

(PDF)

S1 Fig. Spike domains and glycosylation.

(A) Domains of S. (B) Glycosylation pattern of S. Sequons are indicated with the respective glycans in a schematic representation for a fully glycosylated system (“full”) and for resampled simulations containing only mannose-5 (“Man5”).

(TIF)

S2 Fig. Time series of various key parameters monitored during the simulation.

(A) Total potential energy, (B) Lennard-Jones energy, (C) Coulomb energy, (D-F) temperature, pressure, and volume of the simulation box. (G-J) Root-mean-square deviation (RMSD) over the course of the simulation, calculated for Cα carbons of the S body, CC1, HR2, and TMD, with respect to a reference configuration obtained after 300 ns of equilibration. Values for four spike proteins are shown with distinct colors.

(TIF)

S3 Fig. Impact of the glycosylation pattern on ray (A-C) and docking (D-G) accessibility.

(A-C) Number of ray hits without glycans (“no glycans”), with full glycans (“full”, S1(B) Fig), and with full glycans and S protein crowding (“full CR”). (D-G) Monte Carlo rigid-body docking hits without glycans (“no glycans”), with Man5 glycans (“Man5”, S1(B) Fig) and with full glycans (“full”), as well as with full glycans and S protein crowding (“full CR”).

(TIF)

S4 Fig. Comparison of the epitope candidates E3–E6 with previously characterized epitopes.

Glycans are shown in green licorice representation. Left panels: Epitope candidates shown in cartoon representation with purple color intensity indicating epitope consensus scores. Residues with epitope consensus score >0.1 are shown in licorice representation. Right panels: Epitopes described in previous works shown in cartoon and licorice representation, with higher purple color intensity indicating reported binding to multiple distinct antibodies.

(TIF)

S5 Fig. Effect of crowding on accessibility and epitope score.

(A) Ray, (B) docking and (C) consensus scores with (thick line) and without crowding being taken into account.

(TIF)

S6 Fig. Location and structural features of the epitope candidates E1–E9 on the S surface.

Epitope candidates are shown in red, orange and purple cartoon and licorice representation. Neighboring residues are shown in grey cartoon representation.

(TIF)

S7 Fig. Schematic illustration of the strategy used to obtain an atomistic model of the full-length S protein.

For clarity, we do not show the solvent and membrane.

(TIF)

S8 Fig. Atomistic model of the full-length membrane-embedded S protein shown in cartoon representation.

The chains are differentiated by color. Palmitoylated cysteine residues are shown in pink licorice (only one chain shown for clarity). Glycans are shown in green licorice representation. We show a section of the membrane to highlight the transmembrane domain of S.

(TIF)

S9 Fig. Spike-spike interactions and bending during MD simulation.

(A) Snapshots of the 4-spike system from above (top row) and in side-view (bottom row) at the beginning (left) and end (right) of the MD trajectory. While the transmembrane regions move relatively little, spike heads form spike-spike interactions because of significant bending at the “knee” (CC1—CC2 joint). These interactions persist on the simulation timescale. (B) Visualization of the glycans in the final configuration (blue sticks). Glycans mediate spike-spike contacts. (C and D) Maps of time-averaged spike-spike contact probability mediated by amino-acids (C) or amino-acids and glycans (D) from the MD trajectory (color bar: contact probability). Interactions are located exclusively on lateral faces of the spike head.

(TIF)

S10 Fig. Consensus score analysis of “closed” spike.

(A, B) Accessibility, (C) rigidity and (D) consensus score calculated taking only into account the chains with down RBDs.

(TIF)

S1 Movie. Atomistic molecular dynamics simulation trajectory of four S proteins embedded in a membrane.

The proteins and lipids are shown in surface representation. Glycans are represented by green van der Waals beads. Water and ions are omitted for clarity. 600 ns simulation time shown.

(MP4)

Acknowledgments

We thank Martin Beck, Beata Turoňová, and Philipp S. Schmalhorst for stimulating discussions, the Max Planck Computing and Data Facility for providing computational resources, and the Leibniz Supercomputing Centre Munich for the SUPERspike computing allocation.

Data Availability

The structure and GROMACS topology files used to simulate the system are available at: https://doi.org/10.5281/zenodo.3906317 All other relevant data are within the manuscript and its Supporting information files.

Funding Statement

This work was supported by the Max Planck Society (https://www.mpg.de) (GH), the Austrian Science Fund FWF Schrödinger Fellowship J4332-B28 (https://www.fwf.ac.at) (MS), the Human Frontier Science Program RGP0026/2017 (https://www.hfsp.org) (GH), the Landes-Offensive zur Entwicklung Wissenschaftlich-Ökonomischer Exzellenz LOEWE of the State of Hesse (https://wissenschaft.hessen.de/wissenschaft/landesprogramm-loewe): DynaMem (GH) and CMMS (RC and GH), the Frankfurt Institute for Advanced Studies (https://fias.institute): (RC), and the Leibniz Supercomputing Centre Munich (https://www.lrz.de): SUPERspike (GH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF. The proximal origin of SARS-CoV-2. Nature Medicine. 2020;26(4):450–452. 10.1038/s41591-020-0820-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Rey FA, Lok SM. Common Features of Enveloped Viruses and Implications for Immunogen Design for Next-Generation Vaccines. Cell. 2018;172(6):1319–1334. 10.1016/j.cell.2018.02.054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. White JM, Delos SE, Brecher M, Schornberg K. Structures and Mechanisms of Viral Membrane Fusion Proteins: multiple Variations on a Common Theme. Crit Rev Biochem Mol Biol. 2008;43(3):189–219. 10.1080/10409230802058320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Harrison SC. Viral membrane fusion. Virology. 2015;479-480:498–507. 10.1016/j.virol.2015.03.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Heald-Sargent T, Gallagher T. Ready, Set, Fuse! The Coronavirus Spike Protein and Acquisition of Fusion Competence. Viruses. 2012;4(4):557–580. 10.3390/v4040557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Walls AC, Park YJ, Tortorici MA, Wall A, McGuire AT, Veesler D. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell. 2020;181(2):281–292.e6. 10.1016/j.cell.2020.02.058 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh CL, Abiona O, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367(6483):1260–1263. 10.1126/science.abb2507 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Shang J, Ye G, Shi K, Wan Y, Luo C, Aihara H, et al. Structural basis of receptor recognition by SARS-CoV-2. Nature. 2020;581(7807):221–224. 10.1038/s41586-020-2179-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Millet JK, Whittaker GR. Host cell proteases: critical determinants of coronavirus tropism and pathogenesis. Virus Research. 2015;202:120–134. 10.1016/j.virusres.2014.11.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Yuan M, Wu NC, Zhu X, Lee CCD, So RTY, Lv H, et al. A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV. Science. 2020;368(6491):630–633. 10.1126/science.abb7269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Joyce MG, Sankhala RS, Chen WH, Choe M, Bai H, Hajduczki A, et al. A cryptic site of vulnerability on the receptor binding domain of the SARS-CoV-2 spike glycoprotein. bioRxiv. 2020;. 10.1101/2020.03.15.992883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Lv Z, Deng YQ, Ye Q, Cao L, Sun CY, Fan C, et al. Structural basis for neutralization of SARS-CoV-2 and SARS-CoV by a potent therapeutic antibody. Science. 2020; 369(6510): 1505–1509. 10.1126/science.abc5881 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Shi R, Shan C, Duan X, Chen Z, Liu P, Song J, et al. A human neutralizing antibody targets the receptor binding site of SARS-CoV-2. Nature. 2020; 584: 120–124. 10.1038/s41586-020-2381-y [DOI] [PubMed] [Google Scholar]
  • 14. Ju B, Zhang Q, Ge J, Wang R, Sun J, Ge X, et al. Human neutralizing antibodies elicited by SARS-CoV-2 infection. Nature. 2020; 584: 115–119. 10.1038/s41586-020-2380-z [DOI] [PubMed] [Google Scholar]
  • 15. Hanke L, Vidakovics Perez L, Sheward D, Das H, Schulte T, Moliner-Morro A, et al. An alpaca nanobody neutralizes SARS-CoV-2 by blocking receptor interaction. Nature Communications. 2020; 11: 4420. 10.1038/s41467-020-18174-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Pinto D, Park YJ, Beltramello M, Walls AC, Tortorici MA, Bianchi S, et al. Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody. Nature. 2020; 583: 290–295. 10.1038/s41586-020-2349-y [DOI] [PubMed] [Google Scholar]
  • 17. Burton DR, Walker LM. Rational Vaccine Design in the Time of COVID-19. Cell Host & Microbe. 2020;27(5):695–698. 10.1016/j.chom.2020.04.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Baum A, Fulton BO, Wloga E, Copin R, Pascal KE, Russo V, et al. Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science. 2020;369(6506):1014–1018. 10.1126/science.abd0831 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Watanabe Y, Berndsen ZT, Raghwani J, Seabright GE, Allen JD, Pybus OG, et al. Vulnerabilities in coronavirus glycan shields despite extensive glycosylation. Nature Communications. 2020;11(1):2688. 10.1038/s41467-020-16567-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Yan R, Zhang Y, Li Y, Xia L, Guo Y, Zhou Q. Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2. Science. 2020;367:1444–1448. 10.1126/science.abb2762 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Ke Z, Oton J, Qu K, Cortese M, Zila V, McKeane L, et al. Structures and distributions of SARS-CoV-2 spike protein on intact virions. Nature 2020; 588; 498–502. 10.1038/s41586-020-2665-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Klein S, Cortese M, Winter SL, Wachsmuth-Melm M, Neufeldt CJ, Cerikan B, et al. SARS-CoV-2 structure and replication characterized by in situ cryo-electron tomography. Nature Communications. 2020; 11: 5885. 10.1038/s41467-020-19619-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Turoňová B, Sikora M, Schürmann C, Hagen WJH, Welsch S, Blanc FEC, et al. In situ structural analysis of SARS-CoV-2 spike reveals flexibility mediated by three hinges. Science. 2020; 370(6513): 203–208. 10.1126/science.abd5223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Wolff G, Limpens RWAL, Zevenhoven-Dobbe JC, Laugks U, Zheng S, de Jong AWM, et al. A molecular pore spans the double membrane of the coronavirus replication organelle. Science. 2020; 369(6509):1395–1398. 10.1126/science.abd3629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Woo H, Park SJ, Choi YK, Park T, Tanveer M, Cao Y, et al. Developing a fully-glycosylated full-length SARS-CoV-2 spike protein model in a viral membrane. Journal of Physical Chemistry B. 2020; 124(33): 7128–7137. 10.1021/acs.jpcb.0c04553 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Casalino L, Gaieb Z, Goldsmith JA, Hjorth CK, Dommer AC, Harbison AM, et al. Beyond Shielding: The Roles of Glycans in the SARS-CoV-2 Spike Protein. ACS Central Science. 2020;6(10):1722–1734. 10.1021/acscentsci.0c01056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Zimmerman MI, Porter JR, Ward MD, Singh S, Vithani N, Meller A, et al. SARS-CoV-2 simulations go exascale to capture spike opening and reveal cryptic pockets across the proteome. bioRxiv. 2020;. 10.1101/2020.06.27.175430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Henderson R, Edwards RJ, Mansouri K, Janowska K, Stalls V, Kopp M, et al. Glycans on the SARS-CoV-2 spike control the receptor binding domain conformation. bioRxiv. 2020;. 10.1101/2020.06.26.173765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Mehdipour AR, Hummer G. Dual nature of human ACE2 glycosylation in binding to SARS-CoV-2 spike. bioRxiv. 2020;. 10.1101/2020.07.09.193680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Watanabe Y, Allen JD, Wrapp D, McLellan JS, Crispin M. Site-specific glycan analysis of the SARS-CoV-2 spike. Science. 2020; 369(6501): 330–333. 10.1126/science.abb9983 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Ritchie G, Harvey DJ, Feldmann F, Stroeher U, Feldmann H, Royle L, et al. Identification of N-linked carbohydrates from severe acute respiratory syndrome (SARS) spike glycoprotein. Virology. 2010;399(2):257–269. 10.1016/j.virol.2009.12.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Neuman BW, Buchmeier MJ. Supramolecular architecture of the coronavirus particle. In: Ziebuhr J, editor. Advances in Virus Research. vol. 96 of Coronaviruses; 2016. p. 1–27. Available from: https://www.sciencedirect.com/science/article/pii/S0065352716300446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Beniac DR, Andonov A, Grudeski E, Booth TF. Architecture of the SARS coronavirus prefusion spike. Nat Struct Mol Biol. 2006;13(8):751–752. 10.1038/nsmb1123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Murin CD, Wilson IA, Ward AB. Antibody responses to viral infections: a structural perspective across three different enveloped viruses. Nature Microbiology. 2019;4(5):734–747. 10.1038/s41564-019-0392-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Research. 1990;18(20):6097–6100. 10.1093/nar/18.20.6097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Jespersen MC, Peters B, Nielsen M, Marcatili P. BepiPblack-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Research. 2017;45(W1):W24–W29. 10.1093/nar/gkx346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Meulen Jt, Brink ENvd, Poon LLM, Marissen WE, Leung CSW, Cox F, et al. Human monoclonal antibody combination against SARS coronavirus: synergy and coverage of escape mutants. PLOS Med. 2006;3(7):e237. 10.1371/journal.pmed.0030237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Chi X, Yan R, Zhang J, Zhang G, Zhang Y, Hao M, et al. A neutralizing human antibody binds to the N-terminal domain of the Spike protein of SARS-CoV-2. Science. 2020; 369(6504): 650–655. 10.1126/science.abc6952 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Zost SJ, Gilchuk P, Case JB, Binshtein E, Chen RE, Nkolola JP, et al. Potently neutralizing and protective human antibodies against SARS-CoV-2. Nature. 2020;584(7821):443–449. 10.1038/s41586-020-2548-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Liu L, Wang P, Nair MS, Yu J, Rapp M, Wang Q, et al. Potent neutralizing antibodies against multiple epitopes on SARS-CoV-2 spike. Nature. 2020;584(7821):450–456. 10.1038/s41586-020-2571-7 [DOI] [PubMed] [Google Scholar]
  • 41. Ejemel M, Li Q, Hou S, Schiller ZA, Tree JA, Wallace A, et al. A cross-reactive human IgA monoclonal antibody blocks SARS-CoV-2 spike-ACE2 interaction. Nature Communications. 2020;11(1):4198. 10.1038/s41467-020-18058-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Wu Y, Wang F, Shen C, Peng W, Li D, Zhao C, et al. A noncompeting pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2. Science. 2020;368(6496):1274–1278. 10.1126/science.abc2241 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Schoof M, Faust B, Saunders RA, Sangwan S, Rezelj V, Hoppe N, et al. An ultrapotent synthetic nanobody neutralizes SARS-CoV-2 by stabilizing inactive Spike. Science. 2020;370(6523):1473–1479. 10.1126/science.abe3255 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Watanabe Y, Bowden TA, Wilson IA, Crispin M. Exploitation of glycosylation in enveloped virus pathobiology. Biochimica et Biophysica Acta (BBA)—General Subjects. 2019;1863(10):1480–1497. 10.1016/j.bbagen.2019.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Crispin M, Ward AB, Wilson IA. Structure and Immune Recognition of the HIV Glycan Shield. Annual Review of Biophysics. 2018;47(1):499–523. 10.1146/annurev-biophys-060414-034156 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. McLellan JS, Pancera M, Carrico C, Gorman J, Julien JP, Khayat R, et al. Structure of HIV-1 gp120 V1/V2 domain with broadly neutralizing antibody PG9. Nature. 2011;480(7377):336–343. 10.1038/nature10696 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Shajahan A, Supekar NT, Gleinich AS, Azadi P. Deducing the N- and O-glycosylation profile of the spike protein of novel coronavirus SARS-CoV-2. Glycobiology. 2020; 30(12): 981–988. 10.1093/glycob/cwaa042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. de Taeye S, Ozorowski G, Torrents de la Peña A, Guttman M, Julien JP, van den Kerkhof TGM, et al. Immunogenicity of Stabilized HIV-1 Envelope Trimers with Reduced Exposure of Non-neutralizing Epitopes. Cell. 2015;163(7):1702–1715. 10.1016/j.cell.2015.11.056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Zhu C, Dukhovlinova E, Council O, Ping L, Faison EM, Prabhu SS, et al. Rationally designed carbohydrate-occluded epitopes elicit HIV-1 Env-specific antibodies. Nature Communications. 2019;10(1):948. 10.1038/s41467-019-08876-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Hansen J, Baum A, Pascal KE, Russo V, Giordano S, Wloga E, et al. Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail. Science. 2020;369(6506):1010–1014. 10.1126/science.abd0827 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Hakansson-McReynolds S, Jiang S, Rong L, Caffrey M. Solution structure of the severe acute respiratory syndrome-coronavirus heptad repeat 2 domain in the prefusion state. Journal of Biological Chemistry. 2006;281(17):11965–11971. 10.1074/jbc.M601174200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO. Maximum allowed solvent accessibilites of residues in proteins. PLOS ONE. 2013;8(11):e80635. 10.1371/journal.pone.0080635 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008790.r001

Decision Letter 0

Alexander MacKerell, Arne Elofsson

26 Nov 2020

Dear Dr. Hummer,

Thank you very much for submitting your manuscript "Computational epitope map of SARS-CoV-2 spike protein" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Alexander MacKerell

Associate Editor

PLOS Computational Biology

Arne Elofsson

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this manuscript the author present further analysis of MD simulations recently published in Turoñová et al, Science (2020), doi: 10.1126/science.abd5223. This additional information concerns:

1) the ability of the S glycans to shield the spike protein from the immune system and

2) the prediction of epitopes that can be targeted for vaccine design and development

I find that such information is indeed important to share publicly, however I am not entirely convinced that it justifies a second publication based on the same set of experiments.

Further to this, I found aspects on the aforementioned analysis that I found should be re-examined. More specifically, in regard to part 1) the analysis of the accessible surface was done through rigid-body docking the CR3022 antibody Fab region and also through a process called “ray” where the protein was illuminated by diffuse light to identify areas of higher accessibility. The results obtained through there two methods are dramatically different, for instance the glycan shield determines an accessibility reduction of 40% (ray) and 87% (docking). This is a very large difference as a reduction of 40% still signifies wide accessibility, meanwhile a reduction of 87% dramatically precludes it. I may have misunderstood this analysis, however the authors do not comment on such discrepancy.

In regards to part 2) the scoring function designed by the authors identifies a set of 9 epitopes that include 2 known ones. This point is highlighted as proof of the robustness of the score, yet those known epitopes are part of the glycan unshielded RBD and not so difficult to identify in general, as the RBD is the known target for the interaction with the ACE2 receptor and considering that the scoring function (rightfully) promotes unshielded regions. Notably, the score penalises these very regions in terms of flexibility, which is an aspect that was not addressed in the manuscript. A proof of the robustness of this epitope scoring function would be in my opinion to test some of the unknown predicted epitopes experimentally.

As a last point, I am afraid that the ‘trimming’ of the glycans to account for immature glycosylation is fundamentally wrong for two reasons. The first reason is that ER glycans would be large oligomannose types, such as Man9 and Man8, which are processed down to Man5 in the Golgi by alpha mannosidases; the Man5 conversion into complex N-glycans is then initiated by GnT1 also in the Golgi. Mammalian cells don’t have any paucimannose, which is more common in plants and insects. As for the second reason, the glycans 3D structure depends on their sequence and shorter versions do not have necessarily the same structure, similarly to how a protein region may not retain the same conformational propensity if trimmed down.

Reviewer #2: In this work, the authors present a massive, all-atom MD simulation of a patch of membrane of the SARS-CoV-2 virus with 4 spike proteins. Much of the analysis and so forth, as presented, is good. However I have some comments for the authors to consider.

My main comment is that the authors have a really unique opportunity to provide scientists with a view of the spike proteins in a very crowded environment. The spacing of the spike proteins – while this close range may exist in some rare instances – is quite different than the average structures, which place the spike proteins further apart.

That said, this crowded configuration does indeed exist and thus this work stands apart in its ability to inform others about what happens to the dynamics, the shielding, the flexibile hinges, etc. No other work does this.

But what is written somehow doesn’t capture the essence of this most intriguing aspect of the work. The authors only make a few very minor comments about this aspect. Yet, to me, as someone who also studies the spike protein – I think it is of utmost importance to analyze and I think it is quite interesting and worthy to be published.

In the introduction, the comment about the glycans playing a role and validated by experiment should be updated. The final published version of Casalino et al. provides experimental validation within the same paper.

I would have liked to see more discussion of the crowdedness of the system, and a finer analysis of what that means. How many contacts are made between the spikes here, and are all those contacts predominantly glycan mediated? The authors present a short analysis of this at a very high level, on page 4, but a map of the residues themselves (contact footprint, perhaps?) would be useful. Were these direct contacts made during system construction or do they form over time? In their now iconic image presented here as figure 1, e.g., the stalks are quite bent – I’ve always wondered – did the stalks start out that way, or did they move to that conformation over time, and are they sort of stuck like that, or is this terribly sampling limited? The authors don't provide any such information.

Otherwise I think the work is fine, and the epitope analysis, is interesting.

Overall I think this is an intriguing dataset, no doubt well-constructed and informative for folks, but some of what I thought was most interesting, was missing.

I also think the authors MUST deposit the full system models (PDB minimally, preferably actual simulation input files) as part of this work. There are likely other useful choices in terms of the many variable parameters, that the authors work will provide others, as we seek to understand things about this system.

Reviewer #3: This manuscript introduces a very interesting and a general approach to design novel antibodies to fight viral infections, including SARS-COV2. The approach is very elegant and based on combining multi-microsecond long molecular dynamics simulations of the target proteins in their native environment with bioinformatics-based analyses, which results in antibody-binding scores. Overall, the results in the manuscript are convincing and support the message of the authors. The manuscript is generally well-written and clear. It addresses a timely problem and should be of interest to PLOS Computational Biology readership and community in-general. Thus, I would be in favor of accepting the manuscript for publication in PLOS Computational Biology. However, I have some concerns that the author needs to address before it can be accepted.

Major Concerns:

1. Due to the plethora of on-going efforts to simulate full-length structure of SARS-COV2, including some of the recent work from these authors and Amaro lab, it is important to have a consensus on the full-length structure of SARS-COV2. Therefore, authors should discuss about their modeling approach in the light of recently published work by Amaro lab (ACS Central Science, 2020, 6, 10, 1722-1734).

2. Authors mentioned that during the MD simulations, S protein dynamically interacted with its neighboring copies. Does S protein forms stable interactions with the neighboring copies? Authors should provide a detailed analysis, maybe using clustering-based methods, to highlight the dynamic interactions between S proteins.

3. In order to perform bioinformatics-based epitope scoring, authors selected 220 x 4 snapshots from their 2.2μs-long MD simulations. How different were these snapshots? What was the criterion to select snapshots at 10ns intervals? This information will help readers not only to reproduce the data but also help them to intelligently apply this method to other important systems as well.

4. In-order to be exhaustive in their approach (which I really liked), authors performed epitope scoring under different glycan conditions. However, it seems like this analysis is based on the assumption that S protein dynamics will not be altered under different types of glycans. I think this is a strong assumption to make and authors should properly justify it.

5. Based on the starting S protein structure, 2 RBDs exists in down conformation and 1 RBD is in up conformation. Did authors captured conformation-dependent epitope scores for the regions residing on RBDs? Authors should include a discussion and epitope scoring data on this as well.

Minor Concerns:

1. Authors state that “On SARS-CoV-2 virions, S proteins occasionally form dense clusters, which may enhance the avidity of the interactions with human host cells [23]”. However, according to reference 23 (Figure 2A), there is no significant tendency to cluster. This point should be addressed.

2. Based on my understanding, epitope 5-6 seems to be in good correspondence with the recent pre-print ( https://www.biorxiv.org/content/10.1101/2020.08.08.238469v2 ) on nanobody design. Authors should include a discussion of their results with this CryoEM study. Interestingly, epitope 2 seems to be in good correspondence with the allosteric nanobody reported in the above pre-print.

3. In the methods section. authors state “the initial distance between the center of mass of the stalks if neighboring S was about 15nm.” This does not guarantee that ectodomain, to be specific RBDs are not interacting in the initial setup, thus making the system biased. Authors should indicate minimum distance between S proteins, in their initial setup.

4. Figure 4 is difficult to interpret and is too crowded, authors should make it more reader friendly.

5. Reference 30 in Supplementary Methods doesn’t seem to be on monte-carlo based rigid body docking. This should be corrected.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: No: authors should submit PDBs, PSF, input files for simulation

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Rommie Amaro

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008790.r003

Decision Letter 1

Alexander MacKerell, Arne Elofsson

14 Feb 2021

Dear Dr. Hummer,

We are pleased to inform you that your manuscript 'Computational epitope map of SARS-CoV-2 spike protein' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Alexander MacKerell

Associate Editor

PLOS Computational Biology

Arne Elofsson

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for addressing my concerns.

Reviewer #2: The authors have done a very good job of responding to my critique and those of the other two reviewers. I think it can be published now!

Reviewer #3: The authors have addressed all of my concerns

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Rommie Amaro

Reviewer #3: Yes: Shashank Pant

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008790.r004

Acceptance letter

Alexander MacKerell, Arne Elofsson

3 Mar 2021

PCOMPBIOL-D-20-01841R1

Computational epitope map of SARS-CoV-2 spike protein

Dear Dr Hummer,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Andrea Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Detailed Modelling Procedures, Detailed Methods, Consensus Score Parameters.

    (PDF)

    S1 Fig. Spike domains and glycosylation.

    (A) Domains of S. (B) Glycosylation pattern of S. Sequons are indicated with the respective glycans in a schematic representation for a fully glycosylated system (“full”) and for resampled simulations containing only mannose-5 (“Man5”).

    (TIF)

    S2 Fig. Time series of various key parameters monitored during the simulation.

    (A) Total potential energy, (B) Lennard-Jones energy, (C) Coulomb energy, (D-F) temperature, pressure, and volume of the simulation box. (G-J) Root-mean-square deviation (RMSD) over the course of the simulation, calculated for Cα carbons of the S body, CC1, HR2, and TMD, with respect to a reference configuration obtained after 300 ns of equilibration. Values for four spike proteins are shown with distinct colors.

    (TIF)

    S3 Fig. Impact of the glycosylation pattern on ray (A-C) and docking (D-G) accessibility.

    (A-C) Number of ray hits without glycans (“no glycans”), with full glycans (“full”, S1(B) Fig), and with full glycans and S protein crowding (“full CR”). (D-G) Monte Carlo rigid-body docking hits without glycans (“no glycans”), with Man5 glycans (“Man5”, S1(B) Fig) and with full glycans (“full”), as well as with full glycans and S protein crowding (“full CR”).

    (TIF)

    S4 Fig. Comparison of the epitope candidates E3–E6 with previously characterized epitopes.

    Glycans are shown in green licorice representation. Left panels: Epitope candidates shown in cartoon representation with purple color intensity indicating epitope consensus scores. Residues with epitope consensus score >0.1 are shown in licorice representation. Right panels: Epitopes described in previous works shown in cartoon and licorice representation, with higher purple color intensity indicating reported binding to multiple distinct antibodies.

    (TIF)

    S5 Fig. Effect of crowding on accessibility and epitope score.

    (A) Ray, (B) docking and (C) consensus scores with (thick line) and without crowding being taken into account.

    (TIF)

    S6 Fig. Location and structural features of the epitope candidates E1–E9 on the S surface.

    Epitope candidates are shown in red, orange and purple cartoon and licorice representation. Neighboring residues are shown in grey cartoon representation.

    (TIF)

    S7 Fig. Schematic illustration of the strategy used to obtain an atomistic model of the full-length S protein.

    For clarity, we do not show the solvent and membrane.

    (TIF)

    S8 Fig. Atomistic model of the full-length membrane-embedded S protein shown in cartoon representation.

    The chains are differentiated by color. Palmitoylated cysteine residues are shown in pink licorice (only one chain shown for clarity). Glycans are shown in green licorice representation. We show a section of the membrane to highlight the transmembrane domain of S.

    (TIF)

    S9 Fig. Spike-spike interactions and bending during MD simulation.

    (A) Snapshots of the 4-spike system from above (top row) and in side-view (bottom row) at the beginning (left) and end (right) of the MD trajectory. While the transmembrane regions move relatively little, spike heads form spike-spike interactions because of significant bending at the “knee” (CC1—CC2 joint). These interactions persist on the simulation timescale. (B) Visualization of the glycans in the final configuration (blue sticks). Glycans mediate spike-spike contacts. (C and D) Maps of time-averaged spike-spike contact probability mediated by amino-acids (C) or amino-acids and glycans (D) from the MD trajectory (color bar: contact probability). Interactions are located exclusively on lateral faces of the spike head.

    (TIF)

    S10 Fig. Consensus score analysis of “closed” spike.

    (A, B) Accessibility, (C) rigidity and (D) consensus score calculated taking only into account the chains with down RBDs.

    (TIF)

    S1 Movie. Atomistic molecular dynamics simulation trajectory of four S proteins embedded in a membrane.

    The proteins and lipids are shown in surface representation. Glycans are represented by green van der Waals beads. Water and ions are omitted for clarity. 600 ns simulation time shown.

    (MP4)

    Attachment

    Submitted filename: rebuttal.pdf

    Data Availability Statement

    The structure and GROMACS topology files used to simulate the system are available at: https://doi.org/10.5281/zenodo.3906317 All other relevant data are within the manuscript and its Supporting information files.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES