Abstract
We observe that a residue R of the spike glycoprotein of SARS-CoV-2 that has mutated in one or more of the current variants of concern or interest, or under monitoring, rarely participates in a backbone hydrogen bond if R lies in the S subunit and usually participates in one if R lies in the S subunit. A partial explanation for this based upon free energy is explored as a potentially general principle in the mutagenesis of viral glycoproteins. This observation could help target future vaccine cargos for the evolving coronavirus as well as more generally. A related study of the Delta and Omicron variants suggests that Delta was an energetically necessary intermediary in the evolution from Wuhan-Hu-1 to Omicron.
Keywords: SARS-CoV-2 spike, mutagenic pressure, backbone free energy, vaccinology
1. Introduction
This short note isolates a specific and elementary observation about Protein Data Bank (PDB) [1] files concerning the mutated residues in the current variants of concern and of interest, plus the variants under monitoring, as per [2] 22 October 2021, of the SARS-CoV-2 spike glycoprotein S. This observation is then applied to the genesis of the Omicron variant. It has not, to our knowledge, appeared in the literature other than in our own earlier work [3] in the context of specific variants of concern, and it may be material going forward in designing mRNA or other types of vaccine cargos, if necessary, as the coronavirus continues to evolve. It is worth consideration, in this now highly studied example, as a potentially more general example of viral glycoprotein mutagenesis since we provide a partial explanation for our observation based upon general principles involving free energy.
To state this fact about mutagenic pressure in the spike, recall [4] that, as in many examples of viral glycoproteins, in particular commonly with Class I fusion mechanisms [5,6], S is composed of the two subunits S and S, where S mediates receptor-binding extracellularly and S mediates fusion within an endosome. One particularity of S is a host-furin-mediated cleavage between S and S at residue number 685–686. There is, furthermore, a second cleavage lying adjacent to the fusion peptide, mediated by the host-cathepsin or serine protease of a C-terminal segment of S at residue number 815–816. See [7] for more information on cleavage in the SARS-CoV-2 spike.
In any protein, hydrogen bonds form between backbone nitrogen atoms N-H and oxygen atoms O=C in different peptide units and these are called backbone hydrogen bonds (or BHBs). (To be precise, a DSSP [8] hydrogen bond is accepted as a BHB provided that the distance between H and O is less than 2.7 Å and both and each exceed 90). A protein residue R itself is said to participate in a BHB if either the nearby nitrogen N-H donates or the nearby oxygen O=C accepts a BHB. (Again, to be precise, if at least two monomers of the trimeric spike participate, then the residue itself participates.) On average for all proteins, roughly 70–80% of all residues participate in a BHB [9].
Here is the main easily confirmed empirical observation of this paper, which is quantified in Table 1 and subsequently discussed: A residue R of the SARS-CoV-2 spike glycoprotein S, which has mutated in one or more of the current variants of concern or interest, or under monitoring [2] (cf. Table 1 for these mutagenic residue numbers), rarely participates in a BHB if R lies in and usually participates in a BHB if R lies in .
Table 1.
Mol | #Res | avg #Missing | avg #Absent | avg #Unbonded | avg BFE |
---|---|---|---|---|---|
S | 1121 | 1.39 | 4.91 | 0.35 | 2.41 |
56 | 4.71 | 5.34 | 0.61 | 2.47 | |
S | 655 | 1.85 | 5.52 | 0.42 | 3.16 |
43 | 6.14 | 5.69 | 0.74 | 2.80 | |
S | 466 | 0.73 | 4.05 | 0.25 | 1.60 |
13 | 0.00 | 4.15 | 0.15 | 2.14 |
A general but not entirely satisfactory explanation for this involves the free energy of structural details stabilized by BHBs. In particular, viral glycoproteins, which mediate receptor-binding and membrane fusion, are by their very nature metastable. It follows that successful viral mutation can neither increase free energy by so much as to disturb the stability of the molecule nor decrease it by so much as to interrupt near-instability, for otherwise the molecule will, respectively, either explode or fail to reconform and function correctly. The minimal way to avoid this twofold constraint is to mutate residues that do not participate in BHBs at all, and that is precisely what we find in S before mutation. However, we shall also discover, most interestingly, that this is not reversible.
We shall discuss S subsequently only after including certain salient definitions, facts, and data, and note in this introduction just that the existence of BHBs and their free energies are obviously functions of pH. This alone might account for differences between and S since the endocytotic pathway is highly acidifying [4].
2. Materials and Methods
As is customary, we recorded mutations relative to an original Wuhan genome called Wuhan-Hu-1 and its corresponding spike protein (UniProt [10] Code P0DTC2) by considering only structure files with resolutions below some bound; for our purposes, this included a resolution of at most 3.0 Å, neither cleaved nor bound to an antibody or receptor, and computed via cryo-electron microscopy. These 15 exemplar structures 6VXX 6X29 6X79 6XLU 6XM0 6XM3 6XM4 6ZB5 6ZGE 7A4N 7AD1 7DDD 7DF3 7DWY and 7JWY for S from the PDB depend upon various techniques of stabilizing S in its prefusion conformation [11,12,13,14,15,16,17,18,19,20]. The molecules are therefore not truly identical, hence the utility of taking consensus and average data across the collection of PDB files, as we shall do.
Some of the previous considerations can be calibrated by employing a new concept and quantity in structural biology, namely the so-called backbone free energy (BFE) from [21], which can be computed from a PDB file to be called simply a structure. Roughly, the BFE of a structure stabilized by a BHB is computed from geometry [22] by comparing the planes containing the peptide units of the donor and accepter of the BHB, and by applying the Pohl–Finkelstein quasi Boltzmann Ansatz [23,24,25].
Let us next briefly give a more complete discussion of the method from first principles, referring the interested reader to [22] for the background and data for general proteins (as explained here), ref. [21] for application to viral glycoproteins, ref. [26] for application to coronavirus spikes, and [3] for the SARS-CoV-2 spike S in particular. One starts by choosing a suitably unbiased subset of the PDB and computing all of its attendant BHBs, comprising a collection of 1,166,165 BHBs for the unbiased subset in [22] and for our discussion throughout. For each of these BHBs, there is a rotation of space from the peptide plane of its donor to the peptide plane of its receptor, mapping the peptide bond of the former to that of the latter. This defines an a priori distribution on the space of all rotations, which is computed once and for all. Now given another BHB, there is an associated free energy given by taking the log density of this a priori distribution at this new subject BHB, which is suitably normalized to approximate the BFE in kcal/mole. Thus, the geometry of the backbone described in a PDB file determines a BFE associated to each BHB in any protein. The fundamental fact, established in [21] for viral glycoproteins, is that residues of large BFE target locations of large conformational change in the backbone, typically including, in particular, the fusion peptide.
There is a trichotomy of possibilities for a residue R in a specific structure: R may be modeled in the structure and participate in a BHB or not, and in this latter case, we say R is absent, but R may also simply be missing from the PDB file. (As before, these properties of residues are taken as consensus data from the three monomers.) R can be missing for a number of simple reasons: the protein may be disordered at R [4]; the experiment may be inaccurate or problematic at R; the data and its refinement may not model R within reasonable parameters; or R may be C-terminal or N-terminal to the experimentally synthesized sub-peptide of the protein S.
The average of the resolutions in our collection of structures is 2.77 Å and of the percentages of Ramachandran outliers is 0.1, thus these are all high-quality experimental structures. Note that clashscores and sidechain outliers are not particularly relevant measures of quality for our purposes. As argued in [3], it follows that the first among the possibilities for R missing is the most likely, thus one might conflate missing with disordered for high-quality structures within the PDB-range. The consensus range of our collection of structures for S is comprised of residue numbers 27 to 1147.
A residue that is missing or absent is said to be unbonded and is bonded otherwise. If a residue R is bonded, then it participates in a BHB, thus there is either a BHB with donor N-H or one with acceptor O=C, or both, and the BFE of the residue R is defined to be the maximum of the BFEs of these one or two BHBs, first averaged over the two or three monomers. If a residue is unbonded, then its BFE is undefined.
Specifically to give a quantitative sense to what follows, the range of BFE values is −2.9 to +6.85 kcal/mole with approximate 50th, 90th, and 99th percentile cutoffs given by 1.4, 4.6, and 6.6, respectively. The validated hypothesis is that if the BFE of a residue lies in the 90th percentile, i.e., is at least 4.6 kcal/mole, then within one residue of it along the backbone, the sum of the two adjacent backbone conformational angles changes by at least 180 degrees in its pre to post-fusion reconformation. The converse does not hold.
3. Results
3.1. Wuhan-Hu-1 to Many Variants
As argued before, in order to preserve the metastability of the molecule, the BFE before and after a mutation must be more or less constant across the spike. The plot of BFE across the spike is depicted in Figure 1. Higher BFE is evidently concentrated in compared to . The several regions of meaningful negative BFE are illustrated in the figure by the intersections of the plot with the gray horizontal line, which corresponds to nearly ideal helices. Notice that each cleavage site is surrounded by a region of high BFE, and the same is true for the two ends of HR1.
However, as depicted in Figure 2, the single mutation D614G, which quickly globally overtook Wuhan-Hu-1 as the predominant strain, alters BFE along the entire backbone by as much as 5.10 kcal/mole at residue 134, whereas by only 0.14 kcal/mole at residue 614. Thus, a single local change of the primary structure can engender a long-range change of BFE across the whole spike glycoprotein.
Table 1 presents findings and data about the mutating residue numbers M common to one or more of the variants under consideration here. Specifically, the table summarizes BFEs and numbers of absent, missing, and unbonded residues in each of the molecules , and , as well as in their respective intersections , and , with the mutagenic residue M under consideration.
Several trends present themselves:
is more disorganized than (i.e., # missing is larger);
there are more loops in than (i.e., # absent is larger);
the BFE of is larger than ;
the same three assertions above hold for compared to ; and
a greater ratio of residues in are mutating in the variants under consideration than in .
Moreover, this table quantifies our main new
Basic Finding: The residues mutating in the variants under consideration are more often unbonded in and bonded in .
Note that the missing and absent columns in the table come directly from the PDB and DSSP, with no provisos (other than those conventions in parentheses in the text). The unbonded column presents our novel insights and depends upon a cutoff 5; in particular, it is bonded, i.e., neither absent nor missing, in at least five of the 15 structures. The last BFE column depends not only on the cutoff 5 but also on our theory of BHBs. All of the preceding trends are invariant under changing this cutoff by unity, with this data not presented.
3.2. Wuhan-Hu-1 to Delta to Omicron
We shall quantify the basic finding for the three mutational steps from Wuhan-Hu-1 (W) to Delta (), W to Omicron (O), and to O. The geometry of the spike for is derived from the PDB files 7V7O 7V7P…7V7V in the same manner as before for W from its exemplar structures and again with a cutoff of five.
Table 2 quantifies our basic finding under various scenarios, with the and mutations comparable, where (*) is the union of the variants considered in the previous section. The transition is anomalous with its much smaller percentage of unbonded mutated residues. The explanation follows from the last column, showing the large percentages of residues that are bonded in W but not in , and hence of higher mutagenic potential for their transition to O according to the basic finding.
Table 2.
Mol | W to (*) | to | W to | W to |
---|---|---|---|---|
S | 35 | 40 | 35 | 30 |
61 | 61 | 22 | 42 | |
S | 42 | 46 | 42 | 32 |
74 | 67 | 27 | 43 | |
S | 25 | 32 | 25 | 28 |
15 | 33 | 0 | 33 |
The supposition that played an intermediary role in the passage to O is already strongly bolstered by the nearly complete dominance of in South Africa before the advent of O. Meanwhile, the PDB files 7LYK 7LYL…7LYQ [28] for the earlier South African variant are of a lower quality, thus their interpretation is problematic and not fully presented here; however, the resulting entry for to O in Table 2 is 48%.
4. Discussion
We find that mutagenic pressure on exceeds that on , as expected based on the function and location of both subunits, and that the former is more disorganized and with a lower percentage of bonded residues than the latter. These findings are consistent with the general trend that certain B-factors [29] in the receptor-binding subunit usually exceed those in the fusion subunit of a viral glycoprotein, at least in the prefusion conformation.
It is argued that the mutation of unbonded residues avoids the twofold constraint on BFE imposed by the metastability of the viral glycoprotein, thus explaining the tendency of mutating residues in to be unbonded. However, among the mutating residues (19) 156–158, 452, 478, 614, 681, and 950 defining Delta, only 614 and 950 are bonded in Wuhan-Hu-1, in line with the basic finding, while only 681 is unbonded in Delta. This is fascinating and shows that there is more to our energetic argument than simply mutations avoiding BHBs. On the contrary, Wuhan-Hu-1 to Delta is not reversible, and there is thus an evolutionary dynamics of fixing BHBs for function and erasing them to enhance mutation in light of the basic finding, at least in this case of Wuhan-Hu-1 to Delta. In contrast, among the mutated residues 80, 215, 417, 484, 501, 614, and 701 defining Beta, only 417, 614, and 701 are bonded in Wuhan-Hu-1, while these plus 484 are bonded in Beta, here using any cutoff greater than unity, so bonded mutagenic residues remain bonded in this case. Backbone hydrogen bonds therefore provide an additional level of regulation of viral mutation, and this warrants further study.
As was already mentioned, the different pH of activation for the two subunits and may explain the opposite trend in the latter that mutating residues tend to be bonded since the prefusion-stabilized spike structures may better reflect the actual geometry and consequent BHBs of compared to . Another related possibility is that pre-cleavage sits on top of as a kind of cap, thereby sterically constraining the latter, thus the active geometry of is displayed only post-cleavage and in the course of acidic post-fusion reconformation.
In any case, the findings on suggest a strategy for anticipating residues primed for mutation therein. However, going forward, it is the residues that are unbonded for the currently mutated variants, rather than for Wuhan-Hu-1, that should be considered as likely future candidates, just as in our analysis of Delta to Omicron.
5. Conclusions
The new and potentially more general insight is that protein secondary structure may provide a regulatory network controlling mutation, at least for viral glycoproteins.
Our findings admit explanation by general principles and may therefore hold more generally, namely: being free from backbone hydrogen bonds increases the mutagenic potential within the receptor-binding subunit of a viral glycoprotein, and therefore deleting backbone hydrogen bonds within the constraints of molecular functionality can increase the mutagenic potential of a glycoprotein.
These considerations may be of utility for anticipating mutagenic pressure within the receptor-binding subunit of a viral glycoprotein based on a lack of backbone hydrogen bonds, and this may be of substance in general for vaccine design.
Acknowledgments
It is a pleasure to thank Pablo Guardado-Calvo, Minus van Baalen, Charles Swerdlow, and the referees for their critical comments.
Abbreviations
BFE | backbone free energy |
BHB | backbone hydrogen bond |
PDB | protein data bank |
W | Wuhan-Hu-1 SARS-CoV-2 strain |
, , O | Beta, Delta, and Omicron SARS-CoV-2 variant |
Funding
This research study received no external funding.
Data Availability Statement
See [22] for online software.
Conflicts of Interest
The author declares no conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Berman H.M., Westbrook J., Feng Z., Gillil G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Outbreak.info. [(accessed on 16 October 2021)]. Available online: https://outbreak.info/
- 3.Penner R. Antiviral Resistance against Viral Mutation: Praxis and Policy for SARS-CoV-2. Comput. Math. Biophys. 2021;9:81–89. doi: 10.1515/cmb-2020-0119. [DOI] [Google Scholar]
- 4.Dimmock N.J., Easton A.J., Leppard K.N. Introduction to Modern Virology. 6th ed. Blackwell; Oxford, UK: 2007. [Google Scholar]
- 5.Harrison S.C. Viral membrane fusion. Nat. Struct. Mol. Biol. 2008;15:690–698. doi: 10.1038/nsmb.1456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.White J.M., Delos S.E., Brecher M., Schornberg K. Structures and mechanisms of viral membrane fusion proteins: Multiple variations on a common theme. Crit. Rev. Biochem. Mol. Biol. 2008;43:189–219. doi: 10.1080/10409230802058320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bollavaram K., Leeman T.H., Lee M.W., Kulkarni A., Upshaw S.G., Yang J., Song H., Platt M.O. Multiple sites on SARS-CoV-2 spike protein are susceptible to proteolysis by cathepsins B, K, L, S, and V. Protein Sci. 2021;30:1131–1143. doi: 10.1002/pro.4073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kabsch W., Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 9.Finkelstein A.V., Ptitsyn O. Protein Physics, a Course of Lectures. 2nd ed. Academic Press; London, UK: 2016. [Google Scholar]
- 10.The UniProt Consortium UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49 doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Henderson R., Edwards R.J., Mansouri K., Janowska K., Stalls V., Gobeil S.M., Kopp M., Li D., Parks R., Hsu A.L., et al. Controlling the SARS-CoV-2 spike glycoprotein conformation. Nat. Struct. Mol. Biol. 2020;27:925–933. doi: 10.1038/s41594-020-0479-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Juraszek J., Rutten L., Blokl S., Bouchier P., Voorzaat R., Ritschel T., Bakkers M.J., Renault L.L., Langedijk J.P. Stabilizing the closed SARS-CoV-2 spike trimer. Nat. Commun. 2021;12:244. doi: 10.1038/s41467-020-20321-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.McCallum M., Walls A.C., Bowen J.E., Corti D., Veesler D. Structure-guided covalent stabilization of coronavirus spike glycoprotein trimers in the closed conformation. Nat. Struct. Mol. Biol. 2020;27:942–949. doi: 10.1038/s41594-020-0483-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Toelzer C., Gupta K., Yadav S.K., Borucu U., Davidson A.D., Williamson M.K., Shoemark D.K., Garzoni F., Staufer O., Milligan R., et al. Free fatty acid binding pocket in the locked structure of SARS-CoV-2 spike protein. Science. 2020;370:725–730. doi: 10.1126/science.abd3255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Walls A.C., Park Y.J., Tortorici M.A., Wall A., McGuire A.T., Veesler D. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell. 2020;181:281. doi: 10.1016/j.cell.2020.02.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wrobel A.G., Benton D.J., Xu P., Roustan C., Martin S.R., Rosenthal P.B., Skehel J.J., Gamblin S.J. SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects. Nat. Struct. Mol. Biol. 2020;27:763–767. doi: 10.1038/s41594-020-0468-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Xu C., Wang Y., Liu C., Zhang C., Han W., Hong X., Wang Y., Hong Q., Wang S., Zhao Q., et al. Conformational dynamics of SARS-CoV-2 trimeric spike glycoprotein in complex with receptor ACE2 revealed by cryo-EM. Sci. Adv. 2021;7:eabe5575. doi: 10.1126/sciadv.abe5575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yan R., Zhang Y., Li Y., Ye F., Guo Y., Xia L., Zhong X., Chi X., Zhou Q. Structural basis for the different states of the spike protein of SARS-CoV-2 in complex with ACE2. Cell Res. 2021;31:717–719. doi: 10.1038/s41422-021-00490-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang C., Wang Y., Zhu Y., Liu C., Gu C., Xu S., Wang Y., Zhou Y., Wang Y., Han W., et al. Development and structural basis of a two-MAb cocktail for treating SARS-CoV-2 infections. Nat. Commun. 2021;12:264. doi: 10.1038/s41467-020-20465-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhou T., Tsybovsky Y., Gorman J., Rapp M., Cerutti G., Chuang G.Y., Katsamba P.S., Sampson J.M., Schön A., Bimela J., et al. Cryo-EM Structures Delineate a pH-Dependent Switch that Mediates Endosomal Positioning of SARS-CoV-2 Spike Receptor-Binding Domains. bioRxiv. 2020 doi: 10.2139/ssrn.3717767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Penner R. Backbone Free Energy Estimator Applied to Viral Glycoproteins. J. Comput. Biol. 2020;27:1495–1508. doi: 10.1089/cmb.2020.0120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Penner R.C., Andersen E.S., Jensen J.L., Kantcheva A.K., Bublitz M., Nissen P., Rasmussen A.M., Svane K.L., Hammer B., Rezazadegan R., et al. Hydrogen bond rotations as a uniform structural tool for analyzing protein architecture. Nat. Commun. 2014;5:5803. doi: 10.1038/ncomms6803. [DOI] [PubMed] [Google Scholar]
- 23.Finkelstein A.V., Gutin A.M., Badretdinov A.Y. Boltzmann-like statistics of protein architectures: Origins and consequences. In: Biswas B.B., Roy S., editors. Proteins: Structure Function, and Engineering. Volume 24. Springer; Berlin/Heidelberg, Germany: 1995. pp. 1–26. Subcellular Biochemistry. [DOI] [PubMed] [Google Scholar]
- 24.Finkelstein A.V., Badretdinov A.Y., Gutin A.M. Why do protein architectures have Boltzmann-like statistics? Proteins. 1995;23:142–150. doi: 10.1002/prot.340230204. [DOI] [PubMed] [Google Scholar]
- 25.Pohl F.M. Empirical protein energy maps. Nat. New Biol. 1971;234:277–279. doi: 10.1038/newbio234277a0. [DOI] [PubMed] [Google Scholar]
- 26.Penner R. Conserved High Free Energy Sites in Human Coronavirus Spike Glycoprotein Backbones. J. Comput. Biol. 2020;27:1622–1630. doi: 10.1089/cmb.2020.0193. [DOI] [PubMed] [Google Scholar]
- 27.Gobeil S.M., Janowska K., McDowell S., Mansouri K., Parks R., Manne K., Stalls V., Kopp M.F., Henderson R., Edwards R.J., et al. D614G Mutation Alters SARS-CoV-2 Spike Conformation and Enhances Protease Cleavage at the S1/S2 Junction. Cell Rep. 2021;34:108630. doi: 10.1016/j.celrep.2020.108630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gobeil S., Janowska K., McDowell S., Mansouri K., Parks R., Stalls V., Kopp M.F., Manne K., Saunders K.O., Edwards R.J., et al. Effect of natural mutations of SARS-CoV-2 on spike structure, conformation and antigenicity. Science. 2021;373:6555. doi: 10.1126/science.abi6226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Carugo O. How large B-factors can be in protein crystal structures? BMC Bioinform. 2018;19:61. doi: 10.1186/s12859-018-2083-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
See [22] for online software.