Abstract
Protein structures may be used to draw functional implications at the residue level, but how sensitive are these implications to the exact structure used? Calculation of the effects of SARS-CoV-2 S-protein mutations based on experimental cryo-electron microscopy structures have been abundant during the pandemic. To understand the precision of such estimates, we studied three distinct methods to estimate stability changes for all possible mutations in 23 different S-protein structures (3.69 million ΔΔG values in total) and explored how random and systematic errors can be remedied by structure-averaged mutation group comparisons. We show that computational estimates have low precision, due to method and structure heterogeneity making results for single mutations uninformative. However, structure-averaged differences in mean effects for groups of substitutions can yield significant results. Illustrating this protocol, functionally important natural mutations, despite individual variations, average to a smaller stability impact compared to other possible mutations, independent of conformational state (open, closed). In summary, we document substantial issues with precision in structure-based protein modeling and recommend sensitivity tests to quantify these effects, but also suggest partial solutions to the problem in the form of structure-averaged “ensemble” estimates for groups of residues when multiple structures are available.
Supplementary Information
The online version contains supplementary material available at 10.1007/s00249-022-01619-8.
Keywords: Cryo-electron microscopy, Structural heterogeneity, Mutations, Computer models, Spike protein
Introduction
Protein structures are commonly used as input to draw mechanistic implications at the residue level, either by inspection or more often by some form of computation/calculation. This has been much the case during the pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The spike glycoprotein (S-protein) binds to the human receptor angiotensin-converting enzyme 2 (ACE2) during cell infection and is targeted by the immune system, which is the mode of action of many vaccines. (Fehr and Perlman 2015; Letko et al. 2020; Liu et al. 2020; Wang et al. 2020; Forni and Mantovani 2021; van Dorp et al. 2021) Antigenic drift of new S-protein mutations is an urgent challenge, (McCallum et al. 2021; van Dorp et al. 2021) with the recent omicron variant being a hallmark example. (Liu et al. 2022; Planas et al. 2022) Accordingly, S-protein structures are of high interest. (Alsulami et al. 2021; Mehra and Kepp 2022) Made possible by the recent technical breakthroughs in cryo-electron microscopy (cryo-EM) of macromolecules, (Fernandez-Leiro and Scheres 2016; Murata and Wolf 2018; Danev et al. 2019; Blundell and Chaplin 2021) an unprecedentedly rich structural biology of the S-protein was induced by healthy competition during the pandemic, with hundreds of structures published. (Mehra and Kepp 2022) While important of course in themselves, this richness of data also enables new options for testing the sensitivity of computational protein modeling to the exact choice of input structure.
The S-protein is under selection for antigenic drift and plausibly also for maintenance of overall fold stability, (Mehra and Kepp 2022) as seen in other cases of protein evolution. (Bershtein et al. 2008; Tokuriki and Tawfik 2009; Goldstein 2011; Liberles et al. 2012; Kepp 2020) The prefusion S-protein is in a conformationally variable metastable state that evades antibodies and contributes to ACE2 binding. (Berger and Schaffitzel 2020) An accurate understanding and prediction of future SARS-CoV-2 evolution (Maher et al. 2022) requires insight into the potential impact of mutations, and experimental cryo-EM structures are essential for such insight. (Mehra and Kepp 2022).
Computer models are often applied to understand the role of specific residues, typically by studying mutations. However such models have limitations due to noise and biases in the training data and dependencies on a folded wild-type structure for extrapolating the impacts, if the mutant structure is not available. (Khan and Vihinen 2010; Christensen and Kepp 2012; Kepp 2015; Pucci et al. 2018, 2022; Caldararu et al. 2020, 2021a; Iqbal et al. 2021; Louis and Abriata 2021) Computational methods carry bias toward some mutation types. (Pucci et al. 2018; Caldararu et al. 2020) In addition to the heterogeneity within each separate cryo-EM map, which is informative of the ensemble dynamics within the settings of that study and increasingly handled with new techniques, (Scheres 2016; Zhong et al. 2021) “extra” heterogeneity arising from e.g., different chemical conditions and sample and data collection protocols can also be important but has not been studied much, and only for crystal structures obtained by X-ray diffraction. (Mehra et al. 2020; Caldararu et al. 2021a) We are not aware of any studies of the precision of cryo-EM structures in a functional context, i.e., how differences between cryo-EM structures of the supposedly same protein state affect functional deductions from computation. Understanding the precision of such protein modeling requires comparing multiple structures. More than 200 structures published during the pandemic (Alsulami et al. 2021; Mehra and Kepp 2022) makes the SARS-CoV-2 S-protein an ideal study case for exploring this problem with a sufficiently large number of structures.
Our work was partly motivated by many well-cited computational studies published during the pandemic using single methods and a few structures to deduce functional implications for single residues. (Delgado Blanco et al. 2020; Hadi-Alijanvand and Rouhani 2020; Laha et al. 2020; Othman et al. 2020; Shorthouse and Hall 2021; Xue et al. 2021; Teng et al. 2021; Kumar et al. 2022; Rochman et al. 2022) In our initial studies, we found that results varied with method and structure choice, requiring us to investigate much more data intensively many different structures, as we report below.
Here, we address this problem of precision (i.e., how much the choice of cryo-EM structure affects computationally deduced functional implications) via the structure-guided impact of residue substitutions on S-protein stability. Computational methods to estimate these effects from experimental wild-type structures have been developed and tested over several decades and their strengths and weaknesses are fairly well understood. (Sanavia et al. 2020; Casadio et al. 2022; Pucci et al. 2022) We show that even without experimental data available to validate accuracy, the precision of such computational estimates is a major challenge. However, we demonstrate that we can utilize the rich structural biology of the S-protein by explicitly considering several experimental structures as a simple ensemble to improve the estimates.
Methods
Strategy
How much does the choice of cryo-EM structure affect functional deductions at the residue level? This question of inter-structure heterogeneity should not be confused with heterogeneity within a specific cryo-EM map, which gives important information on the ensemble in a specific study’s context. Computational estimates of the impact of mutating a residue based on a wild-type structure provide a metric for answering this question, which is essentially about precision and does not require the thermodynamic stabilities of hundreds of S-protein mutants to be known. The unique richness of cryo-EM structures of the S-protein (Mehra and Kepp 2022) offers an unprecedented possibility to test this question in the resolution regime of 2.5 − 3.5 Å.
Stability impacts of mutating a residue are arguably the most straightforward way to study this problem, as methods have been developed for several decades for this purpose. (Louis and Abriata 2021; Casadio et al. 2022; Pucci et al. 2022) To estimate the effects of S-protein mutations, we
1) use several methods for sensitivity analysis, including some that are more or less sensitive to structural variations,
2) account for heterogeneity in the experimental S-protein structures by using 23 structures as input,
3) compare both heterogeneity between structures of the supposedly same conformational state and between different conformational states,
4) illustrate that interpretations at the residue level can be very structure-dependent, and
5) propose protocols to handle this, by comparing mutation groups rather than single mutations and sites, averaged over an “ensemble” of experimental structures.
Protein structures studied
The initial part of our work emphasized S-protein structures without antibodies or ACE2 bound in an evaluation of structural sensitivity. An example of the prefusion closed state is shown in Fig. 1a. We computed all mutations possible in the protein and illustrated results for a group of natural mutations (Fig. 1b) known to have large impact on function. (Tegally et al. 2020; Li et al. 2020; Chen et al. 2021; Yuan et al. 2021; Greaney et al. 2021; McCallum et al. 2021; Starr et al. 2021; Wang et al. 2021; Thomson et al. 2021; Dejnirattisai et al. 2021; Ferreira et al. 2021) We used six different S-protein structures all representing the S-protein closed state (Fig. 1c), selected based on resolution and residue coverage. These include 6X6P by Herrera et al. 2021, 7DF3 by Xu et al. 2021, 7CAB by Lv et al. 2020, 6X79 by McCallum et al. 2020, 7DDD by Zhang et al. 2021, and 6VXX by Walls et al. 2020.
Fig. 1.
SARS CoV-2 S-protein structures studied in this work. a Trimer colored by chain (PDB code: 7DF3). b Prominent mutation sites studied shown as red balls on monomer unit (PDB code: 7DF3). c Six structurally aligned S-protein structures investigated in this work (each structure represented in a single color, RMSD 0.5−2.9 Å)
The number of residues (N), the percentage of outliers from a Ramachandran analysis (% Outl.), and the resolutions (in Å) of these structures are given in Table 1. We note that the Ramachandran outliers partly reflect molecular mechanics force-field optimization and thus, while indicating deviations from expected realistic backbone conformations, do not inform directly on the empirical quality of the structure (in fact, very accurate structures may have some real outliers not observed upon force field optimization, which is biased toward small variation from ideal torsion angles). We also note that the ambient temperature ensemble may differ somewhat from the cryo-temperature ensemble obtained upon rapid cooling, in addition to chemical composition and cell effects (Mehra et al. 2020). We finally note that the cryo-EM structure deposited in the PDB is usually just one structure out of many that could be derived from the particle data, and it is possible to address the heterogeneity and ensemble effects of a specific experiment by other techniques. Our interest here is to quantify the effect of the choice of deposited structure on results, as it is standard to use a single structure sometimes rather arbitrarily if more good structures are available. However, we expect the structures to provide good information on backbone conformation and secondary and tertiary structure.
Table 1.
Structures used in the present study to compute all mutations
To understand conformation-state-specific mutation stability effects, we also studied all possible mutations in an additional 17 S-protein structures representing different states: closed, open, locked and active (Table S1). To minimize confounding effects (chemical conditions, lab protocol, protein modifications) we only compared conformational states reported by the same study (not one state from one study with another state from another study). These structures include 6VYB, (Walls et al. 2020) 7DWY, (Yan et al. 2021) 7DWZ, (Yan et al. 2021) 6XF5, (Zhou et al. 2020) 6XF6, (Zhou et al. 2020) 7A4N, (Juraszek et al. 2021) 7AD1, (Juraszek et al. 2021) 6X2C, (Henderson et al. 2020) 6X2A, (Henderson et al. 2020) 6X2B, (Henderson et al. 2020) 6ZGE, (Wrobel et al. 2020) 6ZGI, (Wrobel et al. 2020) 6ZGG, (Wrobel et al. 2020) 7KDG, (Gobeil et al. 2021) 7KDH, (Gobeil et al. 2021) 6Z97, (Huo et al. 2020) and 6ZB4 (Toelzer et al. 2020) having N = 963−1099, 0−0.5% outliers from Ramachandran analysis, and resolutions of 2.6−4.0 Å. These data were analyzed as pair comparisons of closed and open structures.
Methods to estimate stability effects of mutations
Protein mutations typically tend to more often be destabilizing than stabilizing, because of the usual evolutionary optimization of structures, giving a skewed fold stability (ΔΔG) distribution of arising mutations. (Tokuriki et al. 2007) Datasets accordingly also tend to be skewed toward destabilization and biased toward small stability effects, and also carry a bias in mutation type, as some mutations have been more commonly studied experimentally, e.g. alanine mutations, and these imbalances also affect method performance. (Pucci et al. 2018; Caldararu et al. 2020) The output must thus be seen in the context of other methods using different approaches, in addition to with same method for other structural inputs.
To study the impact of method choice on the results for individual mutations, we used three methods, SimBa-2 (or SimBa-IB), (Caldararu et al. 2021b; Bæk and Kepp 2022a) I-Mutant 3.0, (Capriotti et al. 2005, 2006) and CUPSAT, (Parthiban et al. 2006) to compute the change in fold stability (ΔΔG, in units of kcal/mol) for all possible mutations of all the 23 studied structures (69 chains). The methods represent three different design types–linear regression, machine learning, and force field-potential based–and are thus expected to provide a good indication of the maximal sensitivity toward method choice. SimBa has been tested both by us (Caldararu et al. 2021b) and in an independent benchmarking study (Pucci et al. 2022) where it performed slightly above average in terms of error and R2, despite its simplicity. SimBa reduces biases by design, but was run in the nonsymmetric mode (IB) which typically has slightly higher accuracy for random mutations. (Bæk and Kepp 2022a) I-Mutant (Capriotti et al. 2005, 2006) is a robust Support Vector Machine trained on ProTherm data which is not mutation-type or stability-balanced (Bæk and Kepp 2022a; Pucci et al. 2022) but has shown good general performance in many benchmarks, (Potapov et al. 2009; Kepp 2014, 2015; Pucci et al. 2022) and has low structural sensitivity. (Caldararu et al. 2021a) CUPSAT (Parthiban et al. 2006) uses environment-specific force fields to predict protein stability that is more sensitive to structure input. (Kepp 2015; Caldararu et al. 2021a) In addition, relative solvent accessibility (RSA) was calculated using each method for all the mutated sites using all 23 structures.
Computations and group comparisons
We used a group comparison protocol to illustrate how a hypothetical study of a few mutations of interest will be sensitive to method and structure choices, and to illustrate the statistical advantages of comparing mutations in groups against each other. By comparing groups of mutations, we can determine if a group of mutations has unusual properties relative to another group by e.g., a t-test. This protocol can produce meaningful results if output for single sites is uncertain. The grouping protocol works because random errors are reduced with larger N of the sample size, and systematic errors are reduced by comparison of the average effects in a t-test (i.e., the systematic errors in a method will exist in both groups and partly cancels when comparing only the mean output of the groups). To this end, we computed the stability effect of all possible N × 19 mutations (about 18,000 to 21,000 data points per chain) for each of the three chains of the closed six S-protein structures with all three methods, corresponding to approximately 3 × 19,000 ΔΔG values for each structure (N typically ~ 1000).
To illustrate the heterogeneity in estimates of functional impact at the individual residue level as would often feature in a study, we separated out results for a small group of mutations seen in prominent SARS-CoV-2 variants and known to impact virus function: K417T, K417N, N439K, N440K, Y449H, L452R, Y453F, S477N, T478K, E484K, E484Q, S494P, N501Y, and D614G. For example, K417T and K417N found in gamma (P.1) and beta (B.1.351) (Yuan et al. 2021) and L452R and N439K evade some antibodies (Li et al. 2020; McCallum et al. 2021; Starr et al. 2021; Thomson et al. 2021) as do mutations in the E484 site in e.g. alpha (B.1.117) and beta (B.1.351). (Tegally et al. 2020; Chen et al. 2021; Greaney et al. 2021; Wang et al. 2021; Yuan et al. 2021; Dejnirattisai et al. 2021; Ferreira et al. 2021) We also performed complete saturated mutagenesis for additional structures in partially open states to estimate if the effects were state-dependent, such that the total number of protein structures studied was 23, or 69 chains, and the number of mutation effects computed was thus 69 × N × 19, about 1.23 million ΔΔG values in total for each of the three methods (3.69 million ΔΔG values).
Analysis of data and statistics
The percentage of outliers from Ramachandran plots (torsion angles of the peptide backbone) indicate the realism of the experimentally obtained structure’s backbone conformations (Wlodawer et al. 2008) and were calculated using the Procheck program (Laskowski et al. 1993) (version 3.6.2) available via the PDBsum server. (Laskowski et al. 2018) The average ΔΔG, its standard deviation and average RSA values were calculated and compared. Mutation groups were compared for their stability effects using two-tailed student t-tests for the same mean, using 95% confidence intervals. We also studied the relationship between ΔΔG and RSA values using linear regression. Statistical analysis and plotting were performed using Matplotlib and Sklearn libraries of Python 3 and Microsoft Excel.
Results and discussion
Computing the stability effects of SARS-CoV-2 mutations
The main aim of our study was to investigate how much the choice of cryo-EM structure affects functional implications at the single residue level, illustrated by the stability effect of mutations. The rich S-protein structural biology has made such a study feasible and arguably also relevant to some biological conclusions specifically regarding S-protein mutations in the literature.
First, to quantify how structure choice affects computed estimates broadly, scatter plots of the ΔΔG values of all possible mutations in chain A of six similar prefusion S-protein structures (RMSD 0.5–2.9 Å) are shown in Figures S1, S2 and S3 for the three methods. Despite the correlations, the result for a specific mutation commonly depends on the structure used, with cases of several kcal/mol differences and sign changes. SimBa and I-Mutant3.0 show less structural sensitivity (Figures S1 and S2), with R2 > 0.9 between different structures, whereas CUPSAT shows large variation (Figure S3) with R2 ranging from 0.30 to 0.61, consistent with its larger emphasis on the local site geometry. (Kepp 2015; Caldararu et al. 2021a) These plots illustrate how the structural variations in the cryo-EM structures affect the precision of functional implications for methods that emphasize such local site structural variations more or less.
To understand how this heterogeneity affects estimates for specific groups of mutations in an envisioned study, Fig. 2 shows the mutation effects estimated using the SimBa method for the 18 chains of these six structures. Results are shown for all possible mutations (blue), for a selection of mutations observed in the wild (green), and for all possible mutations in the sites harboring these selected mutations (orange) (negative values represent destabilization). The selected mutations in this example (green) could have been studied with any of these structures as the green data in one of the panels. However, Fig. 2 shows that results depend on structure used, with the average for all possible mutations varying from − 0.55 to − 0.58 kcal/mol (standard deviation (SD) 1.32 to 1.38), from − 0.13 to − 0.55 (SD 0.95 to 1.18) for all mutations in the selected sites, and from − 0.12 to − 0.50 (SD 0.66 to 0.84) for the selected mutations in the example. The groups, by increasing N, reduce random errors in estimates, i.e., the central estimate interval is smaller for a larger batch of mutations than if studying one or a few mutations, as commonly done.
Fig. 2.
Violin plots of stability effects (ΔΔG, kcal/mol) calculated using the SimBa method (six structures, chains A, B, and C). Blue violins represent all possible mutations in the chain; orange represent all possible mutations at selected mutated positions; green represents selected natural mutations. The values are the average and standard deviation (in brackets) of the datasets
Figure 2 indicates structure-dependent variations even for a group comparison and a single method. Such dependencies are seen for all three methods (results for the other two methods in Figure S4 and Figure S5) and reflect differences in the cryo-EM structures that affect the model estimates. The group of all possible mutations provides a global comparison against which one can compare the mutations selected for study. Only the differences between averages are then relevant, not the direct values, even if structure-averaged, due to systematic and random errors in methods used to extract the information from the structures. We note that the three chains give similar results here because they are all in the down conformation.
The results suggest that random mutations are more likely to destabilize the S-protein, as expected since proteins are optimized for some fold stability. (Tokuriki et al. 2007) Therefore, stability effects tend to be skewed toward destabilization, and when trained on such data the methods develop a destabilization tendency that may be too small or too large. (Christensen and Kepp 2012; Pucci et al. 2018; Sanavia et al. 2020; Casadio et al. 2022) Most mutations have small, nearly neutral effects, but a few are predicted to be very destabilizing or stabilizing.
While individual mutation effects vary substantially even for the same mutation across different structures, the group comparisons for six structures reduce this noise considerably. In all 18 comparisons, the naturally occurring mutations (green) are as a group less destabilizing than both control sets (blue and orange), and the site-specific control (orange) produces less destabilization than the full control group (blue), consistent with surface residues having less impact on stability, all-else-being equal. Thus, group comparisons can make computational studies of mutations meaningful despite the noise in structure and method input. We also note that the natural mutations (green) have consistently smaller destabilization effects than all possible mutations in the same naturally evolved sites (orange) by 0.08 kcal/mol for 6X6P, 0.15 kcal/mol for 7CAB, and 0.12 kcal/mol for 6VXX, but only 0.01 kcal/mol for 7DF3, 0.05 kcal/mol for 6X79, and 0.03 kcal/mol for 7DDD.
The corresponding computations with I-Mutant and CUPSAT show similar results vs. the total group (Figure S4 and Figure S5). However, I-Mutant has a natural mutation group slightly more destabilizing than the same-site mutation group, whereas CUPSAT has the same behavior for both control sets as SimBa, although producing larger values. There are clear differences in the magnitude of the effects both absolutely and relative to the compared groups.
For C3 symmetrized structures, the symmetry constraints generate identical chain conformations. However, the comparison of C1 (no symmetrisation) and C3 EM maps is interesting: for the structures based on C1 EM maps, we observed clear differences in averages (e.g., 7DDD, 6ZB4 in Figure S5 and S8c) for individual chains for some methods. The cryo-EM chain structures, if not symmetrized, reflect relevant noise in the experiment or real dynamics but can give different results if one focuses on a single chain as basis for modeling or interpretation. These effects depend on the method used to map structure to function: if one puts a lot of emphasis on structural variations, such as CUPSAT, then the C1 differences between chains become larger.
The results document the importance of using multiple methods and structures and shows why the direct numbers cannot be used, only the differences in averages between mutation groups. Given the precision effects, we suggest that this type of analysis is done in future studies. We do not wish to discuss specific examples in the literature based on single mutations and structures, but our results suggest that studies based on single structures and methods or emphasizing the direct numbers of the computed effects for a few mutations (as the orange data in any of the panels in Fig. 2) should be supplemented by a broader sensitivity analysis of the type shown in the combined Fig. 2. (Laha et al. 2020; Walls et al. 2020; Shorthouse and Hall 2021; Teng et al. 2021) While we did study several million ΔΔG values with this protocol, we stress that computer power is not a limiting factor for these methods, and the structures are available, so it is mainly a matter of scripting and automating the analysis.
Implied mutation effects: structure and method dependencies
To illustrate how this heterogeneity affects computed results, Fig. 3 summarizes results for 14 natural S-protein mutations observed in variants of concern, computed using the same six prefusion S-protein structures as in Fig. 2. The simple linear regression model SimBa (Fig. 3a) and the vector machine model I-Mutant (Fig. 3b) produce results relatively less sensitive to structure input, consistent with previous findings (Caldararu et al. 2021a, b; Bæk and Kepp 2022a) and more similar tendencies of destabilization, except for S477N and N501Y where SimBa predicts high stabilization and I-Mutant a small impact. They tend to provide the same sign regardless of structure used, but magnitudes vary. However, I-Mutant predicts all mutations to be destabilizing, which the other methods do not. Since this comparison is for the prefusion S-protein, the chains are nearly identical and the methods produce similar results for all chains except a few cases for CUPSAT (Fig. 3c).
Fig. 3.
Change in stability (kcal/mol) of studied natural mutations for six prefusion structures of the SARS-CoV-2 S-protein. a SimBa. b I-Mutant 3.0. c CUPSAT
Figure 3 shows that the methods disagree on the magnitude and sometimes sign, i.e., one cannot draw conclusions for a single mutation based on a single method. The obvious solution to such errors is to only consider differences between mutations for each method separately. For CUPSAT, an example of a statistical force-field based method, the conclusions sometimes depend substantially and even qualitatively on the structure used, with e.g., S494P being very destabilizing when using 6VXX but very stabilizing when using 7CAB. This shows that results from single structures can be very imprecise for methods that emphasize local site geometry. Opposite effects to those reported could have been seen if using another structure. As far as we know, most (if not all) studies published used mainly one method and one or a few structures of the S-protein, and we suggest that the structure-averaged group differences, as illustrated in Fig. 2, will solve some of the precision issues seen in Fig. 3.
Since all structures selected here are of reasonable resolution from a cryo-EM perspective and are of similar residual coverage, no structure is clearly preferable to any other. Thus, in an ensemble of structures, differences are likely due to variations in chemical conditions and sample preparation and data collection. For this reason, we weighted all six structures equally as an “ensemble”, simply averaging out the structural heterogeneity to get a more compressed indication of the stability effects of natural mutations (in blue) in Figure S6. The standard deviations across the structures (in orange) can be taken as a measure of the “precision” of the method for the studied mutation group (Caldararu et al. 2021a). This comparison yields in general good agreement between I-Mutant and SimBa, but very different results for CUPSAT even after averaging out structural heterogeneity and considering the standard deviations.
Reducing heterogeneity by structure-averaged group comparisons
To reduce structural noise, we compared the structure-averaged results only, and additionally only considered differences in averages for groups of mutations, rather than direct values or differences in values of individual mutations. Figure 4 shows that even though local heterogeneity affects individual mutation estimates, comparison of mutation groups is statistically meaningful (i.e., has precision high enough to give statistically significant results in comparisons), as confirmed by a two-tailed t-test for the same mean (Table S2). The difference between the studied natural mutations and all possible mutations in the protein was significant at > 99% confidence for all three methods, as was all possible mutations in the natural mutation sites vs. all possible sites in the S-protein. The similar comparative behavior predicted by the three methods after incorporating structural heterogeneity differs from the analysis of individual mutations in Fig. 3. In the comparison of the natural mutations vs. all possible mutations in the naturally evolving sites, results are significant for SimBa (Fig. 4a) and I-Mutant (Fig. 4b), but not for CUPSAT (Fig. 4c), and the direction of effect differs between I-Mutant and SimBa, making these results inconclusive. In other words, conclusions based on a single cryo-EM structure can sometimes be misleading, and a sensitivity test to the use of other equally reasonable cryo-EM structures as input is therefore recommended.
Fig. 4.
ΔΔG values of some important natural mutations compared to full-protein averaged mutation backgrounds, averaged over six closed S-protein structures (6X6P, 7DF3, 7CAB, 6X79, 7DDD and 6VXX). a SimBa. b I-Mutant3.0. c CUPSAT
State-specific effects
The SARS-CoV-2 S-protein undergoes a conformational change when fusing with host cells, from the closed prefusion state studied above to a partially open state interacting with ACE2, with one, two, or three of its receptor binding domains (RBD) in an upwards conformation (1-up, 2-up, 3-up). To test whether the conformational change affects the stability trends discussed above, or if the stability effects were more or less pronounced in the open states, we computed the mutation effects for all-possible mutations for 17 additional S-protein structures (for each of the three chains) in the partially open or locked states. Consistently, the overall tendency of natural mutations to produce less destabilization as a group than the group of random mutations was seen not just for the closed structures (Figure S7, as discussed above) but also for these additional structures, with an overview for all 23 structures provided in the Supporting Information Figures S8-S9.
To explore whether these conformational changes affect the estimated stability effects, we analyzed the pairs of closed and open structures published from the same studies to reduce noise from confounders (such as different protocols and conditions of structure preparation and analysis) in Figures S10–S15, with one example shown in Fig. 5, for 6ZGE (closed uncleaved, locked) vs. 6ZGI (closed and cleaved) and 6ZGG (1-up). For the mutations as a whole, the state did not substantially affect the stability effects, irrespectively of using SimBa (Fig. 5a), I-Mutant (Fig. 5b), or CUPSAT (Fig. 5c), due to the group effect of large N averaging out the heterogeneity at the site level.
Fig. 5.
Comparison of the ΔΔG values of important natural mutations in 6ZGE (closed uncleaved, locked), 6ZGI (closed cleaved) and 6ZGG (cleaved 1-up) structures using three methods: a SimBa. b I-Mutant3.0. c CUPSAT
This finding was also confirmed by other pair comparisons given in the Supporting Information, such as for 6VYB (open), 7DWY (locked) and 7DWZ (active) (Figure S10), 6XF5 (prefusion closed) and 6XF6 (prefusion 1-up) (Figure S11), 7A4N (closed) and 7AD1 (1-up) structures (Figure S12), 6X2C (closed), 6X2A (1-up) and 6X2B (2-up) (Figure S13), 6VXX (closed) and 6VYB (open/1-up) (Figure S14), and 7KDG (closed) and 7KDH (1-up) structures (Figure S15). Numerical data for these plots are available in Tables S3-S9. All these pair comparisons suggest that group tendencies are not much affected by the state of the protein, although some individual sites (and thus mutations) are disordered and thus highly sensitive to conformational state and the structure used, a notable example being the D614 site (Mehra and Kepp 2022) that harbors the D614G mutation that was fixated early in the pandemic.(Yurkovetskiy et al. 2020; Mansbach et al. 2021).
Relationship between solvent accessibility and stability effects
Since we expect that the more solvent-exposed sites have milder impact on the S-protein’s fold stability than random mutations, (Caldararu et al. 2021b) regardless of the protein state (open, closed), we mapped this relationship in Fig. 6. The blue circles represent the stability effects of the studied natural mutations averaged over the six S-protein structures. The larger green and orange circles represent the average of the natural mutation effects and the average of all possible mutation effects, respectively, with stability changes shown vertically and solvent exposure of the site shown horizontally.
Fig. 6.
Scatter plots of ΔΔG vs. relative solvent accessibility (RSA) for natural mutations (blue circles) (structure-averaged on 6X6P, 7DF3, 7CAB, 6X79, 7DDD and 6VXX). Green circles represent the average for the natural mutations and orange represents the average for all possible chain mutations. a SimBa. b I-Mutant-3.0. c CUPSAT
As confirmed by the aggregate results of SimBa (Fig. 6a), I-Mutant (Fig. 6b), and CUPSAT (Fig. 6c), the natural mutations (as a group shown in green) tend to be more solvent-exposed and less destabilizing than expected for a random mutation as a group (orange circle) in the S-protein. The similarly decent correlations (R = 0.36 − 0.75, with same direction) of the average RSA with average ΔΔG also exist while using different methods. The composite results considering structural heterogeneity by averaging data and group comparisons yield similar tendencies for the three methods that increases significance, which could be different if analyzing a single method and structure, and a few mutations. Thus, while the average site in the SARS-CoV-2 S-protein is only 25% solvent-exposed, the studied natural mutations are on average 40% exposed, although there is substantial variation.
Conclusions
The SARS-CoV-2 pandemic has motivated publication of hundreds of cryo-EM structures of the S-protein. (Mehra and Kepp 2022) While of value for understanding SARS-CoV-2, this unprecedented effort also enables investigation of a more fundamental topic: how precise are computed implications using cryo-EM structures? The precision of such estimates is as important as the accuracy, but only the former can be studied meaningfully if many structures of the same protein state are available, whereas assessment of accuracy requires large unbiased experimental data sets. It has now become possible to study this problem at high data coverage due to the SARS-CoV-2 pandemic producing many structures even for the same states of a single protein.
We show that precision is a major challenge already before any accuracy assessment. Since results can be very dependent on structure used, one cannot simply rely on one method and structure. To handle this, the protein modeling community is recommended to use several methods and structures of presumed similar quality to sensitivity test conclusions. A more robust protocol involves analysis at four levels:
1) structure-averaging to reduce noise in the residue coordinates,
2) comparing several methods to understand errors and obtain consensus estimates,
3) using groups of mutation averages to reduce noise and improve significance (large N effect), and
4) comparing only differences between averages of mutation groups, to reduce systematic errors.
We can thereby utilize the structural information more broadly and take advantage of the law of large numbers to reduce noise for individual residues. The issues are probably aggravated when having additional heterogeneity from protein–protein or protein–ligand interactions. Our study may have some general relevance to the computational structural biology field by quantifying how method and structure choices affect deduced structure–function relationships at the residue level of cryo-EM structures. Variations in published structures, partly due to some arbitrariness in the structure taken as representative from each cryoEM map and partly due to real variations in protocols, reflect a real additional uncertainty that we need to deal with, in addition to the uncertainty seen within a single experiment.
As a note, similar issues of precision and heterogeneity may in principle exist also for crystal structures. However, even with a statistically meaningful analysis of many available structures of the supposedly same protein state, as was possible for S-protein cryo-EM structures due to the pandemic, the distinct protocol for fitting and modeling X-ray reflections will lead to heterogeneity and resolution not being directly comparable. For example, the crystal packing could affect the solvent-exposed sites differently from the frozen protein obtained from cryo-EM microscopy, and this could also affect both precision and accuracy, i.e., even if precision would be higher for exposed crystal-structure sites, their realism could be lower, although these effects will take substantial work to understand fully.
Structure-averaged group comparisons offer a partial solution, utilizing the published data broadly. In cases where several (same-state) structures are available, if there is no reason to favor one over the other, several structures should be used to produce the chemical insight and a sensitivity estimate provided. In the absence of several structures, the heterogeneity could be generated computationally. Molecular dynamics may, with some limitations due to sampling, force field, and realism of the chemical composition, be used to estimate conformational heterogeneity, e.g., under different conditions of temperature and ionic strength and pH. (Bottaro and Lindorff-Larsen 2018; Mehra et al. 2020), and machine-learning methods could be expected to help to represent the heterogeneity of an ensemble beyond the single experiment itself, if the more problematic e.g., polar disordered residues can be described by ensembles. (Chen and Shukla 2022; Bæk and Kepp 2022b) If the protein is not very flexible and condition-dependent, a few structures might be sufficient for the analysis.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
RM acknowledges IIT Bhilai for the Research Initiation Grant (RIG), project code 2005900.
Author contributions
RM: Conceptualization and computation, visualization, analysis and writing the paper. KK: Conceptualization and computation, analysis and writing the paper.
Data availability
The Supporting Information file contains additional data relevant to the study (tables and figures). The methods used are available online: CUPSAT: http://cupsat.tu-bs.de/. SimBa: https://github.com/kasperplaneta/SimBa2. I-Mutant: https://folding.biofold.org/i-mutant/.
Declarations
Conflicts of interest
The authors declare that they have no conflicts of interest related to this work.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Rukmankesh Mehra, Email: rukmankesh@iitbhilai.ac.in.
Kasper P. Kepp, Email: kpj@kemi.dtu.dk
References
- Alsulami AF, Thomas SE, Jamasb AR, et al. SARS-CoV-2 3D database: understanding the coronavirus proteome and evaluating possible drug targets. Brief Bioinform. 2021;22:769–780. doi: 10.1093/bib/bbaa404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bæk KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: toward simple, balanced, and interpretable models. J Comput Chem. 2022;43:504–518. doi: 10.1002/jcc.26810. [DOI] [PubMed] [Google Scholar]
- Bæk KT, Kepp KP. Assessment of AlphaFold2 for human proteins via residue solvent exposure. J Chem Inf Model. 2022;62:3391–3400. doi: 10.1021/acs.jcim.2c00243. [DOI] [PubMed] [Google Scholar]
- Berger I, Schaffitzel C. The SARS-CoV-2 spike protein: balancing stability and infectivity. Cell Res. 2020;30:1059–1060. doi: 10.1038/s41422-020-00430-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bershtein S, Goldin K, Tawfik DS. Intense neutral drifts yield robust and evolvable consensus proteins. J Mol Biol. 2008;379:1029–1044. doi: 10.1016/j.jmb.2008.04.024. [DOI] [PubMed] [Google Scholar]
- Blundell TL, Chaplin AK. The resolution revolution in X-ray diffraction, Cryo-EM and other technologies. Prog Biophys Mol Biol. 2021;160:2–4. doi: 10.1016/j.pbiomolbio.2021.01.003. [DOI] [PubMed] [Google Scholar]
- Bottaro S, Lindorff-Larsen K. Biophysical experiments and biomolecular simulations: a perfect match? Science. 2018;361:355–360. doi: 10.1126/science.aat4010. [DOI] [PubMed] [Google Scholar]
- Caldararu O, Blundell TL, Kepp KP. A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics. 2021;22:88. doi: 10.1186/s12859-021-04030-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caldararu O, Blundell TL, Kepp KP. Three simple properties explain protein stability change upon mutation. J Chem Inf Model. 2021;61:1981–1988. doi: 10.1021/acs.jcim.1c00201. [DOI] [PubMed] [Google Scholar]
- Caldararu O, Mehra R, Blundell TL, Kepp KP. Systematic investigation of the data set dependency of protein stability predictors. J Chem Inf Model. 2020;60:4772–4784. doi: 10.1021/acs.jcim.0c00591. [DOI] [PubMed] [Google Scholar]
- Capriotti E, Calabrese R, Casadio R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics. 2006;22:2729–2734. doi: 10.1093/bioinformatics/btl423. [DOI] [PubMed] [Google Scholar]
- Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33:W306–W310. doi: 10.1093/nar/gki375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casadio R, Savojardo C, Fariselli P, et al (2022) Turning failures into applications: The problem of protein ΔΔG prediction. In: Data Mining Techniques for the Life Sciences. Springer. 169–185 [DOI] [PubMed]
- Chen J, Shukla D. Integration of machine learning with computational structural biology of plants. Biochem J. 2022;479:921–928. doi: 10.1042/BCJ20200942. [DOI] [PubMed] [Google Scholar]
- Chen RE, Zhang X, Case JB, et al. Resistance of SARS-CoV-2 variants to neutralization by monoclonal and serum-derived polyclonal antibodies. Nat Med. 2021;27:717–726. doi: 10.1038/s41591-021-01294-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen NJ, Kepp KP. Accurate stabilities of laccase mutants predicted with a modified FoldX protocol. J Chem Inf Model. 2012;52:3028–3042. doi: 10.1021/ci300398z. [DOI] [PubMed] [Google Scholar]
- Danev R, Yanagisawa H, Kikkawa M. Cryo-electron microscopy methodology: current aspects and future directions. Trends Biochem Sci. 2019;44:837–848. doi: 10.1016/j.tibs.2019.04.008. [DOI] [PubMed] [Google Scholar]
- Dejnirattisai W, Zhou D, Supasa P, et al. Antibody evasion by the P.1 strain of SARS-CoV-2. Cell. 2021;184:2939–2954.e9. doi: 10.1016/j.cell.2021.03.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delgado Blanco J, Hernandez-Alias X, Cianferoni D, Serrano L. In silico mutagenesis of human ACE2 with S protein and translational efficiency explain SARS-CoV-2 infectivity in different species. PLoS Comput Biol. 2020;16:e1008450. doi: 10.1371/journal.pcbi.1008450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fehr AR, Perlman S. Coronaviruses: Methods and Protocols. New York: Springer; 2015. Coronaviruses: an overview of their replication and pathogenesis; pp. 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernandez-Leiro R, Scheres SHW. Unravelling biological macromolecules with cryo-electron microscopy. Nature. 2016;537:339–346. doi: 10.1038/nature19948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferreira I, Kemp S, Datir R, et al. SARS-CoV-2 B.1.617 mutations L452R and E484Q are not synergistic for antibody evasion. J Infect Dis. 2021;224:989–994. doi: 10.1093/infdis/jiab368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forni G, Mantovani A. COVID-19 vaccines: where we stand and challenges ahead. Cell Death Differ. 2021;28:626–639. doi: 10.1038/s41418-020-00720-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gobeil SM-C, Janowska K, McDowell S, et al. D614G Mutation alters SARS-CoV-2 spike conformation and enhances protease cleavage at the S1/S2 junction. Cell Rep. 2021;34:108630. doi: 10.1016/j.celrep.2020.108630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein RA. The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins. 2011;79:1396–1407. doi: 10.1002/prot.22964. [DOI] [PubMed] [Google Scholar]
- Greaney AJ, Starr TN, Gilchuk P, et al. Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition. Cell Host Microbe. 2021;29:44–57. doi: 10.1016/j.chom.2020.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hadi-Alijanvand H, Rouhani M. Studying the effects of ACE2 mutations on the stability, dynamics, and dissociation process of SARS-CoV-2 S1/hACE2 complexes. J Proteome Res. 2020;19:4609–4623. doi: 10.1021/acs.jproteome.0c00348. [DOI] [PubMed] [Google Scholar]
- Henderson R, Edwards RJ, Mansouri K, et al. Controlling the SARS-CoV-2 spike glycoprotein conformation. Nat Struct Mol Biol. 2020;27:925–933. doi: 10.1038/s41594-020-0479-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herrera NG, Morano NC, Celikgil A, et al. Characterization of the SARS-CoV-2 S protein: biophysical, biochemical, structural, and antigenic analysis. ACS Omega. 2021;6:85–102. doi: 10.1021/acsomega.0c03512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huo J, Zhao Y, Ren J, et al. Neutralization of SARS-CoV-2 by destruction of the prefusion spike. Cell Host Microbe. 2020;28:445–454.e6. doi: 10.1016/j.chom.2020.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iqbal S, Li F, Akutsu T, et al (2021) Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations. Brief Bioinform bbab184 [DOI] [PubMed]
- Juraszek J, Rutten L, Blokland S, et al. Stabilizing the closed SARS-CoV-2 spike trimer. Nat Commun. 2021;12:1–8. doi: 10.1038/s41467-020-20321-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kepp KP. Towards a “Golden Standard” for computing globin stability: stability and structure sensitivity of myoglobin mutants. Biochim Biophys Acta - Proteins Proteomics. 2015;1854:1239–1248. doi: 10.1016/j.bbapap.2015.06.002. [DOI] [PubMed] [Google Scholar]
- Kepp KP. Survival of the cheapest: how proteome cost minimization drives evolution. Q Rev Biophys. 2020;53:e7. doi: 10.1017/S0033583520000037. [DOI] [PubMed] [Google Scholar]
- Kepp KP. Computing stability effects of mutations in human superoxide dismutase 1. J Phys Chem B. 2014;118:1799–1812. doi: 10.1021/jp4119138. [DOI] [PubMed] [Google Scholar]
- Khan S, Vihinen M. Performance of protein stability predictors. Hum Mutat. 2010;31:675–684. doi: 10.1002/humu.21242. [DOI] [PubMed] [Google Scholar]
- Kumar S, Thambiraja TS, Karuppanan K, Subramaniam G. Omicron and Delta variant of SARS-CoV-2: a comparative computational study of spike protein. J Med Virol. 2022;94:1641–1649. doi: 10.1002/jmv.27526. [DOI] [PubMed] [Google Scholar]
- Laha S, Chakraborty J, Das S, et al. Characterizations of SARS-CoV-2 mutational profile, spike protein stability and viral transmission. Infect Genet Evol. 2020;85:104445. doi: 10.1016/j.meegid.2020.104445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laskowski RA, Jabłońska J, Pravda L, et al. PDBsum: structural summaries of PDB entries. Protein Sci. 2018;27:129–134. doi: 10.1002/pro.3289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr. 1993;26:283–291. doi: 10.1107/s0021889892009944. [DOI] [Google Scholar]
- Letko M, Marzi A, Munster V. Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat Microbiol. 2020;5:562–569. doi: 10.1038/s41564-020-0688-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q, Wu J, Nie J, et al. The Impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity. Cell. 2020;182:1284–1294.e9. doi: 10.1016/j.cell.2020.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liberles D, a, Teichmann S a, Bahar I,, et al. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci. 2012;21:769–785. doi: 10.1002/pro.2071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu C, Zhou Q, Li Y, et al. Research and development on therapeutic agents and vaccines for COVID-19 and related human coronavirus diseases. ACS Cent Sci. 2020;6(3):315–331. doi: 10.1021/acscentsci.0c00272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu L, Iketani S, Guo Y, et al. Striking antibody evasion manifested by the omicron variant of SARS-CoV-2. Nature. 2022;602:676–681. doi: 10.1038/s41586-021-04388-0. [DOI] [PubMed] [Google Scholar]
- Louis BBV, Abriata LA. Reviewing challenges of predicting protein melting temperature change upon mutation through the full analysis of a highly detailed dataset with high-resolution structures. Mol Biotechnol. 2021;63:863–884. doi: 10.1007/s12033-021-00349-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lv Z, Deng Y-Q, Ye Q, et al. Structural basis for neutralization of SARS-CoV-2 and SARS-CoV by a potent therapeutic antibody. Science. 2020;369:1505–1509. doi: 10.1126/science.abc5881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maher MC, Bartha I, Weaver S, et al. Predicting the mutational drivers of future SARS-CoV-2 variants of concern. Sci Transl Med. 2022 doi: 10.1126/scitranslmed.abk3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mansbach RA, Chakraborty S, Nguyen K, et al. The SARS-CoV-2 spike variant D614G favors an open conformational state. Sci Adv. 2021;7:eabf3671. doi: 10.1126/sciadv.abf3671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCallum M, Bassi J, De Marco A, et al. SARS-CoV-2 immune evasion by the B.1.427/B.1.429 variant of concern. Science. 2021;373:648–654. doi: 10.1126/science.abi7994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCallum M, Walls AC, Bowen JE, et al. Structure-guided covalent stabilization of coronavirus spike glycoprotein trimers in the closed conformation. Nat Struct Mol Biol. 2020;27:942–949. doi: 10.1038/s41594-020-0483-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mehra R, Dehury B, Kepp KP. Cryo-temperature effects on membrane protein structure and dynamics. Phys Chem Chem Phys. 2020;22:5427–5438. doi: 10.1039/C9CP06723J. [DOI] [PubMed] [Google Scholar]
- Mehra R, Kepp KP. Structure and Mutations of SARS-CoV-2 spike protein: a focused overview. ACS Infect Dis. 2022;8:29–58. doi: 10.1021/acsinfecdis.1c00433. [DOI] [PubMed] [Google Scholar]
- Murata K, Wolf M. Cryo-electron microscopy for structural analysis of dynamic biological macromolecules. Biochim Biophys Acta - Gen Subj. 2018;1862:324–334. doi: 10.1016/j.bbagen.2017.07.020. [DOI] [PubMed] [Google Scholar]
- Othman H, Bouslama Z, Brandenburg J-T, et al. Interaction of the spike protein RBD from SARS-CoV-2 with ACE2: Similarity with SARS-CoV, hot-spot analysis and effect of the receptor polymorphism. Biochem Biophys Res Commun. 2020;527:702–708. doi: 10.1016/j.bbrc.2020.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res. 2006;34:W239–W242. doi: 10.1093/nar/gkl190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Planas D, Saunders N, Maes P, et al. Considerable escape of SARS-CoV-2 Omicron to antibody neutralization. Nature. 2022;602:671–675. doi: 10.1038/s41586-021-04389-z. [DOI] [PubMed] [Google Scholar]
- Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel. 2009;22:553–560. doi: 10.1093/protein/gzp030. [DOI] [PubMed] [Google Scholar]
- Pucci F, Bernaerts KV, Kwasigroch JM, Rooman M. Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics. 2018;34:3659–3665. doi: 10.1093/bioinformatics/bty348. [DOI] [PubMed] [Google Scholar]
- Pucci F, Schwersensky M, Rooman M. Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol. 2022;72:161–168. doi: 10.1016/j.sbi.2021.11.001. [DOI] [PubMed] [Google Scholar]
- Rochman ND, Faure G, Wolf YI, et al. Epistasis at the SARS-CoV-2 receptor-binding domain interface and the propitiously boring implications for vaccine escape. Mbio. 2022;13:e0013522. doi: 10.1128/mbio.00135-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanavia T, Birolo G, Montanucci L, et al. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J. 2020;18:1968–1979. doi: 10.1016/j.csbj.2020.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scheres SHW. Processing of structurally heterogeneous cryo-EM data in RELION. Methods Enzymol. 2016;579:125–157. doi: 10.1016/bs.mie.2016.04.012. [DOI] [PubMed] [Google Scholar]
- Shorthouse D, Hall BA. SARS-CoV-2 variants are selecting for spike protein mutations that increase protein stability. J Chem Inf Model. 2021;61:4152–4155. doi: 10.1021/acs.jcim.1c00990. [DOI] [PubMed] [Google Scholar]
- Starr TN, Greaney AJ, Dingens AS, Bloom JD. Complete map of SARS-CoV-2 RBD mutations that escape the monoclonal antibody LY-CoV555 and its cocktail with LY-CoV016. Cell Reports Med. 2021;2:100255. doi: 10.1016/j.xcrm.2021.100255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tegally H, Wilkinson E, Giovanetti M, et al. Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. medRxiv. 2020 doi: 10.1101/2020.12.21.20248640. [DOI] [Google Scholar]
- Teng S, Sobitan A, Rhoades R, et al. Systemic effects of missense mutations on SARS-CoV-2 spike glycoprotein stability and receptor-binding affinity. Brief Bioinform. 2021;22:1239–1253. doi: 10.1093/bib/bbaa233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomson EC, Rosen LE, Shepherd JG, et al. Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity. Cell. 2021;184:1171–1187.e20. doi: 10.1016/j.cell.2021.01.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toelzer C, Gupta K, Yadav SKN, et al. Free fatty acid binding pocket in the locked structure of SARS-CoV-2 spike protein. Science. 2020;370:725–730. doi: 10.1126/science.abd3255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tokuriki N, Stricher F, Schymkowitz J, et al. The stability effects of protein mutations appear to be universally distributed. J Mol Biol. 2007;369:1318–1332. doi: 10.1016/j.jmb.2007.03.069. [DOI] [PubMed] [Google Scholar]
- Tokuriki N, Tawfik DS. Stability effects of mutations and protein evolvability. Curr Opin Struct Biol. 2009;19:596–604. doi: 10.1016/j.sbi.2009.08.003. [DOI] [PubMed] [Google Scholar]
- van Dorp L, Houldcroft CJ, Richard D, Balloux F. COVID-19, the first pandemic in the post-genomic era. Curr Opin Virol. 2021;50:40–48. doi: 10.1016/j.coviro.2021.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walls AC, Park Y-J, Tortorici MA, et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 2020;181:281–292. doi: 10.1016/j.cell.2020.02.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang P, Nair MS, Liu L, et al. Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7. Nature. 2021;593:130–135. doi: 10.1038/s41586-021-03398-2. [DOI] [PubMed] [Google Scholar]
- Wang Q, Zhang Y, Wu L, et al. Structural and functional basis of SARS-CoV-2 entry by using human ACE2. Cell. 2020;181:894–904. doi: 10.1016/j.cell.2020.03.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wlodawer A, Minor W, Dauter Z, Jaskolski M. Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J. 2008;275:1–21. doi: 10.1111/j.1742-4658.2007.06178.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wrobel AG, Benton DJ, Xu P, et al. SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects. Nat Struct Mol Biol. 2020;27:763–767. doi: 10.1038/s41594-020-0468-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu C, Wang Y, Liu C, et al. Conformational dynamics of SARS-CoV-2 trimeric spike glycoprotein in complex with receptor ACE2 revealed by cryo-EM. Sci Adv. 2021;7:eabe5575. doi: 10.1126/sciadv.abe5575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue T, Wu W, Guo N, et al. Single point mutations can potentially enhance infectivity of SARS-CoV-2 revealed by in silico affinity maturation and SPR assay. RSC Adv. 2021;11:14737–14745. doi: 10.1039/D1RA00426C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan R, Zhang Y, Li Y, et al (2021) Structural basis for the different states of the spike protein of SARS-CoV-2 in complex with ACE2. Cell Res. 1–3 [DOI] [PMC free article] [PubMed]
- Yuan M, Huang D, Lee C-CD, et al. Structural and functional ramifications of antigenic drift in recent SARS-CoV-2 variants. Science. 2021;373:818–823. doi: 10.1126/science.abh1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yurkovetskiy L, Wang X, Pascal KE, et al. Structural and functional analysis of the D614G SARS-CoV-2 spike protein variant. Cell. 2020;183:739–751.e8. doi: 10.1016/j.cell.2020.09.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang C, Wang Y, Zhu Y, et al. Development and structural basis of a two-MAb cocktail for treating SARS-CoV-2 infections. Nat Commun. 2021;12:1–16. doi: 10.1038/s41467-020-20465-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong ED, Bepler T, Berger B, Davis JH. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat Methods. 2021;18:176–185. doi: 10.1038/s41592-020-01049-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou T, Teng I-T, Olia AS, et al. Structure-based design with tag-based purification and in-process biotinylation enable streamlined development of SARS-CoV-2 spike molecular probes. Cell Rep. 2020;33:108322. doi: 10.1016/j.celrep.2020.108322. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Supporting Information file contains additional data relevant to the study (tables and figures). The methods used are available online: CUPSAT: http://cupsat.tu-bs.de/. SimBa: https://github.com/kasperplaneta/SimBa2. I-Mutant: https://folding.biofold.org/i-mutant/.






