SARS-CoV-2 protein structure and sequence mutations: Evolutionary analysis and effects on virus variants

Ugo Lomoio; Barbara Puccio; Giuseppe Tradigo; Pietro Hiram Guzzi; Pierangelo Veltri

doi:10.1371/journal.pone.0283400

. 2023 Jul 20;18(7):e0283400. doi: 10.1371/journal.pone.0283400

SARS-CoV-2 protein structure and sequence mutations: Evolutionary analysis and effects on virus variants

Ugo Lomoio ^1,^#, Barbara Puccio ^1,^#, Giuseppe Tradigo ², Pietro Hiram Guzzi ^1,^*, Pierangelo Veltri ³

Editor: Nagarajan Raju⁴

PMCID: PMC10358949 PMID: 37471335

Abstract

The structure and sequence of proteins strongly influence their biological functions. New models and algorithms can help researchers in understanding how the evolution of sequences and structures is related to changes in functions. Recently, studies of SARS-CoV-2 Spike (S) protein structures have been performed to predict binding receptors and infection activity in COVID-19, hence the scientific interest in the effects of virus mutations due to sequence, structure and vaccination arises. However, there is the need for models and tools to study the links between the evolution of S protein sequence, structure and functions, and virus transmissibility and the effects of vaccination. As studies on S protein have been generated a large amount of relevant information, we propose in this work to use Protein Contact Networks (PCNs) to relate protein structures with biological properties by means of network topology properties. Topological properties are used to compare the structural changes with sequence changes. We find that both node centrality and community extraction analysis can be used to relate protein stability and functionality with sequence mutations. Starting from this we compare structural evolution to sequence changes and study mutations from a temporal perspective focusing on virus variants. Finally by applying our model to the Omicron variant we report a timeline correlation between Omicron and the vaccination campaign.

Introduction

Comprehension of cellular processes requires studying relations between the sequence of genes and the structure of encoded proteins [1, 2] through genomic and proteomic studies. From an evolutionary point of view, changes in gene sequences (e.g. single nucleotide mutations) may imply a modification of protein structure and, thus, phenotype changes. The evolutionary process limits some phenotype modifications due to environmental constraints [1, 3].

The pandemic of the SARS-CoV-2 virus has led to the storage of a large amount of genomic and proteomic datasets enabling the molecular evolutionary analysis of proteins thereby boosting studies of the relations between the protein sequence structure and functions of the virus [4, 5]. The genome of SARS-CoV-2 contains 29.9 kilobase [6], and has 14 functional open reading frames (ORFs) and multiple encoding regions: (i) four structural proteins (i.e., Spike, S; envelope, E; membrane, M; and nucleocapsid, N); (ii) 16 nonstructural proteins (nsp1-nsp16) and (iii) accessory proteins [5, 7, 8].

Viruses undergo many mutations during the process of replication [9, 10]. Mutations can occur randomly due to errors in replication steps and to changes in the structure of the viral proteins [11, 12]. These mutations may acquire an evolutionary advantage when they improve their ability to escape from the host’s immune system [13]. Many studies have identified mutations in SARS-CoV-2 [5, 14–17], and in the relation between virus mutations and the ability to override immune systems (e.g., relating the evolutionary process with the spread of the virus).

Sequence of mutations may cause the insurgence of variants. A variant is a viral genome with one or more mutations causing changes in protein structure and virus characteristics. Numerous studies have focused on the evolution of SARS-CoV-2 main variants, highlighting the increase of transmissibility vis-a-vis a reduction in severity, (e.g. Omicron variant [17]). The World Health Organisation (WHO) and the European Center for Disease Control (ECDC) are tasked with studying new evidence on variants. Sequenced samples of SARS-CoV-2 gathered from around the world are analysed by the WHO and ECDC for publishing information about variants and health protocols. Variants and available information are published [18–20] and in the ECDC/WHO register available on line. Variants are organised into lineages, groups of closely related viruses with a common ancestor. Although mutations occur very frequently during virus replication, only few modifications change the virus functionalities. A group of variants with similar genetic changes has been designed by the WHO as a Variant Being Monitored (VBM), Variant of Concern (VOC), or as a Variant of Interest (VOI) due to shared attributes and characteristics that may require government action to safeguard public health [21, 22].

We focus on a subset of variants, responsible for S protein structure modifications, by analysing structural models gathered from the Exascalate4Cov consortium database [23]. We study the impact of sequence changes on Spike protein structure by means of the Protein Contact Network (PCN) formalism [24–28]. PCNs are graphs whose nodes represent the C − α atoms of the backbone of proteins, while edges represent a relative spatial distance of 4—8 Ångstroms among residues. Topological descriptors of PCNs, such as node centrality measures, are used to discover protein properties, even at the sub-molecular level [26, 27]. We focus on the correlation between sequence and structure evolution and figure out relations between sequence updates, structure and phenotype variations. A similar model has been used to study Spike proteins of variants of SARS-CoV-2. We investigate how sequence changes generate relevant changes in the structure. We calculate centrality measures for each PCN node and their changes due to mutations. We also measure the structural differences among variants using the template modelling score (TM-SCORE) [29], a metric for assessing the topological similarity of protein structures. Moreover, using the Louvain algorithm we extract communities for each PCN considered [30] and we then relate them to mutations. Finally we build two trees, one considering the difference in network descriptors and the second regarding the distance in terms of TM-SCORE to clarify possible divergences between the timeline evolution of sequences and structure. We analysed each variant from three perspectives: the sequence, the structure, and the PCN network parameters. The obtained results allowed us to put forward the following claims:

The temporal evolution of the variants shows that both sequence and structure of Spike proteins changed significantly after the beginning of the large-scale vaccination campaign.
The PCN analysis shows local changes in the protein structure of the studied variants, which can be related to protein folding and stability.
PCN nodes centrality measures highlight the differences between Spike proteins in: (i) Omicron₁ variant versus Wild Type, and (ii) Delta variant versus Wild Type.
Communities extracted with the Louvain algorithm represent correlations among amino acids which correspond to different functional domains of spike proteins, for example, with the Louvain algorithm communities can be mapped into the SARS-CoV2 Omicron₁ Spike variant structure.
For S proteins, the net charge of RBD and NTD domain of the S protein, predicted from pK_a values, increases with time.

In this paper we focus on: (i) the correlation between sequence changes and structural changes in the evolving Spike proteins of the SARS-CoV-2 variants; (ii) correlation between vaccination campaigns and Omicron variants mutations. Similar works [17, 31–33] covered the same problem: for example [17] focused on the impact of mutations on the virus pathogenicity, while [33] focused on the impact on binding affinity. In this paper, we also analyse parameters extracted from both sequence and structure, integrating different viewpoints. Unlike the works which consider different scales [32], this paper also analyses the impact of mutations at the intermediate level by using the PCN model [31] and considers a larger set of structures and parameters.

Materials and methods

We here propose the analysis of sequence mutations and structural changes using the pipeline described in Fig 1. The structures of the Spike protein of some selected virus variants are used as input to the pipeline. PDB files of the three-dimensional Spike structures are gathered from Exascalate4Cov consortium database [23]. Table 1 reports the variants used as input in the pipeline and the related mutations on Spike protein.

Table 1. SARS-CoV-2 variants, lineage classification and mutations on the Spike protein sequence.

Variant	Lineage	Mutations on Spike Protein
Alpha (α)	B.1.1.7	HV69–70Δ, Y144Δ, N501Y, A570D, D614G, P681H, T716I, S982, D1118H
Beta (β)	B.1.351	D80A, D215G, 241–243Δ, R246I, K417N, E484K, N501Y, D614G, A701V
Omicron₁ (o₁)	BA.1	A67V, HV69–70Δ, T95I, G142D, V143Δ, YY144–145Δ, N211Δ, L212I, ins214EPE, G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, D796Y, N856K, Q954H, N969K, L981F
Omicron₅ (o₅)	BA.5	Δ25–27, HV69–70Δ, G142D, V213G, G339D, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, L452R, S477N,T 478K, E484A, F486V, Q498R, N501Y, Y505H, D614G, H655Y, N679K, P681H, N764K, D796Y, Q954H, N969K
Gamma (γ)	P.1	D138Y, R190S, K417T, E484K,N501Y, D614G, H655Y, T1027I
Zeta (ζ)	P.2	E484K, D614G, V1176F
Delta (δ)	B.1.617.2	EE156–157Δ, R158G, L452R, T478K, D614G, P681R, D950N
Kappa (κ)	B.1.617.1	E154K, E484Q, D614G, P681R, Q1071H
Epsilon (ϵ)	B.1.427	S13I, W152C, L452R, D614G
Eta (η)	B.1.525	Q52R, A67V, HV69–70Δ, Y144Δ, E484K, D614G, Q677H, F888L
Iota1₁ (ι₁)	B.1.526	L5F, T95I, D253G, S477N, D614G
Iota₂ (ι₂)	B.1.526	L5F, T95I, D253G, E484K, D614G
Ihu	B.1.640.1	E96Q, CNDPFLGVY136–144Δ, R190S, I210T, R346S, N394S, Y449N, F490R, N501Y, D614G, P681H, T859N, D936H, K1191N

Open in a new tab

For each Spike protein we compute a corresponding PCN using the PCN-Miner tool [31]. Each node of the PCN corresponds to a single amino acid of the corresponding protein, while an edge connects two nodes whose spatial distance is comprised of between 4 and 8 Angstroms [26]. For each node of the PCN a set of centrality measures is evaluated. We used the following: Betweenness, Degree, Eigenvector, Closeness, and Katz, centrality measures. A description and definition are reported in Table 2.

Table 2. Centrality measures definition.

Measure Description	Measure Definition
Betweenness Centrality measure: given a node i, it measures how much the node (i.e., amino acid) influences communication and serves as a bridge from one part of the graph (i.e., part of the represented protein) to another. σ_j,k indicates the number of shortest paths from node j to node k and σ_j,k(i) indicates the shortest path which includes i.	$C_{b e t w e e n n e s} (i) = \sum_{i \neq j \neq k} \frac{σ_{j, k} (i)}{σ_{j, k}} .$
Degree Centrality measure: given a node i, it measures the normalised degree of i, i.e. the number of connections of the node. Nodes (amino acids) with a high centrality, considered hubs in the network (i.e., the protein), have a crucial role in the network communication. The degree centrality of a node i is computed as reported on the right.	$C_{d e g} (i) = \frac{d e g (i)}{m a x (d e g r e e s)}$
Eigenvector Centrality measure: given a node i, the Eigenvector centrality measure how a node is connected to other central nodes. A high value means that a node i is connected to many high-score nodes. Let A be the adjacency matrix of a graph G. The eigenvector centrality C_eigiof a node i is given by the formula C_eig(i)= $\frac{1}{λ} \sum_{t \in M_{v}} x_{t}$ = $\frac{1}{λ} \sum_{t \in G} a_{v, t} x_{t}$ , where M₍v) is the set of neighbours and λ is a constant.	$C_{e i g} (i) = \frac{1}{λ} \sum_{j \in G} A (i, j) C_{e i g} (j)$
Closeness Centrality measure: given a node i, it measures the distance (closeness) among i (amino acid) to all graph nodes and it is evaluated as reported on the right, where d(i, j) is the shortest distance between i and j.	$C_{c l o s e n e s s} (i) = \frac{n - 1}{\sum_{j = 1}^{j = n - 1, j \neq i} d (i, j)} .$
Katz Centrality measure: given a node i, it measures the influence degree of i in the graph (protein). α and β (see right part) are parameters indicating (i) the attenuation factor and (ii) the weight attributed to the neighborhoods of each node.	$C_{k a t z} (i) = \sum_{j} α A_{i, j} x_{j} + β$

Open in a new tab

PCN analysis

We measure the centrality values of all the PCN nodes and compare both the overall changes (i.e., averaging the centrality values) and local changes (i.e., changes in the centrality values of mutated residues). We depict these values by using boxplots and radar plots. Boxplots are associated with all variants. Radar plots are used to represent centrality values of mutated nodes in the Spike variants. Finally, the obtained centrality measures are mapped onto the real protein structure using PCN-Miner. A t-test [34] on the variants centrality distribution is used to evaluate the significance of any changes. Community detection analysis has been performed with the Louvain algorithm with default parameters [30] to study the relation between virus mutations and communities in PCNs. The Louvain method allows us to find communities in graphs. Based on the greedy paradigm, which uses modularity graph information to optimise performance on large graphs, it works by finding small communities, each mapped into one node of a new graph. The process is repeated until all of the nodes in each small community have been covered. In our scenario, communities represent a set of related amino acid. After the communities have been extracted, we map them into functional regions, i.e., the RBD domain, and analyse the presence of mutations in such communities. The Louvain algorithm has been widely used in network analysis [35, 36] to discover communities. We apply it here to PCNs to find communities overlapping with protein structures and to relate communities with protein functionalities. For each variant, we plot its mutations inside the communities to graphically show where most mutations end up. This allows us to identify a pattern in mutation distributions of the Louvain communities. In the case of a high percentage of mutations belonging to the same community, we can claim that the corresponding mutations share similar effects on protein functionality.

Structural analysis

We computed TM-scores [37] between pairs of Spike proteins of two different variants by using the US-align (Universal Structural alignment) software [38]. The TM-score quantify the structural similarity between proteins (see the upper part of Fig 1). Thus, the pipeline plots sequence distance and structural distance for Spike variants. We also evaluate contact similarity between PCNs variants. This is defined as the percentage of contacts/non-contacts shared between two PCNs, and should represent a measure of similar behaviours among subnetworks. Finally, for each variant, we computed the acid dissociation constant pK_a for each amino acid of the analysed proteins using the PROPKA3 web server [39]. Given a node, pK_a value is the −log₁₀K_a, where K_a is the acid dissociation constant that measures amino acid acidity or basicity. Following the method proposed in [32], the pKa values were used to predict the overall domain charge, also known as EPrbd and EPntd, respectively, for RBD and NTD protein domains. The surface electrostatic potential has been calculated for each variant RBD and NTD by the APBS (Adaptive Poisson-Boltzmann Solver) software with default parameters [40].

Sequence analysis

We performed sequence alignment using the CLUSTALW software with default parameters [41].

Implementation

The methods described above were developed with the help of the JupyterLab environment [42]. The code was written in Python language version 3, plotly and matplotlib libraries (used to draw figures) [43], and PROPKA [39] library to calculate pK_a of the proteins. The PCN-Miner [31] tool computed PCNs, node centralities, and Louvain algorithms to extract communities [30] for the Spike variants. PyMOL library [44] used to read, visualize and modify PDB files. The CUPSAT software [45] is used to predict changes in protein stability caused by a particular mutation of the Wild Type form. Finally, we used US Align [38] to perform a sequence-independent alignment based on the structural similarity of the variants and to compute the TM score.

Results

We analysed the sequence and structure of Spike proteins of fourteen selected SARS-CoV-2 variants using PDB files as input. For each PDB file we calculated a PCN. We then evaluate centrality measures (Betweennes, Degree, Eigenvector, Closeness, and Katz) for each PCN. Eigenvector centrality values have been calculated to show how protein structure varies. In particular, we note that when centrality values have little variation among variants, the global form of protein structure does not vary among variants, while local changes occur. All the evaluated centrality measures for variants are reported as boxplots in Fig 3. Boxplots evidence that the changes in Eigenvector Centrality are not significant considering both all nodes and only those corresponding to the mutated residues. Conversely, rewiring of the structure causes more changes in Katz Centrality values than other measures. Further measures confirmed the above results that can be found in Tables 3 and 4.

Table 3. P-values of the t-tests for the comparison of average node centrality values of mutated nodes in the Omicron₁ variant only.

EC—Eigenvector Centrality, BC -Betweenness Centrality, KC—Katz Centrality, DC—Degree Centrality, CC—Closeness Centrality.

Variants couples	p-value EC	p-value BC	p-value KC	p-value DC	p-value CC
Omicron₁ vs Wild Type	0.655	0.353	1.148e-10	0.181	0.091
Omicron₁ vs Epsilon	0.627	0.127	8.110e-04	0.820	0.648
Omicron₁ vs Zeta	0.655	0.353	1.148e-10	0.181	0.091
Omicron₁ vs Beta	0.1	0.00000077	1.442–03	0.001	0.022
Omicron₁ vs Alpha	0.551	0.157	1.705e-01	0.410	0.763
Omicron₁ vs Delta	0.22	0.00002	2.619e-02	0.004	0.139
Omicron₁ vs Kappa	0.818	0.004	1.417–01	0.380	0.934
Omicron₁ vs Gamma	0.674	0.011	1.053e-04	0.888	0.923
Omicron₁ vs Iota₁	0.671	0.314	5.270–11	0.242	0.099
Omicron₁ vs Iota₂	0.695	0.310	6.346–03	0.663	0.825
Omicron₁ vs Eta	0.862	0.166	3.758e-01	0.880	0.870
Omicron₁ v credo che il s Ihu	0.008	0.105	8.761e-10	0.014	0.088
Omicron₁ vs Omicron₅	0.397	0.226	6.490e-04	0.027	0.690

Open in a new tab

Table 4. P-Values of the t-test for the comparison of average node centrality values of all nodes in the Omicron₁ variant.

p-values obtained after correction for multiple tests and with values less than 0.05 have been considered significant. EC—Eigenvector Centrality, BC—Betweenness Centrality, KC—Katz Centrality, DC—Degree Centrality, CC—Closeness Centrality.

Variants couples	p-value EC	p-value BC	p-value KC	p-value DC	p-value CC
Omicron₁ vs Wild Type	0.725	3.293e-31	1.340e-119	1.991e-18	0.00002
Omicron₁ vs Epsilon	0.925	1.097e-02	2.378e-45	1.323e-03	0.0597
Omicron₁ vs Zeta	0.725	3.293e-31	1.340e-119	1.991e-18	0.00002
Omicron₁ vs Beta	0.192	4.850e-13	1.120e-23	2.037e-07	0.067
Omicron₁ vs Alpha	0.591	1.865e-04	1.328e-01	1.334e-11	0.555
Omicron₁ vs Delta	0.027	4.552e-12	5.673e-23	3.599e-08	0.050
Omicron₁ vs Kappa	0.609	1.067e-05	7.881e-19	1.423e-07	0.266
Omicron₁ vs Gamma	0.916	7.393e-01	3.139e-57	2.852e-04	0.773
Omicron₁ vs Iota₁	0.725	3.293e-31	1.340e-119	1.991e-18	0.00002
Omicron₁ vs Iota₂	0.930	7.954e-07	2.450e-23	6.653e-04	0.405
Omicron₁ vs Eta	0.591	1.865e-04	1.328e-01	1.334e-11	0.555
Omicron₁ vs Ihu	0.643	2.318e-04	2.132e-34	4.292e-04	0.519
Omicron₁ vs Omicron₅	0.527	3.886e-05	3.349e-20	1.644e-05	0.946

Open in a new tab

Fig 2 reports an example of how centrality values are calculated. Fig 3 shows: (a) the comparison of average eigenvector centrality values of the whole S protein, (b) the eigenvector centrality for all the variants, and (c) eigenvector centrality values of the nodes of the RBD domain of all the selected variants. Eigenvector centrality values have been calculated to show changes in structure. We further compared the centrality values of the mutation sites to highlight possible differences on the same variant sites. Our results find no evidence of a homogeneous pattern of change with respect to site: e.g., considering the mutation sites shared by variants, centrality values increase and decrease according to the time of their appearance. Radar plots showing this behaviour are reported in Supplementary materials (see Supporting information Section).

Similarly to [24], the obtained eigenvector centrality evidence protein instability. In particular, the mutations of Omicron₁ and Delta variants cause Spike protein instability, according to [14]. Is also woth nothing that for the Wild Type virus, amino acids of the RBD domain present the highest eigenvector centrality values. The Omicron₁ variant has 15 mutations in the RBD domain, 14 of which can be related to protein instability, whereas the N501Y mutation site is the only Omicron₁ mutation in the RBD that increases protein stability. In the Omicron₁ PCN, the decreased protein stability caused by mutations in RBDs leads to a decrease in the RBD eigenvector centrality values as reported in Fig 4.

Fig 4 — Eigenvector centrality values are represented by a color-based scale from blue (lower values) to red (higher values). The decrease of eigenvector centrality of RBD domain of the Omicron₁ variants is shown by the presence of a larger number of blue colored nodes.

Changes in centrality measures have been compared by using a t-test. A p-value less than 0,05 represent a significant change in average centrality values. Table 3 reports a comparison between Omicron₁ and all the other structures considering only mutated sites. Table 4 reports the same comparison for all the residues and Table 5 reports confidence intervals of average centrality measure values.

Table 5. Confidence intervals (CI) of average centrality values (from left to right: Eigenvector, Betweennes, Katz, Degree, and Closeness).

Variant	CI EC	CI BC	CI KC	CI DC	CI CC
Epsilon	[0.00065, 0.00384]	[-0.00025, 0.001]	[0.00669, 0.02141]	[0.00146, 0.00311]	[0.05569, 0.08355]
Zeta	[-0.00023, 0.0041]	[-6e-05, 0.00045]	[0.00272, 0.01846]	[0.00093, 0.00294]	[0.05656, 0.09342]
Beta	[0.00144, 0.00372]	[-0.00054, 0.0086]	[0.00855, 0.01676]	[0.00232, 0.00307]	[0.06988, 0.08025]
Alpha	[-3e-05, 0.00855]	[-0.0001, 0.00722]	[0.00663, 0.01195]	[0.002, 0.00244]	[0.0715, 0.08142]
Delta	[0.00044, 0.0052]	[-0.00041, 0.00745]	[0.00808, 0.01489]	[0.00244, 0.00339]	[0.066, 0.07829]
Kappa	[0.00038, 0.00629]	[5e-05, 0.00607]	[0.00366, 0.01009]	[0.00141, 0.00232]	[0.0609, 0.07708]
Gamma	[0.00138, 0.005]	[-0.00088, 0.00858]	[0.01118, 0.01762]	[0.00177, 0.0023]	[0.0668, 0.07842]
Iota1	[0.00065, 0.00306]	[-2e-05, 0.00022]	[0.0084, 0.01967]	[0.00158, 0.00289]	[0.0597, 0.07749]
Iota2	[0.00054, 0.0035]	[-4e-05, 0.00032]	[0.00736, 0.01741]	[0.00168, 0.00286]	[0.0588, 0.07832]
Eta	[0.00094, 0.00338]	[-0.00134, 0.01647]	[0.00578, 0.01854]	[0.00188, 0.00246]	[0.06805, 0.07824]
Ihu	[0.00195, 0.00552]	[-3e-05, 0.00315]	[0.01035, 0.01484]	[0.00194, 0.00251]	[0.06905, 0.07511]
Omicron1	[0.00183, 0.00367]	[0.00025, 0.0008]	[0.00631, 0.00784]	[0.00185, 0.00219]	[0.06907, 0.07517]
Omicron5	[0.00152, 0.003]	[7e-05, 0.00046]	[0.00759, 0.01001]	[0.00184, 0.00219]	[0.06805, 0.07375]

Open in a new tab

By using the here proposed pipeline we found a significant change of Katz centrality measures between Omicron₁ variant and all the other Spike variants (also considering the Wild Type one). Significant changes in betweenness centrality and degree centrality values have also been reported, where the significance is measured by using t-test analyses. Complete comparisons, code and numerical results can be found in references reported in Supporting information Section.

As reported in Fig 1, after the centrality analysis we analysed the node communities in each PCN were analysed by using the Louvain algorithm, as reported in Fig 5. Figs 6 and 7 provide community detection results. The obtained communities are similar to the functional domain of the protein. In particular Fig 6 reports a comparison of communities of Wild Type, Delta, and Omicron₁. Fig 7 reports a comparison between communities in Delta and Omicron₁ variant. Table 6 summarises the extracted communities using the Louvain algorithm.

Fig 5 — Then, communities are identified and plotted by applying the Louvain community detection algorithm on the PCN. Finally, communities are mapped on the protein structure to relate communities and functional domains of the protein.

Fig 6 — Communities mapped directly on the protein structure of a) Wild Type, b) Delta, and c) Omicron₁ variant to visualize functional domains predicted by the Louvain algorithm.

Fig 7 — Visualization of the mutations and the communities on the protein structure. Mutated residues are displayed as red spheres. In Omicron₁, and Omicron₅, the mutations seem to fall inside certain communities with the same function in the Spike Protein. We found that Omicron₁ has more than ten mutations that fall inside the same community.

Table 6. Summary of communities and their mutations.

As some mutations do not belong to any community, we report them here as belonging to a virtual community referred to as (UnModelled).

Variant	No. of communities	Total No. of mutations	Mutations for each Community (community No: No of mutations)
Wild Type	22	0
Epsilon	22	4	UM: 1, Community 0: 1, Community 4: 1, Community 16: 1
Zeta	19	3	Community 9: 1, Community 8: 1, UM: 1
Beta	21	8	Community 4: 3, Community 15: 3, Community 11: 1, Community 7: 1
Alpha	22	7	Community 10: 1, Community 11: 1, Community 12: 2, Community 18: 2, Community 6: 1
Delta	22	6	Community 1: 3, Community 15: 2, Community 0: 1
Kappa	23	5	Community 1: 2, Community 15: 2, Community 5: 1
Gamma	19	8	Community 3: 2, Community 6: 2, Community 16: 1, Community 10: 2, Community 4: 1
Iota1	21	5	UM: 1, Community 2: 2, Community 15: 1, Community 9: 1
Iota2	21	5	UM: 1, Community 0: 2, Community 15: 1, Community 8: 1
Eta	24	6	Community 19: 2, Community 1: 1, Community 12: 2, Community 10: 1
Ihu	24	13	Community 15: 1, Community 1: 4, Community 7: 3, Community 12: 2, Community 0: 2, UM: 1
Omicron₁	20	30	Community 2: 4, Community 5: 15, Community 7: 1, Community 9: 4, Community 13: 4, Community 0: 2
Omicron₅	19	27	Community 2: 10, Community 7: 8, Community 17: 1, Community 3: 4, Community 13: 2, Community 0: 2

Open in a new tab

Protein structures were analyzed by means of TM-scores, structural distance, and PCN Contact similarity between Spike variants. Sequence distances between variants were used to construct a phylogenetic tree. Fig 8 reports the comparison of these measures. The figure shows that there is no direct correlation between sequence changes and structural modification. For instance, Iota1, and Zeta variants have dissimilar sequences but highly similar structures.

Fig 8 — Structural, Sequence and Network similarity results: a) Heatmap representing the structural similarity of variants computed by TM-scores; b) Heatmap representing the similarity between PCNs of variants; c) sequence similarity represented by a Phylogenetic tree. The figure evidences no direct correlation between sequence changes and structural modification. For instance, Iota1, and Zeta variants have dissimilar sequences but very similar structures.

For each variant, we computed the net charge of RBD and NTD domain. In Fig 9 we show the RBD and NTD net charges of all the selected variants. The values are reported in Table 7.

Table 7. Net charge of the Spike protein variants in RBD and NTD domains.

With e the elementary charge constant equals to 1.602 * 10⁻¹⁹C.

Variants	Domain	Folded	Unfolded
SPWT	NTD	2.21e	2.83e
SPWT	RBD	2.88e	2.92e
Epsilon	NTD	1.64e	1.91e
Epsilon	RBD	3.87e	3.92e
Beta	NTD	3.99e	5.37e
Beta	RBD	4.0e	4.11e
Alfa	NTD	1.62e	1.92e
Alfa	RBD	4.9e	4.91e
Delta	NTD	2.51e	3.37e
Delta	RBD	5.03e	5.14e
Kappa	NTD	5.01e	5.34e
Kappa	RBD	5.11e	5.15e
Gamma	NTD	2.34e	2.83e
Gamma	RBD	3.87e	3.91e
Iota	NTD	2.68e	2.92e
Iota	RBD	4.88e	4.91e
Eta	NTD	2.61e	2.92e
Eta	RBD	6.9e	6.91e
Ihu	NTD	2.69e	2.93e
Ihu	RBD	5.89e	5.92e
Omicron1	NTD	0.67e	0.92e
Omicron1	RBD	6.05e	6.15e
Omicron5	NTD	1.75e	1.83e
Omicron5	RBD	7.87e	7.91e

Open in a new tab

Discussion

Results in Tables 3 and 4 show that changes in centrality values could signify protein instability. Considering Omicron₁ and Delta variants in particular (see Fig 4) the potential instability is due to the following specific mutations. E.g., the Omicron₁ variant had multiple mutations in the RBD domain, most of which associated with protein instability. The N501Y mutation is the only one this increased stability. Consequently, the Omicron₁ PCN presents an eigenvector centrality value that decreases for the RBD domain.

The significance of changes in centrality measures was assessed by t-test analyses, comparing average values across nodes and nodes corresponding to mutations. Variations in centrality measures, including Katz centrality, between Omicron₁ and other variants (including the Wild Type) were found to be significant. Changes in betweenness centrality and degree centrality values were also reported. Community detection analysis using the Louvain algorithm identified communities within PCNs, which often correspond to functional domains of the Spike protein. Mutations occurring within the same community appeared to have similar effects on protein function. The resulting communities and their mapping onto protein structures are reported in Figs 5–7. Omicron₁ and Omicron₅ RBD mutations belong to the same community. This may imply a similar impact of the mutations on the Spike protein function in the two variants. Conversely, the Delta variant presents mutation in different communities, thus suggesting a remarkable difference between Omicron₁ and Delta. Protein structure analysis as performed by measuring TM-scores, structural distances, and PCN Contact similarity between Spike variants. Sequence distances were mapped into a phylogenetic tree. Fig 8 illustrates the comparison of these measures. In particular parts a, b and c of the figure represent, respectively: (a) the heatmap of the structural distance; (b) the heatmap for PCNs similarity and (c) phylogenetic tree representing distances among sequences. As reported in Fig 8, there is no correlation between sequence and structure similarity [28].

The acid dissociation constant (pKa) used to predict the net charge showed that protein Spike acquires a positive charge with time as shown in Fig 9.

Valuable insights into the centrality and community structure of Spike protein variants were gained, suggesting potential protein instability, identifying mutation clusters, and assessing structural characteristics. Results contribute to understanding the behaviours and potential implications of different SARS-CoV-2 variants. Moreover, they have several important biological implications. Topological analysis of protein contact networks can provide insight into the stability and function of proteins. The results regarding changes in centrality measures indicate protein instability, with the Omicron₁ variant showing decreased RBD eigenvector centrality due to mutations associated with protein instability. This information may be further analysed to shed light on the effects of viral mutations on transmissibility and changes in disease severity. This could be considered the first step towards the prediction of novel mutations as well as support for providing therapies and vaccines that target specific regions of the virus. Moreover, the community detection analysis revealed that mutations falling in the same community often have similar effects on protein structure and hence function. This may help to predict the effects of mutations on the virus’s behaviour.

We have shown that many changes occurred after the beginning of the vaccination campaign. These changes include modifications of protein structure as evidenced through TM-scores, structural distance, and PCN contact similarity. This may help to explain how virus mutations affect viral transmission, replication, or immune evasion. This information can be used to guide the design of drugs and vaccines that target specific regions of the virus and can help to predict the spread and evolution of new variants. We explore patterns of changes in a temporal dimension and compare the cumulative distribution of vaccination with the characteristics of the variant. Although we cannot infer any causality regarding vaccination driving the evolution, we should note that the presence of vaccinations in a timeline is located in the middle of the first variants of SARS-CoV-2 and Omicron. Considering also the clinical characteristics of Omicron in terms of vaccine escape and neutralization of immune response, we can assume that the effect of all Omicron changes may be related to the structural changes also revealed by the above-reported measures.

Finally, our results show that Omicron variants clearly differ from the others. All the considered parameters confirm that there is a remarkable change in centrality values. In particular, the difference in terms of network invariants among Alpha, Beta, Delta, and Omicron has been studied also in [28]. The presented results here can be considered as a further extension of our previous in two ways. First, we confirm that the difference is not limited to network invariants, but also other protein structural measures, i.e. polarity and TM-Score confirm that Omicron is radically different from the others. We also note an increase of net charge of RBD domain over time [32] which may explain the increased transmissibility through a better binding affinity with the ACE2 receptor.

Finally, as we also show in Fig 10, Omicron₁ variants appeared after the start of the vaccination campaign. Even though this does not imply any causal relation [46, 47], this relation may require additional studies to shed light on the relation between the vaccination campaign and the evolution of the SARS-CoV-2 virus.

Study limitations

While the study provides valuable insights into the centrality and community structure of Spike protein variants, there are some limitations to consider:

Data Availability: The study is based on the availability of protein structure encoded into PDB files, which might have limitations in terms of availability, completeness, and accuracy.
Simplified Representation: The analysis considers only protein structures and network structures while other factors such as protein dynamics, ligand interactions, and post-translational modifications might provide more insights.
Lack of Experimental Validation: As the findings are based on in-silico, experimental validation is needed to verify structural changes.
Limited Sample Size: As the analysis focuses on a specific set of Spike protein variants, the inclusion of a larger set of variants could corroborate the results.

Further research addressing these limitations could contribute to a fuller understanding of the implications of Spike protein variants on protein structure and function.

Conclusion

In this work we presented a pipeline for analysing mutations of the SARS-CoV-2 genome, focusing on the impact of the mutations on Spike protein. From a timeline perspective, we observed that the Omicron variant presents significant changes with respect to the previous ones and Omicron appeared in parallel with the worldwide vaccination campaign. Omicron’s structure presents many changes in the RBD domain and many mutations fall within the same structural region. A final remark, therefore we would argue that further studies should focus on the possible relationship between the vaccination campaign and the of Omicron appearance.

Supporting information

S1 Fig. Clustermap of communities of Spike protein.

Clustermap plot for a) Wild Type, b) Delta, and c) Omicron₁ variant. In a clustermap plot, communities are mapped in a matrix and visualized by means of heatmap.

(TIF)

Click here for additional data file.^{(173.3KB, tif)}

S1 File

(DOCX)

Click here for additional data file.^{(12.2KB, docx)}

S1 Graphical abstract

(PNG)

Click here for additional data file.^{(74.4KB, png)}

Data Availability

The repository \url{https://github.com/UgoLomoio/SARSCoV2_variants_PCN} contains data, code, and additional figures used in this work.

Funding Statement

The authors received no specific funding for this work.

References

1. Pál C, Papp B, Lercher MJ. An integrated view of protein evolution. Nature reviews genetics. 2006;7(5):337–348. doi: 10.1038/nrg1838 [DOI] [PubMed] [Google Scholar]
2. Gu S, Jiang M, Guzzi PH, Milenković T. Modeling multi-scale data via a network of networks. Bioinformatics. 2022;38(9):2544–2553. doi: 10.1093/bioinformatics/btac133 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nature reviews molecular cell biology. 2007;8(12):995–1005. doi: 10.1038/nrm2281 [DOI] [PubMed] [Google Scholar]
4. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nature Reviews Microbiology. 2021;19(7):409–424. doi: 10.1038/s41579-021-00573-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Kumar Das J, Tradigo G, Veltri P, Guzzi PH, Roy S. Data science in unveiling COVID-19 pathogenesis and diagnosis: evolutionary origin to drug repurposing. Briefings in Bioinformatics. 2021;22(2):855–872. doi: 10.1093/bib/bbaa420 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Guzzi PH, Mercatelli D, Ceraolo C, Giorgi FM. Master regulator analysis of the SARS-CoV-2/human interactome. Journal of clinical medicine. 2020;9(4):982. doi: 10.3390/jcm9040982 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H. The architecture of SARS-CoV-2 transcriptome. Cell. 2020;181(4):914–921. doi: 10.1016/j.cell.2020.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Milano M, Guzzi PH, Tymofieva O, Xu D, Hess C, Veltri P, et al. An extensive assessment of network alignment algorithms for comparison of brain connectomes. BMC bioinformatics. 2017;18(6):31–45. doi: 10.1186/s12859-017-1635-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Domingo E, Holland J. RNA virus mutations and fitness for survival. Annual review of microbiology. 1997;51:151. doi: 10.1146/annurev.micro.51.1.151 [DOI] [PubMed] [Google Scholar]
10. Lauring AS, Frydman J, Andino R. The role of mutational robustness in RNA virus evolution. Nature Reviews Microbiology. 2013;11(5):327–336. doi: 10.1038/nrmicro3003 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Mercatelli D, Pedace E, Veltri P, Giorgi FM, Guzzi PH. Exploiting the molecular basis of age and gender differences in outcomes of SARS-CoV-2 infections. Computational and Structural Biotechnology Journal. 2021;19:4092–4100. doi: 10.1016/j.csbj.2021.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Wu S, Tian C, Liu P, Guo D, Zheng W, Huang X, et al. Effects of SARS-CoV-2 mutations on protein structures and intraviral protein–protein interactions. Journal of medical virology. 2021;93(4):2132–2140. doi: 10.1002/jmv.26597 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Ahmad W, Ahmad S, Basha R. Analysis of the mutation dynamics of SARS-CoV-2 genome in the samples from Georgia State of the United States. Gene. 2022;841:146774. doi: 10.1016/j.gene.2022.146774 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Kumar S, Thambiraja TS, Karuppanan K, Subramaniam G. Omicron and Delta variant of SARS-CoV-2: a comparative computational study of spike protein. Journal of medical virology. 2022;94(4):1641–1649. doi: 10.1002/jmv.27526 [DOI] [PubMed] [Google Scholar]
15. Gallo Cantafio ME, Grillone K, Caracciolo D, Scionti F, Arbitrio M, Barbieri V, et al. From single level analysis to multi-omics integrative approaches: a powerful strategy towards the precision oncology. High-throughput. 2018;7(4):33. doi: 10.3390/ht7040033 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Boni MF, Lemey P, Jiang X, Lam TTY, Perry BW, Castoe TA, et al. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nature microbiology. 2020;5(11):1408–1417. doi: 10.1038/s41564-020-0771-4 [DOI] [PubMed] [Google Scholar]
17. Barh D, Tiwari S, Rodrigues Gomes LG, Ramalho Pinto CH, Andrade BS, Ahmad S, et al. SARS-CoV-2 Variants Show a Gradual Declining Pathogenicity and Pro-Inflammatory Cytokine Stimulation, an Increasing Antigenic and Anti-Inflammatory Cytokine Induction, and Rising Structural Protein Instability: A Minimal Number Genome-Based Approach. Inflammation. 2023;46(1):297–312. doi: 10.1007/s10753-022-01734-w [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22(13):30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Klimczak LJ, Randall TA, Saini N, Li JL, Gordenin DA. Similarity between mutation spectra in hypermutated genomes of rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic. PLoS One. 2020;15(10):e0237689. doi: 10.1371/journal.pone.0237689 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Abenavoli L, Cinaglia P, Lombardo G, Boffoli E, Scida M, Procopio AC, et al. Anxiety and gastrointestinal symptoms related to COVID-19 during Italian lockdown. Journal of Clinical Medicine. 2021;10(6):1221. doi: 10.3390/jcm10061221 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Oude Munnink BB, Worp N, Nieuwenhuijse DF, Sikkema RS, Haagmans B, Fouchier RA, et al. The next phase of SARS-CoV-2 surveillance: real-time molecular epidemiology. Nature medicine. 2021;27(9):1518–1524. doi: 10.1038/s41591-021-01472-w [DOI] [PubMed] [Google Scholar]
22. Hu B, Guo H, Zhou P, Shi ZL. Characteristics of SARS-CoV-2 and COVID-19. Nature Reviews Microbiology. 2021;19(3):141–154. doi: 10.1038/s41579-020-00459-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Romeo I, Prandi IG, Giombini E, Gruber CEM, Pietrucci D, Borocci S, et al. The Spike Mutants Website: A Worldwide Used Resource against SARS-CoV-2. International Journal of Molecular Sciences. 2022;23(21). doi: 10.3390/ijms232113082 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Di Paola L, Mei G, Di Venere A, Giuliani A. Disclosing allostery through protein contact networks. In: Allostery. Springer; 2021. p. 7–20. [DOI] [PubMed] [Google Scholar]
25. Di Paola L, Hadi-Alijanvand H, Song X, Hu G, Giuliani A. The discovery of a putative allosteric site in the SARS-CoV-2 spike protein using an integrated structural/dynamic approach. Journal of proteome research. 2020;19(11):4576–4586. doi: 10.1021/acs.jproteome.0c00273 [DOI] [PubMed] [Google Scholar]
26. Di Paola L, De Ruvo M, Paci P, Santoni D, Giuliani A. Protein contact networks: an emerging paradigm in chemistry. Chemical reviews. 2013;113(3):1598–1613. doi: 10.1021/cr3002356 [DOI] [PubMed] [Google Scholar]
27. Guzzi PH, Tradigo G, Veltri P. A Novel Algorithm for Local Network Alignment Based on Network Embedding. Applied Sciences. 2022;12(11):5403. doi: 10.3390/app12115403 [DOI] [Google Scholar]
28. Guzzi PH, di Paola L, Puccio B, Lomoio U, Giuliani A, Veltri P. Computational analysis of the sequence-structure relation in SARS-CoV-2 spike protein using protein contact networks. Scientific Reports. 2023;13(1):2837. doi: 10.1038/s41598-023-30052-w [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research. 2005;33(7):2302–2309. doi: 10.1093/nar/gki524 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008. doi: 10.1088/1742-5468/2008/10/P10008 [DOI] [Google Scholar]
31. Guzzi PH, Di Paola L, Giuliani A, Veltri P. PCN-Miner: An open-source extensible tool for the Analysis of Protein Contact Networks. Bioinformatics. 2022;38(17):4235–4237. doi: 10.1093/bioinformatics/btac450 [DOI] [PubMed] [Google Scholar]
32. Pascarella S, Ciccozzi M, Bianchi M, Benvenuto D, Cauda R, Cassone A. The value of electrostatic potentials of the spike receptor binding and N-terminal domains in addressing transmissibility and infectivity of SARS-CoV-2 variants of concern. Journal of Infection. 2022;84(5):e62–e63. doi: 10.1016/j.jinf.2022.02.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Ortuso F, Mercatelli D, Guzzi PH, Giorgi FM. Structural genetics of circulating variants affecting the SARS-CoV-2 spike/human ACE2 complex. Journal of Biomolecular Structure and Dynamics. 2021; p. 1–11. doi: 10.1080/07391102.2021.1886175 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Limentani GB, Ringo MC, Ye F, Bergquist ML, McSorley EO. Beyond the t-test: statistical equivalence testing; 2005. doi: 10.1021/ac053390m [DOI] [PubMed] [Google Scholar]
35. Fortunato S. Community detection in graphs. Phys Rep-Rev Sec Phys Lett. 2010;486:75–174. [Google Scholar]
36. Harenberg S, Bello G, Gjeltema L, Ranshous S, Harlalka J, Seay R, et al. Community detection in large-scale networks: a survey and empirical evaluation. Wiley Interdisciplinary Reviews: Computational Statistics. 2014;6(6):426–439. doi: 10.1002/wics.1319 [DOI] [Google Scholar]
37. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Structure. 2004;57. doi: 10.1002/prot.20264 [DOI] [PubMed] [Google Scholar]
38. Zhang C, Shine M, Pyle AM, Zhang Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nature methods. 2022;19(9):1109–1115. doi: 10.1038/s41592-022-01585-1 [DOI] [PubMed] [Google Scholar]
39. Olsson MHM, Søndergaard CR, Rostkowski M, Jensen JH. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. Journal of Chemical Theory and Computation. 2011;7(2):525–537. doi: 10.1021/ct100578z [DOI] [PubMed] [Google Scholar]
40. Laureanti J, Brandi J, Offor E, Engel D, Rallo R, Ginovska B, et al. Visualizing biomolecular electrostatics in virtual reality with UnityMol-APBS. Protein Science. 2020;29(1):237–246. doi: 10.1002/pro.3773 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Current Protocols in Bioinformatics. 2003;00(1):2.3.1–2.3.22. doi: 10.1002/0471250953.bi0203s00 [DOI] [PubMed] [Google Scholar]
42.Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. In: International Conference on Electronic Publishing. vol. 2016; 2016. p. 87—90.
43. Bisong E, Bisong E. Matplotlib and seaborn. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners. 2019; p. 151–165. doi: 10.1007/978-1-4842-4470-8_12 [DOI] [Google Scholar]
44. Yuan S, Chan HS, Hu Z. Using PyMOL as a platform for computational drug design. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2017;7(2):e1298. [Google Scholar]
45. Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Research. 2006;34(suppl 2):W239–W242. doi: 10.1093/nar/gkl190 [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Wang R, Chen J, Hozumi Y, Yin C, Wei GW. Emerging vaccine-breakthrough SARS-CoV-2 variants. ACS infectious diseases. 2022;8(3):546–556. doi: 10.1021/acsinfecdis.1c00557 [DOI] [PMC free article] [PubMed] [Google Scholar]
47. McLean G, Kamil J, Lee B, Moore P, Schulz TF, Muik A, et al. The impact of evolving SARS-CoV-2 mutations and variants on COVID-19 vaccines. Mbio. 2022;13(2):e02979–21. doi: 10.1128/mbio.02979-21 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0283400.r001

Decision Letter 0

Nagarajan Raju

4 May 2023

PONE-D-23-06693SARS-CoV-2 protein structure and sequence mutations: evolutionary analysis and effects on virus variantsSARS-CoV-2 protein structure and sequence mutations:PLOS ONE

Dear Dr. Guzzi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 18 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Nagarajan Raju

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure:

"The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

4. Please upload a new copy of Figure 11b as the detail is not clear. Please follow the link for more information: " ext-link-type="uri" xlink:type="simple">https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/" " ext-link-type="uri" xlink:type="simple">https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/"

Additional Editor Comments:

I suggest authors to go through comments from all the reviewers and include the responses in the revised version

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

Reviewer #4: No

Reviewer #5: Yes

Reviewer #6: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

Reviewer #3: I Don't Know

Reviewer #4: No

Reviewer #5: Yes

Reviewer #6: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

Reviewer #5: Yes

Reviewer #6: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

Reviewer #5: Yes

Reviewer #6: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript is easy to understand and the technical implementation is sound. The manuscript seems to present a pipeline. The SARS-CoV-2-2 protein appears to be an example to present the possibilities of the pipeline and the number of 15 plots and four tables. I miss a biological question or motivation. The claims in the introduction are vague. Network properties such as the Betweenness centrality are textbook knowledge that may not need an introduction. The manuscript should, however, introduce all quantities used in the explanation, e.g., what is the matrix A in the eigenvector centrality? The resolution of some figures, e.g., figure 2, is too low to see anything. The message of the majority of the figures remains unclear. There are some minor typos in the manuscript.

Reviewer #2: Dear authors,

Your manuscript your manuscript entitled "SARS-CoV-2 protein structure and sequence mutations: evolutionary analysis and effects on virus variants" is interesting. However, I have a few suggestions that needed to be addressed-

1. The abstract needs to be re-written as it feels like a undergraduate student have written it.

2. Read and cite the following paper that has addressed the same issue. https://doi.org/10.1007/s10753-022-01734-w

3. Make a better graphical abstract that can decipher the work better for the complete study and connect each formula in theoretical way to make it more correlative.

4. Make a separate discussion heading and discuss your work against state of the art work and above provided paper.

I wish to see the above changes to re-review it critically.

Reviewer #3: The work is on the effect of mutations of the structure of Spike protein in SARS-CoV-2. The work described here is heavily dependent upon protein contact networks (PCN). The works seems okay overall though. Therefore, I am recommending it for publication.

Reviewer #4: Comments to the author

1. The abstract does not provide enough context about the research, such as the specific research question, hypothesis and findings. It only provides general information about the relationship between protein sequence, structure, and function, and the use of Protein Contact Networks (PCNs) to investigate protein structures. Authors are suggested to provide a clear and concise summary of the main findings and implications of the analysis.

2. The explanation of the Louvain community detection analysis could be more detailed to provide a better understanding of the approach.

3. While the PKa values are reported for three variants, it is unclear why only these variants were selected and how these values were calculated.

4. Authors should mention all the software and tools those are used to perform the analysis.

5. The study results are presented without proper context or background information, making it challenging for the reader to understand the subject matter and the findings. The significance of the results is not adequately discussed, and the authors do not provide any recommendations or conclusions based on their findings.

6. Although the authors mention that they performed t-test analyses to determine the significance of their results, they do not provide any details about the statistical tests performed, such as the p-values or confidence intervals.

7. It is unclear how the results of the acid dissociation constant (PKa) analysis relate to the overall findings of the study. A more detailed explanation of the implications of this analysis would be helpful.

8. While the authors identify significant changes in centrality measures between Omicron1 and other Spike variants, they do not explain the biological significance or potential implications of these changes.

9. Authors unnecessarily make many figures. Several figures could be merged into one. Also, the text in the figures should be clear.

10. There is no discussion of the limitations of the study or potential sources of bias.

Reviewer #5: In this paper, the authors are proposed “SARS-CoV-2 protein structure and sequence mutations: evolutionary analysis and effects on virus variantsSARS-CoV-2 protein structure and sequence mutations”

The strengths of the paper are that it is well structured, the description of the related work is well done and that results are extensively compared to results of the similar research.

Minor revisions:

1. Authors should draw a graphical abstract of the proposed approach

2. Authors should justify the proposed approach and compare your approach with existing algorithms.

3. Proofread the entire manuscript.

Reviewer #6: The authors shared a rather good paper. The analysis outcomes presented are in alignment with what is already known from the biology of the virus. However, some extra work is needed to improve the English language of the paper. I see some verbs explaining the methods are written in the present tense.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jörg Ackermann

Reviewer #2: Yes: Shaban Ahmad

Reviewer #3: Yes: Ishtiaque Ahammad

Reviewer #4: No

Reviewer #5: No

Reviewer #6: Yes: Rehab Ahmed

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jul 20;18(7):e0283400. doi: 10.1371/journal.pone.0283400.r002

Author response to Decision Letter 0

10 Jun 2023

Answer: We focus on the correlation between sequence and structure evolution and figure out how sequence updates can be related to phenotypic variations in structures. Moreover, the study regards Spike proteins and Omega variation for Sars-Covid-19. We added such a reference in the introduction and updated claims adding references to figures and results and removed some, for greater concision.

Network properties such as the Betweenness centrality are textbook knowledge that may not need an introduction.

Answer: We thank the reviewer and agree that the definition of measures can be found in textbooks. However, some reviewers from other journals asked for formal introduction. We removed them from the text and included them in a table.

The manuscript should, however, introduce all quantities used in the explanation, e.g., what is the matrix A in the eigenvector centrality?

Answer: We added missing definitions (e.g., adjacency matrix A, now included in the Table for description).

The resolution of some figures, e.g., figure 2, is too low to see anything.

Answer: We produced novel figures improving quality and readability. We have revised all figures, removing those that, in line with the reviewer’s observations, lack clarity.

The message of the majority of the figures remains unclear.

There are some minor typos in the manuscript.

Answer: We thank the reviewer. We corrected the manuscript and typos.

Reviewer #2: Dear authors,

Your manuscript entitled "SARS-CoV-2 protein structure and sequence mutations: evolutionary analysis and effects on virus variants" is interesting. However, I have a few suggestions that needed to be addressed-

1. The abstract needs to be re-written as it feels like a undergraduate student have written it.

Answer: we thank the reviewer for the observation. We have rewritten the abstract in order to link it more closely with background and motivations (first paragraph) and model proposal (contributions) and results (applications) as described in the second paragraph.

The abstract has been thus rewritten, hopefully more fluently.

2. Read and cite the following paper that has addressed the same issue. https://doi.org/10.1007/s10753-022-01734-w

Answer: We thank the reviewer, the paper has been cited, even if we believe that our model goes somewhat further than the cited proposal.

3. Make a better graphical abstract that can decipher the work better for the complete study and connect each formula in a theoretical way to make it more correlative.

Answer: We provided a graphical abstract to improve readability, and related formulas throughout the text.

The graphical abstract now has the following caption: “Graphical abstract reporting the contribution: Sequence and structure of variants are used as input, PCNs and pKa are evaluated Louvain and centrality values are evaluated on PCNs. Sets of communities of PCNs, centrality values and pka values for mutations are used to characterise how protein (sequence and structure) mutations are related in time with vaccinations and new variants.”

4. Make a separate discussion heading and discuss your work against state of the art work and above provided paper.

Answer: thank you for the suggestion. In line with the submission format we have added a Discussion Section.

I wish to see the above changes to re-review it critically.

We thank the reviewer for the helpful comments.

Reviewer #4: Comments to the author

Answer: We have restructured the abstract in line with the above suggestion. In particular, we added the specific research questions and findings. We also highlighted research questions about the Investigation of connections and relationships between sequence modifications and structural changes in Spike Protein. We focus on the relations between sequence evolution in S protein sequence, structure and thus functions with virus transmissibility and vaccination effects. Studies of S protein generated large data sets involving sequences and structures.

We also enriched the discussion by focusing on how (i) node centrality and (ii) community extraction analysis can be considered to relate protein stability and functionality with sequence mutations. Starting from such metrics, we compare structural evolution to sequence changes, and study mutations from a temporal perspective focusing on virus variants. We apply our model to the Omicron variant highlighting a timeline correlation between the Omicron variant and the vaccination campaign.

Authors are suggested to provide a clear and concise summary of the main findings and implications of the analysis.

Answer: We added in the introduction a list of the main findings and results.

2. The explanation of the Louvain community detection analysis could be more detailed to provide a better understanding of the approach.

Answer: we added a more detailed explanation of the Louvain community detection analysis in the Methods section.

3. While the PKa values are reported for three variants, it is unclear why only these variants were selected and how these values were calculated.

Answer: We thank the reviewer for this observation. In the submitted version we focused only on Variants of Concern. We run the experiments considering results for all the known variants (i.e. not only VOC). We used the PROPKA3 web server (https://www.ddl.unimi.it/vegaol/propka_about.htm), to predict pKa values for each amino acid starting from the PDB structure. A similar approach is used in Pascarella et al. J Infect. 2022 May; 84(5): e62–e63. We have also added a detailed discussion of our results.

4. Authors should mention all the software and tools those are used to perform the analysis.

Answer: We checked the paper carefully to ensure that all the software used are mentioned.

5. The study results are presented without proper context or background information, making it challenging for the reader to understand the subject matter and the findings.

Answer: we provided more information about the context in both the abstract and introduction.

The significance of the results is not adequately discussed, and the authors do not provide any recommendations or conclusions based on their findings.

Answer: we have taken this observation- also made by other reviewers- on board and added a Discussion Section, providing additional information and considerations on the obtained results.

Answer: We added the requested details and updated Tables for confidence intervals accordingly, while pvalues are reported in the Results Section.

Answer: We apologise for the lack of clarity. First, we would like to point out that the analysis of the overall charge of the Spike Protein impacts the capability of binding ACE2 and, hence transmissibility. As also shown in Pascarella et al (J Infect. 2022 May; 84(5): e62–e63.) the Pka can be used to predict the overall domain charge, so changes in PkA impact the domain charge and the virus transmissibility. We calculate the changes in the overall charge for the VOC showing that all the variants present a positive change in the net variants. As previously noted in Math et al https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9158474/., similar analysis showed that mutation of the Delta variant made the variants more alkaline, improving their structural robustness. Here we show that Omicron presents a higher increase in net charge. Results confirm that there is a common change in all the variants that give the virus greater transmissibility since Spike has become more positive, while the ACE2 receptor has a negative charge.

Answer: We added a Discussion Section highlighting the biological significance and implications also in terms of applications.

9. Authors unnecessarily make many figures. Several figures could be merged into one. Also, the text in the figures should be clear.

Answer: We apologise for any lack of clarity. To address this weakness, we have restructured all figures, merging some and removing others, as suggested. We also moved figures that were hard to read in paper format into supplementary materials, in the github repository with code available.

10. There is no discussion of the limitations of the study or potential sources of bias.

Answer: thank you for the observation. To improve the paper we have added a Section that follows up the discussion and reports study limitations.

The strengths of the paper are that it is well structured, the description of the related work is well done and that results are extensively compared to results of the similar research.

Minor revisions:

1. Authors should draw a graphical abstract of the proposed approach

Answer: We provided a graphical abstract.

2. Authors should justify the proposed approach and compare your approach with existing algorithms.

Answer: We have now presented such a comparison in the introduction.

3. Proofread the entire manuscript

Answer: We have proofread the manuscript with the help of a native English speaker.

Answer: We thank the reviewer. We carefully checked the paper with the help of a native English speaker.

Attachment

Submitted filename: response.docx

Click here for additional data file.^{(12.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0283400.r003

Decision Letter 1

Nagarajan Raju

5 Jul 2023

SARS-CoV-2 protein structure and sequence mutations: evolutionary analysis and effects on virus variantsSARS-CoV-2 protein structure and sequence mutations:

PONE-D-23-06693R1

Dear Dr. Guzzi,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Nagarajan Raju

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: N/A

Reviewer #3: N/A

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #5: Yes

**********

6. Review Comments to the Author

Reviewer #2: The authors have incorporated the suggestions and I am completely satisfied with the answers. This manuscript is now acceptable.

Reviewer #3: The revised manuscript contains the necessary revisions. Therefore I am recommending it for publication.

The strengths of the paper are that it is well structured, the description of the related work is well done and that results are extensively compared to results of the similar research.

The all the reviewer comments has been addressed

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Shaban Ahmad

Reviewer #3: Yes: Ishtiaque Ahammad

Reviewer #5: No

**********

PLoS One. doi: 10.1371/journal.pone.0283400.r004

Acceptance letter

Nagarajan Raju

11 Jul 2023

PONE-D-23-06693R1

SARS-CoV-2 protein structure and sequence mutations: evolutionary analysis and effects on virus variants.

Dear Dr. Guzzi:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Nagarajan Raju

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Clustermap of communities of Spike protein.

Clustermap plot for a) Wild Type, b) Delta, and c) Omicron₁ variant. In a clustermap plot, communities are mapped in a matrix and visualized by means of heatmap.

(TIF)

Click here for additional data file.^{(173.3KB, tif)}

S1 File

(DOCX)

Click here for additional data file.^{(12.2KB, docx)}

S1 Graphical abstract

(PNG)

Click here for additional data file.^{(74.4KB, png)}

Attachment

Submitted filename: response.docx

Click here for additional data file.^{(12.2KB, docx)}

Data Availability Statement

The repository \url{https://github.com/UgoLomoio/SARSCoV2_variants_PCN} contains data, code, and additional figures used in this work.

[pone.0283400.ref001] 1. Pál C, Papp B, Lercher MJ. An integrated view of protein evolution. Nature reviews genetics. 2006;7(5):337–348. doi: 10.1038/nrg1838 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref002] 2. Gu S, Jiang M, Guzzi PH, Milenković T. Modeling multi-scale data via a network of networks. Bioinformatics. 2022;38(9):2544–2553. doi: 10.1093/bioinformatics/btac133 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref003] 3. Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nature reviews molecular cell biology. 2007;8(12):995–1005. doi: 10.1038/nrm2281 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref004] 4. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nature Reviews Microbiology. 2021;19(7):409–424. doi: 10.1038/s41579-021-00573-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref005] 5. Kumar Das J, Tradigo G, Veltri P, Guzzi PH, Roy S. Data science in unveiling COVID-19 pathogenesis and diagnosis: evolutionary origin to drug repurposing. Briefings in Bioinformatics. 2021;22(2):855–872. doi: 10.1093/bib/bbaa420 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref006] 6. Guzzi PH, Mercatelli D, Ceraolo C, Giorgi FM. Master regulator analysis of the SARS-CoV-2/human interactome. Journal of clinical medicine. 2020;9(4):982. doi: 10.3390/jcm9040982 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref007] 7. Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H. The architecture of SARS-CoV-2 transcriptome. Cell. 2020;181(4):914–921. doi: 10.1016/j.cell.2020.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref008] 8. Milano M, Guzzi PH, Tymofieva O, Xu D, Hess C, Veltri P, et al. An extensive assessment of network alignment algorithms for comparison of brain connectomes. BMC bioinformatics. 2017;18(6):31–45. doi: 10.1186/s12859-017-1635-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref009] 9. Domingo E, Holland J. RNA virus mutations and fitness for survival. Annual review of microbiology. 1997;51:151. doi: 10.1146/annurev.micro.51.1.151 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref010] 10. Lauring AS, Frydman J, Andino R. The role of mutational robustness in RNA virus evolution. Nature Reviews Microbiology. 2013;11(5):327–336. doi: 10.1038/nrmicro3003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref011] 11. Mercatelli D, Pedace E, Veltri P, Giorgi FM, Guzzi PH. Exploiting the molecular basis of age and gender differences in outcomes of SARS-CoV-2 infections. Computational and Structural Biotechnology Journal. 2021;19:4092–4100. doi: 10.1016/j.csbj.2021.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref012] 12. Wu S, Tian C, Liu P, Guo D, Zheng W, Huang X, et al. Effects of SARS-CoV-2 mutations on protein structures and intraviral protein–protein interactions. Journal of medical virology. 2021;93(4):2132–2140. doi: 10.1002/jmv.26597 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref013] 13. Ahmad W, Ahmad S, Basha R. Analysis of the mutation dynamics of SARS-CoV-2 genome in the samples from Georgia State of the United States. Gene. 2022;841:146774. doi: 10.1016/j.gene.2022.146774 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref014] 14. Kumar S, Thambiraja TS, Karuppanan K, Subramaniam G. Omicron and Delta variant of SARS-CoV-2: a comparative computational study of spike protein. Journal of medical virology. 2022;94(4):1641–1649. doi: 10.1002/jmv.27526 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref015] 15. Gallo Cantafio ME, Grillone K, Caracciolo D, Scionti F, Arbitrio M, Barbieri V, et al. From single level analysis to multi-omics integrative approaches: a powerful strategy towards the precision oncology. High-throughput. 2018;7(4):33. doi: 10.3390/ht7040033 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref016] 16. Boni MF, Lemey P, Jiang X, Lam TTY, Perry BW, Castoe TA, et al. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nature microbiology. 2020;5(11):1408–1417. doi: 10.1038/s41564-020-0771-4 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref017] 17. Barh D, Tiwari S, Rodrigues Gomes LG, Ramalho Pinto CH, Andrade BS, Ahmad S, et al. SARS-CoV-2 Variants Show a Gradual Declining Pathogenicity and Pro-Inflammatory Cytokine Stimulation, an Increasing Antigenic and Anti-Inflammatory Cytokine Induction, and Rising Structural Protein Instability: A Minimal Number Genome-Based Approach. Inflammation. 2023;46(1):297–312. doi: 10.1007/s10753-022-01734-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref018] 18. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22(13):30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref019] 19. Klimczak LJ, Randall TA, Saini N, Li JL, Gordenin DA. Similarity between mutation spectra in hypermutated genomes of rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic. PLoS One. 2020;15(10):e0237689. doi: 10.1371/journal.pone.0237689 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref020] 20. Abenavoli L, Cinaglia P, Lombardo G, Boffoli E, Scida M, Procopio AC, et al. Anxiety and gastrointestinal symptoms related to COVID-19 during Italian lockdown. Journal of Clinical Medicine. 2021;10(6):1221. doi: 10.3390/jcm10061221 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref021] 21. Oude Munnink BB, Worp N, Nieuwenhuijse DF, Sikkema RS, Haagmans B, Fouchier RA, et al. The next phase of SARS-CoV-2 surveillance: real-time molecular epidemiology. Nature medicine. 2021;27(9):1518–1524. doi: 10.1038/s41591-021-01472-w [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref022] 22. Hu B, Guo H, Zhou P, Shi ZL. Characteristics of SARS-CoV-2 and COVID-19. Nature Reviews Microbiology. 2021;19(3):141–154. doi: 10.1038/s41579-020-00459-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref023] 23. Romeo I, Prandi IG, Giombini E, Gruber CEM, Pietrucci D, Borocci S, et al. The Spike Mutants Website: A Worldwide Used Resource against SARS-CoV-2. International Journal of Molecular Sciences. 2022;23(21). doi: 10.3390/ijms232113082 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref024] 24. Di Paola L, Mei G, Di Venere A, Giuliani A. Disclosing allostery through protein contact networks. In: Allostery. Springer; 2021. p. 7–20. [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref025] 25. Di Paola L, Hadi-Alijanvand H, Song X, Hu G, Giuliani A. The discovery of a putative allosteric site in the SARS-CoV-2 spike protein using an integrated structural/dynamic approach. Journal of proteome research. 2020;19(11):4576–4586. doi: 10.1021/acs.jproteome.0c00273 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref026] 26. Di Paola L, De Ruvo M, Paci P, Santoni D, Giuliani A. Protein contact networks: an emerging paradigm in chemistry. Chemical reviews. 2013;113(3):1598–1613. doi: 10.1021/cr3002356 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref027] 27. Guzzi PH, Tradigo G, Veltri P. A Novel Algorithm for Local Network Alignment Based on Network Embedding. Applied Sciences. 2022;12(11):5403. doi: 10.3390/app12115403 [DOI] [Google Scholar]

[pone.0283400.ref028] 28. Guzzi PH, di Paola L, Puccio B, Lomoio U, Giuliani A, Veltri P. Computational analysis of the sequence-structure relation in SARS-CoV-2 spike protein using protein contact networks. Scientific Reports. 2023;13(1):2837. doi: 10.1038/s41598-023-30052-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref029] 29. Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research. 2005;33(7):2302–2309. doi: 10.1093/nar/gki524 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref030] 30. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008. doi: 10.1088/1742-5468/2008/10/P10008 [DOI] [Google Scholar]

[pone.0283400.ref031] 31. Guzzi PH, Di Paola L, Giuliani A, Veltri P. PCN-Miner: An open-source extensible tool for the Analysis of Protein Contact Networks. Bioinformatics. 2022;38(17):4235–4237. doi: 10.1093/bioinformatics/btac450 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref032] 32. Pascarella S, Ciccozzi M, Bianchi M, Benvenuto D, Cauda R, Cassone A. The value of electrostatic potentials of the spike receptor binding and N-terminal domains in addressing transmissibility and infectivity of SARS-CoV-2 variants of concern. Journal of Infection. 2022;84(5):e62–e63. doi: 10.1016/j.jinf.2022.02.023 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref033] 33. Ortuso F, Mercatelli D, Guzzi PH, Giorgi FM. Structural genetics of circulating variants affecting the SARS-CoV-2 spike/human ACE2 complex. Journal of Biomolecular Structure and Dynamics. 2021; p. 1–11. doi: 10.1080/07391102.2021.1886175 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref034] 34. Limentani GB, Ringo MC, Ye F, Bergquist ML, McSorley EO. Beyond the t-test: statistical equivalence testing; 2005. doi: 10.1021/ac053390m [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref035] 35. Fortunato S. Community detection in graphs. Phys Rep-Rev Sec Phys Lett. 2010;486:75–174. [Google Scholar]

[pone.0283400.ref036] 36. Harenberg S, Bello G, Gjeltema L, Ranshous S, Harlalka J, Seay R, et al. Community detection in large-scale networks: a survey and empirical evaluation. Wiley Interdisciplinary Reviews: Computational Statistics. 2014;6(6):426–439. doi: 10.1002/wics.1319 [DOI] [Google Scholar]

[pone.0283400.ref037] 37. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Structure. 2004;57. doi: 10.1002/prot.20264 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref038] 38. Zhang C, Shine M, Pyle AM, Zhang Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nature methods. 2022;19(9):1109–1115. doi: 10.1038/s41592-022-01585-1 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref039] 39. Olsson MHM, Søndergaard CR, Rostkowski M, Jensen JH. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. Journal of Chemical Theory and Computation. 2011;7(2):525–537. doi: 10.1021/ct100578z [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref040] 40. Laureanti J, Brandi J, Offor E, Engel D, Rallo R, Ginovska B, et al. Visualizing biomolecular electrostatics in virtual reality with UnityMol-APBS. Protein Science. 2020;29(1):237–246. doi: 10.1002/pro.3773 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref041] 41. Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Current Protocols in Bioinformatics. 2003;00(1):2.3.1–2.3.22. doi: 10.1002/0471250953.bi0203s00 [DOI] [PubMed] [Google Scholar]

[pone.0283400.ref042] 42.Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. In: International Conference on Electronic Publishing. vol. 2016; 2016. p. 87—90.

[pone.0283400.ref043] 43. Bisong E, Bisong E. Matplotlib and seaborn. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners. 2019; p. 151–165. doi: 10.1007/978-1-4842-4470-8_12 [DOI] [Google Scholar]

[pone.0283400.ref044] 44. Yuan S, Chan HS, Hu Z. Using PyMOL as a platform for computational drug design. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2017;7(2):e1298. [Google Scholar]

[pone.0283400.ref045] 45. Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Research. 2006;34(suppl 2):W239–W242. doi: 10.1093/nar/gkl190 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref046] 46. Wang R, Chen J, Hozumi Y, Yin C, Wei GW. Emerging vaccine-breakthrough SARS-CoV-2 variants. ACS infectious diseases. 2022;8(3):546–556. doi: 10.1021/acsinfecdis.1c00557 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0283400.ref047] 47. McLean G, Kamil J, Lee B, Moore P, Schulz TF, Muik A, et al. The impact of evolving SARS-CoV-2 mutations and variants on COVID-19 vaccines. Mbio. 2022;13(2):e02979–21. doi: 10.1128/mbio.02979-21 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

SARS-CoV-2 protein structure and sequence mutations: Evolutionary analysis and effects on virus variants

Ugo Lomoio

Barbara Puccio

Giuseppe Tradigo

Pietro Hiram Guzzi

Pierangelo Veltri

Roles

Abstract

Introduction

Materials and methods

Fig 1. Analysis pipeline.

Table 1. SARS-CoV-2 variants, lineage classification and mutations on the Spike protein sequence.

Table 2. Centrality measures definition.

PCN analysis

Structural analysis

Sequence analysis

Implementation

Results

Table 3. P-values of the t-tests for the comparison of average node centrality values of mutated nodes in the Omicron1 variant only.

Table 4. P-Values of the t-test for the comparison of average node centrality values of all nodes in the Omicron1 variant.

Fig 2. Centrality analysis pipeline: Starting with a Spike protein variant we compute the PCN.

Fig 3. Node eigenvector centrality boxplots.

Fig 4. Amino acids eigenvector centrality values mapped on the protein structure of the following Spike variants: a) Omicron1; b) Wild Type; c) Delta.

Table 5. Confidence intervals (CI) of average centrality values (from left to right: Eigenvector, Betweennes, Katz, Degree, and Closeness).

Fig 5. Starting with a Spike protein variant, for example the Omicron1 variant, we use the corresponding PDB file to compute a PCN.

Fig 6. Community detection analysis comparing Spike of the Wild Type, Delta, and Omicron1.

Fig 7. Community detection analysis comparison between a) Delta and b) Omicron1.

Table 6. Summary of communities and their mutations.

Fig 8.

Fig 9. RBD and NTD domain net charges for all the selected variants.

Table 7. Net charge of the Spike protein variants in RBD and NTD domains.

Discussion

Fig 10. COVID-19 worldwide number of vaccinations and confirmed cases studied.

Study limitations

Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Nagarajan Raju

Roles

Author response to Decision Letter 0

Decision Letter 1

Nagarajan Raju

Roles

Acceptance letter

Nagarajan Raju

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 3. P-values of the t-tests for the comparison of average node centrality values of mutated nodes in the Omicron₁ variant only.

Table 4. P-Values of the t-test for the comparison of average node centrality values of all nodes in the Omicron₁ variant.

Fig 4. Amino acids eigenvector centrality values mapped on the protein structure of the following Spike variants: a) Omicron₁; b) Wild Type; c) Delta.

Fig 5. Starting with a Spike protein variant, for example the Omicron₁ variant, we use the corresponding PDB file to compute a PCN.

Fig 6. Community detection analysis comparing Spike of the Wild Type, Delta, and Omicron₁.

Fig 7. Community detection analysis comparison between a) Delta and b) Omicron₁.