Abstract
The spike (S) protein is a critical determinant of the infectivity and antigenicity of SARS-CoV-2. Several mutations in the S protein of SARS-CoV-2 have already been detected, and their effect in immune system evasion and enhanced transmission as a cause of increased morbidity and mortality are being investigated. From pathogenic and epidemiological perspectives, S proteins are of prime interest to researchers. This study focused on the unique variants of S proteins from six continents: Asia, Africa, Europe, Oceania, South America, and North America. In comparison to the other five continents, Africa had the highest percentage of unique S proteins (29.1%). The phylogenetic relationship implies that unique S proteins from North America are significantly different from those of the other five continents. They are most likely to spread to the other geographic locations through international travel or naturally by emerging mutations. It is suggested that restriction of international travel should be considered, and massive vaccination as an utmost measure to combat the spread of the COVID-19 pandemic. It is also further suggested that the efficacy of existing vaccines and future vaccine development must be reviewed with careful scrutiny, and if needed, further re-engineered based on requirements dictated by new emerging S protein variants.
Keywords: SARS-CoV-2, Invariant residues, Mutations, Spike protein, Continents, Vaccines
1. Introduction
The world is experiencing a health emergency due to the Coronavirus disease (COVID-19), caused by an enveloped positive-sense single-stranded virus, the severe acute respiratory syndrome coronavirus (SARS-CoV-2) [1], [2], [3], [4], [5], [6]. The spike (S) protein is a homotrimer present on the surface of the SARS-CoV-2 and recognizes the human host cell surface receptor angiotensin-converting enzyme-2 (ACE2) [7], [8], [9], [10]. The interaction between the S protein of SARS-CoV-2 and its cellular receptor ACE2 is driven by high affinity/avidity. Therefore, neutralization by antibodies does not only require specifically binding antibodies, but antibodies that have high affinity/avidity towards the S1 subunit of the S protein [11]. It is worth mentioning that this particular aspect is directly related to the variability of the S1 subunit (and its isoelectric points) as this may modulate the affinity of binding [12]. The importance of antibody avidity for protection towards SARS-CoV-2 (and other viruses) has been recently reviewed [12]. From the beginning of the second wave of COVID-19 infection, various SARS-CoV-2 variants emerged raising concern of enhanced transmission and mortality of the virus and reduced efficacy of vaccine protection [13], [14]. Some of the studies opposed the perception of SARS-CoV-2 mutations as distinctive pathogenic variants and the increased rate of transmissibility were questioned [15], [16]. However, the frequency of the mutant strains within the SARS-CoV-2 population carrying the D614G mutation in the S protein clearly plays a role in enabling the virus to spread more effectively and rapidly [17]. Epidemiologists have been constantly monitoring the evolution of SARS-CoV-2 with a particular focus on the S protein and other interacting proteins of the virus [17], [18]. The D614G mutation in the S protein discovered in early 2020 makes the virus able to spread more effectively and rapidly [19]. The D614G mutation has been found to be related with high viral loads in infected patients, and high rate of infections, but not with increased disease severity [20]. Various mutations in the S protein make the SARS-CoV-2 more complex and hence it is more difficult to characterize its severity, infectivity and efficacy of vaccines designed to target the S protein. Not all mutations are advantageous to the virus but several mutations or a set of mutations may increase the transmission potential through an increase in receptor binding or the ability to evade the host immune response by altering the surface structures recognized by antibodies [21], [22], [23].
To contain the spread of COVID-19, it is definitely of high interest to detect and identify various unique emerging variants of S proteins. Additionally, it is also worth investigating the impact of new S protein variants on viral infectivity and potential to spread rapidly as well as to ascertain the origin of the spread of the new variants concerning S protein variabilities. Accordingly, it might be possible to segregate the set of new variants with respect to individual characteristics of SARS-CoV-2, which would undoubtedly help policy makers to form various strategies to contain the spread of the virus. There are a large number of different SARS-CoV-2 S protein mutant sequences currently available in the National Center for Biotechnology Information (NCBI) virus database. In this study, all available S protein sequences from six continents Asia, Africa, Europe, North America, South America, and Oceania were analyzed for their uniqueness and variability. An inter-linkage was made among the unique S proteins available on the six continents.
2. Data acquisition and methods
S protein sequences from all six continents (Asia, Africa, Europe, Oceania, South America, and North America) were downloaded in FASTA format from the NCBI database (http://www.ncbi.nlm.nih.gov/). Further, FASTA files were processed in Matlab-2021a for extracting unique S protein sequences for each continent.
2.1. Phylogenetic analysis
To filter sequences with low quality (unknown amino acids ‘X’) and remove redundant sequences, the SeqKit tool was used, with the tools fx2tab and rmdup, respectively [24]. The filter removed all sequences that had one or more ‘X’ and all redundant sequences (100% identical). The amino acid sequences were aligned using the MegaX program with MUSCLE algorithm, and after it a phylogeny calculation was performed with the Neighbor-joining method, considering 3919 taxa sequences and 530 sites [25], [26]. The alignment was used as input in Archeopteryx 0.9914 with the multiple alignment inference option, following the parameters of maximum allowed gaps ratio 0.5, minimum allowed non-gap sequence length 50 and distance calculator Kimura correction [27]. The phylogenetic trees were analyzed and edited in the Archeopteryx 0.9914 tool.
2.2. Frequency probability of amino acids
Any protein sequence is composed of twenty different amino acids with various frequencies starting from zero. The ability of occurrence of each amino acid Ai is determined by the formula where f(A i) denotes the frequency of occurrence of the amino acid A i in a primary sequence, and l stands for the length of an S protein [28]. Hence for each S protein, a twenty-dimensional vector considering the frequency probability of twenty amino acids can be obtained. Based on this frequency probability, the dominance of amino acid density in a given protein is illuminated.
2.3. Evaluation of normalized amino acid compositions
The variability of the amino acid compositions of the unique S proteins from each continent was evaluated using the web-based tool Composition Profiler (http://www.cprofiler.org/) that automates detection of enrichment or depletion patterns of individual amino acids or groups of amino acids in query proteins [29]. In this analysis, we used sets of unique S proteins from each continent as query samples and the amino acid of the original S protein (UniProt ID: P0DTC2) as a reference sample that provides the background amino acid distribution. Composition profiler generates a bar chart composed of twenty data points (one for each amino acid), where bar heights indicate normalized enrichment or depletion of a given residue. The normalized enrichment/depletion is calculated as
where C continent is the content of given residue in the query set of S proteins on a given continent and C original is the content of the same residue in the original S protein. For comparison, we generated composition profiles of disordered proteins, where normalized composition was evaluated as (C DisProt is the content of a given amino acid in the set of intrinsically disordered proteins in the DisProt database [30]; C PDB is the content of the given residue in the dataset of fully ordered proteins, PDB-Select-25 [29]). In these analyses, the positive and negative values produced in the compositional profiler indicated enrichment or depletion of the indicated residue, respectively.
2.4. Amino acid conservation Shannon entropy
How conserved/disordered the amino acids are organized in the S protein is addressed by the information-theoretic measure known as ‘Shannon entropy’ (SE). For each S protein, Shannon entropy of amino acid conservation in the amino acid sequence of the S protein is computed using the following formula [31], [32]:
-
For a given amino acid sequence of length l, the conservation of amino acids is calculated as follows:
where ; k i represents the number of occurrences of an amino acid s i in the given sequence [33].
2.5. Isoelectric point of a protein sequence
The isoelectric point (pI), is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean. We calculated the theoretical pI by using the pKa's of amino acids and summing the net charge across the protein at a given pH (default is typical intracellular pH 7.2), searching with our algorithm for the pH at which the net charge is zero [34]. The isoelectric point is a powerful tool to predict and understand interactions between proteins, proteins and membranes or to determine the presence of protein isoforms [35]. Furthermore, it is noted that the isoelectric point is one of the prime keys for understanding a variety of biochemical properties of protein sequences [35], [36]. Note that the isoelectric point of a protein sequence was computed here using the standard routine of Matlab-2021a. This parameter was deployed to characterize the unique S protein sequences, quantitatively.
2.6. Intrinsic disorder analysis
Intrinsic disorder predisposition of the S protein from the original (Wuhan) version of SARS-CoV-2 was analyzed by a set of six commonly used disorder predictors, such as PONDR® VLXT, PONDR® VL3, PONDR® VSL2B, PONDR® FIT, IUPred2 (Short) and IUPred2 (Long), which were selected for their specific features. The outputs of the evaluation of the per-residue disorder propensity by these tools are represented as real numbers between 1 (ideal prediction of disorder) and 0 (ideal prediction of order) [37], [38], [39], [40], [41]. Thresholds of ≥0.15 and ≥0.5 were used to identify flexible and disordered residues and regions.The intrinsic disorder profile of this protein was generated by DiSpi/RIDAO web-crawler that combines the outputs of PONDR® VLXT, PONDR® VL3, PONDR® VLS2B, PONDR® FIT, IUPred2 (Short) and IUPred2 (Long) on the one plot and complement them by the errors evaluated for the mean disorder profile calculated by averaging profiles of individual predictors. Analysis of intrinsic disorder predisposition of unique variants of the S protein was conducted by PONDR® VSL2B. This tool is commonly used in the analysis of disorder predisposition of proteins and systematically shows good performance in various comparative analyses, including the recently conducted Critical assessment of protein intrinsic disorder prediction (CAID) experiment, where PONDR® VLS2B was recognized as predictor #3 of the 43 evaluated methods [42].
3. Results
We first determined the set of unique S protein sequences from each continent. Further, every unique S protein from a continent was compared with other unique S proteins from five other continents, and the lists of the same are presented in Table 12, Table 13, Table 14, Table 15, Table 16, Table 17. Also, the variability of the S proteins from each continent was shown using Shannon entropy and isoelectric point.
3.1. Unique S proteins on the continents
In Table 1, the number of total sequences, unique sequences and percentages are presented. Note that, a complete list of unique S protein accessions and their names (continent-wise) are made available in Supplementary file-1. Note that, sequence accession is renamed as Ck where C stands for continent code (Asia: AS, Africa: AF, Oceania: O, Europe: U, South America: SA, and North America: NA), and k denotes the serial number.
Table 1.
Percentages of continent-wise unique spike (S) proteins.
Continent | Total S proteins (T) | Unique S proteins (U) | Percentage, continent-wise |
Percentage, worldwide |
---|---|---|---|---|
Africa | 984 | 286 | 29.065 | 1.772 |
Asia | 2314 | 432 | 18.669 | 2.676 |
Europe | 1006 | 187 | 18.588 | 1.158 |
Oceania | 9920 | 1121 | 11.300 | 6.944 |
South America | 464 | 71 | 15.302 | 0.440 |
North America | 113,072 | 14,046 | 12.422 | 87.010 |
Worldwide | 127,760 | 16,143 | 12.635 | – |
The highest percentage (29.065%) of unique S proteins was found in Africa though the total number of available sequences is significantly lower as compared with that from other continents. Almost similar amounts (in percentage) of unique S sequence variations were found in Asia and Europe. Among the total 127,760 S proteins embedded in SARS-CoV-2 genomes, only 16,143 (12%) unique S proteins were detected so far, and notably most of the unique variants (87%) were found in North America only.
For each continent, the unique S proteins were matched with other unique proteins from the rest of the five continents, and a total number of such identical pairs are presented accordingly in the matrix (Table 2 ).
Table 2.
The total continent-wise number of identical S proteins.
Continent-wise | Asia | Africa | Europe | North America | Oceania | South America |
---|---|---|---|---|---|---|
Asia | – | 25 | 27 | 169 | 17 | 17 |
Africa | 25 | – | 15 | 71 | 13 | 5 |
Europe | 27 | 15 | – | 76 | 9 | 8 |
North America | 169 | 71 | 76 | – | 49 | 31 |
Oceania | 17 | 13 | 9 | 49 | – | 5 |
South America | 17 | 5 | 8 | 31 | 5 | – |
Total continent-wise | 255 | 129 | 135 | 396 | 93 | 66 |
Unique residue S proteins | 177 | 157 | 52 | 13,650 | 1028 | 5 |
From Table 2, it was observed that, on each continent there is still a significant percentage of unique S variations available, which are not shared with any other continent. Such percentages of unique variations of S proteins in Asia, Africa, Europe, Oceania, South America, and North America were 41%, 55%, 28%, 92%, 7%, and 97% respectively. The lists of pairs of identical S proteins of SARS-CoV-2 originating from six continents are presented in Table 9, Table 10, Table 11 (see Appendix A). The lists of unique S proteins (from a particular continent), which were found to be identical with some unique S proteins from the other five continents, are presented in Table 12, Table 13, Table 14, Table 15, Table 16, Table 17 (see Appendix A).
The frequency and percentage of invariant residue positions, where no amino acid change was detected so far in the unique S proteins available on each continent, are presented in Table 3 .
Table 3.
The total number and percentage of invariant residue positions among 1273 positions in unique S proteins.
Frequency of invariant residue positions in unique S proteins from each continent | ||||||
---|---|---|---|---|---|---|
Africa | Asia | Europe | Oceania | South America | North America | |
Total freq. | 902 | 695 | 948 | 731 | 1070 | 89 |
Percentage | 70.86 | 54.60 | 74.47 | 57.42 | 84.05 | 6.99 |
The highest number of mutations (lowest number of invariant residue position, 6.99%) (Table 3) were detected in the unique S proteins from North America where 12.42% unique S protein sequences were present as mentioned in Table 1. Likewise, the lowest number (15.95%) of mutations in unique S proteins was observed in South America where 15.3% unique S sequences were found. Only 29.14% residues of 1273 in the unique S proteins were mutated, although a significantly higher number (29.065%) of unique sequences were found in Africa compared to the other five continents. The unique S proteins from Europe possessed only 25.5% mutations, whereas 45.5% mutations were detected in the unique S proteins from Asia, although the same percentage (18.5%) of unique S proteins were found (Table 1, Table 3). Further, it was observed that 11.3% of the unique S proteins from Oceania possessed 42.58% mutations.
3.2. Phylogenetic relationship among unique S protein variants
We collected 204,440 S protein sequences from NCBI and GAISED databases. Upon filtering, 191,536 redundant sequences were removed and 12,904 unique sequences (corresponding to 6.31% of the initial number of sequences) were selected for phylogenetic analysis.
The resultant phylogeny for unique amino acid sequences from the SARS-CoV-2 S protein, revealed a tree with polyphyletic groups, as well as showing sequences from different continents grouping together in the same clade (see Supplementary Fig. 1). On the other hand, after the Archaepteryx analysis five predominant sequence groups were identified between different S variants from different continents (Fig. 1).
Fig. 1.
SARS-CoV-2 S amino acid phylogeny after group clustering. After Archeopteryx analysis five groups can be identified: yellow, blue, magenta, green and black.
In this case, it can be verified that the sequences from the same continents can be found in the groups with different colors. Therefore, this phylogenetic analysis grouped together sequences from different continents. Then, after these analyses it could be possible to assume that we have at least five unique SARS-CoV-2 S variants indicating possibilities for new ways of developing more specific vaccines and drugs.
3.3. Variability through normalized amino acid composition
Additional information on the variability of the amino compositions of the unique S proteins from each continent relative to the composition of the original S protein from Wuhan was retrieved using the web-based tool Composition Profiler (http://www.cprofiler.org/). Results of this analysis are shown in Fig. 2A, which clearly shows the presence of some noticeable amino acid composition variability among unique S proteins from different continents. Since individual S proteins are different from each other and from the original S protein mostly in very limited number of residues, the range of changes in the normalized enrichment/depletion of a given residue is rather limited (compare scales of Y axis in Fig. 2A and B, where a composition profile of the intrinsically disordered proteins is shown for comparison).
Fig. 2.
Composition profiles of unique S proteins from different continents (A) in comparison with the composition profile of typical intrinsically disordered proteins (B). (For interpretation of the references of the colors in this figure, the reader is referred to the web version of this article.)
On an average, unique S proteins form Oceania were found to have the most variability in terms of normalized amino acid composition. This was followed by the unique S proteins from North America. Curiously, Fig. 2A shows that although the normalized content of individual residues in the unique S proteins from Oceania is always below that of the original S protein, S proteins from other continents might have a relative excess of some residues. For example, some unique S proteins from almost all continents can be enriched in glycine or histidine residues, whereas some European S proteins can also be relatively enriched in cysteine, isoleucine, tyrosine, phenylalanine, and lysine residues (see positive green bars in Fig. 2A). Another interesting observation is that the different sets of S proteins are typically characterized by rather noticeable variability of the normalized content of most residues. Aspartate is the noticeable exception, for which depletion is almost uniform between all unique S proteins from all continents.
3.4. Variability through intrinsic disorder analysis
Next, we looked at the correlation between frequencies of the mutations in unique S proteins from different locations and intrinsic disorder predisposition of this protein. Fig. 3A shows the distribution of the mutation frequency within the amino acid sequence of the S protein. It is seen that almost all residues have at least one mutation in different variants currently found globally. In fact, only 15 residues (Met1, Leu996, Ile997, Gly999, Leu1001, Tyr1007, Val1008, Gln1010, Ile1013, Arg1019, His1049, Gln1054, Thr1105, Asn1119, and Leu1270) of the 1273-residue long S protein showed no mutations at the time of this analysis. Curiously, nine of these never-changed residues are concentrated within a short region (residues 996-1019). Fig. 3A also shows that mutation frequencies are unevenly distributed within the amino acid sequence of the S protein and that the region (residues 675-691) surrounding the furin cleavage site (residues 680-686) seems to be characterized by high mutation frequency. In fact, although the average per-residue frequency of mutations of the entire protein is equal to 4.6, the mutation frequency of the 675-691 region is two-fold higher (9.2). Comparison of the mutation frequency profile (Fig. 3A) with the per-residue intrinsic disorder predisposition profile generated for the original (Wuhan) version of the S protein by a set of commonly used disorder predictors (Fig. 3B) indicated that there is some weak correlation between these two parameters, with regions showing more disorder typically are undergoing more frequent mutations. Again, Fig. 3B shows that the region containing the furin cleavage site is among the most disordered segments of the S protein (if not the most disordered one).
Fig. 3.
Correlation of the sequence variability of unique variants of the S protein with the intrinsic disorder predisposition of this protein. A. Mutation frequencies observed at each residue of the S protein at various locations. B. Intrinsic disorder predisposition of the original (Wuhan) version of the S protein analyzed by a set of commonly used disorder predictors. In both plots, the position of the furin cleavage site is shown as a cyan vertical bar. (For interpretation of the references of the colors in this figure legend, the reader is referred to the web version of this article.)
Fig. 4 provides further quantification of the per-residue mutability and disorder predisposition of the S protein. Here, dependencies of the mutation frequencies on the corresponding disorder score evaluated by PONDR® VSL2 are shown for six geo-locations. In Africa, the S protein has 902, 296, 60, 11, and 4 residues with 0, 1, 2, 3, and 4 mutations, which are characterized by the mean disorder scores of 0.27 ± 0.15, 0.28 ± 0.16, 0.30 ± 0.18, 0.29 ± 0.18, and 0.40 ± 0.19, respectively. In Asia, 694, 437, 111, 27, 3, and 1 residues of the S protein with 0, 1, 2, 3, 4, and 5 mutations are characterized by the mean disorder scores of 0.26 ± 0.14, 0.28 ± 0.15, 0.32 ± 0.16, 0.37 ± 0.18, 0.52 ± 0.22, and 0.19, respectively. In Europe, 948, 265, 55, and 5 residues of the S protein with 0, 1, 2, and 3 mutations have the mean disorder scores of 0.27 ± 0.15, 0.28 ± 0.15, 0.33 ± 0.18 and 0.35 ± 0.27, respectively. The S proteins in Oceania has 722, 427, 107, 13, and 4 residues with 0, 1, 2, 3, and 4 mutations, which are showing the disorder scores of 0.26 ± 0.14, 0.28 ± 0.16, 0.30 ± 0.17, 0.35 ± 0.21, and 0.27 ± 0.15, respectively. In the S proteins from South America variants, 1070, 193, and 10 residues with 0, 1, and 2 mutations have mean disorder scores of 0.27 ± 0.15, 0.36 ± 0.13, and 0.23 ± 0.10, respectively. Finally, in North America, the S protein underwent most mutations and has 23, 351, 323, 234, 167, 108, 42, 12, 6, 4, 1, 1, and 1 residues with 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, and 13 mutations characterized by the mean disorder scores of 0.28 ± 0.17, 0.25 ± 0.13, 0.25 ± 0.13, 0.29 ± 0.17, 0.31 ± 0.19, 0.38 ± 0.16, 0.4 ± 0.17, 0.22 ± 0.16, 0.42 ± 0.22, 0.18, 0.25, and 0.45, respectively.
Fig. 4.
Correlation of frequency of amino acid substitutions at a given residue of the S protein and the corresponding intrinsic disorder score of these residues within the sequence of the original (Wuhan) version of the S protein. Individual plots reflect distributions within sequences of variants found at different locations: A. Africa; B. Asia; C. Europe, D. Oceania, E. South America; F. North America.
The most frequently mutated residue is Tyr248 (13 mutations) followed by Val213 and Thr108 with 12 and 10 mutations, respectively, all three from unique S protein variants found in North America. This analysis shows that here is a general trend, where residues with higher disorder levels are mutated more frequently.
3.5. Variability of unique S proteins
We quantitatively determined the variations in the unique S proteins on six continents. The variations were captured through the frequency distribution of amino acids present, Shannon entropy (amount of conservation of amino acids in a given sequence), and molecular weight and isoelectric point of a given protein sequence.
3.5.1. Variations in the frequency distribution of amino acids
The frequency of each amino acid was computed for each unique S protein available on six continents (Supplementary file-2). Maximum and minimum frequencies of amino acids present in the unique S proteins from different continents are presented in Table 4.
Table 4.
Maximum and minimum frequencies of amino acids present in the unique spike proteins from different continents.
Max and min of frequencies | A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | Max | 80 | 44 | 89 | 62 | 41 | 63 | 49 | 84 | 19 | 79 | 109 | 62 | 15 | 78 | 60 | 101 | 98 | 13 | 56 | 98 |
Min | 73 | 40 | 85 | 58 | 38 | 59 | 45 | 78 | 14 | 73 | 102 | 57 | 13 | 72 | 55 | 94 | 90 | 11 | 49 | 93 | |
Asia | Max | 80 | 44 | 89 | 63 | 41 | 63 | 49 | 84 | 19 | 78 | 110 | 62 | 15 | 79 | 59 | 101 | 101 | 13 | 57 | 98 |
Min | 73 | 39 | 80 | 55 | 36 | 56 | 45 | 76 | 15 | 72 | 100 | 55 | 13 | 68 | 52 | 90 | 90 | 11 | 49 | 90 | |
Europe | Max | 80 | 43 | 89 | 63 | 41 | 63 | 49 | 84 | 19 | 79 | 110 | 62 | 15 | 79 | 59 | 101 | 98 | 13 | 57 | 99 |
Min | 75 | 38 | 84 | 59 | 39 | 59 | 46 | 79 | 16 | 74 | 102 | 58 | 13 | 74 | 54 | 96 | 90 | 11 | 50 | 93 | |
Oceania | Max | 81 | 43 | 90 | 62 | 41 | 63 | 49 | 84 | 18 | 78 | 109 | 62 | 15 | 79 | 59 | 100 | 98 | 12 | 56 | 99 |
Min | 72 | 37 | 81 | 58 | 36 | 57 | 44 | 74 | 15 | 71 | 97 | 56 | 13 | 71 | 52 | 92 | 88 | 10 | 43 | 89 | |
South America | Max | 82 | 44 | 91 | 63 | 42 | 64 | 49 | 85 | 20 | 79 | 111 | 64 | 15 | 80 | 60 | 102 | 99 | 13 | 58 | 100 |
Min | 60 | 32 | 63 | 46 | 32 | 39 | 34 | 63 | 11 | 55 | 82 | 43 | 9 | 55 | 43 | 76 | 77 | 8 | 36 | 82 | |
North America | Max | 80 | 43 | 89 | 62 | 41 | 63 | 48 | 83 | 18 | 78 | 109 | 62 | 14 | 79 | 58 | 101 | 98 | 12 | 57 | 98 |
Min | 75 | 38 | 82 | 57 | 37 | 59 | 45 | 79 | 16 | 73 | 105 | 57 | 13 | 73 | 57 | 92 | 93 | 11 | 50 | 92 |
All S protein sequences are leucine (L) and serine (S) rich. Tryptophan (W) and methionine (M) were presented with the least frequencies (Table 4). The widest variation in frequency distributions of the twenty amino acids in the unique S proteins was found in North America.
To obtain quantitative variations in the unique S proteins available on each continent, differences between maximum and minimum vectors (20 dimensions) were obtained (Table 5 ), and then Euclidean distances between the difference vectors were calculated (Table 6 ).
Table 5.
Matrix presenting the difference between maximum and minimum frequencies of amino acids present in the unique S proteins on each continent.
Difference matrix | A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | 7 | 4 | 4 | 4 | 3 | 4 | 4 | 6 | 5 | 6 | 7 | 5 | 2 | 6 | 5 | 7 | 8 | 2 | 7 | 5 |
Asia | 7 | 5 | 9 | 8 | 5 | 7 | 4 | 8 | 4 | 6 | 10 | 7 | 2 | 11 | 7 | 11 | 11 | 2 | 8 | 8 |
Europe | 5 | 5 | 5 | 4 | 2 | 4 | 3 | 5 | 3 | 5 | 8 | 4 | 2 | 5 | 5 | 5 | 8 | 2 | 7 | 6 |
Oceania | 9 | 6 | 9 | 4 | 5 | 6 | 5 | 10 | 3 | 7 | 12 | 6 | 2 | 8 | 7 | 8 | 10 | 2 | 13 | 10 |
South America | 5 | 5 | 7 | 5 | 4 | 4 | 3 | 4 | 2 | 5 | 4 | 5 | 1 | 6 | 1 | 9 | 5 | 1 | 7 | 6 |
North America | 22 | 12 | 28 | 17 | 10 | 25 | 15 | 22 | 9 | 24 | 29 | 21 | 6 | 25 | 17 | 26 | 22 | 5 | 22 | 18 |
Table 6.
Pairwise Euclidean distances among the differences in vectors of each continent.
Distance matrix | Africa | Asia | Europe | Oceania | South America | North America |
---|---|---|---|---|---|---|
Africa | 0.00 | 11.70 | 4.69 | 12.77 | 8.49 | 66.80 |
Asia | 11.70 | 0.00 | 13.00 | 9.06 | 14.04 | 57.02 |
Europe | 4.69 | 13.00 | 0.00 | 13.30 | 8.49 | 68.38 |
Oceania | 12.77 | 9.06 | 13.30 | 0.00 | 16.03 | 56.84 |
South America | 8.49 | 14.04 | 8.49 | 16.03 | 0.00 | 69.02 |
North America | 66.80 | 57.02 | 68.38 | 56.84 | 69.02 | 0.00 |
Based on the distance matrix, a phylogenetic relationship was derived among the continents (Fig. 2).
Variations based on the frequency distribution of amino acids present in the S proteins make North America (which belongs to the rightmost branch of the tree) distant from the other five continents (Fig. 2). Variations among the unique S proteins from Asia and Oceania turned out to be similar, and they belong to the same level of leaves of the far left branch of the tree. Africa and Europe were found to be the closest in terms of variations based on the frequency distribution of amino acids over the unique S proteins from each continent. Variability of S proteins from South America has distant resemblance to that of Africa/Europe as estimated in the phylogeny. The frequencies of amino acid distribution in each unique S protein from each continent are presented in Fig. 6, Fig. 7 (see Appendix A). The widest variations of the frequency distribution of amino acids present in the S proteins were observed in North America as wide band was observed in Fig. 7. Individual frequency distributions of amino acids in Asia and Oceania seem very close as it was observed from the phylogeny (Fig. 5 ).
Fig. 5.
Phylogenetic relationship among the six continents based on the variability of unique spike proteins available in each continent.
3.5.2. Variability through Shannon entropy
In principle, for a random amino acid sequence, the Shannon entropy (SE) is one. Here Shannon entropy for each S protein sequence was computed using the formula stated in Section 2.2 (Supplementary file-2). It was found that the highest and lowest SEs of the S proteins from all continents were 0.9643 and 0.9594 respectively. That is, the length of the largest interval is 0.005 which is sufficiently small. Also note that the length of the smallest interval was 0.001 which occurred in the SEs of the S proteins from South America. Within this realm, the widest variation of SEs was noticed among the unique S proteins of North America. All other four intervals (considering lowest and highest) of SEs of all the unique S proteins from four continents Africa, Asia, Oceania and Europe were contained in the interval of North America and that of South America (Table 7).
Table 7.
Interval of Shannon entropy of the unique S proteins from six different continents.
SE: Continent | Interval of SEs |
---|---|
SE of S protein: Africa | (0.960825, 0.963239) |
SE of S protein: Asia | (0.961471, 0.963326) |
SE of S protein: Europe | (0.961539, 0.963254) |
SE of S protein: North America | (0.95934, 0.964314) |
SE of S protein: Oceania | (0.961525, 0.963042) |
SE of S protein: South America | (0.961589, 0.962895) |
Among all (201273) possible amino acids (20 in number) sequences of length 1273, nature had selected only a fraction to make the S proteins of SARS-CoV-2, and interestingly SEs of them were kept within a very small interval. From the SEs which were close to 1, the S protein sequences are expected to be pseudo-random. Variation of SEs for all unique S proteins from each continent is shown in Fig. 5, Fig. 6 (see Appendix A). Conservation of amino acids present over each S protein from each continent is different from one another, which is depicted by the zig-zag nature of the SE plots (Fig. 8, Fig. 9).
3.5.3. Variability through isoelectric point
For each S protein sequence from each continent the isoelectric point (pI) was computed (Supplementary file-3). Intervals (considering minimum and maximum) pIs of the unique S proteins from each continent are presented in Table 8.
Table 8.
Interval of isoelectric point of unique S proteins from six different continents.
pI: Continent | Interval of PIs |
---|---|
pI of S protein: Africa | (6.44, 7.09) |
pI of S protein: Asia | (6.21, 7.08) |
pI of S protein: Europe | (6.21, 6.99) |
pI of S protein: North America | (5.61, 7.79) |
pI of S protein: Oceania | (6.31, 7.09) |
pI of S protein: South America | (6.36, 6.99) |
It was noticed that pIs for all the unique S proteins from the six continents were distributed between 5.61 and 7.79. The largest interval of pIs was found for the unique S proteins from North America. Therefore, the widest varieties of the unique S proteins were found in North America.
The degree of non-linearity of the plots of pIs for each protein from each continent shows wide variations of the unique S proteins (Fig. 10, Fig. 11 (see Appendix A)).
4. Discussion and concluding remarks
Various mutations in the S proteins lead to the evolution of new variants of SARS-CoV-2 [43]. Naturally, our attention was captured to characterize the unique S protein variants, which were embedded in SARS-CoV-2 genomes infecting millions of people worldwide [44]. As of May 7, 2021, there were 127,760 COVID-19 patients infected with SARS-CoV-2 with 16,143 S protein variants, which are undoubtedly well-organized by means of amino acid composition and conservation as depicted by Shannon entropy and isoelectric point. Among the unique S proteins present on a continent, many of them are common on other continents as well (Table 2). On the other hand, there are still a handful of unique S protein variants residing on each continent. Considering the nature and biological implications of the new variants of SARS-CoV-2 caused by different mutations in the S proteins, the appearance of several unique S variants in SARS-CoV-2 is certainly a worrying trend [45]. There are still many unique S protein variants on all continents that may spread from person to person through close communities or by spontaneous mutations causing a condition that may become alarming.
Comparative analysis revealed the presence of some weak correlation between the per-residue mutability and intrinsic disorder predisposition of the S protein, with residues possessing higher disorder predisposition typically showing higher mutation rates as well. For example, the mean disorder score of 89 residues that were mutated 10-18 times is 0.35 ± 0.18 as compared to the mean disorder score of 0.26 ± 0.12 for 155 residues with 0 and 1 mutations. Curiously, the most disorder region of the S protein (residues 675-691), which includes the furin cleavage site (residues 680-686), was shown to be characterized by high mutation frequency, with Pro681 (which was mutated 17 times) being the second most frequently mutated residue of this protein.
We observed that the unique S proteins from North America have mutations in almost every amino acid residue position (1184 out of 1273), while unique S variants from the other continents only have mutations in 16 to 20% of residues. So, even if international travel is limited, S proteins from these five continents will likely acquire mutations at other residue positions where mutations have already been found in the specific variants from North America due to natural evolution. Based on the amino acid frequency distributions in the S protein variants from all continents, a phylogenetic relationship among the continents was presented. The phylogenetic relationship implies that the unique S proteins from North America were found to be significantly different from those of the other five continents. Therefore, the possibility of spreading the unique variants originating from North America to the other geographic locations by means of international travel is high, and numerous mutations have been detected already in the unique variants from North America. Of note, the infection/herd immunity status in South America may be summarized by the example of Manaus (the capital of Amazonas state in northern Brazil) where from June 2020 to October 2020 the SARS-CoV-2 prevalence among the population increased from ~60% to ~70%, a condition which may mirror acquisition of herd immunity [46]. By January 2021 Manaus had a huge resurgence in cases due to emergence of a new variant known as P.1, which was responsible for nearly 100% of the new cases [47]. Although the population may have then reached a high herd immunity threshold, there is still a risk of resurgence of new immunity-escape variants, which raises important questions. For example, 1. Is post-infection herd immunity not enough for protection and should it be combined with vaccinations? 2. Will the crucial viral variants (mutations) be listed by the WHO and recommended to be included in “next generation vaccines”? [48], [49]. In addition, we cannot yet exclude the possibility of detrimental mutations in the viral Spike-RBD emerging in India and the USA [48].
Let us have a brief glance at the potential consequences of the mutations in S-protein from the viewpoint of protective immunity towards SARS-CoV-2. It is known that the protective immunity towards infection and disease depends on the presence of high avidity antibodies. This is because high avidity of neutralizing antibodies, which is defined as the strength of antibody-target epitope interaction, plays an important role in antibody-mediated protection against viral infections [12]. High avidity (functional affinity) is established during affinity maturation, as the avidity of IgG is low during acute infection and reaches high values several weeks or months later [50], [51]. Importantly, incomplete avidity maturation of IgG often leads to the failure of the protection against viral infections and/or resultant diseases, as was shown for varicella zoster virus (VZV), cytomegalovirus (CMV), measles virus, Dengue virus, respiratory syncytial virus (RSV), and Simian human immunodeficiency virus (SHIDV) [52], [53], [54], [55], [56], [57], [58], [59]. Since the interaction between the SARS-CoV-2 Spike-RBD protein and ACE2 on host cells is characterized by high affinity, it is expected that the protective anti-SARS-CoV-2 antibodies should possess high affinity/avidity to be able to block this high affinity RBD-ACE2 interaction [11]. Recently, it was shown that the serological response to SARS-CoV-2 is frequently characterized by the incomplete maturation of avidity [60], [50]. It was also proposed that such incomplete avidity maturation represents an essential strategy of coronaviruses determining high probability of repeated waves of reinfections due to the short-lasting protective immunity [61], [62], [12]. Furthermore, an unexpected scenario was recently uncovered, where the natural SARS-CoV-2 infection does not lead to the establishment of a high avidity immune response and therefore does not have a good chance for the development of complete protection against SARS-CoV-2 and for establishment of herd immunity [63]. On the contrary, complete avidity maturation was achieved with two rounds of vaccination, whereas the quality of the immune response after natural infection was similar to that generated by one vaccination step and did not reach the quality of complete vaccination with two steps. Therefore, the scenario occurring in Manaus can be considered on the basis of these new findings. In fact, it is obvious now that despite the high COVID-19 prevalence reached in this city, no herd immunity could be expected retrospectively, as natural infections are insufficient for the establishment of a high avidity immune response and related development of complete protection against SARS-CoV-2 [63]. Therefore, it seems likely that the herd immunity can only be reached through at least two vaccination steps [63]. It is also expected that herd immunity might be partially or completely overrun by novel SARS-CoV-2 variants showing higher affinity of RBD-ACE2 interaction than that of the original SARS-CoV-2 strain.
Hence in the near future, we can expect to experience more new SARS-CoV-2 variants, which might cause additional. waves of COVID-19. Therefore, massive vaccination is necessary to combat COVID-19, and of course, existing vaccines must be reviewed, and if needed further re-engineered based on newly emerging S protein variants.
Altogether, data presented in this study indicate that although unique variants of the SARS-CoV-2 S protein are rather abundant, they are unevenly distributed among continents, with Africa possessing the highest percentage of unique S variants, and with the unique S proteins found in North America being noticeably different from the variants on other continents. It is likely that these unique variants can spread to continents where they have not been detected before. Furthermore, this inhomogeneity raises an important question on why the currently observed differences in the number of unique variants of the S protein (reflecting frequency of its mutagenesis) is so great. It cannot be easily explained by the differences between the continents in the number of COVID-19 patients (reported SARS-CoV-2-positive cases). In fact, according to Worldometer, as of September 10, 2021, there were 8,087,058, 72,407,564, 56,618,705, 181,742, 50,106,216, and 37,213,429 recorded COVID-19 cases in Africa, Asia, Europe, Oceania, North America, and South America, respectively. Obviously, these infection levels do not correlate with the corresponding numbers of unique S protein variants (see Table 1, Table 2). There is also no strong correlation between the reported S protein variability and levels of genomic sequencing on different continents (which serves now as a real-time molecular/genomics SARS-CoV-2 surveillance). In fact, it was reported that as of 5 July 2021, 25,284 whole-genome sequences from Africa (0.32% of all reported SARS-CoV-2-positive cases from that continent), 146,562 from Asia (0.30% coverage), 1,292,415 from Europe (2.35% coverage), 692,704 from North America (1.75% coverage), 37,913 from South America (0.12% coverage) and 20,613 from Oceania (25% coverage) had been generated [64]. Therefore, although these numbers reflecting levels of the continent-wise coverage show a heavy bias towards the regions and countries with more specialized genomics facilities, programs, and research projects, there is no strong correlation between the coverage and established S-protein variability [65], [66].
An intriguing possible mechanism of the observed differences in the rates of virus evolution is the presence of a conceivable variability of the ACE2 gene on different continents that might have an impact on the variability of the viral protein as well. In line with this idea, it was recently reported that the expression levels of ACE2 can be elevated up to 50% due to the differences in the frequency of the rs2285666 polymorphism (the TT-plus strand or AA-minus strand alternate allele) among Europeans and Asians, with this difference playing a significant role in the SARS-CoV-2 susceptibility [67], [68]. Similarly, based on comprehensive analyses of the allelic frequencies of polymorphisms in the ACE2, TMPRSS2, TMPRSS11A, cathepsin L (CTSL), and elastase (ELANE) genes in populations from the American, African, European, and Asian continents it was concluded that the non-coding sequences of these proteins related to the SARS-CoV-2 cell entry contain numerous polymorphisms with possible functional consequences [69].
The following are the supplementary data related to this article.
A complete list of unique S protein accessions and their names (continent-wise).
The frequency of each amino acid was computed for each unique S protein available in six continents and Shannon entropy for each S protein sequence computed using the formula stated in section 2.2.
Isoelectric point (pI) computed for each S protein sequence from each continent.
Phylogenetic tree of the unique amino acid sequences from SARS-CoV-2 S-protein showing polyphyletic groups and sequences from different countries grouping together in the same clade
CRediT authorship contribution statement
Sk. Sarif Hassan: Study design, Formal analysis, Investigation, Methodology, Visualization, Writing - Original draft, Writing - Review & editing, Administration;
Kenneth Lundstrom: Formal analysis, Investigation, Writing - Original draft, Writing - Review & editing;
Debmalya Barh: Formal analysis, Investigation, Methodology, Visualization, Writing - Review & editing;
Raner Jośe Santana Silva: Formal analysis, Investigation, Methodology, Visualization, Writing - Review & editing;
Bruno Silva Andrade: Formal analysis, Investigation, Methodology, Visualization, Writing - Review & editing;
Vasco Azevedo: Formal analysis, Investigation, Methodology, Visualization, Writing - Review & editing;
Pabitra Pal Choudhury: Formal analysis, Writing - Review & editing;
Giorgio Palu: Formal analysis, Writing - Review & editing;
Bruce D. Uhal: Formal analysis, Writing - Review & editing;
Ramesh Kandimalla: Formal analysis, Writing - Review & editing;
Murat Seyran: Formal analysis, Writing - Review & editing;
Amos Lal: Formal analysis, Writing - Review & editing;
Samendra P. Sherchan: Formal analysis, Writing - Review & editing;
Gajendra Kumar Azad: Formal analysis, Writing - Review & editing;
Alaa A. A. Aljabali: Formal analysis, Writing - Review & editing;
Adam M. Brufsky: Formal analysis, Writing - Review & editing;
Angel Serrano-Aroca: Formal analysis, Writing - Review & editing;
Parise Adadi: Formal analysis, Writing - Review & editing;
Tarek Mohamed Abd El-Aziz: Formal analysis, Writing - Review & editing;
Elrashdy M. Redwan: Formal analysis, Writing - Review & editing;
Kazuo Takayama: Formal analysis, Writing - Review & editing;
Nima Rezaei: Formal analysis, Writing - Review & editing;
Murtaza Tambuwala: Formal analysis, Writing - Review & editing;
Vladimir N. Uversky: Study design, Formal analysis, Methodology, Investigation, Visualization, Writing - Original draft, Writing - Review & editing.
All authors read and approved the final version of the manuscript.
Declaration of competing interest
Authors have no conflict of interest to declare.
Appendix A
Table 9.
List of pairs of identical spike proteins of SARS-CoV-2 originated from six continents.
Spike: Asia-Europe | Spike: Asia-Africa | Spike: Asia-Oceania | Spike: Asia-South America | Spike: Asia-North America |
---|---|---|---|---|
(A14, U2) | (A14, AF2) | (A15, O5) | (A31, SA1) | (A1, NA7) |
(A15, U3) | (A15, AF3) | (A77, O43) | (A67, SA4) | (A8, NA231) |
(A30, U8) | (A26, AF19) | (A95, O58) | (A148, SA13) | (A12, NA902) |
(A31, U9) | (A71, AF48) | (A109, O83) | (A180, SA19) | (A14, NA928) |
(A33, U11) | (A93, AF58) | (A128, O201) | (A191, SA22) | (A15, NA992) |
(A36, U17) | (A128, AF72) | (A138, O370) | (A200, SA25) | (A19, NA1131) |
(A43, U18) | (A138, AF76) | (A142, O373) | (A207, SA27) | (A23, NA1445) |
(A69, U23) | (A142, AF79) | (A148, O377) | (A211, SA30) | (A28, NA2065) |
(A77, U26) | (A148, AF82) | (A166, O387) | (A213, SA32) | (A30, NA3228) |
(A93, U28) | (A161, AF88) | (A206, O388) | (A219, SA33) | (A31, NA3313) |
(A95, U30) | (A164, AF92) | (A213, O390) | (A234, SA35) | (A32, NA3438) |
(A105, U34) | (A166, AF101) | (A253, O398) | (A280, SA41) | (A33, NA3477) |
(A128, U52) | (A191, AF115) | (A277, O400) | (A284, SA42) | (A34, NA3658) |
(A134, U54) | (A206, AF118) | (A284, O402) | (A335, SA61) | (A43, NA3752) |
(A135, U57) | (A213, AF120) | (A305, O504) | (A340, SA63) | (A44, NA3768) |
(A148, U63) | (A275, AF130) | (A359, O1076) | (A373, SA68) | (A58, NA3911) |
(A213, U80) | (A276, AF131) | (A404, O1104) | (A404, SA71) | (A69, NA4028) |
(A234, U84) | (A277, AF134) | (A71, NA4051) | ||
(A239, U88) | (A279, AF137) | (A76, NA4169) | ||
(A265, U94) | (A282, AF138) | (A77, NA4243) | ||
(A284, U99) | (A292, AF147) | (A78, NA4270) | ||
(A286, U100) | (A379, AF229) | (A85, NA4296) | ||
(A333, U121) | (A394, AF247) | (A89, NA4375) | ||
(A340, U124) | (A404, AF263) | (A90, NA4394) | ||
(A379, U151) | (A430, AF278) | (A91, NA4436) | ||
(A404, U181) | (A93, NA4448) | |||
(A430, U187) | (A95, NA4508) | |||
Spike: Asia-North America | Spike: Asia-North America | Spike: Asia-North America | Spike: Asia-North America | Spike: Asia-North America |
(A96, NA4537) | (A166, NA5819) | (A214, NA6445) | (A267, NA6903) | (A345, NA9597) |
(A97, NA4541) | (A170, NA5927) | (A215, NA6465) | (A273, NA6916) | (A348, NA9612) |
(A100, NA4559) | (A171, NA5977) | (A216, NA6492) | (A274, NA6936) | (A351, NA9663) |
(A101, NA4620) | (A173, NA5992) | (A217, NA6499) | (A275, NA6944) | (A354, NA9674) |
(A102, NA4637) | (A174, NA6060) | (A218, NA6510) | (A276, NA6949) | (A356, NA9724) |
(A103, NA4658) | (A175, NA6067) | (A219, NA6515) | (A277, NA6962) | (A357, NA9763) |
(A105, NA4715) | (A177, NA6071) | (A221, NA6527) | (A278, NA6969) | (A358, NA9776) |
(A109, NA4861) | (A178, NA6080) | (A222, NA6540) | (A279, NA7000) | (A359, NA9792) |
(A111, NA4897) | (A180, NA6101) | (A223, NA6550) | (A280, NA7015) | (A360, NA9834) |
(A114, NA5001) | (A181, NA6142) | (A224, NA6553) | (A282, NA7025) | (A367, NA10276) |
(A115, NA5022) | (A182, NA6148) | (A230, NA6602) | (A283, NA7056) | (A373, NA10342) |
(A121, NA5105) | (A183, NA6155) | (A233, NA6616) | (A284, NA7090) | (A375, NA10442) |
(A122, NA5137) | (A191, NA6185) | (A234, NA6622) | (A286, NA7129) | (A378, NA11135) |
(A126, NA5151) | (A193, NA6193) | (A235, NA6630) | (A291, NA7198) | (A379, NA11225) |
(A127, NA5182) | (A195, NA6244) | (A238, NA6659) | (A292, NA7227) | (A380, NA11305) |
(A128, NA5194) | (A196, NA6258) | (A239, NA6661) | (A293, NA7249) | (A381, NA11560) |
(A133, NA5471) | (A198, NA6276) | (A244, NA6683) | (A304, NA7576) | (A383, NA11874) |
(A134, NA5485) | (A199, NA6293) | (A245, NA6687) | (A322, NA8509) | (A386, NA13280) |
(A135, NA5516) | (A200, NA6299) | (A247, NA6707) | (A323, NA8519) | (A387, NA13307) |
(A138, NA5538) | (A201, NA6305) | (A249, NA6713) | (A324, NA8565) | (A388, NA13362) |
(A140, NA5574) | (A205, NA6324) | (A253, NA6751) | (A325, NA8570) | (A391, NA13404) |
(A148, NA5595) | (A206, NA6334) | (A254, NA6756) | (A333, NA9283) | (A394, NA13438) |
(A158, NA5644) | (A207, NA6373) | (A255, NA6780) | (A335, NA9324) | (A395, NA13444) |
(A159, NA5645) | (A210, NA6388) | (A257, NA6794) | (A341, NA9425) | (A396, NA13465) |
(A161, NA5666) | (A211, NA6406) | (A258, NA6810) | (A342, NA9455) | (A399, NA13554) |
(A163, NA5722) | (A212, NA6424) | (A264, NA6857) | (A343, NA9568) | (A401, NA13614) |
(A164, NA5744) | (A213, NA6429) | (A265, NA6862) | (A344, NA9592) | (A404, NA13635) |
(A405, NA13668) | ||||
(A408, NA13704) | ||||
(A413, NA13841) | ||||
(A418, NA13913) | ||||
(A419, NA13948) | ||||
(A430, NA14000) | ||||
(A431, NA14026) |
Table 10.
List of pairs of identical spike proteins of SARS-CoV-2 originated from different continents.
Spike: Africa-Europe | Spike: Africa-North America | Spike: Africa-North America | Spike: Africa-Oceania | Spike: Africa-South America | Spike: Europe-North America |
---|---|---|---|---|---|
(AF2, U2) | (AF2, NA928) | (AF121, NA6566) | (AF1, O3) | (AF82, SA13) | (U2, NA928) |
(AF3, U3) | (AF3, NA992) | (AF123, NA6628) | (AF3, O5) | (AF115, SA22) | (U3, NA992) |
(AF31, U10) | (AF8, NA1298) | (AF125, NA6816) | (AF71, O148) | (AF117, SA26) | (U4, NA1221) |
(AF58, U28) | (AF9, NA1348) | (AF128, NA6848) | (AF72, O201) | (AF120, SA32) | (U7, NA2680) |
(AF69, U45) | (AF31, NA3387) | (AF130, NA6944) | (AF76, O370) | (AF263, SA71) | (U8, NA3228) |
(AF72, U52) | (AF34, NA3583) | (AF131, NA6949) | (AF79, O373) | (U9, NA3313) | |
(AF82, U63) | (AF38, NA3797) | (AF133, NA6953) | (AF82, O377) | (U10, NA3387) | |
(AF120, U80) | (AF46, NA3986) | (AF134, NA6962) | (AF101, O387) | (U11, NA3477) | |
(AF123, U85) | (AF47, NA3988) | (AF137, NA7000) | (AF118, O388) | (U18, NA3752) | |
(AF145, U103) | (AF48, NA4051) | (AF138, NA7025) | (AF120, O390) | (U22, NA3895) | |
(AF195, U119) | (AF50, NA4061) | (AF145, NA7199) | (AF134, O400) | (U23, NA4028) | |
(AF229, U151) | (AF51, NA4117) | (AF146, NA7224) | (AF179, O751) | (U26, NA4243) | |
(AF230, U154) | (AF58, NA4448) | (AF147, NA7227) | (AF263, O1104) | (U28, NA4448) | |
(AF263, U181) | (AF64, NA4832) | (AF149, NA7286) | Spike: Oceania-South America | (U30, NA4508) | |
(AF278, U187) | (AF69, NA5149) | (AF151, NA7299) |
(O377, SA13) |
(U34, NA4715) | |
(AF71, NA5188) | (AF152, NA7300) | (O389, SA28) | (U36, NA4780) | ||
(AF72, NA5194) | (AF154, NA7375) | (O390, SA32) | (U38, NA4837) | ||
(AF73, NA5202) | (AF156, NA7453) | (O402, SA42) | (U41, NA4989) | ||
(AF76, NA5538) | (AF165, NA7553) | (O1104, SA71) | (U42, NA5083) | ||
(AF82, NA5595) | (AF168, NA7644) | (U45, NA5149) | |||
(AF83, NA5606) | (AF179, NA8514) | (U47, NA5167) | |||
(AF88, NA5666) | (AF195, NA9264) | (U52, NA5194) | |||
(AF90, NA5693) | (AF196, NA9265) | (U53, NA5282) | |||
(AF92, NA5744) | (AF223, NA10257) | (U54, NA5485) | |||
(AF99, NA5818) | (AF227, NA10943) | (U55, NA5490) | |||
(AF101, NA5819) | (AF229, NA11225) | (U57, NA5516) | |||
(AF103, NA5829) | (AF230, NA11456) | (U63, NA5595) | |||
(AF104, NA5830) | (AF231, NA11576) | (U66, NA5627) | |||
(AF105, NA5837) | (AF247, NA13438) | (U72, NA6096) | |||
(AF108, NA5874) | (AF248, NA13478) | (U76, NA6240) | |||
(AF114, NA6178) | (AF254, NA13578) | (U78, NA6399) | |||
(AF115, NA6185) | (AF263, NA13635) | (U79, NA6421) | |||
(AF118, NA6334) | (AF268, NA13798) | (U80, NA6429) | |||
(AF119, NA6390) | (AF271, NA13870) | (U82, NA6450) | |||
(AF120, NA6429) | (AF278, NA14000) | (U84, NA6622) | |||
(AF283, NA14015) | (U85, NA6628) | ||||
(U88, NA6661) | |||||
(U90, NA6704) |
Table 11.
List of pairs of identical spike proteins of SARS-CoV-2 originated from different continents.
Spike: Europe-North America | Spike: Europe-Oceania | Spike: North America-Oceania | Spike: North America-Oceania | Spike: South America-North America |
---|---|---|---|---|
(U92, NA6723) | (U3, O5) | (NA992, O5) | (NA6751, O398) | (NA3313, SA1) |
(U93, NA6775) | (U26, O43) | (NA3873, O28) | (NA6962, O400) | (NA4550, SA5) |
(U94, NA6862) | (U30, O58) | (NA4024, O36) | (NA7060, O401) | (NA4720, SA7) |
(U98, NA7057) | (U52, O201) | (NA4243, O43) | (NA7090, O402) | (NA4989, SA11) |
(U99, NA7090) | (U63, O377) | (NA4508, O58) | (NA7230, O404) | (NA5595, SA13) |
(U100, NA7129) | (U80, O390) | (NA4756, O65) | (NA7355, O415) | (NA5687, SA18) |
(U103, NA7199) | (U99, O402) | (NA4861, O83) | (NA7402, O419) | (NA6101, SA19) |
(U104, NA7312) | (U118, O1032) | (NA5011, O105) | (NA7510, O422) | (NA6146, SA20) |
(U106, NA7431) | (U181, O1104) | (NA5041, O114) | (NA7811, O625) | (NA6161, SA21) |
(U107, NA7557) | (NA5188, O148) | (NA7832, O631) | (NA6185, SA22) | |
(U111, NA7679) | Spike: Europe-South America | (NA5194, O201) | (NA7845, O633) | (NA6299, SA25) |
(U112, NA7884) |
(U9, SA1) |
(NA5200, O225) | (NA7901, O645) | (NA6373, SA27) |
(U113, NA7914) | (U41, SA11) | (NA5205, O238) | (NA8514, O751) | (NA6395, SA28) |
(U114, NA9075) | (U63, SA13) | (NA5372, O368) | (NA8646, O770) | (NA6396, SA29) |
(U116, NA9180) | (U80, SA32) | (NA5538, O370) | (NA8703, O798) | (NA6406, SA30) |
(U117, NA9189) | (U84, SA35) | (NA5579, O374) | (NA8787, O850) | (NA6418, SA31) |
(U119, NA9264) | (U99, SA42) | (NA5595, O377) | (NA8817, O886) | (NA6429, SA32) |
(U121, NA9283) | (U124, SA63) | (NA5819, O387) | (NA8824, O889) | (NA6515, SA33) |
(U122, NA9284) | (U181, SA71) | (NA6334, O388) | (NA9091, O1017) | (NA6622, SA35) |
(U123, NA9330) | (NA6395, O389) | (NA9333, O1035) | (NA6696, SA38) | |
(U126, NA9458) | (NA6429, O390) | (NA9350, O1037) | (NA7015, SA41) | |
(U131, NA10312) | (NA6577, O391) | (NA9639, O1059) | (NA7090, SA42) | |
(U137, NA10457) | (NA6578, O392) | (NA9792, O1076) | (NA7430, SA43) | |
(U141, NA10669) | (NA6620, O395) | (NA9891, O1079) | (NA7477, SA44) | |
(U144, NA10811) | (NA13635, O1104) | (NA7521, SA45) | ||
(U146, NA10987) | (NA7892, SA56) | |||
(U148, NA11013) | (NA9324, SA61) | |||
(U151, NA11225) | (NA9910, SA66) | |||
(U153, NA11367) | (NA10342, SA68) | |||
(U154, NA11456) | (NA13390, SA70) | |||
(U155, NA11466) | (NA13635, SA71) | |||
(U158, NA13110) | ||||
(U160, NA13253) | ||||
(U175, NA13414) | ||||
(U177, NA13551) | ||||
(U179, NA13626) | ||||
(U181, NA13635) | ||||
(U187, NA14000) |
Table 12.
List of spike proteins from Asia, which were found to be identical with spike proteins from other five continents.
Spike proteins (Asia) which were found to be identical with spike proteins from other five continents | ||||||||
---|---|---|---|---|---|---|---|---|
A1 | A71 | A115 | A171 | A207 | A239 | A280 | A344 | A388 |
A8 | A76 | A121 | A173 | A210 | A244 | A282 | A345 | A391 |
A12 | A77 | A122 | A174 | A211 | A245 | A283 | A348 | A394 |
A14 | A78 | A126 | A175 | A212 | A247 | A284 | A351 | A395 |
A15 | A85 | A127 | A177 | A213 | A249 | A286 | A354 | A396 |
A19 | A89 | A128 | A178 | A214 | A253 | A291 | A356 | A399 |
A23 | A90 | A133 | A180 | A215 | A254 | A292 | A357 | A401 |
A26 | A91 | A134 | A181 | A216 | A255 | A293 | A358 | A404 |
A28 | A93 | A135 | A182 | A217 | A257 | A304 | A359 | A405 |
A30 | A95 | A138 | A183 | A218 | A258 | A305 | A360 | A408 |
A31 | A96 | A140 | A191 | A219 | A264 | A322 | A367 | A413 |
A32 | A97 | A142 | A193 | A221 | A265 | A323 | A373 | A418 |
A33 | A100 | A148 | A195 | A222 | A267 | A324 | A375 | A419 |
A34 | A101 | A158 | A196 | A223 | A273 | A325 | A378 | A430 |
A36 | A102 | A159 | A198 | A224 | A274 | A333 | A379 | A431 |
A43 | A103 | A161 | A199 | A230 | A275 | A335 | A380 | |
A44 | A105 | A163 | A200 | A233 | A276 | A340 | A381 | |
A58 | A109 | A164 | A201 | A234 | A277 | A341 | A383 | |
A67 | A111 | A166 | A205 | A235 | A278 | A342 | A386 | |
A69 | A114 | A170 | A206 | A238 | A279 | A343 | A387 |
Table 13.
List of spike proteins from Africa, which were found to be identical with spike proteins from other five continents.
Spike proteins (Africa) which were found to be identical with spike proteins from other five continents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
AF1 | AF34 | AF58 | AF79 | AF101 | AF117 | AF128 | AF145 | AF156 | AF227 | AF263 |
AF2 | AF38 | AF64 | AF82 | AF103 | AF118 | AF130 | AF146 | AF165 | AF229 | AF268 |
AF3 | AF46 | AF69 | AF83 | AF104 | AF119 | AF131 | AF147 | AF168 | AF230 | AF271 |
AF8 | AF47 | AF71 | AF88 | AF105 | AF120 | AF133 | AF149 | AF179 | AF231 | AF278 |
AF9 | AF48 | AF72 | AF90 | AF108 | AF121 | AF134 | AF151 | AF195 | AF247 | AF283 |
AF19 | AF50 | AF73 | AF92 | AF114 | AF123 | AF137 | AF152 | AF196 | AF248 | |
AF31 | AF51 | AF76 | AF99 | AF115 | AF125 | AF138 | AF154 | AF223 | AF254 |
Table 14.
List of spike proteins from Europe, which were found to be identical with spike proteins from other five continents.
Spike proteins (Europe) which were found to be identical with spike proteins from other five continents | ||||||||
---|---|---|---|---|---|---|---|---|
U2 | U18 | U41 | U63 | U85 | U103 | U117 | U137 | U158 |
U3 | U22 | U42 | U66 | U88 | U104 | U118 | U141 | U160 |
U4 | U23 | U45 | U72 | U90 | U106 | U119 | U144 | U175 |
U7 | U26 | U47 | U76 | U92 | U107 | U121 | U146 | U177 |
U8 | U28 | U52 | U78 | U93 | U111 | U122 | U148 | U179 |
U9 | U30 | U53 | U79 | U94 | U112 | U123 | U151 | U181 |
U10 | U34 | U54 | U80 | U98 | U113 | U124 | U153 | U187 |
U11 | U36 | U55 | U82 | U99 | U114 | U126 | U154 | |
U17 | U38 | U57 | U84 | U100 | U116 | U131 | U155 |
Table 15.
List of spike proteins from North America, which were found to be identical with spike proteins from other five continents.
Spike proteins (North America) which were found to be identical with spike proteins from other five continents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
NA7 | NA3911 | NA4837 | NA5595 | NA6161 | NA6510 | NA6810 | NA7300 | NA8703 | NA9792 | NA13390 |
NA231 | NA3986 | NA4861 | NA5606 | NA6178 | NA6515 | NA6816 | NA7312 | NA8787 | NA9834 | NA13404 |
NA377 | NA3988 | NA4897 | NA5627 | NA6185 | NA6527 | NA6848 | NA7355 | NA8817 | NA9891 | NA13414 |
NA389 | NA4024 | NA4989 | NA5644 | NA6193 | NA6540 | NA6857 | NA7375 | NA8824 | NA9910 | NA13438 |
NA390 | NA4028 | NA5001 | NA5645 | NA6240 | NA6550 | NA6862 | NA7402 | NA9075 | NA10257 | NA13444 |
NA402 | NA4051 | NA5011 | NA5666 | NA6244 | NA6553 | NA6903 | NA7430 | NA9091 | NA10276 | NA13465 |
NA902 | NA4061 | NA5022 | NA5687 | NA6258 | NA6566 | NA6916 | NA7431 | NA9180 | NA10312 | NA13478 |
NA928 | NA4117 | NA5041 | NA5693 | NA6276 | NA6577 | NA6936 | NA7453 | NA9189 | NA10342 | NA13551 |
NA992 | NA4169 | NA5083 | NA5722 | NA6293 | NA6578 | NA6944 | NA7477 | NA9264 | NA10442 | NA13554 |
NA1104 | NA4243 | NA5105 | NA5744 | NA6299 | NA6602 | NA6949 | NA7510 | NA9265 | NA10457 | NA13578 |
NA1131 | NA4270 | NA5137 | NA5818 | NA6305 | NA6616 | NA6953 | NA7521 | NA9283 | NA10669 | NA13614 |
NA1221 | NA4296 | NA5149 | NA5819 | NA6324 | NA6620 | NA6962 | NA7553 | NA9284 | NA10811 | NA13626 |
NA1298 | NA4375 | NA5151 | NA5829 | NA6334 | NA6622 | NA6969 | NA7557 | NA9324 | NA10943 | NA13635 |
NA1348 | NA4394 | NA5167 | NA5830 | NA6373 | NA6628 | NA7000 | NA7576 | NA9330 | NA10987 | NA13668 |
NA1445 | NA4436 | NA5182 | NA5837 | NA6388 | NA6630 | NA7015 | NA7644 | NA9333 | NA11013 | NA13704 |
NA2065 | NA4448 | NA5188 | NA5874 | NA6390 | NA6659 | NA7025 | NA7679 | NA9350 | NA11135 | NA13798 |
NA2680 | NA4508 | NA5194 | NA5927 | NA6395 | NA6661 | NA7056 | NA7811 | NA9425 | NA11225 | NA13841 |
NA3228 | NA4537 | NA5200 | NA5977 | NA6396 | NA6683 | NA7057 | NA7832 | NA9455 | NA11305 | NA13870 |
NA3313 | NA4541 | NA5202 | NA5992 | NA6399 | NA6687 | NA7060 | NA7845 | NA9458 | NA11367 | NA13913 |
NA3387 | NA4550 | NA5205 | NA6060 | NA6406 | NA6696 | NA7090 | NA7884 | NA9568 | NA11456 | NA13948 |
NA3438 | NA4559 | NA5282 | NA6067 | NA6418 | NA6704 | NA7129 | NA7892 | NA9592 | NA11466 | NA14000 |
NA3477 | NA4620 | NA5372 | NA6071 | NA6421 | NA6707 | NA7198 | NA7901 | NA9597 | NA11560 | NA14015 |
NA3583 | NA4637 | NA5471 | NA6080 | NA6424 | NA6713 | NA7199 | NA7914 | NA9612 | NA11576 | NA14026 |
NA3658 | NA4658 | NA5485 | NA6096 | NA6429 | NA6723 | NA7224 | NA8509 | NA9639 | NA11874 | |
NA3752 | NA4715 | NA5490 | NA6101 | NA6445 | NA6751 | NA7227 | NA8514 | NA9663 | NA13110 | |
NA3768 | NA4720 | NA5516 | NA6142 | NA6450 | NA6756 | NA7230 | NA8519 | NA9674 | NA13253 | |
NA3797 | NA4756 | NA5538 | NA6146 | NA6465 | NA6775 | NA7249 | NA8565 | NA9724 | NA13280 | |
NA3873 | NA4780 | NA5574 | NA6148 | NA6492 | NA6780 | NA7286 | NA8570 | NA9763 | NA13307 | |
NA3895 | NA4832 | NA5579 | NA6155 | NA6499 | NA6794 | NA7299 | NA8646 | NA9776 | NA13362 |
Table 16.
List of spike proteins from Oceania, which were found to be identical with spike proteins from other five continents.
Spike proteins (Oceania) which were found to be identical with spike proteins from other five continents | ||||||
---|---|---|---|---|---|---|
O3 | O105 | O373 | O392 | O419 | O770 | O1037 |
O5 | O114 | O374 | O395 | O422 | O798 | O1059 |
O28 | O148 | O377 | O398 | O504 | O850 | O1076 |
O36 | O201 | O387 | O400 | O625 | O886 | O1079 |
O43 | O225 | O388 | O401 | O631 | O889 | O1104 |
O58 | O238 | O389 | O402 | O633 | O1017 | |
O65 | O368 | O390 | O404 | O645 | O1032 | |
O83 | O370 | O391 | O415 | O751 | O1035 |
Table 17.
List of spike proteins from South America, which were found to be identical with spike proteins from other five continents.
Spike proteins (Oceania) which were found to be identical with spike proteins from other five continents | ||||||
---|---|---|---|---|---|---|
SA1 | SA13 | SA22 | SA29 | SA35 | SA44 | SA66 |
SA4 | SA18 | SA25 | SA30 | SA38 | SA45 | SA68 |
SA5 | SA19 | SA26 | SA31 | SA41 | SA56 | SA70 |
SA7 | SA20 | SA27 | SA32 | SA42 | SA61 | SA71 |
SA11 | SA21 | SA28 | SA33 | SA43 | SA63 |
Fig. 6.
Frequencies of amino acids present in the unique S sequences.
Fig. 7.
Frequencies of amino acids present in the unique S sequences.
Fig. 8.
SE of unique S proteins from different continents.
Fig. 9.
SE of unique S proteins from different continents.
Fig. 10.
Isoelectric point of unique S proteins from different continents.
Fig. 11.
Isoelectric point of unique S proteins from different continents.
References
- 1.Lokman S.M., Rasheduzzaman M., Salauddin A., Barua R., Tanzina A.Y., Rumi M.H., Hossain M.I., A. Z. Sid- diki A.Mannan., M. M. Hasan Exploring the genomic and proteomic variations of SARS-CoV-2 spike glycoprotein: a computational biology approach. Infect. Genet. Evol. 2020;84 doi: 10.1016/j.meegid.2020.104389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Á. Serrano-Aroca K. Takayama A. Tun˜´on-Molina M. Seyran S. S. Hassan P. P. Choudhury V. N. Uversky K. Lund- strom P. Adadi G. Palù et al., Carbon-based nanomaterials: promising antiviral agents to combat COVID-19 in the microbial resistant era, ACS Nano doi:10.1021/acsnano.1c00629. PMID: 33826850. [DOI] [PubMed]
- 3.Hassan S., Ghosh S., Attrish D., Choudhury P.P., Aljabali A.A., Uhal B.D., Lundstrom K., Rezaei N., Uversky V.N., Seyran M., et al. Possible transmission flow of SARS-CoV-2 based on ace2 features. Molecules. 2020;25(24):5906. doi: 10.3390/molecules25245906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mart́ı M., Tun˜´on-Molina A., Aachmann F.L., Muramoto Y., Noda T., Takayama K., Serrano-Aroca Á. Protective face mask filter capable of inactivating SARS-CoV-2, and methicillin-resistant staphylococcus aureus and staphylococcus epidermidis. Polymers. 2021;13(2):207. doi: 10.3390/polym13020207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hassan S.S., Attrish D., Ghosh S., Choudhury P.P., Uversky V.N., Aljabali A.A., Lundstrom K., Uhal B.D., Rezaei N., Seyran M., et al. Notable sequence homology of the orf10 protein introspects the architecture of SARS-CoV-2. Int. J. Biol. Macromol. 2021;181:801–809. doi: 10.1016/j.ijbiomac.2021.03.199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hassan S.S., Aljabali A.A., Panda P.K., Ghosh S., Attrish D., Choudhury P.P., Seyran M., Pizzol D., Adadi P., El-Aziz T.M.Abd, et al. A unique view of SARS-CoV-2 through the lens of ORF8 protein. Comput. Biol. Med. 2021;133:1–14. doi: 10.1016/j.compbiomed.2021.104380. 104380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhang L., Jackson C.B., Mou H., Ojha A., Peng H., Quinlan B.D., Rangarajan E.S., Pan A., Vanderheiden A., Suthar M.S., et al. SARS-CoV-2 spike-protein d614g mutation increases virion spike density and infectivity. Nat. Commun. 2020;11(1):1–9. doi: 10.1038/s41467-020-19808-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Guruprasad L. Human SARS-CoV-2 spike protein mutations. Proteins Struct. Funct. Bioinformatics. 2021;89(5):569–576. doi: 10.1002/prot.26042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Henderson R., Edwards R.J., Mansouri K., Janowska K., Stalls V., Gobeil S.M., Kopp M., Li D., Parks R., Hsu A.L., et al. Controlling the SARS-CoV-2 spike glycoprotein conformation. Nat. Struct. Mol. Biol. 2020;27(10):925–933. doi: 10.1038/s41594-020-0479-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Seyran M., Takayama K., Uversky V.N., Lundstrom K., Sherchan S.P., Attrish D., Rezaei N., Aljabali A.A., Ghosh S., Palù G., et al. The structural basis of accelerated host cell entry by SARS-CoV-2. FEBS J. 2021;288(17):5010–5020. doi: 10.1111/febs.15651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Khatri I., Staal F.J., Van Dongen J.J. Blocking of the high-affinity interaction-synapse between SARS-CoV-2 spike and human ACE2 proteins likely requires multiple high-affinity antibodies: an immune perspective. Front. Immunol. 2020;11 doi: 10.3389/fimmu.2020.570018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bauer G. The potential significance of high avidity immunoglobulin g (IgG) for protective immunity towards SARS-CoV-2. Int. J. Infect. Dis. 2021;106:61–64. doi: 10.1016/j.ijid.2021.01.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hodcroft E.B., Domman D.B., Snyder D.J., Oguntuyo K., Diest M.Van, Densmore K.H., Schwalm K.C., Femling J., Carroll J.L., Scott R.S., et al. Vol. 677. MedRxiv; 2021. Emergence in Late 2020 of Multiple Lineages of SARS-CoV-2 Spike Protein Variants Affecting Amino Acid Position. [Google Scholar]
- 14.Ke Z., Oton J., Qu K., Cortese M., Zila V., McKeane L., Nakane T., Zivanov J., Neufeldt C.J., Cerikan B., et al. Structures and distributions of SARS-CoV-2 spike proteins on intact virions. Nature. 2020;588(7838):498–502. doi: 10.1038/s41586-020-2665-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.MacLean O.A., Orton R.J., Singer J.B., Robertson D.L. No evidence for distinct types in the evolution of SARS-CoV-2. Virus Evol. 2020;6(1) doi: 10.1093/ve/veaa034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.van Dorp L., Acman M., Richard D., Shaw L.P., Ford C.E., Ormond L., Owen C.J., Pang J., Tan C.C., Boshier F.A., et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol. 2020;83 doi: 10.1016/j.meegid.2020.104351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang J., Cai Y., Xiao T., Lu J., Peng H., Sterling S.M., Walsh R.M., Rits-Volloch S., Zhu H., Woosley A.N., et al. Structural impact on SARS-CoV-2 spike protein by D614G substitution. Science. 2021;372(6541):525–530. doi: 10.1126/science.abf2303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Park S.E. Epidemiology, virology, and clinical features of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2; coronavirus disease-19) Clin. Exp. Pediatr. 2020;63(4):119. doi: 10.3345/cep.2020.00493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Callaway E. The coronavirus is mutating-does it matter? Nature. 2020;585(7824):174–177. doi: 10.1038/d41586-020-02544-6. [DOI] [PubMed] [Google Scholar]
- 20.Korber B., Fischer W.M., Gnanakaran S., Yoon H., Theiler J., Abfalterer W., Hengartner N., Giorgi E.E., T. Bhat- tacharya B.Foley, et al. Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell. 2020;182(4):812–827. doi: 10.1016/j.cell.2020.06.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Volz E., Hill V., McCrone J.T., Price A., Jorgensen D., Á. O’Toole J.Southgate, R. Johnson, et al. Evaluating the effects of SARS-CoV-2 spike mutation D614G on transmissibility and pathogenicity. Cell. 2021;184(1):64–75. doi: 10.1016/j.cell.2020.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Williams T.C., Burgers W.A. SARS-CoV-2 evolution and vaccines: cause for concern? Lancet Respir. Med. 2021;9(4):333–335. doi: 10.1016/S2213-2600(21)00075-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Tegally H., Wilkinson E., Giovanetti M., Iranzadeh A., Fonseca V., Giandhari J., Doolabh D., Pillay S., Msomi N., et al. Detection of a SARS-CoV-2 variant of concern in South Africa. Nature. 2021;592(7854):438–443. doi: 10.1038/s41586-021-03402-9. [DOI] [PubMed] [Google Scholar]
- 24.Shen W., Le S., Li Y., Hu F. Seqkit: a cross-platform and ultrafast toolkit for fasta/q file manipulation. PloS one. 2016;11(10) doi: 10.1371/journal.pone.0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kumar S., Stecher G., Li M., Knyaz C., Tamura K. Mega x: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018;35(6):1547. doi: 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Edgar R.C. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Han M.V., Zmasek C.M. Phyloxml: xml for evolutionary biology and comparative genomics. BMC Bioinformatics. 2009;10(1):1–6. doi: 10.1186/1471-2105-10-356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Brooks D.J., Fresco J.R., Lesk A.M., Singh M. Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code. Mol. Biol. Evol. 2002;19(10):1645–1655. doi: 10.1093/oxfordjournals.molbev.a003988. [DOI] [PubMed] [Google Scholar]
- 29.Vacic V., Uversky V.N., Dunker A.K., Lonardi S. Composition profiler: a tool for discovery and visualization of amino acid composition differences. BMC Bioinformatics. 2007;8(1):1–7. doi: 10.1186/1471-2105-8-211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sickmeier M., Hamilton J.A., LeGall T., Vacic V., Cortese M.S., Tantos A., Szabo B., Tompa P., Chen J., Uversky V.N., et al. Disprot: the database of disordered proteins. Nucleic Acids Res. 2007;35(suppl 1):D786–D793. doi: 10.1093/nar/gkl893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hassan S.S., Attrish D., Ghosh S., Choudhury P.P., Roy B. Pathogenetic perspective of missense mutations of orf3a protein of SARS-CoV-2. Virus Res. 2021;300:1–24. doi: 10.1016/j.virusres.2021.198441. 198441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hassan S.S., Choudhury P.P., Roy B., Jana S.S. Missense mutations in SARS-CoV-2 genomes from Indian patients. Genomics. 2020;112(6):4622–4627. doi: 10.1016/j.ygeno.2020.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Strait B.J., Dewey T.G. The shannon information entropy of protein sequences. Biophys. J. 1996;71(1):148–155. doi: 10.1016/S0006-3495(96)79210-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Righetti P.G. Determination of the isoelectric point of proteins by capillary isoelectric focusing. J. Chromatogr. A. 2004;1037(1–2):491–499. doi: 10.1016/j.chroma.2003.11.025. [DOI] [PubMed] [Google Scholar]
- 35.Stekhoven F.S., Gorissen M., Flik G. The isoelectric point, a key to understanding a variety of biochemical problems: a minireview. Fish Physiol. Biochem. 2008;34(1):1–8. doi: 10.1007/s10695-007-9145-6. [DOI] [PubMed] [Google Scholar]
- 36.Adair G. The chemistry of the proteins and amino acids. Annu. Rev. Biochem. 1937;6(1):163–192. [Google Scholar]
- 37.Romero P., Obradovic Z., Li X., Garner E.C., Brown C.J., Dunker A.K. Sequence complexity of disordered protein. Proteins Struct. Funct. Bioinformatics. 2001;42(1):38–48. doi: 10.1002/1097-0134(20010101)42:1<38::aid-prot50>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
- 38.Peng K., Radivojac P., Vucetic S., Dunker A.K., Obradovic Z. Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics. 2006;7(1):1–17. doi: 10.1186/1471-2105-7-208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Peng K., Vucetic S., Radivojac P., Brown C.J., Dunker A.K., Obradovic Z. Optimizing long intrinsic disorder predictors with protein evolutionary information. J. Bioinforma. Comput. Biol. 2005;3(01):35–60. doi: 10.1142/s0219720005000886. [DOI] [PubMed] [Google Scholar]
- 40.Xue B., Dunbrack R.L., Williams R.W., Dunker A.K., Uversky V.N. PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta (BBA)-Proteins Proteomics. 2010;1804(4):996–1010. doi: 10.1016/j.bbapap.2010.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.B. Ḿesz´aros G.Erd˝os., Z. Doszt´anyi IUPred2a: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46(W1):W329–W337. doi: 10.1093/nar/gky384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Necci M., Piovesan D., Tosatto S.C. Critical assessment of protein intrinsic disorder prediction. Nat. Methods. 2021;18(5):472–481. doi: 10.1038/s41592-021-01117-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Baum A., Fulton B.O., Wloga E., Copin R., Pascal K.E., Russo V., Giordano S., Lanza K., Negron N., Ni M., et al. Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science. 2020;369(6506):1014–1018. doi: 10.1126/science.abd0831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liu Z., VanBlargan L.A., Bloyet L.-M., Rothlauf P.W., Chen R.E., Stumpf S., Zhao H., Errico J.M., Theel E.S., Liebeskind M.J., et al. Identification of SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization. Cell Host Microbe. 2021;29(3):477–488. doi: 10.1016/j.chom.2021.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Dearlove B., Lewitus E., Bai H., Li Y., Reeves D.B., Joyce M.G., Scott P.T., Amare M.F., Vasan S., Michael N.L., et al. A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants. Proc. Natl. Acad. Sci. 2020;117(38):23652–23662. doi: 10.1073/pnas.2008281117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Buss L.F., Prete C.A., Abrahim C.M., Mendrone A., Salomon T., de Almeida-Neto C., R. F. Fran¸ca M.C.Belotti, M. P. Carvalho, et al. Three-quarters attack rate of SARS-CoV-2 in the brazilian amazon during a largely unmitigated epidemic. Science. 2021;371(6526):288–292. doi: 10.1126/science.abe9728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Aschwanden C. Five reasons why covid herd immunity is probably impossible. Nature. 2021;591(7851):520–522. doi: 10.1038/d41586-021-00728-2. [DOI] [PubMed] [Google Scholar]
- 48.Gupta R.K. Will SARS-CoV-2 variants of concern affect the promise of vaccines? Nat. Rev. Immunol. 2021:1–2. doi: 10.1038/s41577-021-00556-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Redwan E.M. COVID-19 pandemic and vaccination build herd immunity. Eur. Rev. Med. Pharmacol. Sci. 2021;25(2):577–579. doi: 10.26355/eurrev_202101_24613. [DOI] [PubMed] [Google Scholar]
- 50.Bauer G. The variability of the serological response to SARS-corona virus-2: potential resolution of ambiguity through determination of avidity (functional affinity) J. Med. Virol. 2021;93(1):311–322. doi: 10.1002/jmv.26262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hedman K., Lappalainen M., M. S¨oderlund L.Hedman. Avidity of IgG in serodiagnosis of infectious diseases, reviews in medical. Microbiology. 1993;4(3):123–129. [Google Scholar]
- 52.Junker A.K., Tilley P. Varicella-zoster virus antibody avidity and IgG-subclass patterns in children with recurrent chickenpox. J. Med. Virol. 1994;43(2):119–124. doi: 10.1002/jmv.1890430204. [DOI] [PubMed] [Google Scholar]
- 53.Boppana S.B., Britt W.J. Antiviral antibody responses and intrauterine transmission after primary maternal cytomegalovirus infection. J. Infect. Dis. 1995;171(5):1115–1121. doi: 10.1093/infdis/171.5.1115. [DOI] [PubMed] [Google Scholar]
- 54.Lazzarotto T., Varani S., Spezzacatena P., Gabrielli L., Pradelli P., Guerra B., Landini M.P. Maternal IgG avidity and igm detected by blot as diagnostic tools to identify pregnant women at risk of transmitting cytomegalovirus. Viral Immunol. 2000;13(1):137–141. doi: 10.1089/vim.2000.13.137. [DOI] [PubMed] [Google Scholar]
- 55.Seo S., Cho Y., Park J. Serologic screening of pregnant korean women for primary human cytomegalovirus infection using IgG avidity test. Korean J. Lab. Med. 2009;29(6):557–562. doi: 10.3343/kjlm.2009.29.6.557. [DOI] [PubMed] [Google Scholar]
- 56.Kaneko M., Ohhashi M., Minematsu T., Muraoka J., Kusumoto K., Sameshima H. Maternal immunoglobulin G avidity as a diagnostic tool to identify pregnant women at risk of congenital cytomegalovirus infection. J. Infect. Chemother. 2017;23(3):173–176. doi: 10.1016/j.jiac.2016.12.001. [DOI] [PubMed] [Google Scholar]
- 57.Puschnik A., Lau L., Cromwell E.A., Balmaseda A., Zompi S., Harris E. Correlation between dengue-specific neutralizing antibodies and serum avidity in primary and secondary Dengue virus 3 natural infections in humans. PLoS Negl. Trop. Dis. 2013;7(6) doi: 10.1371/journal.pntd.0002274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Delgado M.F., Coviello S., Monsalvo A.C., Melendi G.A., Hernandez J.Z., Batalle J.P., Diaz L., Trento A., Chang H.-Y., Mitzner W., et al. Lack of antibody affinity maturation due to poor toll-like receptor stimulation leads to enhanced respiratory syncytial virus disease. Nat. Med. 2009;15(1):34–41. doi: 10.1038/nm.1894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lai L., D. V¨odr¨os P.A.Kozlowski, D. C. Montefiori, et al. GM-CSF DNA: an adjuvant for higher avidity IgG, rectal IgA, and increased protection against the acute phase of a SHIV-89.6P challenge by a DNA/MVA immunodeficiency virus vaccine. Virology. 2007;369(1):153–167. doi: 10.1016/j.virol.2007.07.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bauer G., Struck F., Schreiner P., Staschik E., Soutschek E., Motz M. 2020. The Serological Response to SARS Corona Virus-2 Is Characterized by Frequent Incomplete Maturation of Functional Affinity (Avidity) [Google Scholar]
- 61.Edridge A.W., Kaczorowska J., Hoste A.C., Bakker M., Klein M., Loens K., Jebbink M.F., Matser A., Kinsella C.M., Rueda P., et al. Seasonal coronavirus protective immunity is short-lasting. Nat. Med. 2020;26(11):1691–1693. doi: 10.1038/s41591-020-1083-1. [DOI] [PubMed] [Google Scholar]
- 62.Galanti M., Shaman J. Direct observation of repeated infections with endemic coronaviruses. J. Infect. Dis. 2021;223(3):409–415. doi: 10.1093/infdis/jiaa392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Struck F., Schreiner P., Staschik E., Wochinz-Richter K., Schulz S., Soutschek E., Motz M., Bauer G. Vaccination versus infection with SARS-CoV-2: establishment of a high avidity igg response versus incomplete avidity maturation. J. Med. Virol. 2021 doi: 10.1002/jmv.27270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.WHO-COVID-19-Dashboard COVID live update. https://covid19.who.int/table
- 65.Wu S.L., Mertens A.N., Crider Y.S., Nguyen A., Pokpongkiat N.N., Djajadi S., Seth A., Hsiang M.S., Colford J.M., Reingold A., et al. Substantial underestimation of SARS-CoV-2 infection in the United States. Nat. Commun. 2020;11(1):1–10. doi: 10.1038/s41467-020-18272-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Mohanan M., Malani A., Krishnan K., Acharya A. Prevalence of SARS-CoV-2 in Karnataka, India. Jama. 2021;325(10):1001–1003. doi: 10.1001/jama.2021.0332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Asselta R., Paraboschi E.M., Mantovani A., Duga S. Ace2 and tmprss2 variants and expression as candidates to sex and country differences in COVID-19 severity in Italy. Aging (Albany NY) 2020;12(11):10087. doi: 10.18632/aging.103415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Srivastava A., Bandopadhyay A., Das D., Pandey R.K., Singh V., Khanam N., Srivastava N., Singh P.P., Dubey P.K., Pathak A., et al. Genetic association of ACE2 RS285666 polymorphism with COVID-19 spatial distribution in India. Front. Genet. 2020;11:1163. doi: 10.3389/fgene.2020.564741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.G. Vargas-Alarc´on R.Posadas-S´anchez., J. Raḿırez-Bello Variability in genes related to SARS-CoV-2 entry into host cells (ACE2, TMPRSS2, TMPRSS11A, ELANE, and CTSL) and its potential use in association studies. Life Sci. 2020;260 doi: 10.1016/j.lfs.2020.118313. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
A complete list of unique S protein accessions and their names (continent-wise).
The frequency of each amino acid was computed for each unique S protein available in six continents and Shannon entropy for each S protein sequence computed using the formula stated in section 2.2.
Isoelectric point (pI) computed for each S protein sequence from each continent.
Phylogenetic tree of the unique amino acid sequences from SARS-CoV-2 S-protein showing polyphyletic groups and sequences from different countries grouping together in the same clade