Implications derived from S-protein variants of SARS-CoV-2 from six continents

Sk Sarif Hassan; Kenneth Lundstrom; Debmalya Barh; Raner Jośe Santana Silva; Bruno Silva Andrade; Vasco Azevedo; Pabitra Pal Choudhury; Giorgio Palu; Bruce D Uhal; Ramesh Kandimalla; Murat Seyran; Amos Lal; Samendra P Sherchan; Gajendra Kumar Azad; Alaa AA Aljabali; Adam M Brufsky; Ángel Serrano-Aroca; Parise Adadi; Tarek Mohamed Abd El-Aziz; Elrashdy M Redwan; Kazuo Takayama; Nima Rezaei; Murtaza Tambuwala; Vladimir N Uversky

doi:10.1016/j.ijbiomac.2021.09.080

. 2021 Sep 24;191:934–955. doi: 10.1016/j.ijbiomac.2021.09.080

Implications derived from S-protein variants of SARS-CoV-2 from six continents

Sk Sarif Hassan ^a,^⁎, Kenneth Lundstrom ^b, Debmalya Barh ^c,^d, Raner Jośe Santana Silva ^e, Bruno Silva Andrade ^f, Vasco Azevedo ^g, Pabitra Pal Choudhury ^h, Giorgio Palu ⁱ, Bruce D Uhal ^j, Ramesh Kandimalla ^k,^l, Murat Seyran ^m, Amos Lal ⁿ, Samendra P Sherchan ^o, Gajendra Kumar Azad ^p, Alaa AA Aljabali ^q, Adam M Brufsky ^r, Ángel Serrano-Aroca ^s, Parise Adadi ^t, Tarek Mohamed Abd El-Aziz ^u,^v, Elrashdy M Redwan ^w,^x, Kazuo Takayama ^y, Nima Rezaei ^z,^aa, Murtaza Tambuwala ^ab, Vladimir N Uversky ^ac,^ad,^⁎

^aDepartment of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur 721140, West Bengal, India

^bPanTherapeutics, Rte de Lavaux 49, CH1095 Lutry, Switzerland

^cCentre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, WB, India

^dDepartment of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil

^eDepartment of Biological Sciences (DCB), Graduate Program in Genetics and Molecular Biology (PPGGBM), State University of Santa Cruz (UESC), Rodovia Ilheus-Itabuna, km 16, 45662-900 Ilheus, BA, Brazil

^fLaboratory of Bioinformatics and Computational Chemistry, Department of Biological Sciences, State University of Southwest Bahia (UESB), Jequié 45206-190, Brazil

^gLaborat'orio de Geńetica Celular e Molecular, Departamento de Genetica, Ecologia e Evolucao, Instituto de Ciˆencias Biol'ogicas, Universidade Federal de Minas Gerais, Belo Horizonte CEP 31270-901, Brazil

^hApplied Statistics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700108, India

ⁱDepartment of Molecular Medicine, University of Padova, Via Gabelli 63, 35121 Padova, Italy

^jDepartment of Physiology, Michigan State University, East Lansing, MI 48824, USA

^kApplied Biology, CSIR-Indian Institute of Chemical Technology, Uppal Road, Tarnaka, Hyderabad 500007, India

^lDepartment of Biochemistry, Kakatiya Medical College, Warangal, Telangana, India

^mDoctoral Studies in Natural and Technical Sciences (SPL 44), University of Vienna, W¨ahringer Straße, A-1090 Vienna, Austria

ⁿDivision of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA

^oDepartment of Environmental Health Sciences, Tulane University, New Orleans, LA 70112, USA

^pDepartment of Zoology, Patna University, Patna, Bihar, India

^qDepartment of Pharmaceutics and Pharmaceutical Technology, Yarmouk University, Faculty of Pharmacy, Irbid 566, Jordan

^rUniversity of Pittsburgh School of Medicine, Department of Medicine, Division of Hematology/Oncology, UPMC Hillman Cancer Center, Pittsburgh, PA, USA

^sBiomaterials and Bioengineering Lab, Centro de Investigaci'on Traslacional San Alberto Magno, Universidad Cat́olica de Valencia San Vicente Ḿartir, c/Guillem de Castro, 94, 46001 Valencia, Spain

^tDepartment of Food Science, University of Otago, Dunedin 9054, New Zealand

^uZoology Department, Faculty of Science, Minia University, El-Minia 61519, Egypt

^vDepartment of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229-3900, USA

^wFaculty of Science, Department of Biological Science, King Abdulazizi University, Jeddah 21589, Saudi Arabia

^xTherapeutic and Protective Proteins Laboratory, Protein Research Department, Genetic Engineering and Biotechnology Research Institute, City for Scientific Research and Technology Applications, New Borg El-Arab, Alexandria 21934, Egypt

^yCenter for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto 606-8507, Japan

^zResearch Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran

^aaNetwork of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Stockholm, Sweden

^abSchool of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine BT52 1SA, Northern Ireland, UK

^acDepartment of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA

^adCenter for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russia

^⁎

Corresponding authors.

PMCID: PMC8462006 PMID: 34571123

Abstract

The spike (S) protein is a critical determinant of the infectivity and antigenicity of SARS-CoV-2. Several mutations in the S protein of SARS-CoV-2 have already been detected, and their effect in immune system evasion and enhanced transmission as a cause of increased morbidity and mortality are being investigated. From pathogenic and epidemiological perspectives, S proteins are of prime interest to researchers. This study focused on the unique variants of S proteins from six continents: Asia, Africa, Europe, Oceania, South America, and North America. In comparison to the other five continents, Africa had the highest percentage of unique S proteins (29.1%). The phylogenetic relationship implies that unique S proteins from North America are significantly different from those of the other five continents. They are most likely to spread to the other geographic locations through international travel or naturally by emerging mutations. It is suggested that restriction of international travel should be considered, and massive vaccination as an utmost measure to combat the spread of the COVID-19 pandemic. It is also further suggested that the efficacy of existing vaccines and future vaccine development must be reviewed with careful scrutiny, and if needed, further re-engineered based on requirements dictated by new emerging S protein variants.

Keywords: SARS-CoV-2, Invariant residues, Mutations, Spike protein, Continents, Vaccines

1. Introduction

The world is experiencing a health emergency due to the Coronavirus disease (COVID-19), caused by an enveloped positive-sense single-stranded virus, the severe acute respiratory syndrome coronavirus (SARS-CoV-2) [1], [2], [3], [4], [5], [6]. The spike (S) protein is a homotrimer present on the surface of the SARS-CoV-2 and recognizes the human host cell surface receptor angiotensin-converting enzyme-2 (ACE2) [7], [8], [9], [10]. The interaction between the S protein of SARS-CoV-2 and its cellular receptor ACE2 is driven by high affinity/avidity. Therefore, neutralization by antibodies does not only require specifically binding antibodies, but antibodies that have high affinity/avidity towards the S1 subunit of the S protein [11]. It is worth mentioning that this particular aspect is directly related to the variability of the S1 subunit (and its isoelectric points) as this may modulate the affinity of binding [12]. The importance of antibody avidity for protection towards SARS-CoV-2 (and other viruses) has been recently reviewed [12]. From the beginning of the second wave of COVID-19 infection, various SARS-CoV-2 variants emerged raising concern of enhanced transmission and mortality of the virus and reduced efficacy of vaccine protection [13], [14]. Some of the studies opposed the perception of SARS-CoV-2 mutations as distinctive pathogenic variants and the increased rate of transmissibility were questioned [15], [16]. However, the frequency of the mutant strains within the SARS-CoV-2 population carrying the D614G mutation in the S protein clearly plays a role in enabling the virus to spread more effectively and rapidly [17]. Epidemiologists have been constantly monitoring the evolution of SARS-CoV-2 with a particular focus on the S protein and other interacting proteins of the virus [17], [18]. The D614G mutation in the S protein discovered in early 2020 makes the virus able to spread more effectively and rapidly [19]. The D614G mutation has been found to be related with high viral loads in infected patients, and high rate of infections, but not with increased disease severity [20]. Various mutations in the S protein make the SARS-CoV-2 more complex and hence it is more difficult to characterize its severity, infectivity and efficacy of vaccines designed to target the S protein. Not all mutations are advantageous to the virus but several mutations or a set of mutations may increase the transmission potential through an increase in receptor binding or the ability to evade the host immune response by altering the surface structures recognized by antibodies [21], [22], [23].

To contain the spread of COVID-19, it is definitely of high interest to detect and identify various unique emerging variants of S proteins. Additionally, it is also worth investigating the impact of new S protein variants on viral infectivity and potential to spread rapidly as well as to ascertain the origin of the spread of the new variants concerning S protein variabilities. Accordingly, it might be possible to segregate the set of new variants with respect to individual characteristics of SARS-CoV-2, which would undoubtedly help policy makers to form various strategies to contain the spread of the virus. There are a large number of different SARS-CoV-2 S protein mutant sequences currently available in the National Center for Biotechnology Information (NCBI) virus database. In this study, all available S protein sequences from six continents Asia, Africa, Europe, North America, South America, and Oceania were analyzed for their uniqueness and variability. An inter-linkage was made among the unique S proteins available on the six continents.

2. Data acquisition and methods

S protein sequences from all six continents (Asia, Africa, Europe, Oceania, South America, and North America) were downloaded in FASTA format from the NCBI database (http://www.ncbi.nlm.nih.gov/). Further, FASTA files were processed in Matlab-2021a for extracting unique S protein sequences for each continent.

2.1. Phylogenetic analysis

To filter sequences with low quality (unknown amino acids ‘X’) and remove redundant sequences, the SeqKit tool was used, with the tools fx2tab and rmdup, respectively [24]. The filter removed all sequences that had one or more ‘X’ and all redundant sequences (100% identical). The amino acid sequences were aligned using the MegaX program with MUSCLE algorithm, and after it a phylogeny calculation was performed with the Neighbor-joining method, considering 3919 taxa sequences and 530 sites [25], [26]. The alignment was used as input in Archeopteryx 0.9914 with the multiple alignment inference option, following the parameters of maximum allowed gaps ratio 0.5, minimum allowed non-gap sequence length 50 and distance calculator Kimura correction [27]. The phylogenetic trees were analyzed and edited in the Archeopteryx 0.9914 tool.

2.2. Frequency probability of amino acids

Any protein sequence is composed of twenty different amino acids with various frequencies starting from zero. The ability of occurrence of each amino acid Ai is determined by the formula $\frac{f (A_{i})}{l}$ where f(A _i) denotes the frequency of occurrence of the amino acid A _i in a primary sequence, and l stands for the length of an S protein [28]. Hence for each S protein, a twenty-dimensional vector considering the frequency probability of twenty amino acids can be obtained. Based on this frequency probability, the dominance of amino acid density in a given protein is illuminated.

2.3. Evaluation of normalized amino acid compositions

The variability of the amino acid compositions of the unique S proteins from each continent was evaluated using the web-based tool Composition Profiler (http://www.cprofiler.org/) that automates detection of enrichment or depletion patterns of individual amino acids or groups of amino acids in query proteins [29]. In this analysis, we used sets of unique S proteins from each continent as query samples and the amino acid of the original S protein (UniProt ID: P0DTC2) as a reference sample that provides the background amino acid distribution. Composition profiler generates a bar chart composed of twenty data points (one for each amino acid), where bar heights indicate normalized enrichment or depletion of a given residue. The normalized enrichment/depletion is calculated as

\frac{C_{continent} - C_{origimal}}{C_{original}}

where C _continent is the content of given residue in the query set of S proteins on a given continent and C _original is the content of the same residue in the original S protein. For comparison, we generated composition profiles of disordered proteins, where normalized composition was evaluated as $\frac{C_{DisProt} - C_{PDB}}{C_{PDB}}$ (C _DisProt is the content of a given amino acid in the set of intrinsically disordered proteins in the DisProt database [30]; C _PDB is the content of the given residue in the dataset of fully ordered proteins, PDB-Select-25 [29]). In these analyses, the positive and negative values produced in the compositional profiler indicated enrichment or depletion of the indicated residue, respectively.

2.4. Amino acid conservation Shannon entropy

How conserved/disordered the amino acids are organized in the S protein is addressed by the information-theoretic measure known as ‘Shannon entropy’ (SE). For each S protein, Shannon entropy of amino acid conservation in the amino acid sequence of the S protein is computed using the following formula [31], [32]:

For a given amino acid sequence of length l, the conservation of amino acids is calculated as follows:
$SE = - \sum_{i = 1}^{20} p_{s_{i}} {log}_{20} (p_{s_{i}})$

where $p_{s_{i}} = \frac{k_{i}}{l}$ ; k _i represents the number of occurrences of an amino acid s _i in the given sequence [33].

2.5. Isoelectric point of a protein sequence

The isoelectric point (pI), is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean. We calculated the theoretical pI by using the pKa's of amino acids and summing the net charge across the protein at a given pH (default is typical intracellular pH 7.2), searching with our algorithm for the pH at which the net charge is zero [34]. The isoelectric point is a powerful tool to predict and understand interactions between proteins, proteins and membranes or to determine the presence of protein isoforms [35]. Furthermore, it is noted that the isoelectric point is one of the prime keys for understanding a variety of biochemical properties of protein sequences [35], [36]. Note that the isoelectric point of a protein sequence was computed here using the standard routine of Matlab-2021a. This parameter was deployed to characterize the unique S protein sequences, quantitatively.

2.6. Intrinsic disorder analysis

Intrinsic disorder predisposition of the S protein from the original (Wuhan) version of SARS-CoV-2 was analyzed by a set of six commonly used disorder predictors, such as PONDR® VLXT, PONDR® VL3, PONDR® VSL2B, PONDR® FIT, IUPred2 (Short) and IUPred2 (Long), which were selected for their specific features. The outputs of the evaluation of the per-residue disorder propensity by these tools are represented as real numbers between 1 (ideal prediction of disorder) and 0 (ideal prediction of order) [37], [38], [39], [40], [41]. Thresholds of ≥0.15 and ≥0.5 were used to identify flexible and disordered residues and regions.The intrinsic disorder profile of this protein was generated by DiSpi/RIDAO web-crawler that combines the outputs of PONDR® VLXT, PONDR® VL3, PONDR® VLS2B, PONDR® FIT, IUPred2 (Short) and IUPred2 (Long) on the one plot and complement them by the errors evaluated for the mean disorder profile calculated by averaging profiles of individual predictors. Analysis of intrinsic disorder predisposition of unique variants of the S protein was conducted by PONDR® VSL2B. This tool is commonly used in the analysis of disorder predisposition of proteins and systematically shows good performance in various comparative analyses, including the recently conducted Critical assessment of protein intrinsic disorder prediction (CAID) experiment, where PONDR® VLS2B was recognized as predictor #3 of the 43 evaluated methods [42].

3. Results

We first determined the set of unique S protein sequences from each continent. Further, every unique S protein from a continent was compared with other unique S proteins from five other continents, and the lists of the same are presented in Table 12, Table 13, Table 14, Table 15, Table 16, Table 17. Also, the variability of the S proteins from each continent was shown using Shannon entropy and isoelectric point.

3.1. Unique S proteins on the continents

In Table 1, the number of total sequences, unique sequences and percentages are presented. Note that, a complete list of unique S protein accessions and their names (continent-wise) are made available in Supplementary file-1. Note that, sequence accession is renamed as Ck where C stands for continent code (Asia: AS, Africa: AF, Oceania: O, Europe: U, South America: SA, and North America: NA), and k denotes the serial number.

Table 1.

Percentages of continent-wise unique spike (S) proteins.

Continent	Total S proteins (T)	Unique S proteins (U)	Percentage, continent-wise $\frac{U}{T} \times 100$	Percentage, worldwide $\frac{U}{16, 143} \times 100$
Africa	984	286	29.065	1.772
Asia	2314	432	18.669	2.676
Europe	1006	187	18.588	1.158
Oceania	9920	1121	11.300	6.944
South America	464	71	15.302	0.440
North America	113,072	14,046	12.422	87.010
Worldwide	127,760	16,143	12.635	–

Distance matrix	Africa	Asia	Europe	Oceania	South America	North America
Africa	0.00	11.70	4.69	12.77	8.49	66.80
Asia	11.70	0.00	13.00	9.06	14.04	57.02
Europe	4.69	13.00	0.00	13.30	8.49	68.38
Oceania	12.77	9.06	13.30	0.00	16.03	56.84
South America	8.49	14.04	8.49	16.03	0.00	69.02
North America	66.80	57.02	68.38	56.84	69.02	0.00

SE: Continent	Interval of SEs
SE of S protein: Africa	(0.960825, 0.963239)
SE of S protein: Asia	(0.961471, 0.963326)
SE of S protein: Europe	(0.961539, 0.963254)
SE of S protein: North America	(0.95934, 0.964314)
SE of S protein: Oceania	(0.961525, 0.963042)
SE of S protein: South America	(0.961589, 0.962895)

pI: Continent	Interval of PIs
pI of S protein: Africa	(6.44, 7.09)
pI of S protein: Asia	(6.21, 7.08)
pI of S protein: Europe	(6.21, 6.99)
pI of S protein: North America	(5.61, 7.79)
pI of S protein: Oceania	(6.31, 7.09)
pI of S protein: South America	(6.36, 6.99)

Spike: Asia-Europe	Spike: Asia-Africa	Spike: Asia-Oceania	Spike: Asia-South America	Spike: Asia-North America
(A14, U2)	(A14, AF2)	(A15, O5)	(A31, SA1)	(A1, NA7)
(A15, U3)	(A15, AF3)	(A77, O43)	(A67, SA4)	(A8, NA231)
(A30, U8)	(A26, AF19)	(A95, O58)	(A148, SA13)	(A12, NA902)
(A31, U9)	(A71, AF48)	(A109, O83)	(A180, SA19)	(A14, NA928)
(A33, U11)	(A93, AF58)	(A128, O201)	(A191, SA22)	(A15, NA992)
(A36, U17)	(A128, AF72)	(A138, O370)	(A200, SA25)	(A19, NA1131)
(A43, U18)	(A138, AF76)	(A142, O373)	(A207, SA27)	(A23, NA1445)
(A69, U23)	(A142, AF79)	(A148, O377)	(A211, SA30)	(A28, NA2065)
(A77, U26)	(A148, AF82)	(A166, O387)	(A213, SA32)	(A30, NA3228)
(A93, U28)	(A161, AF88)	(A206, O388)	(A219, SA33)	(A31, NA3313)
(A95, U30)	(A164, AF92)	(A213, O390)	(A234, SA35)	(A32, NA3438)
(A105, U34)	(A166, AF101)	(A253, O398)	(A280, SA41)	(A33, NA3477)
(A128, U52)	(A191, AF115)	(A277, O400)	(A284, SA42)	(A34, NA3658)
(A134, U54)	(A206, AF118)	(A284, O402)	(A335, SA61)	(A43, NA3752)
(A135, U57)	(A213, AF120)	(A305, O504)	(A340, SA63)	(A44, NA3768)
(A148, U63)	(A275, AF130)	(A359, O1076)	(A373, SA68)	(A58, NA3911)
(A213, U80)	(A276, AF131)	(A404, O1104)	(A404, SA71)	(A69, NA4028)
(A234, U84)	(A277, AF134)			(A71, NA4051)
(A239, U88)	(A279, AF137)			(A76, NA4169)
(A265, U94)	(A282, AF138)			(A77, NA4243)
(A284, U99)	(A292, AF147)			(A78, NA4270)
(A286, U100)	(A379, AF229)			(A85, NA4296)
(A333, U121)	(A394, AF247)			(A89, NA4375)
(A340, U124)	(A404, AF263)			(A90, NA4394)
(A379, U151)	(A430, AF278)			(A91, NA4436)
(A404, U181)				(A93, NA4448)
(A430, U187)				(A95, NA4508)

Spike: Asia-North America	Spike: Asia-North America	Spike: Asia-North America	Spike: Asia-North America	Spike: Asia-North America
(A96, NA4537)	(A166, NA5819)	(A214, NA6445)	(A267, NA6903)	(A345, NA9597)
(A97, NA4541)	(A170, NA5927)	(A215, NA6465)	(A273, NA6916)	(A348, NA9612)
(A100, NA4559)	(A171, NA5977)	(A216, NA6492)	(A274, NA6936)	(A351, NA9663)
(A101, NA4620)	(A173, NA5992)	(A217, NA6499)	(A275, NA6944)	(A354, NA9674)
(A102, NA4637)	(A174, NA6060)	(A218, NA6510)	(A276, NA6949)	(A356, NA9724)
(A103, NA4658)	(A175, NA6067)	(A219, NA6515)	(A277, NA6962)	(A357, NA9763)
(A105, NA4715)	(A177, NA6071)	(A221, NA6527)	(A278, NA6969)	(A358, NA9776)
(A109, NA4861)	(A178, NA6080)	(A222, NA6540)	(A279, NA7000)	(A359, NA9792)
(A111, NA4897)	(A180, NA6101)	(A223, NA6550)	(A280, NA7015)	(A360, NA9834)
(A114, NA5001)	(A181, NA6142)	(A224, NA6553)	(A282, NA7025)	(A367, NA10276)
(A115, NA5022)	(A182, NA6148)	(A230, NA6602)	(A283, NA7056)	(A373, NA10342)
(A121, NA5105)	(A183, NA6155)	(A233, NA6616)	(A284, NA7090)	(A375, NA10442)
(A122, NA5137)	(A191, NA6185)	(A234, NA6622)	(A286, NA7129)	(A378, NA11135)
(A126, NA5151)	(A193, NA6193)	(A235, NA6630)	(A291, NA7198)	(A379, NA11225)
(A127, NA5182)	(A195, NA6244)	(A238, NA6659)	(A292, NA7227)	(A380, NA11305)
(A128, NA5194)	(A196, NA6258)	(A239, NA6661)	(A293, NA7249)	(A381, NA11560)
(A133, NA5471)	(A198, NA6276)	(A244, NA6683)	(A304, NA7576)	(A383, NA11874)
(A134, NA5485)	(A199, NA6293)	(A245, NA6687)	(A322, NA8509)	(A386, NA13280)
(A135, NA5516)	(A200, NA6299)	(A247, NA6707)	(A323, NA8519)	(A387, NA13307)
(A138, NA5538)	(A201, NA6305)	(A249, NA6713)	(A324, NA8565)	(A388, NA13362)
(A140, NA5574)	(A205, NA6324)	(A253, NA6751)	(A325, NA8570)	(A391, NA13404)
(A148, NA5595)	(A206, NA6334)	(A254, NA6756)	(A333, NA9283)	(A394, NA13438)
(A158, NA5644)	(A207, NA6373)	(A255, NA6780)	(A335, NA9324)	(A395, NA13444)
(A159, NA5645)	(A210, NA6388)	(A257, NA6794)	(A341, NA9425)	(A396, NA13465)
(A161, NA5666)	(A211, NA6406)	(A258, NA6810)	(A342, NA9455)	(A399, NA13554)
(A163, NA5722)	(A212, NA6424)	(A264, NA6857)	(A343, NA9568)	(A401, NA13614)
(A164, NA5744)	(A213, NA6429)	(A265, NA6862)	(A344, NA9592)	(A404, NA13635)
				(A405, NA13668)
				(A408, NA13704)
				(A413, NA13841)
				(A418, NA13913)
				(A419, NA13948)
				(A430, NA14000)
				(A431, NA14026)

Spike: Africa-Europe	Spike: Africa-North America	Spike: Africa-North America	Spike: Africa-Oceania	Spike: Africa-South America	Spike: Europe-North America
(AF2, U2)	(AF2, NA928)	(AF121, NA6566)	(AF1, O3)	(AF82, SA13)	(U2, NA928)
(AF3, U3)	(AF3, NA992)	(AF123, NA6628)	(AF3, O5)	(AF115, SA22)	(U3, NA992)
(AF31, U10)	(AF8, NA1298)	(AF125, NA6816)	(AF71, O148)	(AF117, SA26)	(U4, NA1221)
(AF58, U28)	(AF9, NA1348)	(AF128, NA6848)	(AF72, O201)	(AF120, SA32)	(U7, NA2680)
(AF69, U45)	(AF31, NA3387)	(AF130, NA6944)	(AF76, O370)	(AF263, SA71)	(U8, NA3228)
(AF72, U52)	(AF34, NA3583)	(AF131, NA6949)	(AF79, O373)		(U9, NA3313)
(AF82, U63)	(AF38, NA3797)	(AF133, NA6953)	(AF82, O377)		(U10, NA3387)
(AF120, U80)	(AF46, NA3986)	(AF134, NA6962)	(AF101, O387)		(U11, NA3477)
(AF123, U85)	(AF47, NA3988)	(AF137, NA7000)	(AF118, O388)		(U18, NA3752)
(AF145, U103)	(AF48, NA4051)	(AF138, NA7025)	(AF120, O390)		(U22, NA3895)
(AF195, U119)	(AF50, NA4061)	(AF145, NA7199)	(AF134, O400)		(U23, NA4028)
(AF229, U151)	(AF51, NA4117)	(AF146, NA7224)	(AF179, O751)		(U26, NA4243)
(AF230, U154)	(AF58, NA4448)	(AF147, NA7227)	(AF263, O1104)		(U28, NA4448)
(AF263, U181)	(AF64, NA4832)	(AF149, NA7286)		Spike: Oceania-South America	(U30, NA4508)
(AF278, U187)	(AF69, NA5149)	(AF151, NA7299)		(O377, SA13)	(U34, NA4715)
	(AF71, NA5188)	(AF152, NA7300)		(O389, SA28)	(U36, NA4780)
	(AF72, NA5194)	(AF154, NA7375)		(O390, SA32)	(U38, NA4837)
	(AF73, NA5202)	(AF156, NA7453)		(O402, SA42)	(U41, NA4989)
	(AF76, NA5538)	(AF165, NA7553)		(O1104, SA71)	(U42, NA5083)
	(AF82, NA5595)	(AF168, NA7644)			(U45, NA5149)
	(AF83, NA5606)	(AF179, NA8514)			(U47, NA5167)
	(AF88, NA5666)	(AF195, NA9264)			(U52, NA5194)
	(AF90, NA5693)	(AF196, NA9265)			(U53, NA5282)
	(AF92, NA5744)	(AF223, NA10257)			(U54, NA5485)
	(AF99, NA5818)	(AF227, NA10943)			(U55, NA5490)
	(AF101, NA5819)	(AF229, NA11225)			(U57, NA5516)
	(AF103, NA5829)	(AF230, NA11456)			(U63, NA5595)
	(AF104, NA5830)	(AF231, NA11576)			(U66, NA5627)
	(AF105, NA5837)	(AF247, NA13438)			(U72, NA6096)
	(AF108, NA5874)	(AF248, NA13478)			(U76, NA6240)
	(AF114, NA6178)	(AF254, NA13578)			(U78, NA6399)
	(AF115, NA6185)	(AF263, NA13635)			(U79, NA6421)
	(AF118, NA6334)	(AF268, NA13798)			(U80, NA6429)
	(AF119, NA6390)	(AF271, NA13870)			(U82, NA6450)
	(AF120, NA6429)	(AF278, NA14000)			(U84, NA6622)
		(AF283, NA14015)			(U85, NA6628)
					(U88, NA6661)
					(U90, NA6704)

Spike: Europe-North America	Spike: Europe-Oceania	Spike: North America-Oceania	Spike: North America-Oceania	Spike: South America-North America
(U92, NA6723)	(U3, O5)	(NA992, O5)	(NA6751, O398)	(NA3313, SA1)
(U93, NA6775)	(U26, O43)	(NA3873, O28)	(NA6962, O400)	(NA4550, SA5)
(U94, NA6862)	(U30, O58)	(NA4024, O36)	(NA7060, O401)	(NA4720, SA7)
(U98, NA7057)	(U52, O201)	(NA4243, O43)	(NA7090, O402)	(NA4989, SA11)
(U99, NA7090)	(U63, O377)	(NA4508, O58)	(NA7230, O404)	(NA5595, SA13)
(U100, NA7129)	(U80, O390)	(NA4756, O65)	(NA7355, O415)	(NA5687, SA18)
(U103, NA7199)	(U99, O402)	(NA4861, O83)	(NA7402, O419)	(NA6101, SA19)
(U104, NA7312)	(U118, O1032)	(NA5011, O105)	(NA7510, O422)	(NA6146, SA20)
(U106, NA7431)	(U181, O1104)	(NA5041, O114)	(NA7811, O625)	(NA6161, SA21)
(U107, NA7557)		(NA5188, O148)	(NA7832, O631)	(NA6185, SA22)
(U111, NA7679)	Spike: Europe-South America	(NA5194, O201)	(NA7845, O633)	(NA6299, SA25)
(U112, NA7884)	(U9, SA1)	(NA5200, O225)	(NA7901, O645)	(NA6373, SA27)
(U113, NA7914)	(U41, SA11)	(NA5205, O238)	(NA8514, O751)	(NA6395, SA28)
(U114, NA9075)	(U63, SA13)	(NA5372, O368)	(NA8646, O770)	(NA6396, SA29)
(U116, NA9180)	(U80, SA32)	(NA5538, O370)	(NA8703, O798)	(NA6406, SA30)
(U117, NA9189)	(U84, SA35)	(NA5579, O374)	(NA8787, O850)	(NA6418, SA31)
(U119, NA9264)	(U99, SA42)	(NA5595, O377)	(NA8817, O886)	(NA6429, SA32)
(U121, NA9283)	(U124, SA63)	(NA5819, O387)	(NA8824, O889)	(NA6515, SA33)
(U122, NA9284)	(U181, SA71)	(NA6334, O388)	(NA9091, O1017)	(NA6622, SA35)
(U123, NA9330)		(NA6395, O389)	(NA9333, O1035)	(NA6696, SA38)
(U126, NA9458)		(NA6429, O390)	(NA9350, O1037)	(NA7015, SA41)
(U131, NA10312)		(NA6577, O391)	(NA9639, O1059)	(NA7090, SA42)
(U137, NA10457)		(NA6578, O392)	(NA9792, O1076)	(NA7430, SA43)
(U141, NA10669)		(NA6620, O395)	(NA9891, O1079)	(NA7477, SA44)
(U144, NA10811)			(NA13635, O1104)	(NA7521, SA45)
(U146, NA10987)				(NA7892, SA56)
(U148, NA11013)				(NA9324, SA61)
(U151, NA11225)				(NA9910, SA66)
(U153, NA11367)				(NA10342, SA68)
(U154, NA11456)				(NA13390, SA70)
(U155, NA11466)				(NA13635, SA71)
(U158, NA13110)
(U160, NA13253)
(U175, NA13414)
(U177, NA13551)
(U179, NA13626)
(U181, NA13635)
(U187, NA14000)

Spike proteins (Asia) which were found to be identical with spike proteins from other five continents
A1	A71	A115	A171	A207	A239	A280	A344	A388
A8	A76	A121	A173	A210	A244	A282	A345	A391
A12	A77	A122	A174	A211	A245	A283	A348	A394
A14	A78	A126	A175	A212	A247	A284	A351	A395
A15	A85	A127	A177	A213	A249	A286	A354	A396
A19	A89	A128	A178	A214	A253	A291	A356	A399
A23	A90	A133	A180	A215	A254	A292	A357	A401
A26	A91	A134	A181	A216	A255	A293	A358	A404
A28	A93	A135	A182	A217	A257	A304	A359	A405
A30	A95	A138	A183	A218	A258	A305	A360	A408
A31	A96	A140	A191	A219	A264	A322	A367	A413
A32	A97	A142	A193	A221	A265	A323	A373	A418
A33	A100	A148	A195	A222	A267	A324	A375	A419
A34	A101	A158	A196	A223	A273	A325	A378	A430
A36	A102	A159	A198	A224	A274	A333	A379	A431
A43	A103	A161	A199	A230	A275	A335	A380
A44	A105	A163	A200	A233	A276	A340	A381
A58	A109	A164	A201	A234	A277	A341	A383
A67	A111	A166	A205	A235	A278	A342	A386
A69	A114	A170	A206	A238	A279	A343	A387

Spike proteins (Africa) which were found to be identical with spike proteins from other five continents
AF1	AF34	AF58	AF79	AF101	AF117	AF128	AF145	AF156	AF227	AF263
AF2	AF38	AF64	AF82	AF103	AF118	AF130	AF146	AF165	AF229	AF268
AF3	AF46	AF69	AF83	AF104	AF119	AF131	AF147	AF168	AF230	AF271
AF8	AF47	AF71	AF88	AF105	AF120	AF133	AF149	AF179	AF231	AF278
AF9	AF48	AF72	AF90	AF108	AF121	AF134	AF151	AF195	AF247	AF283
AF19	AF50	AF73	AF92	AF114	AF123	AF137	AF152	AF196	AF248
AF31	AF51	AF76	AF99	AF115	AF125	AF138	AF154	AF223	AF254

Spike proteins (Europe) which were found to be identical with spike proteins from other five continents
U2	U18	U41	U63	U85	U103	U117	U137	U158
U3	U22	U42	U66	U88	U104	U118	U141	U160
U4	U23	U45	U72	U90	U106	U119	U144	U175
U7	U26	U47	U76	U92	U107	U121	U146	U177
U8	U28	U52	U78	U93	U111	U122	U148	U179
U9	U30	U53	U79	U94	U112	U123	U151	U181
U10	U34	U54	U80	U98	U113	U124	U153	U187
U11	U36	U55	U82	U99	U114	U126	U154
U17	U38	U57	U84	U100	U116	U131	U155

Spike proteins (North America) which were found to be identical with spike proteins from other five continents
NA7	NA3911	NA4837	NA5595	NA6161	NA6510	NA6810	NA7300	NA8703	NA9792	NA13390
NA231	NA3986	NA4861	NA5606	NA6178	NA6515	NA6816	NA7312	NA8787	NA9834	NA13404
NA377	NA3988	NA4897	NA5627	NA6185	NA6527	NA6848	NA7355	NA8817	NA9891	NA13414
NA389	NA4024	NA4989	NA5644	NA6193	NA6540	NA6857	NA7375	NA8824	NA9910	NA13438
NA390	NA4028	NA5001	NA5645	NA6240	NA6550	NA6862	NA7402	NA9075	NA10257	NA13444
NA402	NA4051	NA5011	NA5666	NA6244	NA6553	NA6903	NA7430	NA9091	NA10276	NA13465
NA902	NA4061	NA5022	NA5687	NA6258	NA6566	NA6916	NA7431	NA9180	NA10312	NA13478
NA928	NA4117	NA5041	NA5693	NA6276	NA6577	NA6936	NA7453	NA9189	NA10342	NA13551
NA992	NA4169	NA5083	NA5722	NA6293	NA6578	NA6944	NA7477	NA9264	NA10442	NA13554
NA1104	NA4243	NA5105	NA5744	NA6299	NA6602	NA6949	NA7510	NA9265	NA10457	NA13578
NA1131	NA4270	NA5137	NA5818	NA6305	NA6616	NA6953	NA7521	NA9283	NA10669	NA13614
NA1221	NA4296	NA5149	NA5819	NA6324	NA6620	NA6962	NA7553	NA9284	NA10811	NA13626
NA1298	NA4375	NA5151	NA5829	NA6334	NA6622	NA6969	NA7557	NA9324	NA10943	NA13635
NA1348	NA4394	NA5167	NA5830	NA6373	NA6628	NA7000	NA7576	NA9330	NA10987	NA13668
NA1445	NA4436	NA5182	NA5837	NA6388	NA6630	NA7015	NA7644	NA9333	NA11013	NA13704
NA2065	NA4448	NA5188	NA5874	NA6390	NA6659	NA7025	NA7679	NA9350	NA11135	NA13798
NA2680	NA4508	NA5194	NA5927	NA6395	NA6661	NA7056	NA7811	NA9425	NA11225	NA13841
NA3228	NA4537	NA5200	NA5977	NA6396	NA6683	NA7057	NA7832	NA9455	NA11305	NA13870
NA3313	NA4541	NA5202	NA5992	NA6399	NA6687	NA7060	NA7845	NA9458	NA11367	NA13913
NA3387	NA4550	NA5205	NA6060	NA6406	NA6696	NA7090	NA7884	NA9568	NA11456	NA13948
NA3438	NA4559	NA5282	NA6067	NA6418	NA6704	NA7129	NA7892	NA9592	NA11466	NA14000
NA3477	NA4620	NA5372	NA6071	NA6421	NA6707	NA7198	NA7901	NA9597	NA11560	NA14015
NA3583	NA4637	NA5471	NA6080	NA6424	NA6713	NA7199	NA7914	NA9612	NA11576	NA14026
NA3658	NA4658	NA5485	NA6096	NA6429	NA6723	NA7224	NA8509	NA9639	NA11874
NA3752	NA4715	NA5490	NA6101	NA6445	NA6751	NA7227	NA8514	NA9663	NA13110
NA3768	NA4720	NA5516	NA6142	NA6450	NA6756	NA7230	NA8519	NA9674	NA13253
NA3797	NA4756	NA5538	NA6146	NA6465	NA6775	NA7249	NA8565	NA9724	NA13280
NA3873	NA4780	NA5574	NA6148	NA6492	NA6780	NA7286	NA8570	NA9763	NA13307
NA3895	NA4832	NA5579	NA6155	NA6499	NA6794	NA7299	NA8646	NA9776	NA13362

Max and min of frequencies		A	R	N	D	C	Q	E	G	H	I	L	K	M	F	P	S	T	W	Y	V
Africa	Max	80	44	89	62	41	63	49	84	19	79	109	62	15	78	60	101	98	13	56	98
Africa	Min	73	40	85	58	38	59	45	78	14	73	102	57	13	72	55	94	90	11	49	93
Asia	Max	80	44	89	63	41	63	49	84	19	78	110	62	15	79	59	101	101	13	57	98
Asia	Min	73	39	80	55	36	56	45	76	15	72	100	55	13	68	52	90	90	11	49	90
Europe	Max	80	43	89	63	41	63	49	84	19	79	110	62	15	79	59	101	98	13	57	99
Europe	Min	75	38	84	59	39	59	46	79	16	74	102	58	13	74	54	96	90	11	50	93
Oceania	Max	81	43	90	62	41	63	49	84	18	78	109	62	15	79	59	100	98	12	56	99
Oceania	Min	72	37	81	58	36	57	44	74	15	71	97	56	13	71	52	92	88	10	43	89
South America	Max	82	44	91	63	42	64	49	85	20	79	111	64	15	80	60	102	99	13	58	100
South America	Min	60	32	63	46	32	39	34	63	11	55	82	43	9	55	43	76	77	8	36	82
North America	Max	80	43	89	62	41	63	48	83	18	78	109	62	14	79	58	101	98	12	57	98
North America	Min	75	38	82	57	37	59	45	79	16	73	105	57	13	73	57	92	93	11	50	92

Spike proteins (Oceania) which were found to be identical with spike proteins from other five continents
O3	O105	O373	O392	O419	O770	O1037
O5	O114	O374	O395	O422	O798	O1059
O28	O148	O377	O398	O504	O850	O1076
O36	O201	O387	O400	O625	O886	O1079
O43	O225	O388	O401	O631	O889	O1104
O58	O238	O389	O402	O633	O1017
O65	O368	O390	O404	O645	O1032
O83	O370	O391	O415	O751	O1035

Spike proteins (Oceania) which were found to be identical with spike proteins from other five continents
SA1	SA13	SA22	SA29	SA35	SA44	SA66
SA4	SA18	SA25	SA30	SA38	SA45	SA68
SA5	SA19	SA26	SA31	SA41	SA56	SA70
SA7	SA20	SA27	SA32	SA42	SA61	SA71
SA11	SA21	SA28	SA33	SA43	SA63

PERMALINK

Implications derived from S-protein variants of SARS-CoV-2 from six continents

Sk Sarif Hassan

Kenneth Lundstrom

Debmalya Barh

Raner Jośe Santana Silva

Bruno Silva Andrade

Vasco Azevedo

Pabitra Pal Choudhury

Giorgio Palu

Bruce D Uhal

Ramesh Kandimalla

Murat Seyran

Amos Lal

Samendra P Sherchan

Gajendra Kumar Azad

Alaa AA Aljabali

Adam M Brufsky

Ángel Serrano-Aroca

Parise Adadi

Tarek Mohamed Abd El-Aziz

Elrashdy M Redwan

Kazuo Takayama

Nima Rezaei

Murtaza Tambuwala

Vladimir N Uversky

Abstract

1. Introduction

2. Data acquisition and methods

2.1. Phylogenetic analysis

2.2. Frequency probability of amino acids

2.3. Evaluation of normalized amino acid compositions

2.4. Amino acid conservation Shannon entropy

2.5. Isoelectric point of a protein sequence

2.6. Intrinsic disorder analysis

3. Results

3.1. Unique S proteins on the continents

Table 1.

Table 2.

Table 3.

3.2. Phylogenetic relationship among unique S protein variants

Fig. 1.

3.3. Variability through normalized amino acid composition

Fig. 2.

3.4. Variability through intrinsic disorder analysis

Fig. 3.

Fig. 4.

3.5. Variability of unique S proteins

3.5.1. Variations in the frequency distribution of amino acids

Table 4.

Table 5.

Table 6.

Fig. 5.

3.5.2. Variability through Shannon entropy

Table 7.

3.5.3. Variability through isoelectric point

Table 8.

4. Discussion and concluding remarks

CRediT authorship contribution statement

Declaration of competing interest

Appendix A

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

Table 16.

Table 17.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Fig. 10.

Fig. 11.

References

Associated Data

Supplementary Materials

ACTIONS