Abstract
Since December 2019, a severe pandemic of pneumonia, COVID-19 associated with a novel coronavirus (SARS-CoV-2), have emerged in Wuhan, China and spreading throughout the world. As RNA viruses have a high mutation rate therefore we wanted to identify whether this virus is also prone to mutations. For this reason we selected four major structural (Spike protein (S), Envelope protein (E), Membrane glycoprotein (M), Nucleocapsid phosphoprotein (N)) and ORF8 protein of 100 different SARS-CoV-2 isolates of fifteen countries from NCBI database and compared these to the reference sequence, Wuhan NC_045512.2, which was the first isolate of SARS-CoV-2 that was sequenced. By multiple sequence alignment of amino acids, we observed substitutions and deletion in S protein at 13 different sites in the isolates of five countries (China, USA, Finland, India and Australia) as compared to the reference sequence. Similarly, alignment of N protein revealed substitutions at three different sites in isolates of China, Spain and Japan. M protein exhibits substitution only in one isolates from USA, however, no mutation was observed in E protein of any isolate. Interestingly, in ORF8 substitution of Leucine, a nonpolar to Serine a polar amino acid at same position (aa84 L to S) in 23 isolates of five countries i.e. China, USA, Spain, Taiwan and India were observed, which may affect the conformation of peptides. Thus, we observed several mutations in the isolates thereafter the first sequencing of SARS-CoV-2 isolate, NC_045512.2, which suggested that this virus might be a threat to the whole world and therefore further studies are needed to characterize how these mutations in different proteins affect the functionality and pathogenesis of SARS-CoV-2.
Keywords: SARS-CoV-2, COVID-2, Mutation, Phosphorylation, Serine/Threonine kinases
Abbreviations: Name, Abbreviations; Severe Acute Respiratory Syndrome-Coronavirus-2, SARS-CoV-2; coronavirus disease of 2019, COVID-19; angiotensin converting enzyme 2, ACE2; surface glycoprotein, S; envelope protein, E; membrane glycoprotein, M; nucleocapsid phosphoprotein, N; nonstructual proteins, NSPs; World Health Organization, WHO; open reading frame, ORF
Highlights
-
•
Since December 2019, a severe pandemic of COVID-19 associated with a novel coronavirus (SARS-CoV-2), have emerged in China
-
•
Substitution and deletion mutations were identified in Spike surface glycoprotein (S).
-
•
Only substitution mutations were identified in Nucleocapsid phosphoprotein (N) and Membrane glycoprotein (M).
1. Introduction
An outbreak that started in Early December 2019 in Wuhan City, Hubei Province, China has been claiming the lives of many people there and, thereafter, has created an alarming situation around the globe. The agent causing this pandemic of pneumonia is the novel Betacoronavirus named by the World Health Organization (WHO) as “Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2)” and the disease caused by the virus has been named as COVID 19 (Zhu et al., 2020).
Since its first appearance in Wuhan, China, 82,960 individuals have been reported positive for the coronavirus (SARS-CoV-2) and the death toll recorded in China has reached 4634 as of May 19, 2020. The pace of spreading of the outbreak merits attention and as of March 22, 2020, the virus has spread to more than 200 countries causing over 4,911,700 confirmed positive cases and engulfing 320,454 individuals around the world (WHO, 2020).
Host to host transmission of the SARS-CoV-2 is extremely high and the symptoms manifested by most patients with COVID-19 include fever, coughing, dyspnea, myalgia, shortness of breath and radiological evidence of ground-glass lung opacities compatible with atypical pneumonia (Huang et al., 2020; Lu et al., 2020a). However, some patients with asymptomatic or mildly symptomatic cases have also been reported. The recorded mortality rate of the infection of SARS-CoV-2 is about 3% (Huang et al., 2020; Lu et al., 2020a; Chen et al., 2020).
The coronaviruses, responsible for playing the havoc around the world, are enveloped single stranded RNA viruses with a wide host spectrum including humans, birds and mammals. These viruses can cause respiratory, enteric, hepatic and neurological diseases and gastrointestinal tract infections. Genetically, the viruses have been classified into four genera: Alphacoronavirus, Betacoronavirus, Gammacorronavirus and Deltacoronavirus (Chen et al., 2020; Guan et al., 2020; Weiss and Leibowitz, 2011). The first two genera, Alphacoronavirus and Betacoronavirus, are chiefly associated with infecting mammals while the later two, Gammacoronavirus and Deltacoronavirus, are mostly found in infections associated with birds (Chan et al., 2015; Li, 2016).
Humans can be infected by six different kinds of coronavirus species and can cause disease in them. These include HCoV-NL63 and HCoV229E belonging to the Alphacoronavirus genus while HCoV-OC43 and HCoVHKU1 to the Betacoronavirus genus. These viruses are the most prevalent causing infection in humans producing symptoms of common cold in immunocompetent individuals. The other two strains, which have been found associated with fatal illness, are zoonotic in origin and include Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) belonging to Betacoronavirus genus (Tang et al., 2015).
Coronaviruses for the first time came under spotlight globally when a SARS, Severe Acute Respiratory Syndrome, pandemic outbroke in 2002 and 2003 in Guangdong Province, China. Another episode of the outbreak was witnessed in 2012 in the form of MERS, a severe respiratory disease outbreak in the Middle East and currently since December 2019 in China, the COVID-19 eruption has taken the world by storm (Song et al., 2019; China CDC, 2020). It is believed that SARS-CoV and MERS-CoV might have transmitted from bats to palm civets or dromedary camels and finally to humans and both are considered as highly pathogenic (Cui et al., 2019; Guan et al., 2003; Drosten et al., 2014).
The genome of the SARS-CoV-2 has been sequenced. The genome has been described to have ~29.8 kb and contained 14 ORFs which encode 27 proteins. The two genes, OFR1ab and OFR1a, encoding two long polypeptides, pp1ab and pp1a respectively, are located on the 5′-terminus of the genome. Proteolytically, these two polypeptides, pp1ab and pp1a, have further been processed into 14 nonstructual proteins (nsps) including nsp1 to nsp10 and nsp12 to nsp16. The four structural proteins constituting the main envelop of the virus and eight accessary proteins are encoded by the genes located on the 3′-terminus. The Spike surface glycoprotein (S), Small envelope protein (E), Membrane protein (M) and Nucleocapsid protein (N) are constituted by the four structural proteins. The eight accessary proteins include 3a, 3b, p6, 7a, 7b, 8b, 9b and ORF14 (Wu et al., 2020a).
The binding of the virus to the receptors on the host cell and fusion with the cell membrane is aided by the Spike surface glycoprotein (Li, 2016; Kandeel et al., 2018), while the N protein is assigned to interact with the viral RNA to form the ribonucleoproteins. The E proteins in collaboration with the M protein helps in viron assembly. The E protein is also known to function in comprising ion channel actions (Kandeel et al., 2018; Wu et al., 2020b; Chan et al., 2020; Lu et al., 2020b).
Recently, the SARS-CoV-2 outburst has raised the eyebrows of all and sundry around the world since the virus undergoes a potent immunologic pressure in humans and, consequently, may acquire the capability to outwit the human's immune system by accumulating mutations. Such mutations, with the passage of time, could possibly bring fluctuation in viral infectivity, virulence level and transmissibility in terms of pace and host selection (Lucas et al., 2001). Hence, it is indispensable to undergo a circumspective study to analyze the pattern and frequency of mutations thus far accumulated by the virus in order to project its future implications. Fortunately, the entire genome of the virus has been sequenced by the worldwide scientific community which could be utilized to carry out the current study. Taking the advantage of genomic information currently available, the genome of SARS-CoV-2 isolates from fourteen different countries (China, USA, Spain, Japan, Pakistan, India, Nepal, Australia, Italy, Brazil, Sweden, Finland, Viet Nam and Taiwan) have been analyzed for mutation in four major structural proteins (Spike glycoprotein (S), Envelope protein (E), Membrane glycoprotein (M), Nucleocapsid phosphoprotein (N) and an accessary protein, ORF8, which are compared to reference sequence, Wuhan NC_045512.2 isolate.
2. Materials and methods
2.1. Gene data collection
The amino acids sequences of four structural protein (Spike glycoprotein (S), Nucleocapsid (N), Envelop (E), membrane (M)) and ORF8 gene of different isolates of SARS-CoV-2 from fifteen different countries (China, USA, Spain, Japan, Pakistan, India, Nepal, Australia, Italy, turkey, Brazil, Sweden, Finland, Viet Nam and Taiwan) were downloaded from the NCBI (https://www.ncbi.nlm.nih.gov/).
2.2. Multiple sequence alignment
The DNASTAR. Lasergene (7.1) and Clustal W program of the MEGA software (7.1) were used to conduct multiple sequence alignment of four structural (S, E, M, and N) and ORF8 proteins of 100 targeted SARS-CoV-2 isolates from different countries (China, USA, Spain, Japan, Pakistan, India, Nepal, Australia, Italy, turkey, Brazil, Sweden, Finland, Viet Nam and Taiwan). For mutational analysis the previously reported isolates of SARS-CoV-2 (Wuhan NC_045512.2) was used as reference sequence (Ceraolo and Georgi, 2020).
3. Results
3.1. Mutational analyses of the SARS-CoV-2 structural proteins
The present study carried out the amino acids sequence alignment of four structural proteins (Spike surface glycoprotein (S), Membrane glycoprotein (M), Envelope protein (E), Nucleocapsid phosphoprotein (N)) and one OFR8 protein of the novel SARS-CoV-2.
3.2. Mutational analysis identified substitution and deletion mutations in Spike surface glycoprotein (S)
The Spike surface glycoprotein of SARS-CoV-2 is 1273 amino acid in length and play an essential role in binding of the virus to the receptors, angiotensin converting enzyme 2 (ACE2) on the host cell and initiate the infection (Tang et al., 2015; Wu et al., 2020b). The present study identified amino acids substitutions and deletion in some isolates of five countries as compared to the reference sequence (NC_045512.2). Two isolates of China with accessions numbers MT 049951 and MT 226610 have substitution at (aa28 Y to N) and (aa74 N to K) positions respectively. Similarly, the amino acids substitution were observed in ten isolates of USA, with accessions numbers, MT 159716 (aa157 F to L), MT 246490 (aa614 D to G), MT 251973 (aa 614 D to G), MT 251974 (aa614 D to G), MT 251976 (aa614 D to G), MT 251979 (aa 614 D to G), MT 163720 (aa655 H to Y), MT 093571 (aa797 F to C), MT 050493 (aa930 A to V), while two substitutions were observed at position (aa5 L to F) and (aa 476 G to S) in isolate having accession MT 246488. (Fig. 1 ). Likewise, the isolate of Finland (MT 020781) and Australia (MT 007544) have substitution at position (aa49 H to Y) and (aa 247 S to R) respectively. However isolate from India (MT 012098) has deletion at position (aa 145 Y) as well as a substitution at (aa 408 R to I) (Fig. 1).
3.3. Mutational analysis identified the substitution mutations in Nucleocapsid phosphoprotein (N)
Mutational analysis of the N protein (419 amino acid residues) from fifteen countries (100 isolates) revealed the amino acids substitutions at three different sites in six isolates (Fig. 2 ). Interestingly, three isolates of Spain with accessions numbers MT 233519, MT 233521 and MT 233523 have substitution at the same position (aa 197 S to L), while two isolates one from Japan (LC429905) and second from China (MT 123290) have substitution at the same location (aa 344 P to S). Similarly, one isolate from Japan (LC534419) has substitution at (aa289 Q to H) (Fig. 2).
3.4. Mutational analysis of Envelope protein (E)
The Envelope protein of SARS-CoV-2 consist of 75 amino acids residues and helps in virions assembly (Lucas et al., 2001; Zhu et al., 2018). The alignment of the amino acids sequence of the E protein of all hundred isolates, with corresponding E protein of the reference isolates (NC_045512.2) revealed that the SARS-COV-2, E protein is relatively conserved where no mutation was found figure not shown.
3.5. Substitution mutation was identified in Membrane glycoprotein (M) in one isolate of USA
Membrane glycoprotein of SARS-CoV-2 consists of 222 amino acids residues and involved in the assembly of new virus particles (Lu et al., 2020b; Hu et al., 2018). The present study revealed the substitution of amino acid only in one isolate from USA, with accession number MT 163721 at position (aa70 V to I) as compared to the reference counterpart (NC_045512.2) (Fig. 3 ).
3.6. Charactarization of OFR8 of SARS-CoV-2 protein identified substitution of Leucine to Serine
The OFR8 gene encode 121 amino acid residues (Wu et al., 2020b). Interestingly, in ORF8 we found substitution of Leucine, a nonpolar to Serine a polar amino acid at the same position (aa84 L to S) in 15 USA isolates (MT 163717, MT 163718, MT 163719, MT 163720, MT 163721, MT 246472, MT 246473, MT 246474, MT 246488, MT 246489, MT 251975, MT 251977, MT 251978, MT 251980), while one isolate MT 251972 have two substitutions (aa84 L to S and aa112 H to Q). Three Spanish isolates (MT 233519, MT 233521, MT 233523), three Chinese isolates (MT 049951, MT 226610, MT 123292) and one from each Taiwan (MT 066175) and India (MT 050493) have the same substitutions at position (aa84 L to S) as compared to the reference counterpart (NC_045512.2). This substitution of nonpolar amino acid by a polar one can affect the conformation of the peptide (Fig. 4 ).
4. Discussion
SARS-CoV-2 are enveloped single stranded RNA virus having genome of about 29.8 kb and was annotated to possess 14 ORFs, encoding 27 proteins (Wu et al., 2020b). The four major structure proteins are the Spike surface glycoprotein (S), Envelope protein (E), Membrane glycoprotein (M), and Nucleocapsid phosphoprotein (N). The Spike surface glycoprotein plays an essential role in binding of virus to angiotensin converting enzyme 2 (ACE2) receptors on the host cell, and thus crucial for determining host tropism and transmissibility (Li, 2016; Zhu et al., 2018). The N protein is one of the predominantly expressed proteins during the early stages of infection and interacts with viral RNA to form the ribonucleoprotein. This protein has been an attractive diagnostic tool due to the initiation of strong immune response against it. The E protein helps in virions assembly, while M protein shares in the assembly of new virus particles (Wu et al., 2020b; Chan et al., 2020; Lu et al., 2020b).
RNA viruses have a high mutation rate and thus, are prone to evolve resistance to drugs and escape from immune surveillance (Su et al., 2016). Thus a constant surveillance of mutations arising is needed. Furthermore, recent studies mentioned certain mutations at amino acid level in structural and accessory proteins of SARS-CoV-2 (Wang et al., 2020). In the present study the amino acids sequences of the four structural (S, N, E, and M) and ORF8 protein of 100 isolates from fifteen countries were analyzed for identification of mutations compared to the reference strain. The present study identified various substitutions of amino acids in the spike surface glycoprotein of different isolates, and one of the substitution at (aa 614 D to G) is accumulated in five different isolates from USA, while one Indian isolate have both deletion and substitution of amino acids as compared with reference sequence (NC_045512.2) (Fig. 1). Interestingly the previous studies described that a single-amino-acid substitution of glycine (G) for glutamic acid (E) completely abolished the respiratory droplet transmission of Eurasian avian-like H1N1 (EAH1N1) swine influenza viruses (Zeng et al., 2017) and substitution of glycine (G) to aspartic acid (D) significantly attenuated H5N1 virus virulence in mice (Feng et al., 2015). In conclusion a glycine substitution may have important role in transmission, thus further studies are required to explore how these mutation may influence the transmission and tissue/host tropism of the SARS-CoV-2.
The analysis of N protein from different isolates revealed the presence of three different substitutions of amino acid in three locations. Furthermore, three Spanish isolates have substitution mutation at position aa197 S to L, one Chinese isolate at aa289 H to Q and one each Japanese and Chinese isolate have substitution at aa344 P to S (Fig. 2). Likewise, the present study identified mutation in M protein of only one isolate from USA (MT (163721) (Fig. 3). However, no mutation was found in the envelope protein which is supporting by the previous studies that indicated that E protein of SARS-CoV-2 is relatively conserved (Su et al., 2016). Interestingly, in ORF8 we found an accumulated substitution of Leucine, a nonpolar to Serine a polar amino acid at same position (aa84 L to S) in 23 different isolates as compared to the reference counterpart (NC_045512.2) (Fig. 4). All the observed mutations in these isolates especially L to S could theoretically create a novel phosphorylation target for the mammalian host Serine/Threonine kinases of the host organism. And thus, can affect the conformation of the peptide. The results of the present study suggested that this virus is prone to mutations like most of other RNA viruses and with the passage of time it may become more potential threat to the world. Thus, further studies are required to characterize how these amino acids substitutions in different proteins affect the functionality and pathogenesis of SARS-CoV-2. Moreover, a continuous surveillance system is required to characterize the mutations undergone in this virus which are important for more effective vaccines development.
Declaration of competing interest
The authors declare that there is no conflict of interest.
References
- Ceraolo C., Giorgi F.M. Genomic variance of the 2019-nCoV coronavirus. J. Med. Virol. 2020;92:522–528. doi: 10.1002/jmv.25700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan J.F., Lau S.K., To K.K. Middle East respiratory syndrome coronavirus: another zoonotic betacoronavirus causing SARS-like disease. Clin. Microbiol. Rev. 2015;28:465–522. doi: 10.1128/CMR.00102-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan J.F., Kok K.H., Zhu Z. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg. Microbes Infect. 2020;9:221–236. doi: 10.1080/22221751.2020.1719902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen N., Zhou M., Dong X. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507–513. doi: 10.1016/S0140-6736(20)30211-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- China CDC Tracking the epidemic. 2020. http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm?from=timeline#Beijing%20Municipality%20Update
- Cui J., Li F., Shi Z.L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019;17:181–192. doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drosten C., Kellam P., Memish Z.A. Evidence for camel-to-human transmission of MERS coronavirus. N. Engl. J. Med. 2014;371:1359–1360. doi: 10.1056/NEJMc1409847. [DOI] [PubMed] [Google Scholar]
- Feng X., Wang Z., Shi J. Glycine at position 622 in PB1 contributes to the virulence of H5N1 avian influenza virus in mice. J. Virol. 2015;90:1872–1879. doi: 10.1128/JVI.02387-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guan Y.J., Zheng B.J., He Y.Q. Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science. 2003;302:276–278. doi: 10.1126/science.1087139. [DOI] [PubMed] [Google Scholar]
- Guan W.J., Ni Z., Hu Y. Clinical characteristics of 2019 novel coronavirus infection in China. N. Engl. J. Med. 2020;382:1708–1720. doi: 10.1056/NEJMoa2002032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu D., Zhu C., Ai L. Genomic characterization and infectivity of a novel SARS-like coronavirus in Chinese bats. Emerg. Microbes Infect. 2018;7:154–158. doi: 10.1038/s41426-018-0155-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang C., Wang Y., Li X. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506. doi: 10.1016/S0140-6736(20)30183-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kandeel M., Al-Taher A., Li H. Molecular dynamics of Middle East Respiratory Syndrome Coronavirus (MERS CoV) fusion heptad repeat trimers. Comp. Biol. Chem. 2018;75:205–212. doi: 10.1016/j.compbiolchem.2018.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F. Structure, function, and evolution of coronavirus Spike proteins. Annu. Rev. Virol. 2016;3:237–261. doi: 10.1146/annurev-virology-110615-042301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu R., Zhao X., Li J. Genomic characterization and epidemiology of 2019 novel coronavirus: implications of virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu R., Zhao X., Li J. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lucas M., Karrer U., Lucas A. Viral escape mechanisms—escapology taught by viruses. Int. J. Exp. Pathol. 2001;82:269–286. doi: 10.1046/j.1365-2613.2001.00204.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song Z., Xu Y., Bao From SARS to MERS, thrusting coronaviruses into the spotlight. Viruses. 2019;11:E59. doi: 10.3390/v11010059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su S., Wong G., Shi W. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Trends Microbiol. 2016;24:490–502. doi: 10.1016/j.tim.2016.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang Q., Song Y., Shi M. Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition. Sci. Rep. 2015;5 doi: 10.1038/srep17155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C., Liu Z., Chen Z. The establishment of reference sequence for SARS-CoV-2 and variation analysis. J. Med. Virol. 2020;20:1–8. doi: 10.1002/jmv.25762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weiss S.R., Leibowitz J.L. Coronavirus pathogenesis. Adv. Virus Res. 2011;81:85–164. doi: 10.1016/B978-0-12-385885-6.00009-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- WHO Coronavirus disease (COVID-19) dashboard. 2020. https://covid19.who.int/
- Wu F., Zhao S., Yu B. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu A., Peng Y., Huang B. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe. 2020;27:325–328. doi: 10.1016/j.chom.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng W., Huanliang Y., Yan C. A single-amino-acid substitution at position 225 in hemagglutinin alters the transmissibility of Eurasian avian-like H1N1 swine influenza virus in guinea pigs. J. Virol. 2017;91:800–817. doi: 10.1128/JVI.00800-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Z., Zhang Z., Chen W. Predicting the receptor-binding domain usage of the coronavirus based on kmer frequency on spike protein. Infect. Genet. Evol. 2018;61:183–184. doi: 10.1016/j.meegid.2018.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu N., Zhang D., Wang W. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 2020;382:727–733. doi: 10.1056/NEJMoa2001017. [DOI] [PMC free article] [PubMed] [Google Scholar]