Identification of the hyper-variable genomic hotspot for the novel coronavirus SARS-CoV-2

Feng Wen; Hai Yu; Jinyue Guo; Yong Li; Kaijian Luo; Shujian Huang

doi:10.1016/j.jinf.2020.02.027

letter

. 2020 Mar 5;80(6):671–693. doi: 10.1016/j.jinf.2020.02.027

Identification of the hyper-variable genomic hotspot for the novel coronavirus SARS-CoV-2

Feng Wen ^a,^⁎,¹, Hai Yu ^b,^e,¹, Jinyue Guo ^a, Yong Li ^c, Kaijian Luo ^d, Shujian Huang ^a,^⁎

PMCID: PMC7126159 PMID: 32145215

Dear editor

A recent study in this journal studied the genomes of the novel SARS-like coronavirus (SARS-CoV-2) in China and suggested that the SARS-CoV-2 had undergone genetic recombination with SARS-related CoV¹. By February 14, 2020, a total of 66,576 confirmed cases of COVID-19, people infected with SARS-CoV-2, were reported in China, leading to 1524 deaths, per the Chinese CDC (http://2019ncov.chinacdc.cn/2019-nCoV/). Several full genomic sequences of this virus have been released for the study of its evolutionary origin and molecular characteristics2, 3, 4. Here, we analyzed the potential mutations that may have evolved after the virus became epidemic among humans and also the mutations resulting in the human adaptation.

The sequences of BetaCoV were downloaded on February 3, 2020 from the GISAID platform⁵. A total of 58 accessions were available, among which BetaCoV/bat/Yunnan/RaTG13/2013 is a known close relative of SARS-CoV-2. Four accessions, namely, BetaCov/Italy/INM1/2020, BetaCov/Italy/INM2/2020, BetaCoV/Kanagawa/1/2020, and BetaCoV/USA/IL1/2020, were excluded because of the short-truncated sequences or multiple ambiguous nucleotides. A total of 54 accessions (Supplementary Table 1) isolated from humans were utilized in the following analysis. The sequences NC_004718.3 of SARS coronavirus⁶ genes were utilized to define the protein products of SARS-CoV-2. The protein sequences of ORF1ab, S, E, M, and N genes were translated, and all of the loci without experimental evidences were excluded. First, the protein sequences of SARS-CoV-2 were compared with RaTG13, human SARS (NC_004718.3), bat SARS (DQ022305.2), and human MERS (NC_019843.3) by calculating the similarity in a given sliding window (Fig. 1 A). The sliding window was set to 500 for ORF1ab and S, and to 50 for proteins E, M, and N considering their short length. SARS-CoV-2 were highly similar to RaTG13 isolated from bats, showing 96% identity based on the whole-nucleotide sequences and 83% based on the protein sequences, suggesting a bat zoonotic origin of SARS-CoV-2. ORF1a, and the head of S seemed to have diverged from other beta coronaviruses.

The molecular phylogenetic tree (Fig. 1B) was built by using the maximum likelihood method based on the JTT matrix-based model⁷. It hinted that the protein sequences of SARS-CoV-2 had over 99% similarity. Twenty-eight viruses had shared the same protein sequences, and could be the original strain circulated in the humans. The other viruses had only a few mutations from it. This indicates that the virus could have evolved for only a very short time after gaining the efficient human to human transmissibility, as expected. Next, we analyzed the mutations that occurred after infecting humans (Fig. 1C) in order to identify mutations associated with more severe infection. Here, two accessions (BetaCoV/Shenzhen/SZTH-001/2020 and BetaCoV/Shenzhen/SZTH-004/2020) from Shenzhen, which had 5 and 16 mutations, respectively, were excluded, considering the possible experimental issues. All of the mutations only occurred once, so it is possible that all of these mutations occur naturally and are associated with viral survival and infection. Several mutations were clustered in peptides nsp3 and nsp4 of ORF1ab and in the header of S. These results suggested that there had probably been no hyper-variable genomic hotspot in the SARS-CoV-2 population until now.

We compared these results with those of the work of Ceraolo and Giorgi⁸, who reported at least two hyper-variable genomic hotspots based on the Shannon entropy of nucleotide sequences. They utilized all of the sequences, while we merged all of the fully identical sequences into one during our Shannon entropy calculation. As shown in Fig. 1B, 28 sequences were merged into one in present study because they had been collected in such a short time, so collection time and location could not have produced any large bias. If those identical sequences were calculated individually, any mutations on these 28 sequences would have sharply increased Shannon entropy. The protein sequences were used to exclude any unimportant silent mutations. Finally, the sequences of earliest SARS-CoV-2 were compared with RaTG13 from bats (Fig. 1D). Fisher's exact test with post hoc test suggested that nsp1, nsp3, and nsp15 of ORF1ab and gene S had significantly more mutations than other genes, which might facilitate human adaptation and infection.

S gene encodes spike glycoprotein, which binds host ACE2 receptors and is required for initiation of the infection⁹. They reported that a 193-amino acid fragment was able to bind ACE2 more efficiently than its unmutated counterpart. This region in which spike glycoprotein binds to ACE2 had 21 mutations not found in RaTG13, suggesting their role in the adaptation to human hosts. Peptide nsp1 facilitated viral gene expression and evasion from the host immune response¹⁰. Peptide nsp3, named papain-like proteinase, was found to be associated with the cleavages, viral replication, and antagonization of innate immune. These two peptides are probably associated with the latent period after infection in humans. Peptide nsp15 acted as uridylate-specific endoribonuclease. These results collectively suggest that peptides nsp1, nsp3, and nsp15 might have unclear but critical roles in this outbreak of SARS-CoV-2.

To summarize, this study confirmed the relationship of SARS-CoV-2 with other beta coronaviruses on the amino acid level. The hyper-variable genomic hotspot has been established in the SARS-CoV-2 population at the nucleotide but not the amino acid level, suggesting that there have been no beneficial mutations. The mutations in nsp1, nsp3, nsp15, and gene S that identified in this study would be associated with the SARS-CoV-2 epidemic and was worthy of further study.

Declaration of Competing Interest

None.

Acknowledgments

Funding

This study was supported by Key Laboratory for Preventive Research of Emerging Animal Diseases in Foshan University (KLPREAD201801-06), (KLPREAD201801-10), Youth Innovative Talents Project of Guangdong Province (2018KQNCX280), and National Key Research and Development Project (grant No.2017YFD0500800).

Acknowledgement

We thank the researchers who deposited the SARS-CoV-2 sequences in the GISAID. We thank LetPub for its linguistic assistance during the preparation of this manuscript.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jinf.2020.02.027.

Contributor Information

Feng Wen, Email: wenfengjlu@163.com.

Shujian Huang, Email: 617955368@qq.com.

Appendix. Supplementary materials

mmc1.docx^{(29.2KB, docx)}

References

1.Zhang J., Ma K., Li H., Liao M., Qi W. The continuous evolution and dissemination of 2019 novel human coronavirus. J. Infect. 2020 doi: 10.1016/j.jinf.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Wu A., Peng Y., Huang B., Ding X., Wang X., Niu P. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe. 2020 doi: 10.1016/j.chom.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wu F., Zhao S., Yu B., Chen YM., Wang W., Song. Z.G. A new coronavirus associated with human respiratory disease in china. Nature. 2020 doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zhou P., Yang X.L., Wang X.G., Hu B., Zhang L., Zhang W. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020 doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Shu Y., McCauley J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro. Surveill. 2017;22(13) doi: 10.2807/1560-7917.ES.2017.22.13.30494. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Marra M.A. The genome sequence of the SARS-associated coronavirus. Science. 2003;300(5624):1399–1404. doi: 10.1126/science.1085953. [DOI] [PubMed] [Google Scholar]
7.Jones D.T., Taylor W.R., Thornton J.M. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 1992;8(3):275–282. doi: 10.1093/bioinformatics/8.3.275. [DOI] [PubMed] [Google Scholar]
8.Ceraolo C., Giorgi F.M. Genomic variance of the 2019‐nCoV coronavirus. J. Med. Virol. 2020 doi: 10.1002/jmv.25700. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wong S.K., Li W., Moore M.J., Choe H., Farzan M. A 193-amino acid fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2. J Biol Chem. 2004;279(5):3197–3201. doi: 10.1074/jbc.C300520200. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lokugamage K.G., Narayanan K., Huang C., Makino S. Severe acute respiratory syndrome coronavirus protein nsp1 is a novel eukaryotic translation inhibitor that represses multiple steps of translation initiation. J. Virol. 2012;86(24):13598–13608. doi: 10.1128/JVI.01958-12. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.docx^{(29.2KB, docx)}

[bib0001] 1.Zhang J., Ma K., Li H., Liao M., Qi W. The continuous evolution and dissemination of 2019 novel human coronavirus. J. Infect. 2020 doi: 10.1016/j.jinf.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0002] 2.Wu A., Peng Y., Huang B., Ding X., Wang X., Niu P. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe. 2020 doi: 10.1016/j.chom.2020.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Wu F., Zhao S., Yu B., Chen YM., Wang W., Song. Z.G. A new coronavirus associated with human respiratory disease in china. Nature. 2020 doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Zhou P., Yang X.L., Wang X.G., Hu B., Zhang L., Zhang W. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020 doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0005] 5.Shu Y., McCauley J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro. Surveill. 2017;22(13) doi: 10.2807/1560-7917.ES.2017.22.13.30494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Marra M.A. The genome sequence of the SARS-associated coronavirus. Science. 2003;300(5624):1399–1404. doi: 10.1126/science.1085953. [DOI] [PubMed] [Google Scholar]

[bib0007] 7.Jones D.T., Taylor W.R., Thornton J.M. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 1992;8(3):275–282. doi: 10.1093/bioinformatics/8.3.275. [DOI] [PubMed] [Google Scholar]

[bib0008] 8.Ceraolo C., Giorgi F.M. Genomic variance of the 2019‐nCoV coronavirus. J. Med. Virol. 2020 doi: 10.1002/jmv.25700. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0009] 9.Wong S.K., Li W., Moore M.J., Choe H., Farzan M. A 193-amino acid fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2. J Biol Chem. 2004;279(5):3197–3201. doi: 10.1074/jbc.C300520200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0010] 10.Lokugamage K.G., Narayanan K., Huang C., Makino S. Severe acute respiratory syndrome coronavirus protein nsp1 is a novel eukaryotic translation inhibitor that represses multiple steps of translation initiation. J. Virol. 2012;86(24):13598–13608. doi: 10.1128/JVI.01958-12. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Identification of the hyper-variable genomic hotspot for the novel coronavirus SARS-CoV-2

Feng Wen

Hai Yu

Jinyue Guo

Yong Li

Kaijian Luo

Shujian Huang

Dear editor

Fig. 1.

Declaration of Competing Interest

Acknowledgments

Funding

Acknowledgement

Footnotes

Contributor Information

Appendix. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Identification of the hyper-variable genomic hotspot for the novel coronavirus SARS-CoV-2

Feng Wen

Hai Yu

Jinyue Guo

Yong Li

Kaijian Luo

Shujian Huang

Dear editor

Fig. 1.

Declaration of Competing Interest

Acknowledgments

Funding

Acknowledgement

Footnotes

Contributor Information

Appendix. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases