Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2021 Apr 19;41(3):1017–1022. doi: 10.1007/s10473-021-0323-x

Analysis of the Genomic Distance Between Bat Coronavirus RaTG13 and SARS-CoV-2 Reveals Multiple Origins of COVID-19

Shaojun Pei 1, Stephen S-T Yau 1,2,
PMCID: PMC8054123  PMID: 33897081

Abstract

The severe acute respiratory syndrome COVID-19 was discovered on December 31, 2019 in China. Subsequently, many COVID-19 cases were reported in many other countries. However, some positive COVID-19 samples had been reported earlier than those officially accepted by health authorities in other countries, such as France and Italy. Thus, it is of great importance to determine the place where SARS-CoV-2 was first transmitted to human. To this end, we analyze genomes of SARS-CoV-2 using k-mer natural vector method and compare the similarities of global SARS-CoV-2 genomes by a new natural metric. Because it is commonly accepted that SARS-CoV-2 is originated from bat coronavirus RaTG13, we only need to determine which SARS-CoV-2 genome sequence has the closest distance to bat coronavirus RaTG13 under our natural metric. From our analysis, SARS-CoV-2 most likely has already existed in other countries such as France, India, Netherland, England and United States before the outbreak at Wuhan, China.

Key words: SARS-CoV-2, multiple origins of COVID-19, mathematical genomic distance, k-mer natural vector

Acknowledgements

We thank the researchers worldwide who sequenced and shared the complete genomes of SARS-CoV-2 and other coronaviruses from GISAID (https://www.gisaid.org/).

Footnotes

This work was supported by Tsinghua University Spring Breeze Fund (2020Z99CFY044), Tsinghua University start-up fund, and Tsinghua University Education Foundation fund (042202008).

References

  • [1].Guan W, Ni Z, Yu H, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. New England Journal of Medicine. 2020;382:1708–1720. doi: 10.1056/NEJMoa2002032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Zhou P, Yang X L, Wang X G, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–273. doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Lam T T Y, Jia N, Zhang Y W, et al. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature. 2020;583:282–285. doi: 10.1038/s41586-020-2169-0. [DOI] [PubMed] [Google Scholar]
  • [4].Munnink B B O, Sikkema R S, Nieuwenhuijse D F, et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science. 2020;371(6525):eabe5901. doi: 10.1126/science.abe5901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Dong R, Pei S, Yin C, et al. Analysis of the hosts and transmission paths of SARS-CoV-2 in the COVID-19 outbreak. Genes. 2020;11(6):637. doi: 10.3390/genes11060637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Deslandes A, Berti V, Tandjaoui-Lambotte Y, et al. SARS-CoV-2 was already spreading in France in late December 2019. International Journal of Antimicrobial Agents. 2020;55:106006. doi: 10.1016/j.ijantimicag.2020.106006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Sridhar V B, Monica E P, Kacie G, et al. Serologic testing of U.S. blood donations to identify SARS-CoV-2-reactive antibodies: December 2019-January 2020. Clinical Infectious Diseases, 2020, ciaa1785 [DOI] [PMC free article] [PubMed]
  • [8].Carrat F, Figoni J, Henny J, et al. Evidence of early circulation of SARS-CoV-2 in France: findings from the population-based “CONSTANCES” cohort. European Journal of Epidemiology, 2021. 10.1007/s10654-020-00716-2 [DOI] [PMC free article] [PubMed]
  • [9].Yu C, He R L, Yau S S T. Protein sequence comparison based on K-string dictionary. Gene. 2013;529:250–256. doi: 10.1016/j.gene.2013.07.092. [DOI] [PubMed] [Google Scholar]
  • [10].Wen J, Chan R H F, Yau S-C, et al. K-mer natural vector and its application to the phylogenetic analysis of genetic sequences. Gene. 2014;546:25–34. doi: 10.1016/j.gene.2014.05.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Deng M, Yu C, Liang Q, et al. A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications. PLoS ONE. 2011;6(3):e17293. doi: 10.1371/journal.pone.0017293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Sims G E, Jun S R, Wu G A, et al. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences. 2009;106:2677–2682. doi: 10.1073/pnas.0813249106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Sims G E, Jun S R, Wu G A, et al. Whole-genome phylogeny of mammals: evolutionary information in genic and non-genic regions. Proceedings of the National Academy of Sciences. 2009;106:17077–17082. doi: 10.1073/pnas.0909377106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579(7798):265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Acta Mathematica Scientia = Shu Xue Wu Li Xue Bao are provided here courtesy of Nature Publishing Group

RESOURCES