Abstract
After 56 days without coronavirus disease 2019 (COVID-19) cases, reemergent cases were reported in Beijing, China on June 11, 2020. Here, we report the genetic characteristics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequenced from the clinical specimens of 4 human cases and 2 environmental samples. The nucleotide similarity among six SARS-CoV-2 genomes ranged from 99.98% to 99.99%. Compared with the reference strain of SARS-CoV-2 (GenBank No. NC_045512), all six genome sequences shared the same substitutions at nt241 (C → T), nt3037 (C → T), nt14408 (C → T), nt23403 (A → G), nt28881 (G → A), nt28882 (G → A), and nt28883 (G → C), which are the characteristic nucleotide substitutions of L-lineage European branch I. This was also proved by the maximum likelihood phylogenetic tree based on the full-length genome of SARS-CoV-2. They also have a unique shared nucleotide substitution, nt6026 (C → T), which is the characteristic nucleotide substitution of SARS-CoV-2 in Beijing's Xinfadi outbreak. It is noteworthy that there is an amino acid D614G mutation caused by nt23403 substitution in all six genomes, which may enhance the virus's infectivity in humans and help it become the leading strain of the virus to spread around the world today. It is necessary to continuously monitor the genetic variation of SARS-CoV-2, focusing on the influence of key mutation sites of SARS-CoV-2 on viral transmission, clinical manifestations, severity, and course of disease.
Keywords: COVID-19, SARS-CoV-2, Genomic epidemiology, L-lineage European branch I, D614G mutation
After 56 days without COVID-19 cases, reemergent cases were reported in Beijing, China on June 11, 2020. From June 11 to July 4, 334 COVID-19 cases were reported in Beijing. Of the patients affected, 47% were Xinfadi market staff, while others were people who have had contact with the market [1].
Throat swab samples were taken from the patients, and environmental swab samples were taken from the environment in Xinfadi market, including the body surface of seafood, the channel at the gate of the stall, trash, outer surface of fish tanks, inner surface of refrigerators, etc. Total RNA was extracted from the supernatant using the Viral RNA Extraction Kit (Tianlong, Xi'an, China). The rRNA was removed using the TransNGS rRNA Depletion (Human/Mouse/Rat) kit (TransGen, Beijing, China), while the remaining RNA was incorporated for library preparation with the Illumina TruSeq DNA Preparation Protocol and sequenced on the Illumina NextSeq 550 platform (Illumina, San Diego, CA, USA) with 150 bp paired-end reads. The full-length genome sequences were assembled using QIAGEN CLC Genomics Workbench (Qiagen, Hilden, Germany). Three full-le5ngth genome sequences of SARS-CoV-2 from the Xinfadi market were deposited at the National Genomics Data Center (two from patients and one from environmental sample) and three were deposited at GISAID (two from patients and one from environmental sample).
At present, the widely accepted molecular typing method of SARS-CoV-2 is based on the difference of nucleotide at nt8782 and nt28144 in its full-length genome sequence. According to this molecular typing method, SARS-CoV-2 is divided into S-lineage (nt8782T and nt28144C) and L-lineage (nt8782C and nt28144T). It is speculated that S-lineage may be an older type. In Wuhan City, Hubei Province, S-lineage was the main epidemic type in the early stages of the outbreak, while L-lineage is currently the most prevalent lineage in the world. Therefore, most of the imported cases in China belong to L-lineage. Due to the global pandemic of SARS-CoV-2, imported cases have been frequently reported in China since March 2020. In order to better identify the SARS-CoV-2 types prevalent in various countries, according to the full-length genome sequence of the virus detected in imported cases in China, L-lineage were further divided into L-lineage European branch where the characteristic substitution sites were nt241(C → T), nt3037(C → T), nt14408(C → T), and nt23403(A → G) and L-lineage global branch with substitution sites other than those mentioned above. L-lineage European branch can be further divided into L-lineage European branch I (the characteristic substitution sites were nt28881(G → A), nt28882(G → A), and nt28883(G → C)) and L-lineage European branch II (the characteristic substitution site was nt25563(G → T)). The L-lineage European branch II can be further divided into L-lineage European branch II.1 (the characteristic substitution site was nt1059(C → T)) and L-lineage European branch II.2 (the characteristic substitution site was nt2416(C → T)). Because L-lineage European branch II.1 is a prevalent branch in America, it is also called the American branch.
The nucleotide similarity among six SARS-CoV-2 genomes ranged from 99.98% to 99.99%. Compared with the reference strain of SARS-CoV-2 (GenBank No. NC_045512), the nucleotide similarity was 99.96%–99.97%. A maximum likelihood tree based on the full-length genome sequences of SARS-CoV-2 was constructed using MEGA software (v7.0) [2] with 1,000 bootstrap replicates. According to this phylogenetic tree, all the six genome sequences belong to L-lineage European branch I (Fig. 1 ). SARS-CoV-2 in L-lineage European branch I are prevalent in Asia, South America, Africa, and North America, but mainly in Europe. This further confirmed that the outbreak in Xinfadi may be caused by the introduction of an infectious source. It is different from the virus that was prevalent in Wuhan in December 2019 and Beijing in February 2020, which belonged to S-lineage; therefore, it is suggested that the outbreak in Xinfadi did not involve the continued transmission of native virus.
Compared with the reference strain of SARS-CoV-2 (GenBank No. NC_045512), all six genome sequences shared the same substitutions at nt241(C → T), nt3037(C → T), nt14408(C → T), nt23403(A → G), nt28881(G → A), nt28882(G → A), and nt28883(G → C), among which nt14408, nt23403, and nt28881–nt28883 were nonsynonymous substitutions in ORF 1ab gene, S gene, and N gene, respectively. All the seven nucleotide substitutions are the characteristic nucleotide substitutions of L-lineage European branch I (Fig. 2 ). All six genome sequences also have a unique shared nucleotide substitution, nt6026(C → T), which is the characteristic nucleotide substitution of SARS-CoV-2 from Xinfadi market (Fig. 2).
It is worth noting that there is an amino acid D614G mutation caused by nt23403 substitution in all six genomes of SARS-CoV-2 found in Xinfadi market. The mutation from 614-D to 614-G occurred first in Europe, then in North America and Oceania, and then in Asia, and now this mutation is shared by all viral genomes in L-lineage European branch I.
In a recent study, researchers found that a D614G mutation in the SARS-CoV-2 genome enhances the virus's ability to infect human cells, helping it to become the leading strain of the virus spreading around the world today [3]. The research team also proposed that D614G mutation pseudovirus was associated with higher infectivity. Quantitative analysis showed that viral particles carrying 614-G mutation had significantly higher infectivity titer than the corresponding 614-D mutation, which increased by 2.6 to 9.3 times, and was confirmed in a variety of cell types [3]. However, there is no evidence that it will lead to a more serious condition [4] because there is a big difference between the laboratory results and the changes in the actual transmission of the virus, which scientists have monitored.
The complete genome sequence analysis of the SARS-CoV-2 from Xinfadi further confirmed that the source of the epidemic was not a new “overflow” from the host or intermediate host. The Xinfadi market has become the SARS-CoV-2 transmission “hub”, the most important reason being that the market environment is relatively wet and cold. One of the main characteristics of the virus is that it is sensitive to heat and not to cold. In such an environment, the virus can survive for a long time. Moreover, it is also closed and poorly ventilated, which also contributes to the spread of the virus. For example, if a person infected with SARS-CoV-2 sneezes, it is difficult to spread out, and the droplets may settle on the ground and pollute other places after flushing and sweeping. These conditions help to explain the occurrence of a large number of cases in a short period of time. However, this does not mean that the market itself is the source of the virus.
The prevention and control of COVID-19 epidemic in big cities should be based on the principle of precise prevention and control. For example, areas with new confirmed cases can be classified as medium-risk areas. However, it is not necessary to freeze all the communities in medium-risk areas. Precise prevention and control involves the freezing of only the communities that have confirmed cases. Beijing's rapid, effective, and accurate epidemic prevention and control measures will draw on experience from other places and the future normalization of epidemic prevention.
At present, with the rapid development of genomics technology, bioinformatics, and big data management technology, the new generation of genomic sequencing technology can achieve more detailed analysis of pathogenic microorganisms in specimens. Through bioinformatics analysis and high-speed operation of large-scale servers, massive whole genome sequence data can be processed and sorted, which can accurately trace the source of viruses and gain insight into their distribution. In the process of pathogen evolution, the virulence, pathogenic factors, and drug resistance genes are found, so as to grasp the pathogenic and drug-resistant mechanism of pathogens and provide a theoretical basis for the precise prevention and control of viral diseases.
The field of pathogen genomics has been developing rapidly, and various molecular technologies are being integrated into the diagnosis, prevention, and control of viral infectious diseases. The integration of genomic technology with bioinformatics and epidemiology has improved the public health surveillance, research, and control of infectious diseases, and has played an increasingly important role in the world [5]. For example, the traceability analysis of SARS virus in 2003, the transmission of imported wild poliovirus in Xinjiang, China in 2011, and the traceability of Ebola virus epidemic transmission in West Africa from 2014 to 2016 have been widely used [6].
With the support of the National Science and Technology Major Project, the National Institute for Viral Disease Control and Prevention has taken the lead in establishing a set of surveillance and traceability technology systems based on genome typing and clustering analysis algorithm of difference viruses including SARS-CoV-2, which can be used for early warning and prediction of viral infectious diseases. These systems were introduced in disease prevention and control centers at all levels in China, as well as scientific research and hospital systems. This will greatly improve the prevention and control of viral infectious diseases and play an important role in the prevention and control of major infectious diseases and the construction of biosafety system in China.
Finally, we also need to note whether the epidemic of D614G mutant SARS-CoV-2 is random or naturally selected, as the virus is circulating globally at present. Although there is no evidence from available epidemiological and clinical data that the mutation of S protein D614G leads to increased pathogenicity or virulence of the virus, whether the transmission of the virus is enhanced still needs to be examined by systematic global assessment. Therefore, we also need to conduct more research to understand the characteristics of this virus to help end the global outbreak as soon as possible. It is necessary to continuously monitor the genetic variation of SARS-CoV-2, focusing on the influence of key mutation sites of SARS-CoV-2 on viral transmission, clinical manifestations, severity, and course of disease.
Acknowledgements
This study was supported by the National Science and Technology Major Project of China (Project No. 2017ZX10104001, 2018ZX10711001).
Conflict of interest statement
The authors declare that there are no conflicts of interest. Given their roles as Editorial Board Member, Y. Zhang, W. Tan, D. Wang, W.J. Liu, W. Xu, and G. Wu had no involvement in the peer-review of this article and had no access to information regarding its peer-review. Full responsibility for the editorial process for this article was delegated to the editor Chuan Qin.
Author contributions
Yong Zhang: Conceptualization, Data curation, Writing – original draft, Writing – review & editing. Yang Pan: Investigation. Xiang Zhao: Data curation, Formal analysis. Weifeng Shi: Data curation, Formal analysis. Zhixiao Chen: Formal analysis. Sheng Zhang: Formal analysis. Peipei Liu: Formal analysis. Jinbo Xiao: Formal analysis. Wenjie Tan: Formal analysis. Dayan Wang: Formal analysis. William J. Liu: Formal analysis. Wenbo Xu: Conceptualization, Funding acquisition. Quanyi Wang: Conceptualization, Funding acquisition. Guizhen Wu: Conceptualization, Funding acquisition.
References
- 1.Tan W., Niu P., Zhao X., Pan Y., Zhang Y., Chen L., Zhao L., Wang Y., Wang D., Han J., Gao G.F., Huang C., Xu W., Wu G. Reemergent cases of COVID-19 — Xinfadi wholesales market, Beijing municipality, China, June 11, 2020. China CDC Weekly. 2020;2:502–504. doi: 10.46234/ccdcw2020.132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kumar S., Stecher G., Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 2016;33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Korber B., Fischer W.M., Gnanakaran S., Yoon H., Theiler J., Abfalterer W., Hengartner N., Giorgi E.E., Bhattacharya T., Foley B., Hastie K.M., Parker M.D., Partridge D.G., Evans C.M., Freeman T.M., Silva T.I.D., McDanal C., Perez L.G., Tang H., Moon-Walker A., Whelan S.P., LaBranche C.C., Saphire E.O., Montefiori D.C. Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell. 2020;182:812–827. doi: 10.1016/j.cell.2020.06.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Eaaswarkhanth M., Al Madhoun A., Al-Mulla F. Could the D614G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Int. J. Infect. Dis. 2020;96:459–460. doi: 10.1016/j.ijid.2020.05.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lu J., du Plessis L., Liu Z., Hill V., Kang M., Lin H., Sun J., Francois S., Kraemer M.U.G., Faria N.R., McCrone J.T., Peng J., Xiong Q., Yuan R., Zeng L., Zhou P., Liang C., Yi L., Liu J., Xiao J., Hu J., Liu T., Ma W., Li W., Su J., Zheng H., Peng B., Fang S., Su W., Li K., Sun R., Bai R., Tang X., Liang M., Quick J., Song T., Rambaut A., Loman N., Raghwani J., Pybus O.G., Ke C. Genomic epidemiology of SARS-CoV-2 in Guangdong Province, China. Cell. 2020;181:997–1003. doi: 10.1016/j.cell.2020.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gire S.K., Goba A., Andersen K.G., Sealfon R.S., Park D.J., Kanneh L., Jalloh S., Momoh M., Fullah M., Dudas G., Wohl S., Moses L.M., Yozwiak N.L., Winnicki S., Matranga C.B., Malboeuf C.M., Qu J., Gladden A.D., Schaffner S.F., Yang X., Jiang P.P., Nekoui M., Colubri A., Coomber M.R., Fonnie M., Moigboi A., Gbakie M., Kamara F.K., Tucker V., Konuwa E., Saffa S., Sellu J., Jalloh A.A., Kovoma A., Koninga J., Mustapha I., Kargbo K., Foday M., Yillah M., Kanneh F., Robert W., Massally J.L., Chapman S.B., Bochicchio J., Murphy C., Nusbaum C., Young S., Birren B.W., Grant D.S., Scheiffelin J.S., Lander E.S., Happi C., Gevao S.M., Gnirke A., Rambaut A., Garry R.F., Khan S.H., Sabeti P.C. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345:1369–1372. doi: 10.1126/science.1259657. [DOI] [PMC free article] [PubMed] [Google Scholar]