Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
letter
. 2022 Jan 15;84(5):722–746. doi: 10.1016/j.jinf.2022.01.016

Intra-host SARS-CoV-2 single-nucleotide variants emerged during the early stage of COVID-19 pandemic forecast population fixing mutations

Yi Zhang a,#, Ning Jiang a,d,#, Weiqiang Qi b,#, Tao Li b,#, Yumeng Zhang a, Haocheng Zhang a, Jing Wu a, Zhaoqin Zhu b,, Jingwen Ai a,, Chao Qiu a,c,, Wenhong Zhang a,c,d,e,
PMCID: PMC8760650  PMID: 35041922

Dear editor

We read with great interest the recent study by J Zhu et al [1]. reported that SARS-CoV-2 in saliva from infected patients presented viral load dynamics. With error-prone polymerase and recombination events occurring during the replication, the SARS-CoV-2 has been continuously spreading across populations. The highly diversity of SARS-CoV-2 genome leads to worries of emerging strains with high transmissibility or those escaping from immunity induced by infection or vaccination. Intra-host single-nucleotide variant variants (iSNVs) that appear during the course of infection might provide valuable information on distinguishing mixed founder viruses, their potential of escaping immune response, as well as potential clues for drug designing 2, 3, 4. A sub-lineage of SARS-CoV-2 was first defined by T4959 mutation that appeared as an iSNV in two individuals in Massachusetts [5]. Additionally, studies of intra-host diversity have revealed that the most iSNVs are either lost, or occasionally fixed as population dominant mutations [6]. Collectively, these observations suggest that close monitoring of iSNVs would help predict dominant mutations and gain time in besieging and addressing variants of concern.

Here, taking advantage of iSNVs profiles from specimens collected from January 2020 to March 2020, the early stage of the pandemic, we investigated the use of iSNVs in predicting mutations that can be fixed at the population level. Sixty-three confirmed COVID-19 patients from domestic cases and oversea importation cases admitted to Huashan Hospital or Shanghai Public Health Clinical Center from January to March 2020 were included in this study. Viral RNA was isolated from patients' sputum or nasopharyngeal swabs and amplicon sequencings were performed on Illumina Nova-seq Platform (Illumina, USA). This study was approved by the ethical committee of Huashan Hospital.

Raw sequencing data were filtered and mapped to the reference genome (Accession number: NC_045512.1). Bowtie2 (v 2.3.3.1) was used for mapping reads and candidate SNPs were identified using SAMtools (v 1.9). The number of mapping reads, mapping ratio, sequence coverage, and depth were generated to evaluate the quality of specimens. The iSNVs sites were determined as described earlier [2,3]; briefly, first, criteria of Phred Quality Score (base quality and mapping quality) ≥ 20 and ≥ 200x depth were met. Second, [1] minor allele frequency ≥5%, [2] at least ten reads to support the minor allele, and finally, [3] strand bias of the minor allele and reads with major allele < 10-folds. Additionally, iSNVs positioned in the 20 bp upstream and downstream of the primers-targeted SARS-CoV-2 genome region was removed. The annotations were made using the reference genome available at NCBI (NC_045512.1). The data related to fixed SARS-CoV-2 mutation sites deposited by November 8th, 2021 were downloaded from National Genomics Data Center [7].

Overall, 836 iSNV sites were identified among specimens obtained from 61 patients. As of November 8th, 2021, 829 out of 836 (99.16%) iSNVs in our samples were repeatedly found as fixed single-nucleotide polymorphisms (SNPs) in the sequences deposited from laboratories across the world in National Genomics Data Center platform. Among these sites, 29 iSNVs gradually increased in frequency and eventually became consensus variants worldwide, with at least 1‰ proportion (5,202) in 5201,737 strains, suggesting they are advantageous within small subsets of population (Table 1 ). Other variants either ‘reverted’ in subsequent infections or did not transmit as effectively during onward transmission. Four sites were considered as lineage-specific fixed mutation with existence in more than 1% strains. These four iSNVs, 10,029 (ORF1ab: T3255I), 11,418(ORF1ab: V3718A), 26,149 (ORF3a: S253P) and 28,932 (N: A220V) were non-synonymous regarding change in amino acids (Fig. 1 a and 1b). Surprisingly, iSNV site 10,029 accounted for approximately 20.19% (1050,005/5201,737), especially in Delta and Omicron variant. The iSNVs 11,418 and 28,932 accounted for approximately 2.82% (146,468/5201,737) and 2.35% (122,070/5201,737), respectively. The iSNV 22,992 (S: S477N) was later identified as a marked mutation of the Iota and Omicron Variant. (Fig. 1c)

Table 1.

Prospective dominant fixed mutations indicated by iSNV.

Position Region Substitution alteration Annotation of alteration Amino Acid alteration Accumulated mutation rate by 2021/11/08 Timeline of SNP
iSNVs existed before fixed mutation emerge
5392 ORF1ab C>T synonymous / 0.13% Emerge from 2020/03, ascend from 2020/08
6027 ORF1ab C>A nonsynonymous P1921Q 0.13% Emerge from 2020/03, ascend from 2020/07
7767 ORF1ab T>C nonsynonymous I2501T 0.40% Emerge from 2020/03, ascend from 2020/08
8084 ORF1ab G>A nonsynonymous E2607K 0.10% Emerge from 2020/08, ascend from 2021/03
11,418 ORF1ab T>C nonsynonymous V3718A 2.82% Emerge from 2020/05, ascend from 2021/06
12,053 ORF1ab C>T nonsynonymous L3930F 0.16% Emerge from 2020/03, ascend from 2021/01
18,395 ORF1ab C>T nonsynonymous A6044V 0.15% Emerge from 2020/03, ascend from 2021/07
21,306 ORF1ab C>T synonymous / 0.39% Emerge from 2020/03, ascend from 2021/02
21,789 S C>T nonsynonymous T76I 0.13% Emerge from 2020/03, ascend from 2021/03
22,088 S C>T nonsynonymous L176F 0.11% Emerge from 2020/02, ascend from 2021/02
24,334 S C>T synonymous / 0.35% Emerge from 2020/03, ascend from 2020/09
25,703 ORF3a C>T nonsynonymous P104L 0.20% Emerge from 2020/03, ascend from 2021/02
25,710 ORF3a C>T synonymous / 0.63% Emerge from 2020/02, ascend from 2020/08
26,149 ORF3a T>C nonsynonymous S253P 1.20% Emerge from 2020/03, ascend from 2021/04
28,932 N C>T nonsynonymous A220V 2.35% Emerge from 2020/02, ascend from 2020/08
29,750 3′UTR C>T non-coding / 0.18% Emerge from 2020/03, ascend from 2020/09
iSNVs existed before epidemic of fixed mutations
2485 ORF1ab C>T synonymous / 0.14% Emerge from 2020/01, ascend from 2020/11
4878 ORF1ab C>T nonsynonymous T1538I 0.14% Emerge from 2020/03, ascend from 2020/05
9430 ORF1ab C>T synonymous / 0.25% Emerge from 2020/03, ascend from 2020/12
9693 ORF1ab C>T nonsynonymous A3143V 0.21% Emerge from 2020/01, ascend from 2021/01
10,029 ORF1ab C>T nonsynonymous T3255I 20.19% Emerge from 2020/01, ascend from 2021/07
11,521 ORF1ab G>T nonsynonymous M3752I 0.17% Emerge from 2020/01, ascend from 2021/01
14,708 ORF1ab C>T nonsynonymous A4815V 0.23% Emerge from 2020/03, ascend from 2020/06
14,724 ORF1ab C>T synonymous / 0.23% Emerge from 2019/12, ascend from 2020/03
16,092 ORF1ab C>T synonymous / 0.10% Emerge from 2020/03, ascend from 2020/12
22,992 S G>A nonsynonymous S477N 0.96% Emerge from 2020/01, ascend from 2020/07
27,999 ORF8 C>T nonsynonymous P36S 0.15% Emerge from 2020/01, ascend from 2021/07
29,743 3′UTR C>T non-coding / 0.12% Emerge from 2020/01, ascend from 2020/09
29,779 3′UTR G>T non-coding / 0.22% Emerge from 2020/01, ascend from 2020/05

Fig. 1.

Fig 1

The dynamics mutation rate of iSNV 10029, 11428 and 22992 since December, 2019. (a) The dynamics fixed mutation rate of iSNV 10029 since December, 2021. The fixed mutation emerged from 2020/01 and ascend dramatically from 2021/07.(b) The dynamics fixed mutation rate of iSNV 11418 since December, 2021. The fixed mutation emerged from 2020/05 and ascend dramatically from 2021/06.(c) The dynamics fixed mutation rate of iSNV 22992 since December, 2021. The fixed mutation emerged from 2020/01 and ascend quickly from 2020/07.(d) The proportion of dominant fixed mutations in substitution C>U/G>A and the other five substitutions.

Among 29 iSNVs, 18 (62.07%) iSNVs led to non-synonymous substitutions, 8 (27.59%) led to synonymous substitutions, whereas the remaining 3 iSNVs were located in non-coding regions; the high proportion of non-synonymous substitutions indicated enhanced viral replication after cross-species transmission. The iSNVs to be fixation for C>U/G>A transitions (10.75%, 23/214) is stark significantly higher than the other substitutions (0.96%, 6/622) (P < 0.001, OR=11.14) (Fig. 1d).

Based on the emergence and time-line of these fixed SNPs, we inferred that among the 29 iSNVs, 16 iSNVs occurred in our samples far before fixing consensus SNPs emerged. Time-line wise, the other 13 iSNVs were observed close to the emergence of fixed mutations, but before the progression to fixation (Table 1). Apart from the lineage-specific mutations, 4 iSNVs (21,789, 22,088, 22,992, and 24,334) were located at the region encoding the spike protein, which may help in immune escape, especially iSNV 22,992 (S: S477N) [8,9]

Among publicly available sequenced database that had been deposited 18 months into the COVID-19 pandemic, we identified several intra-host variants that were eventually fixed, and their proportion increased in population. Among these, substitutions in the receptor binding domain (RBD) attracted our attention as they may affect receptor binding or neutralization by antibodies, although most iSNVs identified in this study may have been lost during transmission because of the narrow bottleneck.

Although previous studies reported occasional fixed mutations from iSNVs [6], in this study, we observed that a large proportion of iSNVs could be found in the several dominant lineages in the samples obtained during early stage of COVID-19 epidemic.

During origin of iSNVs, interferon-induced expression of restriction factors belonging to APOBEC family exclusively deaminate an adenine or cytosine on the viral RNA, initiating C-U/G-A transitions, which facilitates evading degradation [10]. The large proportion of C-U/G-A in these iSNVs may be linked with APOBECs driven under innate immune pressure. As most of the fixed mutations are APOBEC related, APOBEC RNA editing may drive SARS-CoV-2 adaptation to the human host. In conclusion, close monitoring of variants conferring immune-escape ability via iSNVs would aid in variants surveillance and forecasting immune-escape variants that may emerge in the future.

Author contribution

WZ, CQ, ZZ and JWA designed the study. WQQ and TL collected the samples and clinical data. YZ and NJ analyzed clinical and sequencing data. YZ completed figures and tables. YZ and CQ drafted the article. YZ and YMZ, JW, HZ performed RNA extraction and amplification. All the authors contributed to this article and reviewed the final version.

Availability of data and materials

All data are available from the corresponding author.

Declarations of Competing Interest

All authors report no potential conflict of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China [Grant No. 82041010, 92169212 and and 82161138018], Shanghai Key Laboratory of Infectious Diseases and Biosafety Emergency Response [Grant No. 20dz2260100], and Key Discipline Construction Plan from Shanghai Municipal Health Commission (GWV-10.1-XK01). We acknowledge National Genomics Data Center platform (https://bigd.big.ac.cn/ncov#progress) for SARS-CoV-2 genome surveillance data.

References

  • 1.Zhu J, Guo J, Xu Y, Chen X. Viral dynamics of SARS-CoV-2 in saliva from infected patients. J Infect. 2020;81(3):e48–e50. doi: 10.1016/j.jinf.2020.06.059. PubMed PMID: 32593658. Pubmed Central PMCID: PMC7316041. Epub 2020/07/01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wang Y, Wang D, Zhang L, Sun W, Zhang Z, Chen W, et al. Intra-host variation and evolutionary dynamics of SARS-CoV-2 populations in COVID-19 patients. Genome Med. 2021 Feb 22;13(1):30. doi: 10.1186/s13073-021-00847-5. PubMed PMID: 33618765. Pubmed Central PMCID: PMC7898256. Epub 2021/02/24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ni M, Chen C, Qian J, Xiao HX, Shi WF, Luo Y, et al. Intra-host dynamics of Ebola virus during. Nat Microbiol. 2014;1(11):16151. doi: 10.1038/nmicrobiol.2016.151. 2016 Sep 5PubMed PMID: 27595345. Epub 2016/10/27. [DOI] [PubMed] [Google Scholar]
  • 4.Jary A, Leducq V, Malet I, Marot S, Klement-Frutos E, Teyssou E, et al. Evolution of viral quasispecies during SARS-CoV-2 infection. Clin Microbiol Infect. 2020;26(11):1560. doi: 10.1016/j.cmi.2020.07.032. Nove1- e4. PubMed PMID: 32717416. Pubmed Central PMCID: PMC7378485. Epub 2020/07/28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Siddle KJ, Krasilnikova LA, Moreno GK, Schaffner SF, Vostok J, Fitzgerald NA, et al. Evidence of transmission from fully vaccinated individuals in a large outbreak of the SARS-CoV-2 Delta variant in Provincetown. Massachusetts. medRxiv. 2021 Oct 20 PubMed PMID: 34704102. Pubmed Central PMCID: PMC8547534. Epub 2021/10/28. [Google Scholar]
  • 6.Lythgoe KA, Hall M, Ferretti L, de Cesare M, MacIntyre-Cockett G, Trebes A, et al. SARS-CoV-2 within-host diversity and transmission. Science. 2021 Apr 16;372(6539) doi: 10.1126/science.abg0821. PubMed PMID: 33688063. Epub 2021/03/11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Song S, Ma L, Zou D, Tian D, Li C, Zhu J, et al. The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR. Genomics Proteom Bioinform. 2020;18(6):749–759. doi: 10.1016/j.gpb.2020.09.001. DecPubMed PMID: 33704069. Pubmed Central PMCID: PMC7836967. Epub 2021/03/12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, et al. Prospective mapping of viral mutations that escape antibodies used to treat COVID-19. Science. 2021 Feb 19;371(6531):850–854. doi: 10.1126/science.abf9302. PubMed PMID: 33495308. Pubmed Central PMCID: PMC7963219. Epub 2021/01/27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. 2021;19(7):409–424. doi: 10.1038/s41579-021-00573-0. JulPubMed PMID: 34075212. Pubmed Central PMCID: PMC8167834. Epub 2021/06/03. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Di Giorgio S, Martignano F, Torcia MG, Mattiuz G, Conticello SG. Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2. Sci Adv. 2020;6(25):eabb5813. doi: 10.1126/sciadv.abb5813. JunPubMed PMID: 32596474. Pubmed Central PMCID: PMC7299625. Epub 2020/07/01. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data are available from the corresponding author.


Articles from The Journal of Infection are provided here courtesy of Elsevier

RESOURCES