Abstract
How SARS-CoV-2 Omicron evolved remains obscure. T492I, an Omicron-specific mutation encountered in SARS-CoV-2 nonstructural protein 4 (NSP4), enhances viral replication and alters nonstructural protein cleavage, inferring potentials to drive evolution. Through evolve-and-resequence experiments of SARS-CoV-2 wild-type (hCoV-19/USA/WA-CDC-02982585-001/2020, A) and Delta strains (B.1.617) with or without T492I, this study demonstrates that the NSP4 mutation T492I confers accelerated phenotypic adaption and a predisposition to the emergence of SARS-CoV-2 Omicron-like variants. The T492I-driven evolution results in accelerated enhancement in viral replication, infectivity, immune evasion capacity, receptor-binding affinity and potential for cross-species transmission. Aside from elevated mutation rates and impact on deaminases, positive epistasis between T492I and adaptive mutations could potentially mechanistically facilitate the shifts in mutation spectra and indirectly determines the Omicron-predisposing evolution. These suggest a potentially important role of the driver mutation T492I in the evolution of SARS-CoV-2 Omicron variants. Our findings highlight the existence and importance of mutation-driven predisposition in viral evolution.
Subject terms: SARS-CoV-2, Evolutionary biology, Viral evolution
Drivers of viral evolution in SARS-CoV-2 are insufficiently understood. In this study, the authors show how a key SARS-CoV-2 mutation, NSP4 T492I, is potentially responsible for accelerating genome evolution to develop adaptive variants (e.g. Omicron).
Introduction
The SARS-CoV-2 Omicron strain is the latest variant of concern (VOC), giving rise to the fourth wave of the COVID-19 pandemic. Variants of this lineage are still predominant around the world1–3. Compared to previously emerged VOCs and variants of interest (VOIs), Omicron is heavily mutated3, resulting in considerable conformational alterations, which cause increased transmissibility and immune evasion capability compared with Alpha and Delta4,5. At the same time, mutations encountered in Omicron variants are thought to result in attenuated pathology6, elevations in the binding affinity to host cell enzymes7,8, and a tropism toward the upper airway epithelium and away from lung tissue9. The Omicron variants have a rapid epidemic expansion, faster than any previous VOCs or VOIs10–12. However, the origins of the Omicron variant remain obscure. This VOC contains a number of mutations rarely seen in previous VOCs or VOIs, particularly changes observed in the spike protein (S) distinguish this variant from previously circulating variants13. Identifying the evolutionary origin of Omicron variants has more than academic importance and may help to prevent the pandemic spread of new variants possibly emerging in the future12, such as guidance on control procedures and vaccine development.
Previous work suggests that the S protein, the nucleocapsid protein (N), the non-structural protein 4 (NSP4) and NSP6 facilitate the functional adaption of Omicron variants14–17. The T492I mutation within the non-structural protein 4 (NSP4) in SARS-CoV-2 is associated with increased replication capacity and accelerated viral replication15. Elevated mutation rates can influence evolution18,19, and mutations can shape future evolution by epistasis20, modulating the effects of mutations at other sites21, such as the epistatic shifts caused by the spike N501Y substitution in the effects of receptor-binding domain (RBD) mutations11,21–23. Thus, variants bearing T492I theoretically undergo more replication cycles, introduce more mutations to the genome than the wild-type virus within a transmission cycle, and endow more opportunities to develop adaptive mutations (such as S D614G and N R203K/G304R)16,17,24 to support the transmission advantage and adaptability of SARS-CoV-2. Our previous in silico analysis of the highly transmissible Delta sub-variant 21 J (with 492I) and the less transmissible variants 21 A + 21I (with T492) predicted a possibly additive effect of T492I to adaptive mutations in fitness. Moreover, T492I increases the cleavage efficiency of the viral main protease NSP5 by enhancing enzyme−substrate binding, resulting in an increased production of most NSPs processed by NSP5. The NSPs are known to constitute the viral replication and transcription complex, interact with host proteins during the early coronavirus replication cycle, and initiate the biogenesis of replication organelles25,26. NSP4 also contributes importantly to the assembly of DMV-spanning pores27. Consequently, the T492I mutation, as an adaptive mutation involved in replication, transcription and the capacity of immune evasion, may cause emergence and increased selection of other adaptive mutations. The VOCs Delta and Omicron both bear the T492I mutation. However, previous research suggests that these two VOCs are attributed to distinct phylogroups and do not share a common ancestry28,29. Delta has a relatively limited mutational landscape, compared to Omicron. The new mutations in the Delta variant emerged through multiple stages, a process of acceleration with more and more mutations30. T492I may contribute differently to the evolution of these two different VOCs.
Theoretically, mutations can be the driver of evolution, and major phenotypic changes may arise after the occurrence of rare constraint-breaking mutations31,32. Previous records of mutation-driven evolution were primarily identified in cancer development33–35, in which the driver mutation increases an individual’s susceptibility or predisposition to certain diseases, particularly cancers. The identification of these cancer-predisposing mutations is crucial for targeted cancer screening, early detection, and personalized treatment strategies36. Mutation-driven predisposition in the evolution of viruses, especially epidemic viruses, was rarely identified or systematically investigated, although partially relevant cases were sporadically reported37–39, e.g. the mutation-driven parallel evolution of the angiotensin-converting enzyme 2 (ACE2). Intrinsic characters of viral mutations, such as driver mutations, mutation rates and mutation bias, may influence or in large part determines the process of viral evolution38–42. Identification of the driver or predisposing mutation and in-depth understanding of the evolutionary mechanism prospectively contribute fundamentally to forecasting viral evolution and developing strategies for prevention and treatment.
To evaluate the hypotheses formulated above, we performed evolve-and-resequence experiments of replicate SARS-CoV-2 populations propagated by serial passaging on Calu-3 cells, and compared the populations evolved from the wild-type strain and those from the T492I mutant (Fig. 1a). We also performed the same experiment for the VOC Delta and compared the evolved populations resulting from Delta ancestors with T492I and those without T492I. Our findings show that T492I confers an evolution toward enlarged enhancements in viral replication, transmissibility and immune evasion capability. Subsequent combined efforts of both comprehensive genomic analyses and experiments demonstrated the T492I-driven predisposition to Omicron-like variants. This agrees with population genomic and phylogenetic analysis results on historical data. The Omicron-predisposing evolution confers increased RBD-hACE2 binding affinity and cross-species infection potentials. Mechanically, our study suggests that the driver mutation T492I induces positive epistasis (synergism)43, elevated mutation rates and alteration in the expression of RNA-editing enzymes. These should collectively introduce shifted mutation spectra and Omicron-predisposing evolutionary forces, and result in an accelerated Omicron-predisposition evolution (Fig. 1b).
Fig. 1. The experimental design and primary findings of this study.
a Graphic overview of evolve-and-resequence experiments. The ancestors are the wild-type strain bearing T492I (aWT-I), the wild-type strain (aWT-T), the Delta variant (aDelta-I) and the Delta variant without T492I (aDelta-T), and the evolved populations are eWT-I, eWT-T, eDelta-I and eDelta-T, respectively. Each run has three replicates (R1, R2 and R3). b Diagram displays a proposed mechanism of the Omicron-biased evolution induced by T492I. The diagrams were created in Canva (www.canva.cn).
Results
The T492I-driven evolution results in enlarged viral replication, infectivity and immune evasion capability
To evaluate the effects of T492I on the evolution of SARS-CoV-2, we constructed four ancestor variants for our evolution experiments: wild-type strain (aWT-T; hCoV-19/USA/WA-CDC-02982585-001/2020, GenBank accession No. MT020880, Pango Lineage A), T492I mutant (aWT-I) that based on the aWT-T strain; Delta variant (aDelta-I; EVAg accession No. 009V-04187, Pango Lineage B.1.617) which bearing the T492I mutation; and the reversely mutated (I492T) Delta variant (aDelta-T). We selected Delta and WT as ancestors for the evolution experiments to evaluate the effects of T492I on the strains that already contained adaptive mutations and those that did not. Experimental evolution was carried out over an incubation course of 90 days (30 transmission events) with three independent replicates performed in parallel (R1, R2 and R3) (Fig. 1a), resulting in corresponding evolved populations (eWT-T, eWT-I, eDelta-T and eDelta-I). Thereafter, we tested and compared the replication and infectivity of ancestral populations and the populations resulting from the evolution trial in human lung epithelial cells (Calu-3, Fig. 2a–c and Supplementary Fig. 1a–d). The results show that eWT-I and eDelta-I (the 492I populations) produced higher levels of extracellular viral RNA at 24 and 36 hours post infection (hpi) than eWT-T and eDelta-T (the T492 populations), respectively (Supplementary Fig. 1b). This is similarly true for ancestral populations when compared among each other (aWT-I vs aWT-T and aDelta-I vs aDelta-T, Supplementary Fig. 1a). By calculating the fold changes of detected viral genome copies before and after evolution, we find that the T492I mutation endowed WT strains with stronger replication capabilities after 30 passages (Fig. 2a). Further, viral infectivity was measured by plaque assay and viral subgenomic RNA assay. The results show that the evolved populations from the 492I run produced significantly higher infectious titers (Supplementary Fig. 1c, d) and E sgRNA loads (Supplementary Fig. 1e, f) compared to those of the control (the T492 run). We also find that the mutation T492I conferred increased infectivity to both the WT and Delta strains after the evolution process (Fig. 2b, c).
Fig. 2. Evidence suggesting enhanced increases of viral replication, infectivity and immune evasion capability in 492I populations.
a–c Calu-3 cells were infected with aWT-T, aWT-I, aDelta-I or aDelta-T virus at a MOI of 0.01. Fold changes in genomic RNA levels (a), PFU titerss (b), and E sgRNA loads (c) were calculated from the ratios of evolved populations to ancestor strains. Fold changes of genomic RNA levels (d), PFU titerss (e), and E sgRNA loads (f) were calculated from the ratios of evolved populations to ancestor strains. g–i Calu-3 cells were infected with aWT-T, aWT-I, aDelta-I or aDelta-T virus at an MOI of 0.01. At 12, 24, and 48 h after infection, total RNA extracted from the cells was evaluated via real-time qRT-PCR. Fold changes in the IFN-β (g), IFN-λ (h), and ISG56 (i) mRNA levels were calculated from the ratios of evolved populations to ancestor strains. The experiments were performed in three independent biological replicates. Pairwise comparisons were performed via a one-sided Student’s t-test. Data are presented as the mean ± s.e.m. ‘*’, ‘**’, and ‘***’ denote a P-value < 0.1, <0.05 and <0.01, respectively. ‘ns’, not significant. The exact P-values with significance levels are near the asterisks and parenthesized. Source data are provided as a Source Data file.
Furthermore, replication and infectivity tests were performed on the Vero E6 cells. In this experiment, similar trends were observed for fold changes in extracellular viral RNA (Fig. 2d and Supplementary Fig. 1g, h), infectious titers (Fig. 2e and Supplementary Fig. 1i, j), and E sgRNA loads (Fig. 2f and Supplementary Fig. 1k, l) before and after evolution. In particular, there are no significant differences in PFU titers (Supplementary Fig. 1i) and E sgRNA loads (Supplementary Fig. 1k) between either aWT-I and aWT-T or aDelta-I and aDelta-T. This is consistent with our previous observations15. However, eWT-I produced significantly higher PFU titers (Supplementary Fig. 1j) and E sgRNA loads (Supplementary Fig. 1l) than eWT-T, and so did eDelta-I compared to eDelta-T. These results demonstrate an enhancement of replication and infectivity driven by T492I in the evolution of SARS-CoV-2.
A previous study from our lab suggests a potential association between the T492I mutation and immune evasion capacity of SARS-CoV-215. We subsequently tested the production of interferon (IFN)-β, IFN-λ, and interferon stimulated gene 56 (ISG56) in the pre- and post-evolution variants. As a result, the T492I mutation shows inhibitory effects on the mRNA level of IFN-β in both pre- and post-evolution wild type group (Supplementary Fig. 1m, n), but the inhibitory effect was stronger when the T492I mutation strains evolved over time (Fig. 2g). Although there are no significant differences in the IFN-β level between T492 and 492I virus in the pre-evolution Delta group (Supplementary Fig. 1m, n), the production of IFN-β is significantly higher in the 492I virus of post-evolution Delta group (Fig. 2g), suggesting that the WT and Delta strains bearing T492I acquired enhanced immune evasion capabilities in the 90-day evolution. These are further confirmed by examining the production IFN-λ and ISG56 at 24 and 36 hpi in Calu-3 cells (Fig. 2h, i and Supplementary Fig. 1o-r). The identified enlarged alteration of the evolved 492I populations in viral replication, infectivity and immune evasion capacity demonstrates accelerated phenotypic adaption driven by T492I.
T492I induces elevated mutation rates and predisposition to emergence of Omicron-like variants
For uncovering the genomic alterations leading to the phenotypic changes driven by T492I, we extracted the RNA from the evolved viruses and analyzed via whole-genome sequencing (WGS). To develop a better understanding of the evolutionary process, we also independently performed evolve-and-resequence experiments on the four constructed variants in a shorter experiment of 45 days (15 transmission events). Through variant detection of the sequencing data via a Bayesian statistical framework44 (requiring a frequency > 0.01), we identified 753 and 611 mutations in populations that underwent experimental evolution for 90 days (90-day runs) and those subject to experimental evolution for 45 days (45-day runs), respectively. In both the 90-day and 45-day runs, eWT-I accumulated more mutations than eWT-T did, and the same effect was observed when 45-day eDelta-I and 45-day eDelta-T were compared (Fig. 3a). The 90-day eWT-I has a median mutation rate (0.000116 per base per transmission event, nt-1 T-1), which is 4.36-fold higher than that of the 90-day eWT-T (0.000027 nt-1 T-1), and the 90-day eDelta-I has a median mutation rate (0.000133 nt-1 T-1), which is 2.63-fold higher than that of the 90-day eDelta-T (0.000051 nt-1 T-1). Similarly, the 45-day 492I populations (eWT-I and eDelta-I) have a 3 ~ 4-fold higher mutation rate than the 45-day T492 populations (eWT-T and eDelta-T, Fig. 3b). This finding demonstrates the increase in SARS-CoV-2 mutation rates conveyed by T492I, which agrees with the enhancement of viral replication conferred by T492I15. Accordingly, global surveys of full-length SARS-CoV-2 genomes also reveal a greater number of mutations in the 492I variants than in the T492 variants (Fig. 3c, d) before the emergence of VOCs (from April 2020 to November 2020). For both the 90-day- and 45-day-evolved populations, the mutations we identified are mostly nonsynonymous (two-sided Binomial test, P-value < 2.2e-16, Supplementary Fig. 2a) and single nucleotide polymorphisms (SNPs, two-sided Binomial test, P-value < 2.2e-16, Supplementary Fig. 2b).
Fig. 3. Evidence suggesting Omicron-predisposing evolution driven by T492I.
a Comparison of mutation counts (Frequency > 0.01) between the 492I and T492 populations. b Comparison of mutation rates in different evolved populations. c Global dynamics of the median mutation counts in the strains bearing T492 and those bearing 492I. The changes in IF of historical dominant VOCs and T492I were shown at the top. d Comparison of the number of amino acid substitutions between the 492I variants (n = 201) and the T492 variants (n = 201) from April 2020 to November 2020 on the basis of global SARS-CoV-2 epidemiological data. e Distributions of nonsynonymous mutation frequencies in different evolved populations. f, g Comparison in the distribution of nonsynonymous mutation frequencies between eWT-I and eWT-T (f) and that between eDelta-I and eDelta-T (g). Raincloud, box and scatter plots display the distributions of the mutations characteristic for Omicron BA.2 (red) and other mutations (gray). The boxplots in (d, f, g) show the median (center), the lower quartile (25%) and upper quartile (75%). The whiskers denote the points within 1.5 times the interquartile range (IQR) from the lower and upper quartiles. For pairwise comparisons, one-way ANOVA tests, two-sided Wilcoxon tests and two-sided Kolmogorov-Smirnov tests were performed in (a, b), (d) and (e–g), respectively. (h) Heatmap displays the frequencies of the T492I-driven mutations. Delta-specific and WT-specific mutations are colored by gray. Historical global IFs are on the right. ‘Fixed’, fixed to date. In (i), P and O denote the fractions of T492I-driven mutations in the VOC-specific mutations, respectively. In (j), P and O denote the fractions of VOC-specific mutations in T492I-driven and other high-frequency mutations, respectively. The overlapping portions are marked by red. The statistics for the comparisons between P and O was performed via two-sided Fisher’s exact tests. Delta-specific reverse mutations were excluded for (h–j). ‘OR’, odds ratios. ‘n’, sample size. Other legends for statistics follow Fig. 2.
The 492I populations have a greater fraction of high-frequency nonsynonymous mutations than the T492 populations, both in the 90-day and 45-day runs (Fig. 3e–g and Supplementary Fig. 2c). A biased distribution toward high-frequency mutations was also found for synonymous and noncoding region mutations (Supplementary Fig. 2d, e). The higher frequencies of the mutations observed in the 90-day 492I populations than in in the 45-day 492I populations suggest an evolutionary dynamic induced by T492I (Supplementary Fig. 2f). There were 139 high-frequency nonsynonymous mutations with a > 0.5 frequency in one or more replicates of the 90-day and 45-day runs (HF_mutations.xlsx in Supplementary Data 1). Among these high-frequency mutations, 78 were identified with a significantly greater frequency in the 492I populations than in the T492 populations (please refer to the Methods description for details), suggesting that the occurrence of and selection for these mutations was driven by T492I. All of these T492I-driven mutations were present in at least one of the Omicron sublineages, most of which (95%) were still dominant around the world (April, 2024, Fig. 3f–h and Supplementary Fig. 2g). Similarly, T492I appeared to drive synonymous mutations as well as mutations in the noncoding region characteristic for the Omicron variant (Supplementary Fig. 2h, i and and HF_mutations.xlsx in Supplementary Data 1). These findings suggest that T492I directs evolution toward adaptiveness and a predisposition to the Omicron variants. When the 22 Delta-to-WT reverse mutations were excluded (Supplementary Fig. 2j), 54 mutations remained. The fraction of Omicron sublineage mutations in the 54 T492I-driven mutations was generally greater than the fraction of other high-frequency mutations (Fig. 3i), and the fraction of these 54 T492I-driven mutations in Omicron sublineage mutations was also greater than those of other high-frequency mutations (Fig. 3j). Similar fraction differences were not found for the VOCs Alpha and Delta. These data confirmed that the Omicron-predisposing evolution was driven by T492I. The fraction of T492I-driven mutations in the mutations of the Omicron BA.2 (86%) and BA.5 (85%) was greater than those of other Omicron sublineages (Fig. 3j), suggesting a predisposition to early Omicron lineages driven by T492I.
On the basis of the viral quasispecies reconstructed by TenSQR45, the predicted dominant strains in the 492I replicates harbored more dominant-to-date Omicron mutations than did the T492 replicates, both for the 90-day and 45-day experiments (Supplementary Fig. 2k, l). These are confirmed in the constructed combined phylogenies of different viruses (Supplementary Fig. 3a–e). The populations with 492I accumulated more mutations (2.7 ~ 6.9 folds), higher diversities (1.4 ~ 25 folds) and higher percentage of spike mutations than those with T492 (Supplementary Fig. 3a–d, f). Meanwhile, the phylogenetic relationship between the 45-day and 90-day dominant strains of the populations with T492I suggests a potentially convergent evolution under the effect of T492I (Supplementary Fig. 3a–d).
We counted the emergences of Omicron spike mutations in the historical strains to evaluate the effect of T492I to viral evolution in the pandemic of SARS-CoV-2, since most of Omicron mutations are spike mutations (e.g. 31/48 spike mutations in hCoV-19/Benin/NMIMR-712859/2022, BA.1, from the data of Nextstrain46). The spike mutations were previously utilized in the discrimination of Omicron variants47,48. We performed phylogenetic analyses on historical strains preceding the fixation of Omicron variants. The results show that Omicron spike mutations emerged more frequently in the strains bearing 492I than those not bearing T492 (Supplementary Fig. 4a–g). The historical strains with 492I bear more Omicron spike mutations than those with T492 preceding the beginning of the spread of Omicron variants (November 2021) (Supplementary Fig. 4h, i). These strengthen the Omicron-predisposing effect of T492I and suggest a potentially important role of NSP4 T492I in the emergence of Omicron variants historically.
The mutations resulted from the Omicron-predisposing evolution confer adaptive phenotypic alterations
Omicron mutations enhance the viral replication, infectivity and immune evasion capability of SARS-CoV-249–51. The mutations evolved after the Omicron-predisposing evolution driven by T492I result in the enlarged phenotypic alteration of the evolved 492I populations. In accordance to our experimental results, in silico analyses based on published experimental data4,52,53 show that many T492I-driven spike mutations have a promotion effect to infection (Fig. 4a) and are mostly associated with increases in the levels of serum neutralization capability (Fig. 4a). The T492I-driven non-spike mutations N R203K/G204R and NSP6 ∆SGF were likewise associated with enhancement in viral replication, infectivity and immune evasion capability, according to previous records16,25,54,55.
Fig. 4. Phenotypic and protein structural alterations conferred by the T492I-driven mutations and the evidence suggesting positive epistasis associated with T492I.
a Predicted impacts of T492I-driven mutations (please refer to the Methods description for details). Two-sided binomial tests were performed to evaluate the ratio of the mutations with a Log > 0 to those with a Log <0. b Predicted differences in the infection performance of T492I-driven and other mutations across different ACE2 orthologs. c The heatmap shows the infection performance of LVpps bearing the spike of evolved viruses in Calu-3 cell lines stably expressing ACE2 orthologs. d The binding ability of the RBD and hACE2 detected via ELISA (please refer to the Methods description for details). e The values of KD in different groups resulting from the SPR experiments. For pairwise comparisons, two-sided Wilcoxon tests and one-way ANOVA tests were performed in (c) and (d, e), respectively. f Representative molecular docking results of the hACE2/evolved RBD complex. The orange dotted line indicates the π-cation interaction formed by ARG-559 of hACE2 and PHE-138 of the eWT-I R1 RBD. hACE2, orange; RBD, blue; hydrophobic interaction, gray dotted line; hydrogen bond, blue line; π-stacking (parallel), green dotted line; π-cation interaction, orange dotted line; salt bridge, yellow dotted line. g Results of phylogenetic-based clustering analyses based on the OD and KD values from the Enzyme-Linked Immunosorbent Assay (ELISA) and SPR experiments. h–s are multi-mutant cycles illustrating the positive epistasis interaction between T492I and 5SM, between N501Y and 5SM and between T492I and N501Y + 5SM, based on the results from the Supplementary Fig. 5j–o. Asterisk indicates expected double-mutant phenotype assuming additivity. (h, i), (j, k), (l, m), (n, o), (p, q) and (r, s) correspond to the examination of the phenotypes, genomic RNA levels, PFU titerss, E sgRNA loads, IFN-λ, IFN-β and ISG56, respectively. The statistics are described detailedly in the Method part. For (d) and (h–s), the experiments were performed in three independent biological replicates. Data are presented as the mean ± s.e.m. ‘OR’, odds ratios. Other legends for statistics follow Fig. 2. Source data are provided as a Source Data file.
Aside from striking increases in viral replication, infectivity and immune evasion capability, the Omicron variant has a host range expansion56. Accordingly, in silico analyses also show that the T492I-driven spike mutations show potentials to induce higher cross-species infectivity than other mutations (Fig. 4b). To validate this issue, we generated 12 stable Calu-3 cell lines with exogenous expression of various ACE2 orthologs and infected them with different lentiviral-pseudotyped particles (LVpp) bearing the RBDs of the reconstructed dominant strains to determine the host tropism of SARS-CoV-2 variants. According to the viral quasispecies reconstructed from our sequencing data, the predicted predominant strains of all replicates (R1, R2 or R3) were selected as representative strains (Supplementary Table 1). By analyzing the infection performance, we find that most ACE2 orthologs could efficiently support virus entry, except those expressing the ACE2 of ferret, bat and mouse. Interestingly, the mutation T492I conferred drastically increased cross-species infectivity to the evolved viruses (Fig. 4c), suggesting an enhanced cross-species infectivity in the evolved populations driven by T492I.
Protein structural evolution resulted from the Omicron-predispositing evolution
To further investigate the infection capacity of the evolved viruses, we tested the binding affinity of the evolved populations to human angiotensin-converting enzyme 2 (hACE2). The predicted predominant strains of all replicates were ultilized as described above, and their receptor-binding domains (RBDs)57 were synthesized and subjected to a hACE2 binding ELISAs (Enzyme-Linked Immunosorbent Assay). The ancestor strains, WT and Delta, and the epidemic strains, BA.1 and EG.5.1, were used as controls. The results show that eWT-I has a stronger binding ability to hACE2 than eWT-T, and so dose eDelta-I vs eDelta-T (Fig. 4d). These data imply a potential association between the T492I-induced evolution and hACE2 affinity of the spike RBD. Next, we quantitatively analyzed the hACE2 affinities of these strains through surface plasmon resonance (SPR) experiments. Consistent with the results obtained from ELISA, strains of the 492I populations had significantly stronger hACE2 affinities than their controls (Fig. 4e and SPR.jpg in Supplementary Data 1). Through molecular docking, we find that the RBD-hACE2 binding interface of the eWT-I strains formed a noncovalent π-cation interaction, thereby further improving receptor affinity (Fig. 4f, Supplementary Table 2 and Molecular_docking.jpg in Supplementary Data 1). These results provide structural evidence to support the conduciveness of the T492I mutation to strengthening the affinity between RBD and hACE2 during the evolution. Based on the OD (optical density) and KD (dissociation rate constant) from ELISA and SPR experiments, phylogenetic-based clustering analyses show that the evolved populations of eWT-I and eDelta-I are clustered with EG.5.1, an Omicron sublineage predominant in 2023 (Fig. 4g). These findings suggest adaptiveness in the structure of the spike protein resulted from the Omicron-predisposing evolution driven by T492I.
Positive epistasis between T492I and T492I-driven mutations
Other genomic variations, such as the S mutation N501Y and the NSP6 deletion ΔSGF, are reported to play essential roles in the increased infectivity and transmissibility of SARS-CoV-214,21. N501Y has a predominant epistatic effect to S mutations21. Due to the effects of T492I to the cleavage of multiple non-structural proteins15, we infer that T492I confers a potential additive effect or even synergistic effect (positive epistasis) to other adaptive mutations in the SARS-CoV-2 genome. To evaluate this issue, we constructed NSP4 T492I, NSP6 ΔSGF, and S N501Y mutants based on the Delta 21 A strain (bearing T492), respectively, and then compared the replication and infectivity of these variants in Calu-3 cells. Through normalizing the data to aDelta-T, we found that T492I and N501Y both significantly increase extracellular viral RNA (Supplementary Fig. 5a), infectious titers (Supplementary Fig. 5b) and E sgRNA loads (Supplementary Fig. 5c), suggesting the potentials of T492I and N501Y to confer additive effects to adaptive mutations. Moreover, we found that the Delta strains with both T492I and N501Y mutations show a higher effect on replication and infectivity than sum of their individual effects (Supplementary Fig. 5d-i), suggesting a positive epistasis between NSP4 T492I and S N501Y under the genomic background of Delta-T. Further, we compared the effect of T492I and N501Y on the interaction with identified T492I-driven Omicron spike mutations. Five RBD spike mutations (5SM), namely S373P, K417N, S477N, Q498R, and Y505H, were selected according to the sequencing data (please see the Method part for details). The T492I mutation is located in nonstructural protein and cannot bind to ACE2 directly. In this context, replication, infectivity and immune evasion capacity were examined. The results show that the viruses with both T492I and 5SM exhibit significantly stronger replication, infectivity (Fig. 4h–m and Supplementary Fig. 5j–l) and immune evasion capacity (Fig. 4n–s and Supplementary Fig. 5m–o) outcomes than that would be expected from the sum of their individual effects, inferring a positive epistasis between T492I and 5SM. Positive epistasis was identified between N501Y and 5SM and between T492I and N501Y + 5SM, too (Fig. 4h–s). T492I + 5SM occasionally exhibit stronger replication and immune evasion capacity than N501Y + 5SM (Supplementary Fig. 5j–o). Taken together, these data suggest that, aside from S N501Y, NSP4 T492I is also capable of inducing a positive epistatic effect to adaptive Omicron mutations within the limits of these in vitro models.
T492I impacts the expression of RNA-editing enzymes and drives shifts in mutation spectra
The relative rates of different types of mutations positively correlate the process of viral evolution58,59. Therefore, we evaluated the impact of T492I on mutation spectra through comparatively analyzing the single base substitution (SBS) spectrum of the evolved populations in different runs, both for synonymous and nonsynonymous substitutions (Fig. 5a–l). Synonymous substitutions are expected to be neutral with respect to fitness and nonsynonymous substitutions may encounter underlying evolutionary forces. There are differences in the mutation spectrum at synonymous substitutions across different populations and the differences are enlarged at nonsynonymous substitutions (Fig. 5a, b). Synonymous substitutions have a higher fraction of C > T and G > A, and a lower fraction of T > C in the 492I populations than the T492 populations (Fig. 5c–f). The RNA-editing enzyme APOBEC (apolipoprotein B mRNA editing enzyme catalytic subunit) presumably induce C > T and G > A if deaminating cytosine in the antigenome3,60–63. The ADAR (adenosine deaminase acting on RNA enzymes) induces the A > G transition and the T > C transition in the antigenome64. Thus, higher C > T and G > A substitutions in 492I populations than in T492 populations may be a result of T492I-driven activation of APOBEC. Accordingly, there are a higher frequency of the nucleotide T near the 5’ end of the T > C synonymous substitutions (Fig. 5g) and a higher frequency of G near the 3’ end of the G > A synonymous substitutions (Fig. 5h). 5’-CC-3’ and 3’-GG-5’ in the antigenome are potential major target sequences of APOBEC enzymes65,66. The decrease in the relative rate of T > C may result from T492I-driven inactivation of ADAR. Through examining the relative expression of APOBEC3A, APOBEC1, APOBEC3G and ADAR in the Calu-3 cells infected by the virus with or without T492I, we find that infection with the T492I and Delta variants induced significant upregulation of APOBEC3A, APOBEC1 and APOBEC3G, while infection with WT and I492T variants resulted in remarkable induction of ADAR (Fig. 5m–q). Similar trends were observed when using Vero-E6 cells (Fig. 5r–v). Collectively, these experiments validate the capability of the NSP4 T492I to alter the mutation spectrum by regulating APOBEC and ADAR.
Fig. 5. Evidence suggesting the shifts in mutation spectra and the expression alteration of RNA-editing enzymes influenced by T492I.
a, b Principal component analysis (PCA) of mutation spectra across populations for synonymous (a) and nonsynonymous substitutions (b). c–f and i–l are fractions of different types of synonymous (c–f) and nonsynonymous substitutions (i–l) in different evolved populations. Statistics was performed via one-way ANOVA tests for comparisons between the 492I and T492 populations. The substitutions with a significant higher fraction in the 492I populations than in the T492 populations are colored by red, and those with a decreased fraction are colored by green. g, h Fractions of four types of nucleotides on the 5’ end of the T > C synonymous substitutions (g) and those on the 3’ end of the G > A synonymous substitutions (h) across different populations. m–v Calu-3 cells m–q or Vero E6 cells r–v were infected with the WT, T492I, Delta or Delta-I492T virus at a MOI of 0.01. At 24 h after infection, total RNA extracted from the cells was evaluated by real time qRT-PCR. The relative changes in the APOBEC3A (m, r), APOBEC1 (n, s), APOBEC3G (o, t) ADAR (p, u) and PRORP (Control, q, v) mRNA levels were normalized to the GAPDH mRNA level. The experiments were performed in six independent biological replicates. Pairwise comparisons were performed via a two-sided Student’s t-test. Data are presented as the mean ± s.e.m. Other legends for statistics follow Fig. 2. Source data are provided as a Source Data file.
More types of nonsynonymous substitutions have a significant difference in the relative rate between the T492 and 492I populations than those of synonymous substitutions (Fig. 5a–f, i–l). The epistasis of T492I should further enlarge the differentiation in the mutation spectrum between the T492 populations and the 492I populations. The 492I populations have high C > A nonsynonymous substitutions and low T > C and G > T nonsynonymous substitutions. C > A was reported to prefer to occur in lung bacteria and T > C prefers to occur in environmental bacteria67. G > T was previously found to be decreased in Omicron clades58,59 and generally elevated when viral replication occurs in the lower respiratory tract (LRT)68. These coincide with the tropism toward the upper respiration tract (URT) of Omicron variants.
Omicron-predisposing selective force driven by T492I
We performed sliding window analyses of selection signatures by ANGSD69 and SweeD70 to evaluate the impacts of T492I on the evolutionary driving force. Positive selection signatures, peaks of the composite likelihood ratio (CLR)71,72, were identified in the evolved 492I populations (Fig. 6a, b and Supplementary Fig. 6a, b) but were absent in the evolved T492 populations, possibly due to the limited number of identified SNPs after removing monomorphic cases. For the 492I populations, mutations characteristic for Omicron sublineages biasedly emerged in genomic regions with evidence for positive selection (a high CLR with a P-value < 0.05 according to the ranking tests, Fig. 6c). The identification of selection signatures via another population genomics analysis tool (CLEAR73) reveals that, compared with the T492 populations, the 492I populations presented a greater accumulation of Omicron mutations that were positively selected (Fig. 6d). The 492I populations also had more Omicron mutations in the genomic regions with estimated positive selection strength than the T492 populations (Fig. 6e). The estimated effective population sizes in the 492I populations were smaller than those in the T492 populations (Supplementary Fig. 6c). These results collectively demonstrate the biased positive directional selection of Omicron mutations in the 492I populations, suggesting Omicron-predisposing evolutionary forces may be induced by the T492I mutation.
Fig. 6. Selective pressure analyses results.
a, b Sliding window views display the CLR peaks and the thresholds (orange dotted line), for the 90-day eWT-I (a) and eDelta-I runs (b). The points colored in red denote the positions with high-frequency mutations. (c) Comparisons of the fractions of Omicron-specific mutations (Positions, Omicron) between the positions with positive selection signatures (SS + ) and those without positive selection signatures (SS-). ‘Positions, OTHS’ denotes the positions without Omicron-specific mutations. d Comparisons of the fractions of Omicron BA.2 mutations (Positions, Omicron) within the positions with a significantly high likelihood (H) across different runs. e Comparisons of the ratio of the positions with and without an estimated positive selection strength across the positions with Omicron BA.2 mutations in eDelta-I (Omicron, eDelta-I), eDelta-T (Omicron, eDelta-T), eWT-I (Omicron, eWT-I) and eWT-T (Omicron, eWT-T) and those without Omicron mutations in eDelta-I (Other, Delta-I) and eWT-I (Other, WT-I). Pairwise comparisons in (c–e) were performed via two-sided Fisher’s exact tests. ‘OR’, odds ratios. Other legends for statistics follow Fig. 2.
The 492I populations have more regions with high nucleotide diversities (π) and deviated-from-zero Tajima’s D than did the T492 populations (Fig. 7a, b and Supplementary Data 1), which is consistent with the increased mutation rate observed in the 492I populations. π and Tajima’s D both positively correlate the frequencies of mutations (Supplementary Fig. 6d, e). There is increased genetic differentiation (Fst) between the 492I populations and the T492 populations in the genomic regions with high-frequency mutations (Fig. 7c, d and Supplementary Data 1). For the 492I populations, the genomic regions with fixed Omicron mutations presented a greater π and a greater deviation from zero for Tajima’s D than did those without fixed Omicron mutations (Fig. 7e, f). The 45-day eDelta-I populations show more positions with a positive Tajima’s D and fewer positions with a minus Tajima’s D than the 90-day eDelta-I populations (two-sided Kolmogorov-Smirnov test, P-value = 0.000268, Fig. 7b), inferring a transition in the degree of selective sweep from the 45-day state (intermediate) to the 90-day state. Directional selection may have similar effects to those of balancing selection (Tajima’s D > 0) if the selected mutations are present at intermediate frequencies74.
Fig. 7. Other selective pressure analyses results.
a–f show the distributions of selection signatures on the basis of sliding window analyses across runs. a, b Comparison of the distributions of the nucleotide diversity (a) and Tajima’s D (b) between the T492 and 492I runs. c, d Comparison of the genetic differentiation (Fst) between the genomic regions with high-frequency mutations (HF Mut) and those without high-frequency mutations (others), both for the 90-day (c) and 45-day runs (d). e, f Comparisons of the distributions of the nucleotide diversity (e) and Tajima’s D (f) between the genomic regions with fixed Omicron mutations (Fixed) and those without fixed Omicron mutations (OTHS). In (a–f), pairwise comparisons were performed via two-sided Kolmogorov-Smirnov tests. g Estimated selection of the proteins in different populations. ‘nd’ denotes not detectable, due to the lack of segregating sites. By two-sided Fisher’s exact test, the proteins with a significantly greater percentage of positive selection cases in the 492I populations than in the T492 populations are marked by upward red arrows, and those with a significantly greater percentage of negative selection are marked by green downward arrows. h Comparisons of the distributions of the estimated emergence times of the mutations in the spike protein (S) and other proteins (OTHS) for the 90-day and 45-day 492I populations (Delta-I and WT-I). ‘Early Mut’ and ‘Late Mut’ denote the mutations that evolved early and those that evolved late in the run, respectively. Here, pairwise comparisons were performed via two-sided Wilcoxon tests. ‘OR’, odds ratios. ‘n’, sample size. Other legends for statistics follow Fig. 2.
Furthermore, we calculated the ratios of the nonsynonymous Pi and synonymous Pi (PiN/PiS) across SARS-CoV-2 proteins. The proteins with a significant >1 PiN/PiS and a significant <1 PiN/PiS were considered positively selected and negatively selected, respectively. The results (Fig. 7g) reveal that the NSP1, NSP14 and ORF9c proteins were more positively selected in the 492I populations than in the T492 populations. These three proteins are involved in immune evasion75–77 and the variations observed here suggest viral adaptations to the host innate immune response78. The NSP15, ORF3, ORF6, ORF7b and ORF8 proteins were negatively selected in the 492I populations compared with the T492 populations. Overall, the 492I populations presented more negatively selected proteins and fewer positively selected proteins than the T492 populations (chi-square test with continuity correction, P-value = 0.001855), suggesting the overall functionality of the T492I-driven mutations. This was confirmed by a lower Tajima’s D in the 492I populations of the 90-day runs than in the 45-day runs. On the basis of the in silico reconstructed viral quasispecies referred to above, we predicted the emergence time of the mutations in the dominant strain. The results reveal that nonspike mutations generally emerged later than spike mutations did (Fig. 7h), both in the 90-day and 45-day eDelta-492I populations. This finding agrees with the promotion of T492I to the emergences of spike mutations. The 90-day eDelta-I populations harbored more late emerged mutations than the 90-day eWT-I populations did, and similar results were observed between the 45-day populations of eDelta-I and eWT-I (Supplementary Fig. 6f). This may be due to the late emergence of Delta-to-WT reverse mutations (Supplementary Fig. 6g).
Discussion
Our study demonstrates the capability of NSP4 T492I to predispose the evolution of SARS-CoV-2 to accelerated emergences of Omicron-like variants via 24 independently performed replicates of evolve-and-resequence experiments for wild-type and Delta strains over incubation periods of 45 and 90 days (Fig. 1). The evidence includes increased proportions of high-frequency mutations, a high ratio of Omicron/fixed mutations in T492I-driven mutations and high coverage of T492I-driven mutations in the mutations of historically early Omicron variants, such as BA.2 and BA.5. However, the evolved Omicron-like viruses in the in vitro evolutionary model are not identical to the Omicron strains prevalent in the real world. There are many Omicron-representative mutations that are not present in the evolved viruses, while the evolved viruses also possess mutations that are not present in the Omicron strains. The predisposition to an evolution toward Omicron results in increases in viral replication, transmissibility, immune evasion capability and cross-species infection potentials of the viral populations that evolved from ancestors with T492I compared with those from ancestors without T492I. Consistent with previous findings in Omicron early variants, some T492I-driven mutations impair the infectivity of SARS-CoV-2 (e.g. S375F)52, but the mutations that evolved in the 492I populations together promote the infectivity of SARS-CoV-2 (Fig. 2a–f and 4a). The T492I-driven mutations in the spike protein also enhance the RBD-hACE2 binding affinity. Population genomic and phylogenetic analyses on historical data support the T492I-induced predisposition to the emergence of Omicron mutations. These suggest that the predisposing mutation NSP4 T492I may contribute importantly to the emergence and spread of Omicron variants. Historically, many T492I-bearing viruses went extinct before the spread of Omicron (Supplementary Fig. 4). Other host/environmental factors may also contribute importantly to the emergence and evolution of Omicron.
Elevated mutation rates contribute to the predisposition. The 492I populations have a 2 ~ 5 fold higher mutation rate in than the T492 populations, due to T492I-driven enhancement in viral replication. The regulation of RNA-editing enzymes also contributes to the predisposition, though causing impacts on mutation spectra. The potential tropism toward URT of high-frequency mutations in the 492I populations agrees with the Omicron predisposition of T492I-driven mutations and the tropism toward URT of the Omicron variants9,79. Reactive oxygen species (ROS) are proposed to be relevant to the G > T transition80,81. The reduction in G > T changes possibly resulted from the impairment of the activity of ROS by T492I. However, more mechanistic work is needed to understand the changes in the mechanism of deaminases to restrict viral replication82 through hypermutation and whether the removal of deaminases from cells may render the polymorphism redundant. The positive epistasis between T492I and adaptive mutations may predominantly contribute to the preposition to Omicron-like variants. Previous work revealed that the N501Y mutation is the predominant spike mutation that epistatically enables other affinity-enhancing mutations21. It has also been suggested that mutations in the spike protein and NSP6 determine the function of Omicron variants14. We provide experimental evidence that the NSP4 mutation T492I has a positive epistatic effect to the Omicron spike adaptive mutations on infectivity, transmission and immune evasion capability, which also coincides with increased frequency and early emergences of spike mutations driven by T492I (Fig. 7h and Supplementary Fig. 3f). These findings, together with the high overlap between the Omicron mutations and T492I-driven mutations, suggest that T492I may induce considerable epistatic shifts in the effect to the mutations at other sites. These may ultimately cause shifts in mutation spectra via positive selection toward mutation sites characteristic for the Omicron variant. In conclusion, we infer that T492I exerts positive epistasis interaction to adaptive mutations, regulates RNA-editing enzymes, and elevates mutation rates, which collectively alter the mutation spectra, induce Omicron-predisposing selection forces, and subsequently predispose the evolution of SARS-CoV-2 to Omicron-like variants (Fig. 1b).
This study mimicked an evolutionary process of SARS-CoV-2 and demonstrated the in vitro emergence of Omicron-like variants from the ancestors with T492I within 90 days. This suggests a potentially strong predisposition driven by T492I. Despite the close association between the spikes and viral invasion and the importance of the spikes in the viral life cycle, we infer that the processes of replication, assembly, and post-entry budding of the virus may also be important and more closely related to the NSPs. However, these do not exclude an important role of the spike protein and other proteins in the evolutionary process. Even though Omicron was historically first spotted in November 202183, three months after the global identified sample frequency (IF) of T492I reached 80% (August 2021, Supplementary Fig. 4g)15, there were Omicron-like variants that emerged as early as the beginning of the SARS-CoV-2 pandemic (Supplementary Fig. 4g and Early_Omicron_Like_variants.xlsx in Supplementary Data 1). Omicron-like variants repeatedly emerged in the strains with T492I and went to loss in the two years preceding the spread of Omicron variants. The loss of Omicron-like variants is possibly due to a low frequency in the population, which results in an uncompetitive relative fitness (the growth rate of a genotype relative to another genotype picked as the standard of comparison) and the purging by genetic drift. It is also possible that the adaptiveness of Omicron-like variants is comparably lower in 2020 than that in the late of 2021 and the time thereafter, owing to the herd effect. After the spread of T492I from the mid of 2021, Omicron mutations were introduced more intensively (Supplementary Fig. 4g), which may, together with the increases of the immunized proportion, result in a competitive relative fitness for Omicron-like variants and the spread of Omicron variants. Aside from the predisposition effect of T492I, the host/environmental factors (such as herd immunity) may be an important force that shapes the viral evolution, since the host/environmental factors induce selection on the mutation-driven phenotypes. For example, the increased global vaccination coverage15 may have enhanced the adaptiveness of Omicron-like or Omicron variants. The driver mutation and the host/environmental factors may jointly and interactively regulate the evolution of the SARS-CoV-2. The evolution of SARS-CoV-2 may also be the sum of many parts, resulting in the gain of fitness under many varying conditions. Despite the rapid epidemic expansion of Omicron after the spotting this VOC, the evolvement of the first Omicron may not have been rapid. As described above, multiple emerged Omicron-like variants went to loss in the two years after the pandemic of SARS-CoV-2. Moreover, there may be intrahost evolution for a long time before the emergence of Omicron in the community. For example, the cryptic waste water lineages84,85 may contribute importantly to the emergences of Omicron mutations. The appearance of Omicron-like lineages in the waste water, predating the emergence of Omicron variants, supports the capability of Omicron lineages to replicate and evolve in a very specific tissue environment (e.g. the intestine)86, suggesting a potential fitness gain possibly accelerated by T492I and other adaptive mutations to resist a high level of mature immune responses. Due to the possibility of missing historical samples12, further work is still needed to investigate the evolutionary origin and relevant detailed evolutionary mechanism of the Omicron variants.
Methods
Cell culture and infection
Resources utilized in this work are presented in Resource_Table.docx in Supplementary Data 1. Human lung epithelial Calu-3 cells (HTB-55, ATCC, MD, USA) and African green monkey kidney epithelial Vero E6 cells (CRL-1586, ATCC) were maintained at 37 °C with 5% CO2 in high-glucose Dulbecco’s modified Eagle’s medium (DMEM, Gibco, CA, USA) supplemented with 10% FBS (Gibco). The wild-type (USA_WA1/2020 SARS-CoV-2 sequence, GenBank accession No. MT020880) and Delta B.1.617.2 (EVAg: 009V-04187, European Virus Archive, www.european-virus-archive.com) SARS-CoV-2 viruses were generated via using a reverse genetic method as previously described15,16. The cells were infected at a multiplicity of infection (MOI) of 0.01 at the indicated time points. All SARS-CoV-2 live virus infection experiments were performed under biosafety conditions in the BSL-3 facility at the Institut für Virologie, Freie Universität Berlin, Germany in compliance with relevant institutional, national, and international guidelines.
Plaque assay
Approximately 5 × 105 cells were seeded into each well of 12-well plates and cultured at 37 °C under 5% CO2 for 12 h. eWT-T, eWT-I, eDelta-T, and eDelta-I viruses were serially diluted in DMEM with 2% FBS, and 100 μL aliquots were transferred to cell monolayers. After 1 h at 37 °C and 5% CO2, the inoculum was removed, and the cells were overlaid with 2X Eagle´s minimum essential medium (EMEM; Lonza™ BioWittaker™) containing 1.5% microcrystalline cellulose and carboxymethyl cellulose sodium (Vivapur 611p; JRS Pharma) or MEM containing 1.5% carboxymethyl cellulose sodium (Sigma Aldrich). Forty-eight hours after infection, the plates were washed with 1X PBS, fixed with 4% PBS-buffered formaldehyd,e and stained with 0.75% crystal violet. Visualization of the plaques was performed using a light box.
Generation of SARS-CoV-2 mutants
The genome of the SARS-CoV-2 WT/Delta strain was cloned into a bacterial artificial chromosome-yeast artificial chromosome (BAC-YAC) using TAR cloning in yeast87,88. Briefly, viral DNA fragments were generated by RT–PCR amplification of viral RNA extracted from viral strains, using the SuperScript IV One-Step RT–PCR System according to the manufacturer’s instructions. Synthetic SARS-CoV-2 DNA fragments were cloned into either pUC57 or pUC57mini vectors, containing regions homologous to the TAR vector pCC1BAC-His3. Following plasmid isolation with the QIAGEN Midiprep kit, each cloned fragment was sequence-verified by Sanger sequencing. The NSP4 mutation T492I/I492T was then introduced into cloned virus genomes by scarless mutagenesis89, using the primers listed in Supplementary Data 1. The correct introduction of the mutation was verified by commercial nanopore sequencing (Eurofins Genomics). To recover the WT or mutant viruses, BAC DNA was isolated from E. coli using the Plasmid Midi Kit (Qiagen) and transfected into Vero E6 cells with Lipofectamine 3000 (Thermo Fisher Scientific)90.
Sequencing, identification of mutations and relevant analyses
The virus in the wells of each run was extracted. cDNA synthesis and whole-genome amplification were subsequently performed. Pair-end genomic sequencing was performed via an Illumina NextSeq 2000 apparatus. NEB Ultra II RNA was used for non-directional (non-strand-specific) library preparation. RNA library construction was performed strictly according to the Product Manual (www.neb.com/en-us/-/media/nebus/files/manuals/manuale7770_e7775.pdf). The temperature/time/cycles for initial denaturation, denaturation, annealing/extension, final extension, and hold are 98 °C/30 sec/1, 98 °C/10 sec/7-15, 65 °C/75 sec/7-15, 65 °C/5 sec/1 and 4 °C/∞, respectively. Based on total RNA input amount, the number of PCR cycles were limited to avoid over amplification. The read length is 111 bp. The sequence read quality analyses were performed via FastQC (www.bioinformatics.babraham.ac.uk/projects/fastqc/). Quality improvement was performed via FastP91. We mapped the reads of eWT-I and eWT-T to the ancestor genome (GenBank accession No. MT020880) via the Burrows-Wheeler Aligner (BWA)92 and marked duplicated reads via GATK Mark Duplicates93. Similarly, we mapped the reads of eDelta-I and eDelta-T to the ancestor genome (EVAg: 009V-04187). The depths of coverage throughout the genome for these samples were mostly near 8000, both for the 90-day and 45-day runs (Supplementary Fig. 6h–k and Supplementary Data 1). The change in format from SAM to BAM was performed via SAMtools94. We performed mutation calling on the BAM files via freebayes44. The parameters were “-p 1 -C 1 -F 0.01 –pooled-continuous”. We counted the frequency of mutations in different runs on the basis of output VCF files and wrote Perl scripts to perform the format change and annotation, such as classifying mutation types and attributing nucleotide mutations to corresponding amino acid mutations. The Perl module BioPerl95 was used to translate nucleotides into amino acids. Comprehensive manual checks were also performed in the annotation work. For the difference between the ancestor Delta and WT, we identified Delta-specific reverse mutations (e.g. NSP13 L77P in eDelta) and WT-specific mutations (e.g. S D614G). The premutation state of NSP13 L77P in eDelta is not available in the ancestor of eWT. The postmutation state of S D614G is already available in the ancestor of eDelta. Moreover, convergent mutations were identified, whose postmutation state was shared by eDelta and eWT, but premutation states were different between eDelta and eWT (e.g. N M203K in eDelta and N R203K in eWT, Fig. 3h).
For the identification of T492I-driven mutations, we performed one-way ANOVA tests of the frequencies for the high-frequency mutations (with a > 0.5 frequency in one or more replicates) between the 492I and T492 runs. The cases with a significant higher frequency in 492I runs than in T492 runs were considered T492I-driven mutations. We evaluated the historical IFs of the T492I-driven mutations on the basis of the global genomic epidemiology data of SARS-CoV-2 provided by Nextstrain. The Omicron mutations with a > 0.9 identified IF in April 2024 were considered as fixed Omicron mutations. Using the VOC information provided by Nextstrain and the genomic sequences provided by GISAID96 (Supplementary Table 3), we built a dataset of mutations characteristic for the Alpha, Delta, and Omicron sublineages (VOC mutations). According to the dataset, we evaluated the coverage of T492I-driven mutations in VOC mutations and the coverage of VOC mutations in T492I-driven mutations. To evaluate the dominant strain in different evolved populations, we used TenSQR45 to reconstruct viral quasispecies from the sequencing data, and then curated the output from the results from freebayes. Based on these, we evaluated the fractions of fixed Omicron mutations in the reconstructed dominant strains of different populations. For exploring the relative trajectories each virus took and the relative diversity accumulated, the individual variant phylogenies for each resequencing experiment are combined by building a phylogenetic tree based on all the reconstructed viral sequences in the three replicates both for the 45-day and 90-day experiments. Based on the lineage information provided by Nextstrain46, we retrieved the Omicron spike mutations shared by Omicron sublineages (Supplementary Table 3) and built a list of Omicron spike mutations (Omicron_Spike_Muts.xlsx in Supplementary Data 1). We used the SNPs in the list to evaluate the counts of Omicron spike mutations in the reconstructed strains. The construction of phylogenetic trees, time-calibration and calculation of the relative diversity were performed via IQ-TREE297, the R package treedater98 and MEGA99.
For the statistics of the mutation counts in the strains with T492 and those with 492I, we downloaded the protein sequences of 9836814 SARS-CoV-2 samples from GISAID. The collection dates of these samples ranged from December 2019 to March 2022. Following the pipeline we previously used to perform epidemiological analyses15,16,24, we performed alignments of proteins between these SARS-CoV-2 strains and the wild-type reference (MT020880) via MUSCLE100,101, based on the global SARS-CoV-2 genomic data. Using the alignments and information of VOC from Nextstrain, we identified mutations and calculated the IFs of mutations and VOCs (Fig. 3c). To evaluate the association between T492I and the mutation rate, we compared the number of mutations (amino acid substitutions) that emerged early in the global pandemic of SARS-CoV-2 (from April 2020 to November 2020, Fig. 3c, d) between the T492 and 492I strains. The selection of this time course is to avoid the effect from contemporary VOCs. For example, the VOC Alpha began to spread throughout the world after November 2020 (Fig. 3c). The global historical statistics show that the 492I variants had a lower number of mutations than the T492 variants in the spread of the VOC Alpha (Fig. 3c), possibly due to the exclusion of T492I in the Alpha variants.
We used the Omicron spike mutation list (Supplementary Data 1), as described above, to evaluate the percentage of Omicron spike mutations, and the IFs of T492I and Omicron-like strains (with >50% Omicron spike mutations) are calculated through subdividing the weekly average counts of all sequenced samples. Omicron-like variants emerged intensively in six time points preceding the spread of Omicron variants (Supplementary Fig. 4g). For each time point, we focused on the time span that begins nearly one month before the collection time of the first emerged Omicron-like variant and terminates after that of the last emerged one, retrieved the genomic sequences of all the strains in the time span, and built a phylogenetic tree via FastTree102. We used the option -gamma to report a Gamma20-based likelihood. From the constructed phylogenetic trees, we retrieved the outgroup strains of the Omicron-like variants repeatedly until an early strain in the time span was retrieved. This was performed via the R package ggtree. Thereafter, we rebuilt time-calibrated phylogenetic trees via IQ-TREE2103 and treedater104.
Prediction of the phenotypic alterations conferred by T492I-driven mutations
Based on a published data52, we evaluated the impacts of T492I-driven mutations on normalized pseudo particle infection of CaCo-2 cells (Infection), levels of full-length spike in supernatants (S Sups) and cells (S Cells), levels of S2 Spike subunit in the supernatants (S2 Sups) and cells (S2 Cells), binding of spike to ACE2 (ACE2 interaction), automated quantification of syncytia formation in HEK293T cells expressing the indicated mutant S proteins and human ACE2 (Syncytia), and average TCID50 values obtained for neutralization of the indicated mutant S proteins by sera from five vaccinated individuals relative to those obtained for Hu-1 S (Serum) (Fig. 4a). Based on another published data53, we compared the infection performance of T492I-driven mutations and other mutations in 112 lentiviral-pseudotyped particles bearing the single-site RBD-mutated spike in H1299-expressing ACE2 orthologs (Fig. 4b).
Evolutionary analyses
Based on previous efforts105, we counted the mutation rate (r) according the formula as shown below.
| 1 |
Here f is the frequency of some mutation and ∑ f is the sum of the freqeuncies of all mutations in the genomic region, P is the number of transmission events, and L is the length of the genome. The median mutation rate in eWT is consistent with the mutation rate underlying the global diversity of SARS-CoV-2 (1 × 10−5–1 × 10−4 nt-1 T-1)106,107. On the basis of the identified mutations after our evolve-and-resequence experiments, we performed quantificaiton of the fractions of different types of substitutions, statistics on the fractions of four types of nucleotides near the substitution sites and comparisons between the T492 and 492I populations via one-way ANOVA tests. The principal component analyses (PCAs) of substitution fractions across populations was performed using the R package factoextra108. For the detection of evolutionary signatures,we used ANGSD69 to perform sliding window calculations of the genetic differentiation between the T492 and 492I runs. We also calculated the nucleotide diversity (π) and the values of Tajima’s D for all runs. The window size was 50 and the step size was 20. We piled the BAM files of all runs and performed sliding window analyses of the composite likelihood ratio (CLR) with a grid size of 300 via SweeD70. Furthermore, we utilized the tool CLEAR73 to estimate the population size, selection strength, and likelihood (H). In the calculation by CLEAR, we assumed that the evolved populations of the 45-day runs were the midway of those of the 90-day runs. Considering that the ancestral virus of each run was ~106 cells/ml in 500 µl and that the MOI was 0.01, we used 106*0.5*0.01 = 5000 as the initial viral population size and constructed the configure files for model parameter estimation.
For the identification of the positions (windows) with mutations characteristic for Omicron sublineages, we counted the number of Omicron mutations in the vicinity ( < 100 bp) of the middle of a window. If one or more Omicron mutations were identified, the position was considered to have Omicron mutations. In this way, we estimated the associations between Omicron mutations and selection parameters, such as H, CLR and the selection strength. To evaluate the selection force on the SARS-CoV-2 proteins in different populations, we used the tool SNPGenie109 to infer the nonsynonymous Pi and synonymous Pi (PiN/PiS) of the proteins.
To estimate the emergence times of the mutations in the dominant strains reconstructed, we performed multiple alignments of all reconstructed strains via Clustal Omega110. Then, maximum-likelihood trees were built via the GTR model and approximate Bayes tests111 were performed via IQ-TREE112. According to the split time of the dominant strain and other strains in the phylogenetic trees (Supplementary Data 1), we estimated the emergence times of the mutations. Specifically, the total incubation course was normalized to 1. If a dominant strain A (with mutations X1, X2 and X3) had outgroups B, C and D from near to far subsequently, X1 was not available in B, C and D; X2 was available in B but not available in C and D; and X3 was available in A, B and C but not available in D; then the estimated emergence time of X1, X2 and X3 are 1/4, 2/4, and 3/4, respectively. We wrote PERL scripts to evaluate the function of T492I-driven mutations on the basis of published records and performed the statistical analyses by R. The scripts for in silico analyses are provided (Pipline.zip in Supplementary Data 1).
Analyses of the expression of APOBEC and ADAR enzymes
Calu-3 and Vero-E6 cells were seeded into each well of 6-well plates and cultured at 37 °C under 5% CO2. SARS-CoV-2 variants (WT, T492I, Delta and I492T) were inoculated into a culture at an MOI of 5. After 12 h of infection, the inoculum was removed, and the culture was washed three times with PBS. Infectious cell lysates were harvested, and total RNA was subsequently extracted via a PureLink RNA Mini Kit (Thermo Fisher Scientific Inc., CA, USA). RT-PCR was performed via a SYBR PrimeScript RT-PCR Kit (Takara, Otsu, Shiga, Japan) and qRT-PCR was performed via a TB Green Premix ExTaq II Kit (Takara) on a Bio-Rad CFX-96 system (Bio-Rad, Hercules, CA, USA). Thermal cycling was performed at 95 °C for 30 s, followed by 39 cycles of 95 °C for 5 s, and 60 °C for 30 s. The sequences of primers used were listed in Primers.docx in Supplementary Data 1.
Viral subgenomic RNA assay and genomic RNA assay
Approximately 1 × 106 cells were seeded into 6-well plates and cultured in 5% CO2 at 37 °C for 12 h. The virus was serially diluted in DMEM containing 2% FBS, and 200 μL aliquots were added to the cells. After infection, total RNA from the infectious cell lysate was extracted via an RNeasy Mini Kit (QIAGEN, Hilden, Germany). RT-PCR was performed via an iTaq Universal SYBR Green One-Step Kit (Bio-Rad) and an ABI StepOnePlus PCR system (Thermo Fisher Scientific, CA, USA) according to the manufacturer’s instructions. The viral subgenomic RNA assay was performed with primers that target the envelope protein (E) gene and ORF1ab sequences. The primers were listed in Primers.docx in Supplementary Data 1.
RNA isolation and qRT-PCR
Total SARS-CoV-2 RNA was extracted by using the Analytik Jena Kit (QIAGEN, Hilden, Germany), followed by reverse transcription into cDNA with a high-capacity cDNA reverse transcription kit (Thermo Fisher Scientific). The quantification of mRNA levels was conducted via an iTaq Universal SYBR Green One-Step Kit (Bio-Rad). The assay was performed on an ABI StepOnePlus PCR system (Thermo Fisher Scientific). The primers used were listed in Primers.docx in Supplementary Data 1.
Validation of the infection performance of the evolved viruses in different species
The spike-expressing plasmid of evolved viruses, the packing plasmid, and the mNeonGreen reporter vector were cotransfected into HEK-293T cells to generate spike-bearing LVpps. The Calu-3 cell lines stably expressing ACE2 orthologs and SARS-CoV-2 variant-bearing LVpp were developed as follows. Briefly, the cDNA of ACE2 orthologs (human, monkey, hamster, rabbit, cat, pig, dog, horse, camel, ferret, bat and mouse) were synthesized, cloned, and inserted into the pCDH-CMV-MCS-EF1-RFP-T2A-Puro vector. The lentiviruses carrying ACE2 orthologs were produced in HEK-293T cells and were harvested to infect the Calu-3 cell lines. The stably-transduced cells were enriched via Puromycin selection. The spike-expressing plasmid of evolved viruses, the packing plasmid, and the mNeonGreen reporter vector were cotransfected into HEK-293T cells to generate SARS-CoV-2 variant-bearing LVpp. The p24 concentrations of viral stocks were determined via a p24 Rapid Titer Kit (Takara). For the LVpp infection assay, the Calu-3 cell lines stably expressing ACE2 orthologs were seeded into each well of 96-well plates and cultured at 37 °C under 5% CO2. After 16 h of culture, the cells were incubated with a virus inoculum of 10 ng p24. After 2 days of infection, the number of mNeonGreen-activated cells in each well was determined and expressed as the number of green-fluorescent units per well (GFU/well). The infection performance was examined in three independent biological replicates and the medians were displayed in Fig. 4d.
ELISA
The binding ability of the SARS-CoV-2 RBD to human ACE2 (hACE2) was detected by ELISA using the RayBio COVID-19 Spike-ACE2 Binding Assay Kit II. Different recombinant RBD proteins were added to hACE2-coated plates and incubated overnight at 4 °C with gentle shaking. The solution was subsequently discarded and 100 μL of 1×HRP-conjugated IgG antibody (1:500 dilution) was added to the plates. The reaction mixture was allowed to react for 1 h at room temperature. Finally, 100 µL of TMB One-Step Substrate Reagent was added to the plates. The mixture was incubated at room temperature for an additional 30 min, and then 50 μL of Stop Solution was added. The absorbance was immediately read at 405 nm.
Surface plasmon resonance and protein complex structure analysis
Recombinant hACE2 protein was immobilized on a CM5 Chip and different concentrations of SARS-CoV-2 RBD proteins were injected into the hACE2-immobilized flow cell. The KD was calculated via the steady-state affinity obtained for each concentration. The flow rate was 20 mL/min for 200 s and dissociation for 400 s. The structures of the hACE2 protein and SARS-CoV-2 RBD proteins were predicted by AlphaFold299, and the inactive sites were removed. ZDOCK was subsequently used to achieve protein-protein rigid docking, and RosettaDock113 was used to achieve protein-protein flexible docking. The preliminary conformation of the global docking was optimized with Rosetta114 in two rounds to obtain the final model of the hACE2/RBD complex. The binding interface of the complex was comprehensively characterized and systematically analyzed via the interaction analysis platforms PDBsum115 and PLIP116.
For the phylogenetic-based clustering analyses, on the basis of the OD and KD values from the ELLSA and SPR experiments, we used the dynamic clustering k-means clustering method. Specifically, we used the “elbow method” as the standard. We used the function “fviz_clust”, the contour coefficients and the WSS (sum of squared errors within clusters) to determine the optimal number of clusters. We used the function “kmeans” with the parameters “centers=4, start=50” to calculate the distance between each object and the cluster center, assign each object to the nearest cluster center, cluster the surrounding points, and then calculate the average value of each class. The calculated results were used as the classification points, and the above process was repeated continuously until the classification results converged. The clustering results were visualized via the R package Factoextra.
Evaluation of the epistasis between mutations
Replication, infectivity, and immune evasion capacity were the phenotypes examined to evaluate the epistasis between the NSP and spike mutations. The phenotype of a combination of mutations has a higher fitness relative to the wild type than the expected fitness (the sum in the fitness of independent mutations) is considered positive epistasis or synergism21,117. The analyses of deviance from the addition were performed via the R function glm and multi-way ANOVA tests. The Holm’s method was applied in adjustments for multiple comparisons. In the evaluation of the epistasis between T492I/N501Y and T492-driven Omicron spike RBD mutations, we selected five adaptive mutations with reported enhancement to viral transmissibility or enhanced infectivity118 (Omicron_Spike_Muts.xlsx in the Supplementary Data 1), and an increased and high frequency both in the 45-day and 90-day replicates of eDelta-I and eWT-I.
Quantitative and statistical analysis
Student’s t-tests, Fisher’s exact tests, Chi-square tests, Wilcoxon tests, Kolmogorov-Smirnov tests, correlation tests, ANOVA tests, and binomial tests were performed in R. We wrote Perl scripts to classify the strains into lineages and quantified the IFs of these lineages. Heatmaps, box-plots, scatter-plots, and raincloud plots were generated via the R libraries “gdata”, “ggplot2”, “RColorBrewer”, “scales”, “ggsci”, “RColorBrewer”, “ggridges”, “pheatmap”, “cowplot”, “dplyr”, “readr” and “Factoextra”.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Description of Additional Supplementary Files
Source data
Acknowledgements
We gratefully acknowledge the submitting and the originating laboratories where genetic sequence data were generated and shared via NCBI and the GISAID Initiative. Moreover, we especially appreciate Dr. Yong Zhang, Computational & Evolutionary Genomics Group, Institute of Zoology, CAS, for invaluable suggestions concerning the preparation of the manuscript. This work was supported by grants from the National Natural Science Foundation of China (92369115 and 82422048 for H.W., 92469110 for X.L., and 32170661 for Z.Z.), the SGC’s Rapid Response Funding for COVID-19 (C-0002 for H.W.), the Chongqing Municipal Science and Health Joint Medical Research Project Key Project (2024ZYD009 for Z.Z.) and the Fundamental Research Funds for the Central Universities (2023CDJXY-009 for Z.Z.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author contributions
Z.Z., C.Z., M.T., and C.J. performed the in silico analyses. H.W., J.T., Z.S., J.A., R.V., C.L., B.F., Y.X., and D.K. performed the experiments. X.L., H.W., H.Z., X.Z., Q.L., J.Y., X.X.L., S.W., and X.M. performed the protein structure analysis. Z.Z. conceived the idea. Z.Z., H.W., J.T., and Q.Z. managed the project. Z.Z., Q.Z., H.W., J.T., and X.L. wrote the manuscript and coordinated the project.
Peer review
Peer review information
Nature Communications thanks John-Sebastian Eden, and the other, anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.
Data availability
The historical sequence data and relevant information used in this study are available from GISAID (Mar 31, 2022) [https://gisaid.org/], CoVdb (v3) [http://covdb.popgenetics.net/v3/] and Nextstrain (v2.61.2) [https://nextstrain.org/ncov/gisaid]. The GISAID accession numbers of the accessed sequences in this study are archived on Figshare [10.6084/m9.figshare.29566235]. The accession numbers and relevant information of other accessed sequences in this study are available in Data 1. The original sequencing data of the evolve-and-resequence experiments are available in the NCBI SRA database under a BioProject accession code PRJNA1139330. All other data supporting the results of the study are provided in this article and the supplementary files. Source data are provided with this paper.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Xiaoyuan Lin, Zhou Sha, Chunlin Zhang.
Contributor Information
Jakob Trimpert, Email: jtrimpert@vet.k-state.edu.
Haibo Wu, Email: hbwu023@cqu.edu.cn.
Quanming Zou, Email: qmzou2007@163.com.
Zhenglin Zhu, Email: zhuzl@cqu.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-025-62300-0.
References
- 1.Wang, X., Lu, L. & Jiang, S. SARS-CoV-2 evolution from the BA.2.86 to JN.1 variants: unexpected consequences. Trends immunol.45, 81–84 (2024). [DOI] [PubMed] [Google Scholar]
- 2.Kimura, I. et al. Virological characteristics of the SARS-CoV-2 Omicron BA.2 subvariants, including BA.4 and BA.5. Cell185, 3992–4007.e3916 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jung, C. et al. Omicron: What makes the latest SARS-CoV-2 variant of concern so concerning? J. Virol. 96, e0207721 (2022). [DOI] [PMC free article] [PubMed]
- 4.Carabelli, A. M. et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nat. Rev. Microbiol.21, 162–177 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhou, Y., Zhi, H. & Teng, Y. The outbreak of SARS-CoV-2 omicron lineages, immune escape, and vaccine effectivity. J. Med. Virol.95, e28138 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rizvi, Z. A. et al. Omicron sub-lineage BA.5 infection results in attenuated pathology in hACE2 transgenic mice. Commun. Biol.6, 935 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kim, S. et al. Binding of human ACE2 and RBD of Omicron enhanced by unique interaction patterns among SARS-CoV-2 variants of concern. J. Computat. Chem.44, 594–601 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shah, M. & Woo, H. G. Omicron: a heavily mutated SARS-CoV-2 variant exhibits stronger binding to ACE2 and potently escapes approved COVID-19 therapeutic antibodies. Front. Immunol.12, 830527 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liu, Y. Attenuation and degeneration of SARS-CoV-2 despite adaptive evolution. Cureus15, e33316 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Elliott, P. et al. Rapid increase in omicron infections in england during december 2021: REACT-1 study. Science375, 1406–1411 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Viana, R. et al. Rapid epidemic expansion of the SARS-CoV-2 omicron variant in southern africa. Nature603, 679–686 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mallapaty, S. Where did omicron come from? Three key theories. Nature602, 26–28 (2022). [DOI] [PubMed] [Google Scholar]
- 13.Du, P., Gao, G. F. & Wang, Q. The mysterious origins of the omicron variant of SARS-CoV-2. Innovation3, 100206 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen, D. Y. et al. Spike and nsp6 are key determinants of SARS-CoV-2 Omicron BA.1 attenuation. Nature615, 143–150 (2023). [DOI] [PubMed] [Google Scholar]
- 15.Lin, X. et al. The NSP4 T492I mutation increases SARS-CoV-2 infectivity by altering non-structural protein cleavage. Cell Host Microbe31, 1170–1184.e1177 (2023). [DOI] [PubMed] [Google Scholar]
- 16.Wu, H. et al. Nucleocapsid mutations R203K/G204R increase the infectivity, fitness, and virulence of SARS-CoV-2. Cell Host Microbe29, 1788–1801 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Plante, J. A. et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature592, 116–121 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xie, K. T. et al. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science363, 81–84 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Monroe, J. G. et al. Mutation bias reflects natural selection in arabidopsis thaliana. Nature602, 101–105 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein sci.25, 1204–1218 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Starr, T. N. et al. Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science377, 420–424 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zahradnik, J. et al. SARS-CoV-2 variant prediction and antiviral drug design are enabled by RBD in vitro evolution. Nat. Microbiol.6, 1188–1198 (2021). [DOI] [PubMed] [Google Scholar]
- 23.Bate, N. et al. In vitro evolution predicts emerging SARS-CoV-2 mutations with high affinity for ACE2 and cross-species binding. PLoS Pathog.18, e1010733 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhu, Z. et al. Rapid spread of mutant alleles in worldwide SARS-CoV-2 strains revealed by genome-wide single nucleotide polymorphism and variation analysis. Genome Biol. Evol. 13, evab015 (2021). [DOI] [PMC free article] [PubMed]
- 25.Ricciardi, S. et al. The role of NSP6 in the biogenesis of the SARS-CoV-2 replication organelle. Nature606, 761–768 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.V’Kovski, P., Kratzel, A., Steiner, S., Stalder, H. & Thiel, V. Coronavirus biology and replication: implications for SARS-CoV-2. Nat. Rev. Microbiol.19, 155–170 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zimmermann, L. et al. SARS-CoV-2 nsp3 and nsp4 are minimal constituents of a pore spanning replication organelle. Nat. Commun.14, 7894 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bansal, K. & Kumar, S. Mutational cascade of SARS-CoV-2 leading to evolution and emergence of omicron variant. Virus Res.315, 198765 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sun, Y., Lin, W., Dong, W. & Xu, J. Origin and evolutionary analysis of the SARS-CoV-2 Omicron variant. J. Biosaf. Biosecur.4, 33–37 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ruan, Y. et al. The runaway evolution of SARS-CoV-2 Leading to the highly evolved Delta strain. Mol Biol Evol. 39, msac046 (2022). [DOI] [PMC free article] [PubMed]
- 31.Temin, H. M. Evolution of cancer genes as a mutation-driven process. Cancer Res.48, 1697–1701 (1988). [PubMed] [Google Scholar]
- 32.Weiss, K. Mutation-driven evolution. Am. J. Hum. Genet.93, 999–1000 (2013). [Google Scholar]
- 33.Jackson, S. P. & Bartek, J. The DNA-damage response in human biology and disease. Nature461, 1071–1078 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Rodriguez-Meira, A. et al. Single-cell multi-omics identifies chronic inflammation as a driver of TP53-mutant leukemic evolution. Nat. Genet.55, 1531–1541 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kishtagari, A., Levine, R. L. & Viny, A. D. Driver mutations in acute myeloid leukemia. Curr. Opin. Hematol.27, 49–57 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Filipowicz, N. et al. Comprehensive cancer-oriented biobanking resource of human samples for studies of post-zygotic genetic variation involved in cancer predisposition. PLoS One17, e0266111 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kousathanas, A. et al. Whole-genome sequencing reveals host factors underlying critical COVID-19. Nature607, 97–103 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sackman, A. M. et al. Mutation-driven parallel evolution during viral adaptation. Mol. Biol. Evol.34, 3243–3253 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Paul, S., Minnick, M. F. & Chattopadhyay, S. Mutation-driven divergence and convergence indicate adaptive evolution of the intracellular human-restricted pathogen, Bartonella bacilliformis. PLoS Negl. Trop. Dis.10, e0004712 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Horton, J. S. & Taylor, T. B. Mutation bias and adaptation in bacteria. Microbiology169, (2023). [DOI] [PMC free article] [PubMed]
- 41.Wardell, C. P. et al. Genomic characterization of biliary tract cancers identifies driver genes and predisposing mutations. J. Hepatol.68, 959–969 (2018). [DOI] [PubMed] [Google Scholar]
- 42.Grossmann, P., Cristea, S. & Beerenwinkel, N. Clonal evolution driven by superdriver mutations. BMC Evol. Biol.20, 89 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Perez-Perez, J. M., Candela, H. & Micol, J. L. Understanding synergy in genetic interactions. Trends Genet.25, 368–376 (2009). [DOI] [PubMed] [Google Scholar]
- 44.Garrison, E. P. & Marth, G. T. Haplotype-based variant detection from short-read sequencing. arXiv1207, 3907 (2012).
- 45.Ahn, S., Ke, Z. & Vikalo, H. Viral quasispecies reconstruction via tensor factorization with successive read removal. Bioinformatics34, i23–i31 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics34, 4121–4123 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Aoki, A. et al. Discrimination of SARS-CoV-2 Omicron sublineages BA.1 and BA.2 using a high-resolution melting-based assay: a pilot study. Microbiol. Spectr.10, e0136722 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kong, X. et al. Discrimination of SARS-CoV-2 omicron variant and its lineages by rapid detection of immune-escape mutations in spike protein RBD using asymmetric PCR-based melting curve analysis. Virol. J.20, 192 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Syed, A. M. et al. Omicron mutations enhance infectivity and reduce antibody neutralization of SARS-CoV-2 virus-like particles. Proc. Natl Acad. Sci. USA A119, e2200592119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Yamasoba, D. et al. Virological characteristics of the SARS-CoV-2 Omicron BA.2 spike. Cell185, 2103–2115.e2119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Willett, B. J. et al. SARS-CoV-2 Omicron is an immune escape variant with an altered cell entry pathway (vol 7, pg 1161, 2022). Nat. Microbiol.7, 1709–1709 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pastorio, C. et al. Determinants of Spike infectivity, processing, and neutralization in SARS-CoV-2 Omicron subvariants BA.1 and BA.2. Cell Host Microbe30, 1255–1268.e1255 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhang, Y. et al. Cross-species tropism and antigenic landscapes of circulating SARS-CoV-2 variants. Cell Rep.38, 110558 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bignon, E., Marazzi, M., Grandemange, S. & Monari, A. Autophagy and evasion of the immune system by SARS-CoV-2. Structural features of the non-structural protein 6 from wild type and Omicron viral strains interacting with a model lipid bilayer. Chem. Sci.13, 6098–6105 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Thorne, L. G. et al. Evolution of enhanced innate immune evasion by SARS-CoV-2. Nature602, 487–495 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ren, W. et al. Evolution of immune evasion and host range expansion by the SARS-CoV-2 B.1.1.529 (Omicron) variant. mBio14, e0041623 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Patil, S. et al. Receptor binding domain of SARS-CoV-2 from Wuhan strain to Omicron B.1.1.529 attributes increased affinity to variable structures of human ACE2. J. Infect. Public Health15, 781–787 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Bloom, J. D., Beichman, A. C., Neher, R. A. & Harris, K. Evolution of the SARS-CoV-2 mutational spectrum. Mol. Biol. Evol. 40, msad085 (2023). [DOI] [PMC free article] [PubMed]
- 59.Ruis, C. et al. Mutational spectra distinguish SARS-CoV-2 replication niches. bioRxiv2022.09.27, 509649 (2022).
- 60.Carter, R. W. & Sanford, J. C. A new look at an old virus: patterns of mutation accumulation in the human H1N1 influenza virus since 1918. Theor. Biol. Med. Model. 9, 42 (2012). [DOI] [PMC free article] [PubMed]
- 61.Pathak, V. K. & Temin, H. M. Broad spectrum of in vivo forward mutations, hypermutations, and mutational hotspots in a retroviral shuttle vector after a single replication cycle: substitutions, frameshifts, and hypermutations. Proc. Natl Acad. Sci. USA87, 6019–6023 (1990). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Vartanian, J. P., Meyerhans, A., Asjo, B. & Wain-Hobson, S. Selection, recombination, and G to A hypermutation of human immunodeficiency virus type 1 genomes. J. Virol.65, 1779–1788 (1991). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kim, K. et al. The roles of APOBEC-mediated RNA editing in SARS-CoV-2 mutations, replication and fitness. Sci. Rep.12, 14972 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Di Giorgio, S., Martignano, F., Torcia, M. G., Mattiuz, G. & Conticello, S. G. Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2. Sci. Adv. 6, eabb5813 (2020). [DOI] [PMC free article] [PubMed]
- 65.Ratcliff, J. & Simmonds, P. Potential APOBEC-mediated RNA editing of the genomes of SARS-CoV-2 and other coronaviruses and its impact on their longer term evolution. Virology556, 62–72 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Salter, J. D., Bennett, R. P. & Smith, H. C. The APOBEC protein family: united by structure, divergent in function. Trends Biochem. Sci.41, 578–594 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ruis, C. et al. Mutational spectra are associated with bacterial niche. Nat. Commun.14, 7091 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ruis, C. et al. A lung-specific mutational signature enables inference of viral and bacterial respiratory niche. Microb. Genom. 9, mgen001018 (2023). [DOI] [PMC free article] [PubMed]
- 69.Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinforma.15, 356 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Pavlidis, P., Zivkovic, D., Stamatakis, A. & Alachiotis, N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol. Biol. Evol.30, 2224–2234 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Nielsen, R. et al. Genomic scans for selective sweeps using SNP data. Genome Res.15, 1566–1575 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Zhu, L. & Bustamante, C. D. A composite-likelihood approach for detecting directional selection from DNA sequence data. Genetics170, 1411–1421 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Iranmehr, A., Akbari, A., Schlotterer, C. & Bafna, V. Clear: composition of likelihoods for evolve and resequence experiments. Genetics206, 1011–1023 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hedrick, P. W. Balancing selection. Curr. Biol.17, R230–R231 (2007). [DOI] [PubMed] [Google Scholar]
- 75.Yuan, S., Balaji, S., Lomakin, I. B. & Xiong, Y. Coronavirus Nsp1: Immune response suppression and protein expression inhibition. Front. Microbiol.12, 752214 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Zandi, M. ORF9c and ORF10 as accessory proteins of SARS-CoV-2 in immune evasion. Nat. Rev. Immunol.22, 331–331 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Zaffagni, M. et al. SARS-CoV-2 Nsp14 mediates the effects of viral infection on the host cell transcriptome. Elife11, e71945 (2022). [DOI] [PMC free article] [PubMed]
- 78.Warger, J. & Gaudieri, S. On the evolutionary trajectory of SARS-CoV-2: host immunity as a driver of adaptation in RNA viruses. Viruses15, 70 (2022). [DOI] [PMC free article] [PubMed]
- 79.Hui, K. P. Y. et al. SARS-CoV-2 omicron variant replication in human bronchus and lung ex vivo. Nature603, 715–720 (2022). [DOI] [PubMed] [Google Scholar]
- 80.Kockler, Z. W. & Gordenin, D. A. From RNA world to SARS-CoV-2: the edited story of RNA viral evolution. Cells10, 1557 (2021). [DOI] [PMC free article] [PubMed]
- 81.Graudenzi, A., Maspero, D., Angaroni, F., Piazza, R. & Ramazzotti, D. Mutational signatures and heterogeneous host response revealed via large-scale characterization of SARS-CoV-2 genomic diversity. iScience24, 102116 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Milewska, A. et al. APOBEC3-mediated restriction of RNA virus replication. Sci. Rep.8, 5960 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Madhi, S. A. et al. Population immunity and Covid-19 severity with omicron variant in south Africa. N. Engl. J. Med.386, 1314–1326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Karthikeyan, S. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature609, 101–108 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Shafer, M. M. et al. Tracing the origin of SARS-CoV-2 omicron-like spike sequences detected in an urban sewershed: a targeted, longitudinal surveillance study of a cryptic wastewater lineage. Lancet Microbe5, e335–e344 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Heller, L., Mota, C. R. & Greco, D. B. COVID-19 faecal-oral transmission: Are we asking the right questions?. Sci. Total Environ.729, 138919 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Noskov, V. et al. A genetic system for direct selection of gene-positive clones during recombinational cloning in yeast. Nucleic Acids Res.30, E8 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Thi Nhu Thao, T. et al. Rapid reconstruction of SARS-CoV-2 using a synthetic genomics platform. Nature582, 561–565 (2020). [DOI] [PubMed] [Google Scholar]
- 89.Tischer, B. K., Smith, G. A. & Osterrieder, N. En passant mutagenesis: a two step markerless red recombination system. Methods Mol. Biol.634, 421–430 (2010). [DOI] [PubMed] [Google Scholar]
- 90.Trimpert, J. et al. Development of safe and highly protective live-attenuated SARS-CoV-2 vaccine candidates by genome recoding. Cell Rep.36, 109493 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Chen, S. F. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta2, e107 (2023). [DOI] [PMC free article] [PubMed]
- 92.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Stajich, J. E. et al. The bioperl toolkit: Perl modules for the life sciences. Genome Res.12, 1611–1618 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 22, 30494 (2017). [DOI] [PMC free article] [PubMed]
- 97.Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol.28, 2731–2739 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Didelot, X., Siveroni, I. & Volz, E. M. Additive uncorrelated relaxed clock models for the dating of genomic epidemiology phylogenies. Mol. Biol. Evol.38, 307–317 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma.5, 113 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res.32, 1792–1797 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One5, e9490 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol.37, 1530–1534 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Volz, E. M. & Frost, S. D. W. Scalable relaxed clock phylogenetic dating. Virus Evol.3, vex025 (2017). [Google Scholar]
- 105.Amicone, M. et al. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol. Med. Public Health10, 142–155 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Van Egeren, D. et al. Risk of rapid evolutionary escape from biomedical interventions targeting SARS-CoV-2 spike protein. PLoS One16, e0250780 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.van Dorp, L. et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol.83, 104351 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Liu, T. et al. Comprehensive analyses of genome-wide methylation and RNA epigenetics identify prognostic biomarkers, regulating the tumor immune microenvironment in lung adenocarcinoma. Pathol. Res. Pract.248, 154621 (2023). [DOI] [PubMed] [Google Scholar]
- 109.Nelson, C. W., Moncla, L. H. & Hughes, A. L. SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data. Bioinformatics31, 3709–3711 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Sievers, F. & Higgins, D. G. Clustal omega. Curr. Protoc. Bioinforma.48, 3 13 11–13 13 16 (2014). [DOI] [PubMed] [Google Scholar]
- 111.Anisimova, M., Gil, M., Dufayard, J. F., Dessimoz, C. & Gascuel, O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst. Biol.60, 685–699 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol.32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput.13, 3031–3048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Das, R. & Baker, D. Macromolecular modeling with Rosetta. Annu. Rev. Biochem.77, 363–382 (2008). [DOI] [PubMed] [Google Scholar]
- 115.Laskowski, R. A., Jablonska, J., Pravda, L., Varekova, R. S. & Thornton, J. M. PDBsum: Structural summaries of PDB entries. Protein Sci.27, 129–134 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Salentin, S., Schreiber, S., Haupt, V. J., Adasme, M. F. & Schroeder, M. PLIP: fully automated protein-ligand interaction profiler. Nucleic Acids Res.43, W443–W447 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Gros, P. A., Le Nagard, H. & Tenaillon, O. The evolution of epistasis and its links with genetic robustness, complexity and drift in a phenotypic model of adaptation. Genetics182, 277–293 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Pavia, G. et al. Persistence of SARS-CoV-2 infection and viral intra- and inter-host evolution in COVID-19 hospitalized patients. J. Med. Virol.96, e29708 (2024). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Files
Data Availability Statement
The historical sequence data and relevant information used in this study are available from GISAID (Mar 31, 2022) [https://gisaid.org/], CoVdb (v3) [http://covdb.popgenetics.net/v3/] and Nextstrain (v2.61.2) [https://nextstrain.org/ncov/gisaid]. The GISAID accession numbers of the accessed sequences in this study are archived on Figshare [10.6084/m9.figshare.29566235]. The accession numbers and relevant information of other accessed sequences in this study are available in Data 1. The original sequencing data of the evolve-and-resequence experiments are available in the NCBI SRA database under a BioProject accession code PRJNA1139330. All other data supporting the results of the study are provided in this article and the supplementary files. Source data are provided with this paper.







