Distribution of intra‐host variations and mutations in the genomes of SARS‐CoV‐2 and their implications on detection and therapeutics

Dongyan Xiong; Xiaoxu Zhang; Junping Yu; Hongping Wei

doi:10.1002/mco2.186

. 2022 Dec 2;3(4):e186. doi: 10.1002/mco2.186

Distribution of intra‐host variations and mutations in the genomes of SARS‐CoV‐2 and their implications on detection and therapeutics

Dongyan Xiong ^1,², Xiaoxu Zhang ¹, Junping Yu ^1,^2,^✉, Hongping Wei ^1,^2,^✉

PMCID: PMC9717708 PMID: 36474856

Abstract

The ongoing circulation of SARS‐CoV‐2 variants of concern (VOCs) has caused global concerns, because VOCs could escape current vaccines, antiviral drugs, and diagnosis. Analyzing mutations and intra‐host diversities in different and widespread VOCs can provide important insights to virus adaptive evolution and validity of vaccines, antiviral drugs, and diagnosis. In this study, by analyzing 1744 high‐throughput sequencing data for intra‐host single‐nucleotide variations (iSNVs) and 3,668,205 genome sequences for mutations in different VOCs, it was found that Omicron variant is still evolving at high speed, especially having high iSNVs frequency in its S and N genes. The efficacies of antibodies or detection primers targeting these two genes are at high risks to be invalid. Instead, highly conserved regions such as NSP8 gene could be better therapeutic and detection targets. Furthermore, mutations in later VOCs could be traced to the minor alleles in the previous variant samples such as Alpha and Delta in different countries. Finally, it was found that mutations C14408T in RdRp and A18163G in NSP14 gene might be associated with the higher genetic diversity in Omicron. Our findings not only contribute to understanding the adaptive evolution of SARS‐CoV‐2 VOCs, but also provide useful information for both drugs and diagnostic kits development.

Keywords: adaptive evolution, antiviral drugs, genetic polymorphism, intra‐host diversity, PCR detection, SARS‐CoV‐2 VOCs, vaccine

The distributions of mutations and iSNVs in the genomes of SARS‐CoV‐2 VOCs were characterized. The potential correlation between mutations and iSNVs of SARS‐CoV‐2 VOCs was evaluated. Analysis results of both mutations and iSNVs provided some helpful implications to SARS‐CoV‐2 detection and therapeutics.

graphic file with name MCO2-3-e186-g003.jpg

1. INTRODUCTION

It has been over 2 years since COVID‐19 became the pandemic. As severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) rapidly and continuously evolves, more than 50,000 lineages had been classified (https://cov‐lineages.org/). ¹ Among these lineages, three variants of concern (VOCs), Alpha (B.1.1.7 and Q.x), Delta (B.1.617.2 and AY.x), and Omicron (B.1.1.529 and BA.x), were dominant in different stages of the pandemic and caused millions of infections. The current global spread VOC (Omicron variant) has a reduced binding capability to many antibodies because of the heavy mutations, such as S375F, N501Y in Spike protein. ² With the ongoing evolution of the virus, whether the current PCR detection primers, vaccines, or antiviral agents are effective has become urgent concern. Currently, many studies focus on the mutations landscape to evaluate the impact of mutations on viral infectivity, antigenicity, genomic diversity, and the effectiveness of antibodies, vaccines, or diagnosis kits against the variants. ³ , ⁴ , ⁵ , ⁶ However, the landscape of the mutations was constructed based on the previous genome data, which lag behind the constantly evolving virus. It is necessary to understand the adaptively evolutionary features of current virus so that the variable regions in its genome can be determined and the efficacy of vaccines or antiviral agents can be confidently evaluated.

RNA viruses have been evolutionarily sharp tuning the replication fidelity to balance the genetic diversity and stability. ⁷ , ⁸ The replication of RNA viruses with low fidelity is related to the rapid evolution and the adaptation to constantly changing environmental pressures. ⁹ Therefore, RNA viruses often generate a large number of minor mutations within the host. Some of these intra‐host mutations have a positive impact on the virological function. ¹⁰ After a period of time, these positive minor mutations can change into the major mutations by competing with other strains. ¹¹ , ¹² Thus, the intra‐host diversity of RNA virus is usually looked as a key marker for understanding the adaptive evolution of the viruses within the host, ¹³ , ¹⁴ and it is often characterized by intra‐host single‐nucleotide variations (iSNVs) or minor alleles. ¹⁵ However, millions of SARS‐CoV‐2 genome sequences available from the public database disguised the minor mutations spectra within the host, limiting our understanding of the intra‐host evolution of SARS‐CoV‐2. Currently, there are some large‐scale studies on the SARS‐CoV‐2 within‐host genomic diversity. ¹¹ , ¹⁵ For example, Lythgoe et al. performed large‐scale sequencing and found that most mutations generated from virus in the early stage of the pandemic were lost, and few mutations were fixed. ¹⁶ And Tonkin‐Hill et al. also found that the within‐ and between‐host diversities were under a strong purifying selection in the early variants. ¹⁷ However, as the pandemic continues and the novel variants such as Omicron appear, it is meaningful to compare the distribution of the minor mutations in different SARS‐CoV‐2 variants to provide a new insight to variants’ adaptive evolution. Up until now, there is always a certain SARS‐CoV‐2 VOC that dominates the global transmission in different time periods. Investigating the distribution of the intra‐host variation in the genomes of different VOCs would not only help us better understand the adaptive evolution of the widespread SARS‐CoV‐2 variants but also assist us to evaluate the effectiveness of diagnosis kits, vaccines, and drugs against the variants in the future.

In this study, we performed large‐scale investigations to characterize the distributions of both mutations and iSNVs in the genomes of SARS‐CoV‐2 VOCs. The S and N genes were found most variable in the genomes of Omicron variant, which may accelerate the adaptive evolution of this variant. Therefore, some vaccines and PCR primers targeting these two genes are at a high risk of being invalid or showing reduced efficiency in the future. On the contrary, NSP8 gene might be an ideal therapeutic target because of the lowest frequency of mutations and iSNVs in this gene. Our findings not only provide useful information to understand the adaptive evolution of SARS‐CoV‐2 VOCs, but also find some conserved genes, which might be better targets for novel vaccines or drugs development.

2. RESULTS

2.1. Summary of the datasets used for analysis

Public high‐throughput sequencing raw data (n = 1744) of the samples from COVID‐19 patients were collected and used for iSNVs analysis. These data included eight independent SARS‐CoV‐2 genomics sequencing datasets, two of which belonged to our previous work, ¹⁸ , ¹⁹ and other six datasets were downloaded from the public database NCBI and EBI with accession numbers: PRJNA633948, PRJNA639864, PRJNA784038, PRJNA817870, PRJEB46969, and PRJEB47264. Among the 1744 sequencing raw data, only 258 samples had the best genome‐wide sequencing quality (total sequencing coverage more than 95%, and each sequenced position had depth over 100×), 1465 samples had low sequencing quality in some regions of their genomes (total sequencing coverage less than 95% or the number of positions with sequencing depth below 100× was greater than 1000), and the rest 21 samples had low coverage and failed to get the whole genomes (total sequencing coverage smaller than 10%). The detailed data are available in the Supporting Information (Table S1), and the summary of the 258 well‐sequenced samples was listed in Table 1. The average sequencing error rate 0.001 was used to determine iSNVs.

TABLE 1.

Summary of the well‐sequenced SARS‐CoV‐2 samples for iSNV analysis

Variant	Number of samples	Source	Time
B.1/A	76	China, Australia	2020.01–2020.03
Alpha	89	Lebanon	2021.02–2021.04
Beta	20	Pakistan, Australia	2021.06–2021.07
Delta	25	Pakistan, Australia	2021.07–2021.11
Omicron	48	South Africa, France	2021.11–2022.02

Open in a new tab

Besides, another 3,668,205 high‐quality genomes (genome length >29 kbp, number of unknown bases <30, and number of degenerate bases <70) of SARS‐CoV‐2 from GISAID were used to analyze the distribution of mutations and unique mutations specific to each SARS‐CoV‐2 VOC (Table S2).

2.2. Characterization of iSNVs and common mutations in SARS‐CoV‐2 VOCs

The well‐sequenced 258 NGS raw data were selected for iSNVs screening. In summary, we calculated and sorted the frequencies of four bases at each base site of the genome. The base with the second highest frequency is seen as the potential iSNV (minor allele) at the site. Only the qualified minor alleles with the minor allele frequency (MAF) value larger than 0.02, which is about 20 times larger than the sequencing error rate, was considered as the iSNVs. The detailed identification of iSNVs screening can be seen in the method selection.

As shown in Figure 1A and Table S3, the ancestral lineage (B.1/A) of SARS‐CoV‐2 has the lowest number of iSNVs with an average of 41, while the Omicron variant has the maximum number of iSNVs with the average number of 223. Although most of the iSNVs belong to non‐synonymous (non‐syn) mutations, the results of K _a/K _s analysis revealed that most iSNVs in the coding regions of SARS‐CoV‐2 variants were under the purifying selection (K _a/K _s values <1, Table S3 sheet B). Besides, we found that the NSP3 gene had the maximum number of iSNVs in the ancestral lineage (Figure 1B), as it is the longest gene in the SARS‐CoV‐2 genome. Interestingly, the number of iSNVs in individual genes of Alpha, Beta, and Delta variants also correlate to the gene lengths, but all are higher than those of the ancestral lineage B.1/A (Figure 1B–E). However, this correlation was decreased in the Omicron variant because some genes such as S, RdRp, and N have significantly higher number of iSNVs (Figure 1F) than those of the previous VOCs. Generally, among the genes, S and NSP3 genes were the top two iSNVs‐rich genes, while NSP8 and ORF10 genes have the least iSNVs (Table S3 sheet B). More detailed information, such as iSNV number, iSNV density, and K _a/K _s values are available in Table S3 sheet B.

Distribution of iSNVs and mutations in SARS‐CoV‐2 variants. (A) Total number of iSNVs and non‐synonymous (non‐syn) mutated iSNVs with minor allele frequency (MAF) larger than 0.02 in each variant. (B–F) Correlation analysis between iSNV number and gene length in different variants of concern (VOCs) (B: B.1/A, C: Alpha, D: Beta, E: Delta, F: Omicron), the top four iSNV‐rich genes were named in the figures. (G) Total number of common mutations and non‐syn common mutations in each variant. (H–L) Correlation analysis between mutation number and gene length in different VOCs (H: B.1/A, I: Alpha, J: Beta, K: Delta, L: Omicron), the top four mutated genes were marked. In details, the average number of iSNVs in each gene of SARS‐CoV‐2 VOCs was calculated as the (total number of iSNVs in a certain gene)/(total number of the SARS‐CoV‐2 VOC genomes). And the average number of mutations in each gene of SARS‐CoV‐2 VOCs was calculated as the (total number of mutations in a certain gene)/(total number of the SARS‐CoV‐2 VOC genomes).

Furthermore, the common mutations (mutations present in over 50% isolates of a certain SARS‐CoV‐2 lineage) of each SARS‐CoV‐2 variant were identified based on the large‐scale genomes data. As shown in Figure 1 and Table S3, similar to iSNVs, the Omicron variant contains more mutations than other VOCs. Analyzing results of the mutations in different genes showed that S, N, and NSP3 genes were the three most highly variable genes in different VOCs. Further analysis indicated that the mutations in the S gene of SARS‐CoV‐2 VOCs were under the positive selection (K _a/K _s values >1, Table S3 sheet B). The other genes, such as NSP8 and ORF10 genes, are the most conserved (Table S3 sheet B). RdRp gene has fewer mutations than iSNVs in different VOCs (Table S3 sheet B). More detailed information, such as mutation number, mutation density, and K _a/K _s values, is available in Table S3 sheet B.

2.3. Evaluation of relationship between iSNVs and mutations

During analysis of the distributions of iSNVs and common mutations in different VOCs, we found that many common mutations of the SARS‐CoV‐2 variants were present as iSNVs (minor allele) in previous sequencing datasets. For example, two unique mutations C22686T (S375F) and T23075C (Y505H) were in the receptor binding domain (RBD) of Omicron variant and reported to be associated with decreased protein stability and increased risks of infectivity. ²⁰ As shown in Figure 2A and Table S4, the mutation S375F presented as the iSNV in some Alpha isolates emerged in early 2021, and the mutated minor allele frequency (MuAF) of this iSNV increased over time in some later Alpha and Delta strains. Similarly, the unique mutation Y505H identified as a minor allele in some isolates emerged at the end of 2020. Later, the MuAF of this allele significantly increased in the Delta strains emerged in November 2021 (Figure 2B). Similarly, another unique mutation A27383T (D61L) specific for Omicron sublineage BA.2 was found as a minor allele in the samples isolated from early stage in 2020, and MuAF of this allele finally increased in the Omicron variant B.1.1.529 (Figure 2C). Collectively, many mutations of SARS‐CoV‐2 VOCs, especially 41 mutations of Omicron variant, can be found as minor alleles in previous variant samples such as Alpha and Delta in different countries (Figure 2D–G and Table S4).

The relationship between iSNVs and mutations. (A–C) The common mutations could be found as minor allele in the previous samples and the relationship between their minor allele frequency and time. The boxplots were plotted with mean value of mutated minor allele frequency (MuAF) and standard deviation. The outliers are shown as scatter points in the boxplots. (D–G) Genome‐wide profiles of whether unique or common mutations were present as minor alleles in previous samples among the SARS‐CoV‐2 variants of concern (VOCs) (Alpha: D, Beta: E, Delta: F, Omicron: G). The unique mutations specific to a certain SARS‐CoV‐2 VOC were marked at the top and the common mutations were marked at the bottom. If a mutation could be found as a minor allele in previous samples, this mutation was marked with pink. Else, it was marked with green.

2.4. Potential contribution of mutations to high intra‐host diversities in SARS‐CoV‐2 VOCs

We evaluated the intra‐host diversities of the SARS‐CoV‐2 strains by calculating the root‐mean‐square‐deviation (RMSD) value at each site across the genome. Each column in Figure 3A shows the RMSD data at each position across the SARS‐CoV‐2 genome of one representative SARS‐CoV‐2 strain. Figure 3B shows the average RMSD data of all well‐sequenced SARS‐CoV‐2 strains in different lineages (one dot represents one strain). Figure 3A,B indicate that the intra‐host diversities of all SARS‐CoV‐2 VOCs are significantly higher than that of the ancestral strain or lineage (Kruskal–Wallis test, p‐values <0.0001). Omicron variants have the highest diversity among the analyzed VOCs, irrespective of the individual strain level or lineage level.

Intra‐host genetic diversity of each SARS‐CoV‐2 variant. (A) Intra‐host genetic diversity at representative SARS‐CoV‐2 strain level. The root‐mean‐square‐deviation (RMSD) is shown with mean values ± SEM. (B) Intra‐host genetic diversity at lineage level. Each scatter represents the average RMSD value of each isolate in that lineage. The number of isolates in each lineage is marked. The significant analysis is calculated by the Kruskal–Wallis test. (C) Molecular dynamics simulations of I42V mutation in NSP14 (PDB accession number: 7DIY). The reference structure of NSP14 (I42) is highlighted with white. The mutated structure of NSP14 (V42) is shown with green. The structure of NSP10 is highlighted with cyan. Amino acid residue conformations affected by the mutation are shown.

Previous studies reported that RNA‐dependent RNA polymerase (RdRp) and NSP14 are two major contributors to the intra‐host diversities of coronaviruses, ⁹ , ¹¹ because the product of NSP14 gene has 3′ to 5′ exonuclease activity that can proofread the mismatched nucleotides replicated from RdRp. Therefore, we further analyzed whether any mutations in RdRp and NSP14 genes could contribute to the increased diversities of SARS‐CoV‐2 VOCs. According to the large‐scale genomes analysis, Omicron variant has two non‐synonymous mutations, C14408T (P316L) in RdRp and A18163G (I42V) in NSP14, but the other three VOCs only have one non‐synonymous mutation C14408T (P316L) in RdRp (Figure 2 and Table S2). C14408T (P316L) is the second hotspot mutation after the mutation A23403G (D614G). To investigate whether the mutation P316L in RdRp could affect the genetic diversity of SARS‐CoV‐2 VOCs, other 14 deep‐sequenced non‐VOC samples in the early stage of the pandemic that belonged to B.1.x or B.x, but with P316L mutation in RdRp and a few mutations in other regions, were included to calculate intra‐host diversities (Table S5). As shown in Figure 3B, the RMSD values showed that the intra‐host diversities of these 14 samples were also significantly higher than that of the ancestral lineage but close to those of Alpha variants (Kruskal–Wallis test, p‐value <0.0001). One thing to be noticed is that B.1.x or B.x lineages with P316L mutation only contain a few mutations (Table S5) compared to the reference genome (MN908947.3). Besides the hotspot mutation P316L, Omicron variants have the specific mutation A18163G (I42V) in NSP14 gene (Figure 2G and Table S2). Owing to that the proofreading activity of NSP14 depends on the binding capability of NSP10, ²¹ and I42V is close to the key sites that directly interact with NSP10, ²² molecular dynamics simulation was performed to compare the changes of the binding free energy before and after the mutation. Using the CryoEM structure (PDB accession number: 7DIY), we found the binding free energy is −36.62 kcal/mol before mutation, but it changed to −31.53 kcal/mol after the mutation, which means that the binding between NSP14 and NSP10 may become weaker after the mutation, leading to weaker proofreading activity (Figure 3C).

2.5. Impact of mutations and iSNVs on RT‐PCR detection

The current gold‐standard approach for SARS‐CoV‐2 detection is RT‐PCR. One of the obvious factors to keep the effectiveness of these assays is that the oligonucleotide primers and probes can accurately match the target sequence of different variants. Mutations, especially the mutations in the probe or at 3′ end of the primers, may lead to invalid PCR assays. We further analyzed whether mutations and iSNVs in N gene, which was chosen mostly as the target gene for detection because of its high abundance lead by the complex numerous discontinuous transcription events, ²³ might affect the PCR assays widely used. As shown in Figure 4, almost all the primer probe sets matched regions had thousands of mutations, but most of these mutations were far away from the key sites that could affect PCR amplification, except for the primer probe sets from Thailand and Hong Kong CDCs. There is a deletion in the N gene of Omicron variants (Table S2) in the region of the reverse primer from Thailand CDC and thousands of genomes have mutations in the 3′ end of the reverse primer from Hong Kong CDC (Figure 4C). The widely used primer probe set from China CDC was located in the iSNVs‐rich regions of N gene (Figure 4B,C). In contrast, two primer probe sets targeting ORF1ab and E gene from China CDC and Germany CDC, respectively, are found much conserved (Figure 4C). Moreover, there is one most conserved region (from 28595 to 28702 in the genome of MN908947.3) in the N gene, which may be used for designing new primer probe sets for PCR detection of SARS‐CoV‐2.

Distribution of mutations and iSNV frequency in the N gene of SARS‐CoV‐2 variants of concern (VOCs). (A) Distribution of mutations in the N gene of SARS‐CoV‐2 VOCs. (B) Distribution of iSNV frequency in the N gene of SARS‐CoV‐2 VOCs. The most conserved region and the amplification regions of the primer probe sets targeting N gene from the Center for Disease Control and Prevention (CDC) of several areas were also highlighted. (C) Mutations (left) and iSNV frequency (right) in each position of several primer probe sets announced by some CDCs worldwide. All sequences were 5′‐aligned, and the short sequences were filled with the gray squares at their 3′ end to match the length with the longest sequence. The numbers 1–28 represent the sites in primer/probe. Collectively, in order to facilitate visualization, the number of mutations larger than a certain value was visualized by this value. And the iSNV frequency in each VOC was calculated as (total number of iSNVs in the sites)/(total number of the well‐sequenced samples of a certain VOC). The full names of the abbreviations for countries/areas are: Thailand (THA), China (CHA), United States of America (USA), Japan (JPN), Hong Kong Special Administrative Region (HK), and Germany (GER). The CHA‐ORF1ab represents the primer probe of China CDC targeting ORF1ab region and the GER‐E represents the primer probe of Germany CDC targeting E gene.

2.6. Implications of mutations and iSNVs on vaccines

The S protein of SARS‐CoV‐2 majorly interacts with the host angiotensin‐converting enzyme 2 (ACE2) receptor before entering a cell to initiate the infection, so it was mainly used as the target for vaccine development. With the circulation of the SARS‐CoV‐2 variants, most monoclonal antibodies approved by FDA such as bamlanivimab, etesevimab, casirivimab, imdevimab, tixagevimab, and cilgavimab were reported to have reduced efficacies against Omicron variants because of the heavy mutations in RBD region. ²⁰ We further evaluated the potential impact on the vaccine effectiveness based on the mutations and iSNV landscapes in S gene.

As shown in Figure 5, the N‐terminal domain (NTD) and RBD regions contained many variable sites, and Omicron variant has much higher densities of both mutations and iSNVs. Even worse, most variants, especially Omicron, have more mutations and iSNVs in the epitope‐rich regions than that of other regions (Figure 5C,D, most p‐values <0.0001 by Kruskal–Wallis test). Therefore, it is highly possible that efficacy of vaccines targeting S gene is variable to different VOCs. For example, two mRNA vaccines named BNT162b2 and Spikevax approved by FDA, another mRNA vaccine named ARCoV under phase III clinical trials, and two other subunit vaccines, NVSI‐06‐08 in clinical trials and NVX‐COV2373 for Emergency Use Authorization (EUA) by FDA (https://www.fda.gov/emergency‐preparedness‐and‐response/coronavirus‐disease‐2019‐covid‐19/covid‐19‐vaccines), ²⁴ , ²⁵ which covered the whole RBD region or the full length of S gene (Figure 5), were designed based on sequences of the ancestral strains or Beta and Kappa variants, other than current Omicron variants.

Distribution of mutations and iSNV frequency in the S gene of SARS‐CoV‐2 variants of concern (VOCs). (A) Distribution of the mutations in the S gene of SARS‐CoV‐2 VOCs. (B) Distribution of the iSNV frequency in the S gene of SARS‐CoV‐2 VOCs. (C and D) Comparison of iSNVs and mutations distribution in the epitope‐rich regions and other regions in the S gene of each VOC. The significant difference was calculated by the Kruskal–Wallis test. In details, the iSNV frequency in each VOC was calculated as (total number of iSNVs in the site)/(total number of the well‐sequenced samples of a certain VOC), and the antigen epitopes including discontinuous epitopes and linear epitopes were referred to the results from http://tools.iedb.org/ and Li et al. ⁵⁶ The vaccine sequences were aligned to the reference S gene and highlighted at the bottom of the figure.

2.7. Implications of mutations and iSNVs on antiviral drugs

Most of the antiviral drugs target the products of ORF1ab region, where few mutations in SARS‐CoV‐2 VOCs are included (Figure 1G and Table S3 sheet B). Currently, remdesivir is the only drug approved by FDA, and Chinese scientists have developed an oral remdesivir derivative VV116. ²⁶ , ²⁷ Another drug was approved for EUA by FDA named paxlovid targeting the main protease (NSP5). ²⁸ For the antiviral drug remdesivir, there are five reported mutations (A90V, P316L, F473L/S/C, V550L, and E795D/A) that could affect the affinity of remdesivir and RdRp. ²⁹ Among them, the mutation C14408T (P316L) was found capable of increasing the affinity of remdesivir and RdRp. ²⁹ The mutations and iSNVs profiles of RdRp revealed that except the common mutation P316L shared by SARS‐CoV‐2 VOCs, the key sites of RdRp to remdesivir were extremely conserved (Figure 6B,C). For another drug paxlovid, the common mutations G71S (Alpha), K90R (Beta), and P132H (Omicron) were far away from the interaction‐critical sites of the drug (Figure 6D) with NSP5. And there were few isolates with mutations and iSNVs in the interaction‐critical sites (Figure 6E,F). ²⁸ Compared with the variations in S and N genes, genes targeted by the antiviral drugs have much less mutations and iSNVs.

Distribution of mutations and iSNV frequency in the RdRp and NSP5 genes. (A) Structure of RNA‐dependent RNA polymerase (gray) interaction with the antiviral drug remdesivir (pink). The reported five mutations that can affect the remdesivir resistance of RdRp are highlighted in the structure (A), and the number of mutations and iSNV frequency of these sites are shown in (B) and (C). (D) Structure of the main protease (NSP5, gray) interacting with the antiviral drug paxlovid (green). The interaction‐critical sites with paxlovid of NSP5 are highlighted in the structure (D). The number of mutations and iSNV frequency of these sites are shown in (E) and (F). The number of mutations larger than a certain value was visualized by this value. The iSNV frequency in each variants of concern (VOC) was calculated as (total number of iSNVs in the sites)/(total number of well‐sequenced samples of a certain VOC).

3. DISCUSSION

Previous studies on other RNA viruses, such as yellow fever virus and influenza virus, revealed that the intra‐host diversity could provide useful information on virus adaptive evolution and host‐to‐host transmission during the epidemic. ¹³ , ¹⁴ In our study, we compared iSNVs and mutation features of SARS‐CoV‐2 VOCs during different stages of COIVD‐19 pandemic. Compared with the previous studies, ¹¹ , ¹⁶ , ¹⁷ , ³⁰ some interesting findings were obtained, which may deepen our understanding of the adaptive evolution of SARS‐CoV‐2 VOCs, especially Omicron variant. The distribution of iSNVs (Figure 1A–F) showed that VOCs have more iSNVs than the ancestral lineage. The selection pressure analysis showed most genes in the genomes of SARS‐CoV‐2 VOCs were under the purifying selection, which is consistent with the findings of Lythgoe et al. ¹⁶ But we found that Omicron variant is quite special with much higher iSNVs. Especially the S gene of Omicron variant was under positive selection, which indicated that S gene may be in the process of fast evolution. We further found that the mutation C14408T in RdRp gene may contribute to the increased intra‐host diversities of all the SARS‐CoV‐2 VOCs, which matched the studies of Pathak et al. and Pachetti et al., ¹¹ , ³¹ and the mutation A18163G in NSP14 gene might contribute further to the even higher diversities of Omicron variants, as A18163G in NSP14 is the unique mutation specific for Omicron variant (Table S2) and did not present in previous variants. It is possible that this mutation might affect the binding capability between NSP14 and NSP10, leading to a weaker proofreading activity (Figure 3C), because NSP14 can proofread the mismatch produced by RdRp after binding to NSP10. ²¹ , ²² However, further experimental confirmation is needed.

Overall, the iSNV results indicated that the VOCs have adapted to humans easier than the ancestral strain during spread and the higher iSNVs of Omicron variants may contribute to their much higher transmission rates. We also compared our results to the deep‐sequencing NGS data of the B.1/A lineage submitted by UK scientists in the early stage of the pandemic. ¹⁷ The RMSD results revealed that the intra‐host genetic diversities of the B.1/A variants in the early stage of the pandemic in the United Kingdom were low, which is consistent with our conclusions (Figure S1). At the same time, VOCs have obtained more and more mutations during spread (Figure 1G–L), which led to concerns about the effectiveness of current vaccines and detection using current PCR primers. For example, the N gene was often seen as the PCR target, but it was mutable because of the accumulation of C‐containing codons in its tether region. ³² , ³³ In this study, we found that some mutations in the key sites targeted by the CDC PCR assays may lead to invalid detection. And other mutations, especially the mutations in S gene, may lead to immune evasion of the virus to vaccines and antibodies generated in previous infections. As the current pandemic is still ongoing, it highlights the importance of developing effective detection primers, vaccines and drugs to combat the widespread virus. As we identified that the distributions between mutations and iSNVs are similar and some mutations could be traced to iSNVs in previous variants, it may be useful to track the distributions of iSNVs in current VOCs to avoid the variable regions in the genome during designing primer probes and vaccines. Unlike S gene, the key sites of the RdRp and NSP5 genes targeted by current antiviral drugs are quite conserved for all the VOCs, which may mean that the current antiviral drugs are still effective for the time being. Our results also revealed that NSP8 gene, which works as a key and essential cofactor for RNA replication and synthesis, ³⁴ was one of the extremely conserved genes and could be considered as a therapeutic and detection target also.

Although the overall distributions of iSNVs and mutations are similar, there are still some iSNVs‐rich genes with low mutation frequency such as RdRp gene (Figure 1). The fitness cost of mutation is one of the important factors that determines whether a minor mutation can be retained. The RNA‐dependent RNA polymerase is an essential enzyme that helps RNA synthesis and replication of SARS‐CoV‐2 genome. Therefore, proper and efficient synthesis of viral RNA is important for virus survival. ³⁵ However, the fitness costs of mutations in RdRp are very high. For example, Agostini et al. found that the mutation V533L or F476L in RdRp could reduce the susceptibility of coronavirus mouse hepatitis virus (MHV) to the antiviral drug remdesivir, while the replication ability of the mutated virus was significantly decreased and failed to compete with wildtype. ³⁶ These two mutations corresponded to V550L and F473L in RdRp of SARS‐CoV‐2, and the corresponding mutation rate was also extremely low (Figure 6A–C). Thus, accumulation of mutations in RdRp might probably damage the virus and limit the emergence of mutations. And the K _a/K _s values of iSNVs and mutations in RdRp were smaller than 1, which means the deleterious mutations in this gene might be eliminated (Table S3 sheet B). On the contrary, the Spike protein is located on the surface of SARS‐CoV‐2 and it can mediate viral entry by binding with the host cell surface protein ACE2. Thus, the Spike protein becomes the target of most neutralizing antibodies and vaccines. The mutations in Spike protein, especially in the antigen epitope regions, are favorable for the virus to escape, neutralizing antibodies or enhancing the receptor binding capability. ³⁷ Hence, in the scenario of mass vaccination of different vaccines, the iSNVs in S gene can favor the virus survival and have large probability to be accumulated, and the K _a/K _s value of this gene larger than 1 also supports this hypothesis (Table S3 sheet B). It might be the reason why the Omicron variant has the cluster mutations and iSNVs in its S gene (Figure 5). ³⁸

Our results also indicated that many mutations of SARS‐CoV‐2 VOCs that may be traced as minor alleles in previous VOCs emerged in different countries and the early iSNVs provide a mutational pool shaping the global evolution of variants, which are consistent to the previous studies by Pathak et al. ¹¹ and Li et al. ³⁰ But we also found there were some mutations that cannot be traced to previous iSNVs, which might be partially due to the limitations of the collected datasets. However, as the variants widely spread, they probably underwent multiple ways to generate the mutations, such as different host selection, environmental mutagenesis, or recombination. ³⁹ , ⁴⁰ For example, Bate et al. found that the unique mutation A23055G (Q498R) of Omicron variant combined with A23063T (N501Y) allowed the virus to bind tightly to rat ACE2, ⁴¹ and the mutation Q498R was not found as minor allele in the previous isolates (Figure 2G). It was suggested that the evolution of SARS‐CoV‐2 in the past might have gone through a highly complex process, such as zoonotic transfer from animal to human, and human‐to‐animal‐to‐human. ⁴²

In summary, we characterized the mutation and intra‐host variation landscapes of typical SARS‐CoV‐2 VOCs. Our findings not only suggested that Omicron variant is still under a high degree of adaptive evolution, but also identified two mutations, which might be associated with the increased intra‐host diversity. Further analyzing mutations and iSNVs in important genes revealed that some PCR primer probe sets need to be re‐designed to avoid invalid detection for certain variants, and vaccines targeted to S gene need to be updated so that they are efficient against new variants. However, the genes targeted by current antiviral drugs are quite conserved, which might mean that those drugs are active to current VOCs and future variants for only short time.

4. MATERIALS AND METHODS

4.1. Next‐generation deep sequencing data collection and processing

From our previous study, ¹⁸ the third‐generation sequencing platform still has false‐positive minor mutations caused by the technical errors during sequencing. Therefore, only next‐generation pair‐end deep sequencing data meet the requirements in intra‐host variation calling. Besides, each independent NGS dataset should contain at least 100 COVID‐19 patient samples. And we also ensured that different sequencing datasets came from the patients in different countries at different times. Finally, a total of eight independent NGS datasets were selected in this study. Among them, two datasets belonged to our previous work, ¹⁸ , ¹⁹ and the other six datasets were collected from the public databases NCBI and EBI. The accession numbers of the six studies are PRJNA633948, PRJNA639864, PRJNA784038, PRJNA817870, PRJEB46969, and PRJEB47264. Total 1744 sequencing data from 1744 SARS‐CoV‐2‐positive nasopharyngeal swab samples, covering different areas worldwide from 2020 to 2022, as listed in Table S1, were used to intra‐host genetic diversity analysis.

The raw data in FASTQ format were downloaded. The low‐quality reads were trimmed by the Trimmomatic software (version 0.36). ⁴³ Clean reads were then aligned to the SARS‐CoV‐2 reference genome (accession number: MN908947.3) via BWA software (version 0.7.17) with the default options. ⁴⁴ The coverage and sequencing depth were calculated by the samtools software. ⁴⁵ Only samples with high sequencing coverage (>95%) and enough depth (>100) were used for downstream analysis.

4.2. Lineage typing of SARS‐CoV‐2 isolates

After mapping the clean reads to the SARS‐CoV‐2 reference genome, the viral reads were extracted by the bam2fastq software (version 1.1.0, https://genome.sph.umich.edu/wiki/BamUtil:_bam2FastQ). The de novo assembly was performed by the megahit software (version 1.2.9) based on the viral reads. ⁴⁶ Meanwhile, the point mutations, deletions, and insertions of each sequence were identified by the SnpEff software (version 5.1). ⁴⁷ The completed or nearly full length of SARS‐CoV‐2 genomes was uploaded to the online sever https://pangolin.cog‐uk.io/ to calculate the lineages. Based on the lineage information, the taxonomy of SARS‐CoV‐2 variant could be accurately determined.

4.3. Calling of intra‐host single‐nucleotide variations

The iSNVs are defined as the allele with minor frequency co‐existing with the reference allele (major frequency) at the same position within the same sample. The identification of iSNVs was basically executed by our established pipelines and referred to Nicole's work. ⁹ , ⁴⁸ In details, the jellyfish software (version 2.3.0) was utilized to evaluate the sequencing errors of each NGS data. ⁴⁹ In order to determine the iSNVs more strictly, three requirements should be met: (a) sequencing depth greater than 100; (b) MAF should be larger than 0.02, which is significantly greater than the average sequencing errors rate (0.001); (c) the position of each iSNV was supported by the inner part of read (excluding 10 bp on each end).

4.4. Genetic diversity analysis

As mentioned in Rozen‐Gagnon's publication, the root mean square deviation (RMSD) values of the variance at each nucleotide position were used to estimate the viral genetic diversity within a deep sequencing sample. ⁵⁰ Each position was considered by the RMSD value of genetic diversity across the genome. ⁵¹ Therefore, in this study, we used the RMSD value to evaluate the intra‐host genetic polymorphisms of differential SARS‐CoV‐2 variants based on the genome‐wide minor allele profile generated from the above description.

4.5. Large‐scale genomics data analysis for the mutations

A total of 3,668,205 SARS‐CoV‐2 completed high‐quality genomes were downloaded from the GISAID database (https://www.gisaid.org/) covering all isolates that emerged from 2019 to April 2022. These genomes were selected based on the criteria: (a) genome length is longer than 29 kbp, (b) the unknown base(s) are less than 30, and (c) the degenerate base(s) are less than 70. These large‐scale genomics data were used to construct the mutations landscapes and identify unique mutations specific to SARS‐CoV‐2 VOCs according to our established pipelines. ⁵² In particular, our established pipelines performed pair‐wise alignment between each SARS‐CoV‐2 genome and the reference genome (MN908947.3). After the alignment, our pipeline can identify the mutations of each isolate against the reference genome. Combined with the lineage information, the mutation coverage of a certain SARS‐CoV‐2 lineage can be calculated by (total number of genomes identified with this mutation from this lineage)/(total number of genomes from this lineage). If a mutation presented in over 50% of the corresponding lineage, it is considered as the common mutation of this lineage. If a common mutation only occurred in a certain SARS‐CoV‐2 lineage, this mutation can be considered as the unique mutation specific to this lineage.

4.6. Molecular dynamics simulations

The molecular dynamics simulations were used to evaluate whether a single amino acid substitution on NSP14 would affect the binding capability between the two nonstructural proteins NSP10 and NSP14. The high‐resolution CryoEM result of NSP14–NSP10 complex (7DIY) from Protein Data Bank (PDB) database was used for amino acid substitution prediction. The Chimera software ⁵³ was executed to minimize the energy of the mutated protein structure to adjust the changes of the amino acid and its surrounding amino acid structure after mutation. The binding free energies for NSP14 to NSP10 were calculated by the HawkDock sever based on the Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) model. ⁵⁴

4.7. Selection pressure calculation

The selection pressures of iSNVs and mutations were calculated by the software KaKs_calculator (version: 3.0). ⁵⁵

4.8. Statistical analysis

All the statistical analyses were performed in the GraphPad Prism 7 software (GraphPad Inc., San Diego, CA, USA) or the R program language (version: 3.6.3, https://cran.r‐project.org/bin/windows/base/old/3.6.3/). Before statistical analysis, the distribution feature of each data was judged. Collectively, in this study, the independent groups were analyzed by Kruskal–Wallis test and the correlation coefficient was calculated by Pearson correlation analysis. Data were showed as mean ± SEM. Statistical differences were considered significant at *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001.

AUTHOR CONTRIBUTIONS

Dongyan Xiong designed experiments, performed all bioinformatics analysis, analyzed the experiment results, and drafted the manuscript. Xiaoxu Zhang attended the results discussion. Junping Yu designed experiments, attended the results discussion, and revised the manuscript. Hongping Wei generated the idea, designed experiments, finalized the manuscript, and provided funding. All authors have read and approved the final manuscript.

CONFLICT OF INTEREST

The authors declare that they have no conflicts of interest.

ETHICS STATEMENT

No research activity for this manuscript requires an ethics approval.

Supporting information

Supporting Information

Click here for additional data file.^{(168.4KB, docx)}

Supporting Information

Click here for additional data file.^{(66.2KB, xlsx)}

Supporting Information

Click here for additional data file.^{(1.3MB, xlsx)}

Supporting Information

Click here for additional data file.^{(1.4MB, xlsx)}

Supporting Information

Click here for additional data file.^{(2.7MB, xlsx)}

Supporting Information

Click here for additional data file.^{(15.5KB, xlsx)}

ACKNOWLEDGMENTS

We thank the financial support from Urgent Project for COVID‐19 Control by Hubei Province (Grant No: 2020YFC0841200). Thanks also to Tao Du, Jing Xiong, and the Wuhan National Biosafety Laboratory staff for their critical supports. For all the public SARS‐CoV‐2 genome sequences and sequencing data used in the study, we thank all the people and the laboratories collecting the specimens and sharing the SARS‐COV‐2 data via the NCBI, EBI, and GISAID Initiative.

Xiong D, Zhang X, Yu J, Wei H. Distribution of intra‐host variations and mutations in the genomes of SARS‐CoV‐2 and their implications on detection and therapeutics. MedComm. 2022;3:e186. 10.1002/mco2.186

Contributor Information

Junping Yu, Email: yujp@wh.iov.cn.

Hongping Wei, Email: hpwei@wh.iov.cn.

DATA AVAILABILITY STATEMENT

The commands used in pipeline are available at the following GitHub repository: https://github.com/MisgaXiong/Sripts_for_Paper/tree/master/SARS‐CoV‐2‐mutation. The data that support the findings of this study are available in the Supporting Information of this article.

REFERENCES

1. Rambaut A, Holmes EC, O'Toole Á, et al. A dynamic nomenclature proposal for SARS‐CoV‐2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403‐1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Ao D, Lan T, He X, et al. SARS‐CoV‐2 Omicron variant: immune escape and vaccine development. MedComm. 2022;3(1):e126. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Hernández‐Huerta MT, Pérez‐Campos Mayoral L, Romero Díaz C, et al. Analysis of SARS‐CoV‐2 mutations in Mexico, Belize, and isolated regions of Guatemala and its implication in the diagnosis. J Med Virol. 2021;93(4):2099‐2114. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Lesbon JCC, Poleti MD, de Mattos Oliveira EC, et al. Nucleocapsid (N) gene mutations of SARS‐CoV‐2 can affect real‐time RT‐PCR diagnostic and impact false‐negative results. Viruses. 2021;13(12):2474. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Li Q, Wu J, Nie J, et al. The impact of mutations in SARS‐CoV‐2 Spike on viral infectivity and antigenicity. Cell. 2020;182(5):1284‐1294.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. van Dorp L, Acman M, Richard D, et al. Emergence of genomic diversity and recurrent mutations in SARS‐CoV‐2. Infect Genet Evol. 2020;83:104351. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Domingo E. Mechanisms of viral emergence. Vet Res. 2010;41(6):38. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Sanjuán R. From molecular genetics to phylodynamics: evolutionary relevance of mutation rates across viruses. PLoS Pathog. 2012;8(5):e1002685. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Sexton NR, Smith EC, Blanc H, Vignuzzi M, Peersen OB, Denison MR. Homology‐based identification of a mutation in the coronavirus RNA‐dependent RNA polymerase that confers resistance to multiple mutagens. J Virol. 2016;90(16):7415‐7428. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Ni M, Chen C, Qian J, et al. Intra‐host dynamics of Ebola virus during 2014. Nat Microbiol. 2016;1(11):16151. [DOI] [PubMed] [Google Scholar]
11. Pathak AK, Mishra GP, Uppili B, et al. Spatio‐temporal dynamics of intra‐host variability in SARS‐CoV‐2 genomes. Nucleic Acids Res. 2022;50(3):1551‐1561. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Ruan Y, Hou M, Li J, et al. One viral sequence for each host? – the neglected within‐host diversity as the main stage of SARS‐CoV‐2 evolution. bioRxiv. 2021. 10.1101/2021.06.21.449205 [DOI]
13. Chen C, Jiang D, Ni M, et al. Phylogenomic analysis unravels evolution of yellow fever virus within hosts. PLoS Negl Trop Dis. 2018;12(9):e0006738. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Valesano AL, Fitzsimmons WJ, McCrone JT, et al. Influenza B viruses exhibit lower within‐host diversity than influenza A viruses in human hosts. J Virol. 2020;94(5):e01710‐e01719. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Wang Y, Wang D, Zhang L, et al. Intra‐host variation and evolutionary dynamics of SARS‐CoV‐2 populations in COVID‐19 patients. Genome Med. 2021;13(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Lythgoe KA, Hall M, Ferretti L, et al. SARS‐CoV‐2 within‐host diversity and transmission. Science. 2021;372(6539):eabg0821. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Tonkin‐Hill G, Martincorena I, Amato R, et al. Patterns of within‐host genetic diversity in SARS‐CoV‐2. eLife. 2021;10:e66857. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Yan Y, Wu K, Chen J, et al. Rapid acquisition of high‐quality SARS‐CoV‐2 genome via Amplicon‐Oxford nanopore sequencing. Virol Sin. 2021;36(5):901‐912. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Xiong D, Muema C, Zhang X, et al. Enriched opportunistic pathogens revealed by metagenomic sequencing hint potential linkages between pharyngeal microbiota and COVID‐19. Virol Sin. 2021;36(5):924‐933. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Bhattacharya M, Sharma AR, Dhama K, Agoramoorthy G, Chakraborty C. Omicron variant (B.1.1.529) of SARS‐CoV‐2: understanding mutations in the genome, S‐glycoprotein, and antibody‐binding regions. Geroscience. 2022;44(2):619‐637. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Moeller NH, Shi K, Demir Ö, et al. Structure and dynamics of SARS‐CoV‐2 proofreading exoribonuclease ExoN. Proc Nat Acad Sci U S A. 2022;119(9):e2106379119. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Bojkova D, Widera M, Ciesek S, Wass MN, Michaelis M. Reduced interferon antagonism but similar drug sensitivity in Omicron variant compared to Delta variant of SARS‐CoV‐2 isolates. Cell Res. 2022;32(3):319‐321. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H. The architecture of SARS‐CoV‐2 transcriptome. Cell. 2020;181(4):914‐921.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Zhang NN, Li XF, Deng YQ, et al. A thermostable mRNA vaccine against COVID‐19. Cell. 2020;182(5):1271‐1283.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Liang Y, Zhang J, Yuan RY, et al. Design of a mutation‐integrated trimeric RBD with broad protection against SARS‐CoV‐2. Cell Discov. 2022;8(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Elmansi H, Ibrahim AE, Mikhail IE, Belal F. Green and sensitive spectrofluorimetric determination of Remdesivir, an FDA approved SARS‐CoV‐2 candidate antiviral; application in pharmaceutical dosage forms and spiked human plasma. Anal Methods. 2021;13(23):2596‐2602. [DOI] [PubMed] [Google Scholar]
27. Xie Y, Yin W, Zhang Y, et al. Design and development of an oral remdesivir derivative VV116 against SARS‐CoV‐2. Cell Res. 2021;31(11):1212‐1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Mótyán JA, Mahdi M, Hoffka G, Tőzsér J. Potential resistance of SARS‐CoV‐2 main protease (Mpro) against protease inhibitors: lessons learned from HIV‐1 protease. Int J Mol Sci. 2022;23(7):3507. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Focosi D, Maggi F, McConnell S, Casadevall A. Very low levels of remdesivir resistance in SARS‐COV‐2 genomes after 18 months of massive usage during the COVID19 pandemic: a GISAID exploratory analysis. Antiviral Res. 2022;198:105247. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Li J, Du P, Yang L, et al. Two‐step fitness selection for intra‐host variations in SARS‐CoV‐2. Cell Rep. 2022;38(2):110205. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Pachetti M, Marini B, Benedetti F, et al. Emerging SARS‐CoV‐2 mutation hot spots include a novel RNA‐dependent‐RNA polymerase variant. J Transl Med. 2020;18(1):179. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Matyášek R, Kovařík A. Mutation patterns of human SARS‐CoV‐2 and bat RaTG13 coronavirus genomes are strongly biased towards C>U transitions, indicating rapid evolution in their hosts. Genes. 2020;11(7):761. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Troyano‐Hernáez P, Reinosa R, Holguín Á. Evolution of SARS‐CoV‐2 envelope, membrane, nucleocapsid, and spike structural proteins from the beginning of the pandemic to September 2020: a global and regional approach by Epidemiological Week. Viruses. 2021;13(2):243. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Wang Q, Wu J, Wang H, et al. Structural basis for RNA replication by the SARS‐CoV‐2 polymerase. Cell. 2020;182(2):417‐428.e13. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Tian L, Qiang T, Liang C, et al. RNA‐dependent RNA polymerase (RdRp) inhibitors: the current landscape and repurposing for the COVID‐19 pandemic. Eur J Med Chem. 2021;213:113201. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Agostini ML, Andres EL, Sims AC, et al. Coronavirus susceptibility to the antiviral remdesivir (GS‐5734) is mediated by the viral polymerase and the proofreading exoribonuclease. mBio. 2018;9(2):e00221‐18. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Yi C, Sun X, Lin Y, et al. Comprehensive mapping of binding hot spots of SARS‐CoV‐2 RBD‐specific neutralizing antibodies for tracking immune escape variants. Genome Med. 2021;13(1):164. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Martin DP, Lytras S, Lucaci AG, et al. Selection analysis identifies clusters of unusual mutational changes in omicron lineage BA.1 that likely impact spike function. Mol Biol Evol. 2022;39(4):msac061. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Hammer AS, Quaade ML, Rasmussen TB, et al. SARS‐CoV‐2 transmission between mink (Neovison vison) and humans, Denmark. Emerg Infect Dis. 2021;27(2):547‐551. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Motayo BO, Oluwasemowo OO, Olusola BA, et al. Evolution and genetic diversity of SARS‐CoV‐2 in Africa using whole genome sequences. Int J Infect Dis. 2021;103:282‐287. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Bate N, Savva CG, Moody PCE, et al. In vitro evolution predicts emerging SARS‐CoV‐2 mutations with high affinity for ACE2 and cross‐species binding. PLoS Pathog. 2022;18(7):e1010733. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Pekar JE, Magee A, Parker E, et al. The molecular epidemiology of multiple zoonotic origins of SARS‐CoV‐2. Science. 2022;377(6609):960‐966. [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114‐2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754‐1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078‐2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra‐fast single‐node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674‐1676. [DOI] [PubMed] [Google Scholar]
47. Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso‐2; iso‐3. Fly. 2012;6(2):80‐92. [DOI] [PMC free article] [PubMed] [Google Scholar]
48. Zhang X, Xiong D, Yu J, Yang H, He P, Wei H. Genetic polymorphism drives susceptibility between bacteria and bacteriophages. Front microbiol. 2021;12:627897. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Marçais G, Kingsford C. A fast, lock‐free approach for efficient parallel counting of occurrences of k‐mers. Bioinformatics. 2011;27(6):764‐770. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Rozen‐Gagnon K, Stapleford KA, Mongelli V, et al. Alphavirus mutator variants present host‐specific defects and attenuation in mammalian and insect models. PLoS Pathog. 2014;10(1):e1003877. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Li K, Venter E, Yooseph S, et al. ANDES: statistical tools for the ANalyses of DEep sequencing. BMC Res Notes. 2010;3:199. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Xiong D, Zhang X, Shi M, et al. Developing an amplification refractory mutation system‐quantitative reverse transcription‐PCR assay for rapid and sensitive screening of SARS‐CoV‐2 variants of concern. Microbiol Spectr. 2022;10(1):e0143821. [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Pettersen EF, Goddard TD, Huang CC, et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 2021;30(1):70‐82. [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Feng T, Chen F, Kang Y, et al. HawkRank: a new scoring function for protein‐protein docking based on weighted energy terms. J Cheminform. 2017;9(1):66. [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Zhang Z. KaKs_calculator 3.0: calculating selective pressure on coding and non‐coding sequences. Genomics Proteomics Bioinformatics. 2022. 10.1016/j.gpb.2021.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Li Y, Ma ML, Lei Q, et al. Linear epitope landscape of the SARS‐CoV‐2 Spike protein constructed from 1,051 COVID‐19 patients. Cell Rep. 2021;34(13):108915. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Click here for additional data file.^{(168.4KB, docx)}

Supporting Information

Click here for additional data file.^{(66.2KB, xlsx)}

Supporting Information

Click here for additional data file.^{(1.3MB, xlsx)}

Supporting Information

Click here for additional data file.^{(1.4MB, xlsx)}

Supporting Information

Click here for additional data file.^{(2.7MB, xlsx)}

Supporting Information

Click here for additional data file.^{(15.5KB, xlsx)}

Data Availability Statement

[mco2186-bib-0001] 1. Rambaut A, Holmes EC, O'Toole Á, et al. A dynamic nomenclature proposal for SARS‐CoV‐2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403‐1407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0002] 2. Ao D, Lan T, He X, et al. SARS‐CoV‐2 Omicron variant: immune escape and vaccine development. MedComm. 2022;3(1):e126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0003] 3. Hernández‐Huerta MT, Pérez‐Campos Mayoral L, Romero Díaz C, et al. Analysis of SARS‐CoV‐2 mutations in Mexico, Belize, and isolated regions of Guatemala and its implication in the diagnosis. J Med Virol. 2021;93(4):2099‐2114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0004] 4. Lesbon JCC, Poleti MD, de Mattos Oliveira EC, et al. Nucleocapsid (N) gene mutations of SARS‐CoV‐2 can affect real‐time RT‐PCR diagnostic and impact false‐negative results. Viruses. 2021;13(12):2474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0005] 5. Li Q, Wu J, Nie J, et al. The impact of mutations in SARS‐CoV‐2 Spike on viral infectivity and antigenicity. Cell. 2020;182(5):1284‐1294.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0006] 6. van Dorp L, Acman M, Richard D, et al. Emergence of genomic diversity and recurrent mutations in SARS‐CoV‐2. Infect Genet Evol. 2020;83:104351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0007] 7. Domingo E. Mechanisms of viral emergence. Vet Res. 2010;41(6):38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0008] 8. Sanjuán R. From molecular genetics to phylodynamics: evolutionary relevance of mutation rates across viruses. PLoS Pathog. 2012;8(5):e1002685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0009] 9. Sexton NR, Smith EC, Blanc H, Vignuzzi M, Peersen OB, Denison MR. Homology‐based identification of a mutation in the coronavirus RNA‐dependent RNA polymerase that confers resistance to multiple mutagens. J Virol. 2016;90(16):7415‐7428. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0010] 10. Ni M, Chen C, Qian J, et al. Intra‐host dynamics of Ebola virus during 2014. Nat Microbiol. 2016;1(11):16151. [DOI] [PubMed] [Google Scholar]

[mco2186-bib-0011] 11. Pathak AK, Mishra GP, Uppili B, et al. Spatio‐temporal dynamics of intra‐host variability in SARS‐CoV‐2 genomes. Nucleic Acids Res. 2022;50(3):1551‐1561. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0012] 12. Ruan Y, Hou M, Li J, et al. One viral sequence for each host? – the neglected within‐host diversity as the main stage of SARS‐CoV‐2 evolution. bioRxiv. 2021. 10.1101/2021.06.21.449205 [DOI]

[mco2186-bib-0013] 13. Chen C, Jiang D, Ni M, et al. Phylogenomic analysis unravels evolution of yellow fever virus within hosts. PLoS Negl Trop Dis. 2018;12(9):e0006738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0014] 14. Valesano AL, Fitzsimmons WJ, McCrone JT, et al. Influenza B viruses exhibit lower within‐host diversity than influenza A viruses in human hosts. J Virol. 2020;94(5):e01710‐e01719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0015] 15. Wang Y, Wang D, Zhang L, et al. Intra‐host variation and evolutionary dynamics of SARS‐CoV‐2 populations in COVID‐19 patients. Genome Med. 2021;13(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0016] 16. Lythgoe KA, Hall M, Ferretti L, et al. SARS‐CoV‐2 within‐host diversity and transmission. Science. 2021;372(6539):eabg0821. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0017] 17. Tonkin‐Hill G, Martincorena I, Amato R, et al. Patterns of within‐host genetic diversity in SARS‐CoV‐2. eLife. 2021;10:e66857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0018] 18. Yan Y, Wu K, Chen J, et al. Rapid acquisition of high‐quality SARS‐CoV‐2 genome via Amplicon‐Oxford nanopore sequencing. Virol Sin. 2021;36(5):901‐912. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0019] 19. Xiong D, Muema C, Zhang X, et al. Enriched opportunistic pathogens revealed by metagenomic sequencing hint potential linkages between pharyngeal microbiota and COVID‐19. Virol Sin. 2021;36(5):924‐933. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0020] 20. Bhattacharya M, Sharma AR, Dhama K, Agoramoorthy G, Chakraborty C. Omicron variant (B.1.1.529) of SARS‐CoV‐2: understanding mutations in the genome, S‐glycoprotein, and antibody‐binding regions. Geroscience. 2022;44(2):619‐637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0021] 21. Moeller NH, Shi K, Demir Ö, et al. Structure and dynamics of SARS‐CoV‐2 proofreading exoribonuclease ExoN. Proc Nat Acad Sci U S A. 2022;119(9):e2106379119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0022] 22. Bojkova D, Widera M, Ciesek S, Wass MN, Michaelis M. Reduced interferon antagonism but similar drug sensitivity in Omicron variant compared to Delta variant of SARS‐CoV‐2 isolates. Cell Res. 2022;32(3):319‐321. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0023] 23. Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H. The architecture of SARS‐CoV‐2 transcriptome. Cell. 2020;181(4):914‐921.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0024] 24. Zhang NN, Li XF, Deng YQ, et al. A thermostable mRNA vaccine against COVID‐19. Cell. 2020;182(5):1271‐1283.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0025] 25. Liang Y, Zhang J, Yuan RY, et al. Design of a mutation‐integrated trimeric RBD with broad protection against SARS‐CoV‐2. Cell Discov. 2022;8(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0026] 26. Elmansi H, Ibrahim AE, Mikhail IE, Belal F. Green and sensitive spectrofluorimetric determination of Remdesivir, an FDA approved SARS‐CoV‐2 candidate antiviral; application in pharmaceutical dosage forms and spiked human plasma. Anal Methods. 2021;13(23):2596‐2602. [DOI] [PubMed] [Google Scholar]

[mco2186-bib-0027] 27. Xie Y, Yin W, Zhang Y, et al. Design and development of an oral remdesivir derivative VV116 against SARS‐CoV‐2. Cell Res. 2021;31(11):1212‐1214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0028] 28. Mótyán JA, Mahdi M, Hoffka G, Tőzsér J. Potential resistance of SARS‐CoV‐2 main protease (Mpro) against protease inhibitors: lessons learned from HIV‐1 protease. Int J Mol Sci. 2022;23(7):3507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0029] 29. Focosi D, Maggi F, McConnell S, Casadevall A. Very low levels of remdesivir resistance in SARS‐COV‐2 genomes after 18 months of massive usage during the COVID19 pandemic: a GISAID exploratory analysis. Antiviral Res. 2022;198:105247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0030] 30. Li J, Du P, Yang L, et al. Two‐step fitness selection for intra‐host variations in SARS‐CoV‐2. Cell Rep. 2022;38(2):110205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0031] 31. Pachetti M, Marini B, Benedetti F, et al. Emerging SARS‐CoV‐2 mutation hot spots include a novel RNA‐dependent‐RNA polymerase variant. J Transl Med. 2020;18(1):179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0032] 32. Matyášek R, Kovařík A. Mutation patterns of human SARS‐CoV‐2 and bat RaTG13 coronavirus genomes are strongly biased towards C>U transitions, indicating rapid evolution in their hosts. Genes. 2020;11(7):761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0033] 33. Troyano‐Hernáez P, Reinosa R, Holguín Á. Evolution of SARS‐CoV‐2 envelope, membrane, nucleocapsid, and spike structural proteins from the beginning of the pandemic to September 2020: a global and regional approach by Epidemiological Week. Viruses. 2021;13(2):243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0034] 34. Wang Q, Wu J, Wang H, et al. Structural basis for RNA replication by the SARS‐CoV‐2 polymerase. Cell. 2020;182(2):417‐428.e13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0035] 35. Tian L, Qiang T, Liang C, et al. RNA‐dependent RNA polymerase (RdRp) inhibitors: the current landscape and repurposing for the COVID‐19 pandemic. Eur J Med Chem. 2021;213:113201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0036] 36. Agostini ML, Andres EL, Sims AC, et al. Coronavirus susceptibility to the antiviral remdesivir (GS‐5734) is mediated by the viral polymerase and the proofreading exoribonuclease. mBio. 2018;9(2):e00221‐18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0037] 37. Yi C, Sun X, Lin Y, et al. Comprehensive mapping of binding hot spots of SARS‐CoV‐2 RBD‐specific neutralizing antibodies for tracking immune escape variants. Genome Med. 2021;13(1):164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0038] 38. Martin DP, Lytras S, Lucaci AG, et al. Selection analysis identifies clusters of unusual mutational changes in omicron lineage BA.1 that likely impact spike function. Mol Biol Evol. 2022;39(4):msac061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0039] 39. Hammer AS, Quaade ML, Rasmussen TB, et al. SARS‐CoV‐2 transmission between mink (Neovison vison) and humans, Denmark. Emerg Infect Dis. 2021;27(2):547‐551. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0040] 40. Motayo BO, Oluwasemowo OO, Olusola BA, et al. Evolution and genetic diversity of SARS‐CoV‐2 in Africa using whole genome sequences. Int J Infect Dis. 2021;103:282‐287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0041] 41. Bate N, Savva CG, Moody PCE, et al. In vitro evolution predicts emerging SARS‐CoV‐2 mutations with high affinity for ACE2 and cross‐species binding. PLoS Pathog. 2022;18(7):e1010733. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0042] 42. Pekar JE, Magee A, Parker E, et al. The molecular epidemiology of multiple zoonotic origins of SARS‐CoV‐2. Science. 2022;377(6609):960‐966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0043] 43. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114‐2120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0044] 44. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754‐1760. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0045] 45. Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078‐2079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0046] 46. Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra‐fast single‐node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674‐1676. [DOI] [PubMed] [Google Scholar]

[mco2186-bib-0047] 47. Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso‐2; iso‐3. Fly. 2012;6(2):80‐92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0048] 48. Zhang X, Xiong D, Yu J, Yang H, He P, Wei H. Genetic polymorphism drives susceptibility between bacteria and bacteriophages. Front microbiol. 2021;12:627897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0049] 49. Marçais G, Kingsford C. A fast, lock‐free approach for efficient parallel counting of occurrences of k‐mers. Bioinformatics. 2011;27(6):764‐770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0050] 50. Rozen‐Gagnon K, Stapleford KA, Mongelli V, et al. Alphavirus mutator variants present host‐specific defects and attenuation in mammalian and insect models. PLoS Pathog. 2014;10(1):e1003877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0051] 51. Li K, Venter E, Yooseph S, et al. ANDES: statistical tools for the ANalyses of DEep sequencing. BMC Res Notes. 2010;3:199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0052] 52. Xiong D, Zhang X, Shi M, et al. Developing an amplification refractory mutation system‐quantitative reverse transcription‐PCR assay for rapid and sensitive screening of SARS‐CoV‐2 variants of concern. Microbiol Spectr. 2022;10(1):e0143821. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0053] 53. Pettersen EF, Goddard TD, Huang CC, et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 2021;30(1):70‐82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0054] 54. Feng T, Chen F, Kang Y, et al. HawkRank: a new scoring function for protein‐protein docking based on weighted energy terms. J Cheminform. 2017;9(1):66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0055] 55. Zhang Z. KaKs_calculator 3.0: calculating selective pressure on coding and non‐coding sequences. Genomics Proteomics Bioinformatics. 2022. 10.1016/j.gpb.2021.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mco2186-bib-0056] 56. Li Y, Ma ML, Lei Q, et al. Linear epitope landscape of the SARS‐CoV‐2 Spike protein constructed from 1,051 COVID‐19 patients. Cell Rep. 2021;34(13):108915. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Distribution of intra‐host variations and mutations in the genomes of SARS‐CoV‐2 and their implications on detection and therapeutics

Dongyan Xiong

Xiaoxu Zhang

Junping Yu

Hongping Wei

Abstract

1. INTRODUCTION

2. RESULTS

2.1. Summary of the datasets used for analysis

TABLE 1.

2.2. Characterization of iSNVs and common mutations in SARS‐CoV‐2 VOCs

FIGURE 1.

2.3. Evaluation of relationship between iSNVs and mutations

FIGURE 2.

2.4. Potential contribution of mutations to high intra‐host diversities in SARS‐CoV‐2 VOCs

FIGURE 3.

2.5. Impact of mutations and iSNVs on RT‐PCR detection

FIGURE 4.

2.6. Implications of mutations and iSNVs on vaccines

FIGURE 5.

2.7. Implications of mutations and iSNVs on antiviral drugs

FIGURE 6.

3. DISCUSSION

4. MATERIALS AND METHODS

4.1. Next‐generation deep sequencing data collection and processing

4.2. Lineage typing of SARS‐CoV‐2 isolates

4.3. Calling of intra‐host single‐nucleotide variations

4.4. Genetic diversity analysis

4.5. Large‐scale genomics data analysis for the mutations

4.6. Molecular dynamics simulations

4.7. Selection pressure calculation

4.8. Statistical analysis

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST

ETHICS STATEMENT

Supporting information

ACKNOWLEDGMENTS

Contributor Information

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases