Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2023 Jul 24;120(31):e2304667120. doi: 10.1073/pnas.2304667120

High-resolution mapping reveals the mechanism and contribution of genome insertions and deletions to RNA virus evolution

Mauricio Aguilar Rangel a,1, Patrick T Dolan a,b,1,2,3, Shuhei Taguwa a,c, Yinghong Xiao b, Raul Andino b,3, Judith Frydman a,3
PMCID: PMC10400975  PMID: 37487061

Significance

RNA viruses are masters at rapid adaptation to adverse conditions. One important and poorly understood viral adaptation strategy is the use of genome insertions and deletions (indels) as a source of variation and evolutionary novelty. This study presents a hybrid experimental-bioinformatic approach to comprehensively characterize the broad indel landscape of adapting RNA virus populations, from extremely low-frequency to high-frequency variants. Using this approach, we identify mechanisms of indel generation and define the selection forces that shape indel diversity and accumulation. Our analyses establish the kinetic and mechanistic tradeoffs between Indel generation and fitness and reveal functional principles defining the central role of Indels in virus evolution, emergence, and the regulation of viral infection.

Keywords: virus, evolution, adaptation, dengue virus, poliovirus

Abstract

RNA viruses rapidly adapt to selective conditions due to the high intrinsic mutation rates of their RNA-dependent RNA polymerases (RdRps). Insertions and deletions (indels) in viral genomes are major contributors to both deleterious mutational load and evolutionary novelty, but remain understudied. To characterize the mechanistic details of their formation and evolutionary dynamics during infection, we developed a hybrid experimental-bioinformatic approach. This approach, called MultiMatch, extracts insertions and deletions from ultradeep sequencing experiments, including those occurring at extremely low frequencies, allowing us to map their genomic distribution and quantify the rates at which they occur. Mapping indel mutations in adapting poliovirus and dengue virus populations, we determine the rates of indel generation and identify mechanistic and functional constraints shaping indel diversity. Using poliovirus RdRp variants of distinct fidelity and genome recombination rates, we demonstrate tradeoffs between fidelity and Indel generation. Additionally, we show that maintaining translation frame and viral RNA structures constrain the Indel landscape and that, due to these significant fitness effects, Indels exert a significant deleterious load on adapting viral populations. Conversely, we uncover positively selected Indels that modulate RNA structure, generate protein variants, and produce defective interfering genomes in viral populations. Together, our analyses establish the kinetic and mechanistic tradeoffs between misincorporation, recombination, and Indel rates and reveal functional principles defining the central role of Indels in virus evolution, emergence, and the regulation of viral infection.


RNA viruses exhibit an extraordinary mutational diversity due to the high-error nature of their RNA-dependent RNA-polymerases (RdRp). These error-prone enzymes allow viral populations to rapidly explore a wide mutational neighborhood (1). Although much attention has been paid to the contribution of single-nucleotide variants (SNVs) to the diversity and evolution of viral populations, the role of insertions and deletions (Indels) has not been as extensively studied, and thus their contribution to viral evolution is not well understood. A major challenge is that Indels are rare and thus difficult to identify and distinguish from sequencing noise. Since error rates associated with NGS preclude accurately capturing rare Indel variants they lead to biased estimates of Indel rates based on the more common Indel variants. Importantly, while Indels are estimated to occur at a lower rate than that of SNVs (2), they are in aggregate more numerous than single-nucleotide variants. For instance, in a 10,000-nucleotide genome there are 30,000 SNVs ( 3L , where L is the length of the genome) accessible through single mutational events. In contrast, the same genome can contain n=1LL-n+1=L+1L2, where n is the deletion size, or ~50 million possible deletions accessible through single mutational events.

Indels have major phenotypic consequences (3). Indels strongly affect protein-coding sequences, altering reading frames, and duplicating, removing, or inserting novel sequences. Indels in noncoding regions can disrupt regulatory and structural elements in DNA and RNA molecules. Examples of such phenotypic changes caused by Indels can be found across RNA viruses, from those affecting plants (4), where indels can even dominate the mutational spectrum (5), to those infecting humans. In flaviviruses, such as dengue (DENV), West Nile (WNV), and Zika (ZIKV), Indels are frequent in the 3′ UTRs (68). Numerous, frequent deletions occur in the evolution of SARS coronaviruses (912), including a novel furin cleavage site that alters the structural dynamics of the Spike protein conferring enhanced pathogenesis (13, 14). This type of genetic variation is not limited to RNA viruses. Insertion and deletion dynamics also play critical roles in DNA virus evolution, by expansion of the genome through duplication (15) and by rapid deletion of genes in response to selection (16, 17). Additionally, the presence of large genome deletions can dramatically affect the dynamics of viral infection and pathogenesis through formation of defective interfering particles (18, 19). Therefore, compared to single-nucleotide variants, Indels have the potential to markedly change the behavior of the genome and proteome in single mutational steps.

Accurately mapping the full spectrum of Indel variants in viral populations is key to understanding the mechanisms by which they are generated and the selective pressures that shape their diversity. However, directly measuring the frequency of SNV and Indel variants occurring at or near the mutation rate of the viral RdRp, ranging from 1E-06 to 1E-04 (2, 2022), imposes significant technical challenges. High-resolution sequencing strategies, which correct for sequencing errors and directly resolve the rate of mutation in sequenced populations, may overcome these limitations. One such technique, circular resequencing (CirSeq) (23, 24), uses rolling-circle amplification to generate tandem repeat cDNA from a single template RNA molecule, using these repeats, sequencing errors are reduced from 10−3 to 10−6, enabling the resolution of SNVs occurring at or below the viral SNV mutation rate. Here, we sought to harness this approach to Indel identification and quantification.

A wide range of tools are available for Indel identification. They vary in their experimental and computational approaches and the maximum length of variants they can detect (2532). In general, two major strategies are used: de novo assembly followed by comparison to a reference (25, 30), or direct read-by-read comparison to a reference genome. Given enough coverage, the latter of those strategies could theoretically identify all Indels. The main limitation of these approaches comes from the error rate associated with library preparation and sequencing platforms (31). This is especially true for the most frequently observed Indels of length one, as their estimated sequencing error for Illumina platforms is 2.8E-6 for insertions and 5.1E-6 for deletions (33). To mitigate the challenge of false positives from sequencing artifacts, in silico techniques often report a confidence metric along the identified Indel, whose calculation mostly relies on sequencing depth. On the experimental side, NGS seeks to improve library preparation to lower the number of false positives due to sequencing. All these limitations have so far precluded us from fully understanding the dynamics of Indel generation and the role that Indels play during the evolution and adaptation of viral populations.

Here, we report a hybrid approach that combines accurate NGS and software improvements to allow reliable and highly accurate Indel identification. By combining the previous CirSeq pipeline with a purpose-built, split-read Indel mapping software, we can confidently identify Indels occurring at frequencies below the NGS error. With these data, we can characterize the temporal dynamics of Indels in viral populations with high resolution. We combined our approach with cell culture models of viral adaptation to examine the role of Indels in the in vitro evolution of two positive-sense RNA viruses, poliovirus (PV) and DENV. Our approach identified the Indel repertoire in passaged viral populations and then used them to calculate the distribution of mutational fitness effects (DMFEs) for Indels in viral genomes. The ability to estimate the near-complete distribution of Indels of PV and DENV carrying distinct RdRp variants or replicating in different host cells allowed us to probe very deeply into the population diversity and directly estimate the rates of Indel incorporation by viral RdRps and the dynamics of Indels in viral populations. We identify how RdRp fidelity and recombination rate variants affect Indel production and identify virus- and host-specific roles for Indels with selective advantages. Our analysis paves the way to define how Indel generation allows viral genomes to explore a mutational space not accessible by SNV and produce viral novelty at the protein and RNA levels.

Results

An Approach to Accurately Map Indels in Viral Populations.

Because most Indels occur at very low frequencies in viral populations, their identification is hindered by the error rates of the enzymes used in library generation and errors in base-calling of traditional next-generation sequencing (NGS). The CirSeq library preparation method (23, 24) dramatically reduces the errors associated with NGS through a key step in library preparation. Circularized template RNA fragments are rolling-circle reverse transcribed to generate a single read consisting of three tandem repeats derived from the same template molecule (34) (Fig. 1A). Mistakes introduced during library preparation are culled through cross-comparison of the three repeats, yielding a consensus read containing only mutations in the original population of RNA templates.

Fig. 1.

Fig. 1.

MultiMatch pipeline for high-resolution insertion-deletion (Indel) detection. (A) Overview of the circular resequencing approach. MultiMatch is specifically designed to map reads generated through Circular Resequencing (20) but can be used downstream of any RNAseq pipeline. (B) Indel mapping with MultiMatch. Indels in the sequenced RNAs are mapped by comparing the relative position of tandem, noncontiguous subalignments on the sequenced read. (C) Validation of MultiMatch with synthetic data. Table showing the proportion of reads, insertions, and deletions recovered from a synthetic set of CirSeq reads.

We reasoned that CirSeq reads of viral populations should also enable accurate Indel detection, since any true Indels should be present in all three repeats derived from a single template genomic fragment (Fig. 1A and Materials and Methods). Earlier iterations of CirSeq analysis pipelines disregarded any reads encoding Indels. We developed an alternative alignment strategy to detect true Indels present in all three repeats (Fig. 1A and Materials and Methods). Our approach, termed MultiMatch, accomplishes this by addressing two main issues (Fig. 1B). First, because the CirSeq-generated consensus reads do not preserve the 5′ and 3′ ends from their parental mRNA fragment, a read rearrangement is needed to orient the circular read on the linear reference genome. Second, an ideal Indel mapping approach should not be restricted to small Indels, like those detected by gap-permissive local alignments but also identify larger insertions and deletions. MultiMatch addresses both problems. First, an iterative split-read mapping approach enables assembly of the consensus read in its original 5′ to 3′ orientation by performing multiple subalignments. Secondly, start and end coordinates of the subalignments are then used to determine the presence of insertions or duplications in the consensus reads (Fig. 1B).

To assess the performance of the MultiMatch approach, we generated in silico CirSeq datasets where we introduced Indels of various lengths at fixed and random positions (Materials and Methods). Out of the total number of simulated Indels, 99.99% of insertions and 66.63% of deletions could be considered as candidates for mapping. This difference reflects a limitation of our minimum subalignment length mapping strategy, which hinders detection of deletions located within a minimum distance from either end of a read. However, given the random fragmentation of the RNA in the CirSeq procedure, variants falling under this category are likely also covered by other reads in which they meet the mapping criteria and can be detected. After applying our split-read strategy to the simulated dataset, we recovered 84.2% of the insertion candidates, and 93.4% of deletions, including those present in only a single read (Fig. 1C). These results indicate that MultiMatch can identify the vast majority of the Indel diversity present in a viral population. Combined with CirSeq libraries, this hybrid experimental analysis pipeline provides a powerful tool to examine the role of Indels in viral biology.

MultiMatch Yields a Complete Description of Indel Frequency, Distribution, and Fitness during Poliovirus Replication and Adaptation.

Having validated our approach with in silico data, we next used MultiMatch to identify the repertoire of Indels in poliovirus (PV) populations and their dynamic changes over serial passages in cell culture (Fig. 2A). We used CirSeq sequencing data for PV population following serial passage experiments in HeLa cells. MultiMatch revealed a near-complete view of Indel distribution across the PV genome for each passage (Fig. 2B).

Fig. 2.

Fig. 2.

Description of Indels on the poliovirus genome. (A) Experimental detection of Indels in PV. Populations of PV were sequenced using the CirSeq pipeline and Indels were identified using MultiMatch. (B) Distribution of Indels on PV genome. Manhattan plot of insertions (circles) and deletions (triangles) along the genome of PV (Type 1, Mahoney) passaged in HeLa cells. (C and D) Boxplots showing the empirical estimates of the insertion (C) and deletion (D) rates (per site) from each of seven sequenced PV populations passaged in HeLa cells. (E) Size distribution of Indels across the seven sequenced passages. (F) Stacked barplot depicting the number of variants unique to a given passage n, and the number of variants shared with passage n-1. (G) Distribution of mutational fitness effects (DMFE) for Indels in the PV1 genome. The distribution of fitness effects for nonlethal Indels is shown in the Inset. (Yellow, beneficial; gray, neutral; maroon, detrimental; black lethal). (H) DMFEs for all SNVs in the PV genome. The distribution of nonlethal SNVs are shown in the Inset figure. (I) Manhattan plot of the Indel mutations in the 5′ UTR of PV. A single 5-bp deletion, 564 to 568, is found near 0.1 to 1% in all populations sequenced. (J) Scatter plot of the opening and closing positions of the Indels in the PV1 IRES. The majority of Indels are small, falling along the diagonal. Neutral mutations accumulate to high frequencies at the junctions between IRES stem loops (gray points). (K) Representation of Indel frequency as a function of pairing probability as determine by 2′ hydroxyl acylation analyzed by primer extension (SHAPE) assay. Line indicates the median accessibility in each bin (adj. R^2 = 0.7, P.val. 0.0013); the shaded region represents the interquartile range. (L) A five-base deletion at positions 564 to 568 lies within a previously identified loop in the SL-V structure, directly upstream of the uORF start codon at position 588.

Our analysis identified and quantified both small and large genomic deletions, even those present at very low frequencies, thus enabling the reliable estimation of Indel rates, not feasible through conventional NGS approaches. In contrast, the high accuracy of CirSeq combined with MultiMatch permitted estimating Indel rates per nucleotide position. We find Indels occur at roughly uniform and very low frequencies across the genome. In PV, deletions occur at a rate of ~1.5 × 10−5 (Fig. 2C), whereas the rate for insertions, specifically duplications, was about two orders of magnitude lower (~1.4 × 10−7, Fig. 2D). This agrees with reports that insertions occur less frequently than deletions (3537). It is important to note that these estimates reflect the basal RdRp rates as their computation excludes any overrepresented variants that might reflect positive selection.

We next examined the length of genomic Indels. Viral populations from across all seven passages displayed a wide range of Indel lengths, with their distributions being similar across passages (Fig. 2E). While most Indels are less than 10 nt long (Fig. 2E), a small fraction underwent deletion of most of the viral genome, up to ~6 kb. Strikingly, we find that on average, only ~20% of the variants in a given passage are shared with the repertoire identified in the previous passage (Fig. 2F). This could reflect the rapid loss of deleterious Indels between passages as well as the limited ability of these viral populations to access the large spectrum of Indel diversity available in a single round of replication.

Having obtained frequency estimates for Indels across passages, we next calculated the distribution of mutational fitness effects (DMFE) of Indels detected in the adapting PV population using an approach modified from (38) (Materials and Methods) (Fig. 2 G and H). In these populations, the DMFE of SNV is centered around neutrality (Fig. 2H), and neutral SNVs linger at moderate frequencies or accumulate in the population as hitchhikers on better-adapted genotypes. In striking contrast, the vast majority of the observed Indel variants were lethal (Fig. 2G). This analysis indicates that Indels are predominantly deleterious and impose a significant fitness burden on viral populations as compared to SNVs. While a novel assortment of Indel variants emerges at low frequencies in each passage and is rapidly removed from the population, we find a small number of positively selected Indels (i.e., two insertions, three deletions), suggesting they may have an adaptive phenotype (Fig. 2G).

Indel Formation Is Constrained by Viral RNA Structure and Function.

We next carried out a detailed analysis of how viral RNA structure and function shape the location and properties of Indels across the PV genome. Mapping their start and end positions showed Indels are distributed across the entire genome, affecting both coding sequences (CDS) and untranslated regions (UTRs) (SI Appendix, Fig. S1A). However, Indel mutations are observed at lower frequency in the viral 5′ IRES region which is a key structural element for viral replication and translation.

The overall structure of the PV RNA genome was previously examined using base pairing probabilities estimated by 2′ hydroxyl acylation analyzed by primer extension reactivity (SHAPE) (39). To assess the potential role of viral RNA structures in Indel formation, we examined the observed Indel frequencies at specific sites across the genome in relation to their SHAPE base pairing probabilities (39). We observed an inverse correlation between Indels and base pairing probability, with Indels preferentially observed in unstructured regions of the viral RNA (Fig. 2K). The finding that viral RNA structure constrains Indel generation could relate to several factors that are not mutually exclusive. For instance, Indel generation could be mechanistically disfavored in structured RNA regions. Alternatively, stable RNA structures may promote the generation of deletions in flanking unstructured regions. Additionally, since structured regions in the viral RNA tend to be functionally important, they may be highly sensitive to disruption, leading to Indel lethality.

Analysis of Indels in the highly structured PV IRES region illustrates the interplay between RNA structure and Indel frequency. We observe a higher frequency of Indels at junctions between canonical IRES stem-loop structures (Fig. 2 I and J), including a consistently high-frequency five-nt deletion at 564 to 568-nt between stem-loops V and VI (Fig. 2 I and J). Of note, this 564 to 568 deletion was also found at elevated frequencies in a sequenced population of PV1 (40). Interestingly, this deletion lies upstream of the translation initiation site for a recently characterized noncanonical upstream ORF in PV, which initiates from position 588 within SL-VI of the PV IRES (Fig. 2L). It is tempting to speculate that this deletion modulates the relative translation initiation rates for both the canonical and the noncanonical upstream ORFs of PV. Future analyses of this variant could uncover the potential of this Indel to regulate PV translation and replication.

Indel Length and Periodicity Are Constrained by Translation Frame Maintenance.

We next examined whether Indel length is subject to any constraints. We obtained empirical per-length deletion rate estimates and modeled our data according to a power law (41) used for estimating gap probabilities in multiple sequence alignments (Fig. 3A). The estimated curves showed that Indel probability is inversely correlated with its length in agreement with previous studies (4144). Nonetheless, while short Indels occur with higher probability, we also observed very long genomic deletions (Fig. 2E). Interestingly, our observed measurements displayed periodic increments in the deletion rates (Fig. 3 A, Inset) that were absent in the curve derived from the modeled data. We hypothesized the triplet periodicity of the genetic code imposes a constraint for Indels in coding sequences. To test this, we calculated separately the Indel rates for viral regions corresponding to CDS and UTR sequences. A Fourier transform analysis of those estimates revealed a strong tri-nucleotide periodicity in the CDS estimates, which was not present in the UTR segments (Fig. 3B). This analysis indicates that deletions prone to disrupt the translation reading frame evolve under stronger purifying selection.

Fig. 3.

Fig. 3.

Mechanistic insights into poliovirus Indel diversity. (A) Comparison of empirical and theoretical deletion rates per length. Comparison of empirical deletion rates as a function of size to the theoretical probability mass function defined by a power law. Empirical rates exhibit periodic behavior (Inset). (B) Fourier transform analysis of empirical deletion rates from panel A. Line plot of power spectral density (PSD) from Fourier transformation of empirical distribution of deletion rates over size in coding (Top) and noncoding (Bottom) regions of the PV genome, highlighting the peak in periodicity at 3 bp (gray). (C) Poliovirus RNA-dependent RNA Polymerase structure highlighting position of fidelity mutants with altered fidelity and recombination rates. Inset Table: single and double mutants used. (D) Scatterplot comparison of mutation and Indel rates in poliovirus fidelity mutants from (C). Rate estimates are shown as mean and SE (N = 7 passages).

Role of RNA Polymerase Fidelity and Processivity in Indel Generation.

Mechanistically, Indels are generated by errors in RdRp-mediated replication of viral genomes, including transcriptional slippage (45), and aberrant homologous and nonhomologous recombination (46). The role of RdRp fidelity, both in terms of nucleotide substitution rate and recombination rate, in the generation of SNV in viral populations has been extensively examined. In contrast, much less is known about how viral RNA replication leads to Indel generation. We next examined Indel frequencies in virus bearing a set of PV RdRp variants with increased and decreased replication fidelity with respect to nucleotide substitution and/or recombination rates (4752). These analyses provide insight into how these RdRp functions contribute to the mechanism of Indel generation (4752). CirSeq analyses of PV populations carrying this set of mutants were used to examine the mechanism of Indel generation and its relationship with nucleotide substitution rates. In principle, the kinetics of misincorporation to generate SNVs or Indels may be similar (53). In this case, the changes in Indel rates observed for these RdRp variants may be proportional to the changes in fidelity and/or recombination.

We applied the MultiMatch analysis pipeline to CirSeq datasets generated using previously characterized panel of PV RdRp mutants with altered misincorporation fidelity and recombination rates (4752) (Fig. 3 C, Inset). To test the role of misincorporation fidelity, we examined the high-fidelity polymerase variant G64S (47) which exhibits an approximately three-fold lower misincorporation rate with respect to WT PV and the low-fidelity polymerase variant, H273R, which exhibits a three-fold higher misincorporation rate (51). Surprisingly, the Indel rates for these fidelity mutants exhibited the opposite trend observed for single-nucleotide errors. Indel rates in PV-carrying the G64S RdRp mutant were higher than WT, while the Indel rate for RdRp H273R was lower relative to WT (Fig. 3D). We next examined the role of recombination in Indel generation. PV carrying recombination-defective D79H RdRp (52) exhibit no change in single-nucleotide incorporation fidelity, but led to a strong increase in Indel rate to almost double to that for WT virus, with only a slight change in the mutation rate (Fig. 3D). This result indicates that defective recombination is a major source of Indel generation.

We next used double mutants in the PV RdRp to examine the relative contribution of recombination rate and polymerase error rate to Indel generation. PV RdRps with either the high-fidelity G64S or the low-fidelity H273R were combined with the recombination defective D79H (Fig. 3C). We hypothesize that in these mutants, Indels generated by the fidelity variants will not be purged from the population due to the reduced recombination rates. Notably, the Indel rates of the double G64S/D79H and H273R/D79H mutants mirrored those observed for the individual fidelity mutants, G64S and H273R (Fig. 3D). These experiments indicate that the Indel generation is inversely correlated to misincorporation fidelity. Mechanistically, fidelity is enhanced by slowing RdRp elongation and enhancing proofreading. Our data suggest this also increases the chances of template switching to produce Indels. Additionally, our analyses indicate that recombination functions primarily to purge Indels rather than in their primary generation. Taken together, these analyses illuminate the mechanistic constraints of Indel generation and indicate viral replication evolved their RdRps to adjust the tradeoff between fidelity and Indel generation.

Distinct Indels Landscapes during DENV Adaptation to Human and Mosquito Cells.

We next examined how the pronounced phenotypic effects and phenotypic redundancy of Indels are modulated during DENV adaptation to its distinct human and mosquito cell environments. Previous experiments indicated the 3′ UTRs of arthropod-borne flaviviruses, such as DENV, acquire deletions in their 3′ UTR sequences to alter their interaction with host intracellular innate immune pathways (6, 54, 55). However, the diversity and dynamics of Indels in flavivirus adaptation to insect and mammalian hosts have not been comprehensively examined.

We applied MultiMatch to CirSeq sequencing data for DENV serotype two populations adapting to human (Huh7, human hepatoma–derived) and mosquito (c6/36, Aedes albopictus–derived) cells (Fig. 4A) (38). Our analysis of Indel-containing reads from these populations revealed Indels throughout the DENV genome (Fig. 4B). While we observed the expected high frequencies of Indels in the 3′ UTR of mosquito passaged populations, we also observed high-frequency Indels across other regions of the viral genome in both human and mosquito populations (Fig. 4B) (8, 56, 57). Thus, these analyses uncover a surprising degree of Indel diversity in DENV adaptation to their host.

Fig. 4.

Fig. 4.

Host-specific dynamics of Indels in dengue virus. (A) Experimental detection of Indel mutation dynamics in dengue virus populations passaged in human (Huh7) and mosquito cells (C6/36). (B) Manhattan plot showing Indel variant diversity across the dengue genome, with their associated relative fitness values and the frequencies at passage seven, WT DENV-S2. (C and D) Boxplot comparison of estimated insertion (C) and deletion (D) rates per passage in dengue populations derived from human and mosquito cells. (E) Distribution of mutational fitness effects of Indels in one lineage of DENV (replicate A). DMFE of Indels shows stark bias toward lethality, with only a few hundred mutations estimated to have neutral character (Inset). (F) DMFE of SNVs in the same population shows a more balanced DMFE, with most mutations being viable. (G) Barplots showing the density of beneficial Indels in the UTR and CDS regions in populations derived from human or mosquito cells. (H) Barplots showing the density of lethal Indels in the UTR and CDS regions in populations derived from human or mosquito cells. (I) Kernel density plot illustrating the size distribution of Indels in dengue virus two populations derived from human and mosquito cells, passage one.

The depth afforded by CirSeq error correction allowed us to directly estimate the Indel rates for the DENV populations (Fig. 4 C and D). In human cells, DENV deletions occur at a rate of ~1.6 × 10−5 (Fig. 4D), whereas the rate for insertions was about two orders of magnitude lower (~1.2 × 10−7, Fig. 4C). These rates are similar to those observed for PV (Fig. 2 C and D). Comparison of the rates between populations grown in human and mosquito cells revealed host-specific differences in Indel rates in these populations, indicating the host cellular environment affects Indel rates for the same virus.

We next used the frequency dynamics of the individual variants and our estimates of Indel rates to calculate the mutational fitness effects of individual Indel variants (Fig. 4 B and E). Mapping over the viral genome revealed numerous mosquito-specific beneficial Indel variants in the 3′ UTR (Fig. 4B) as well as two high-frequency insertions in the envelope (Env) and NS3 proteins, which were maintained in the populations across passage in both hosts and appeared nearly neutral in their fitness effects (W): Env 1041A duplication: Wmosquito = 1.06, C.I. [0.97, 1.14], Whuman = 0.94 C.I. [0.87, 1.01]; NS3 5916A duplication: Wmosquito = 1.09 C.I. [1.006, 1.18], Whuman = 1.005 C.I. [0.94, 1.06].

The estimates of fitness effects were used to populate DMFEs for both Indel mutations (Fig. 4E) and for SNVs (Fig. 4F). As observed for PV, Indels are overwhelmingly lethal and present a significant mutational load on the population compared to SNV. This is especially striking in comparison to the equivalent DMFE for SNVs in the DENV genome determined from the same viral populations (Fig. 4F). Interestingly, a small minority of Indels are nonlethal and show a similar distribution of fitness around neutrality (Fig. 4 E, Inset), suggesting they are well-tolerated in the genome and may confer only few or no deleterious or beneficial effects. In general, the DMFE for DENV is very similar to what we observed for PV, suggesting that this distribution might be a shared trend among RNA viruses.

We next compared the distribution of beneficial (Fig. 4G) and detrimental (Fig. 4E) Indels across the DENV genome in either host. Lethal Indels were distributed in similar proportions across the coding and noncoding regions of both populations (Fig. 4H). In contrast, beneficial Indels showed a host-specific distribution across the DENV genome (Fig. 4G). Notably, beneficial Indels were enriched in the CDS and the 3′ UTR for DENV in mosquito cells and in the CDS and 5’UTR of DENV in human cells (Fig. 4G). In contrast, beneficial mutations were depleted from the 3′ UTR during adaptation in human cells and from the 5′ UTR during adaptation in mosquito cells. These findings reveal distinct evolutionary constraints for the same viral RNA elements in the two host environments which shape Indel selection across the genome. Interestingly, lethal mutations, which were present at a much higher frequency, were equally enriched in the 5′ UTR, 3′UTR, and CDS in both human and mosquito cells, likely reflecting intrinsic constraints on core viral function and interactions.

We next examined the distribution of length of Indels in DENV populations. We observed a wide range of Indel lengths in human and mosquito cells, with the vast majority of Indels only one nucleotide in length for both hosts (Fig. 4I and SI Appendix, Fig. S2 A and C). Interestingly, mosquito-derived DENV populations included a class of Indels of intermediate size (~100 nt) that rose in frequency during early passaging in mosquito cells and decreased in frequency over successive passages (SI Appendix, Fig. S2D). We hypothesize these Indels represent adaptive solutions that are displaced as smaller Indels of increased fitness emerge at later passages. We also detected large deletions in both human- and mosquito-adapted DENV, encompassing nearly the entire viral genome, close to 10 kb in length (Fig. 4I and SI Appendix, Fig. S2 AD). As discussed below, these likely correspond to defective viral genomes (DVGs) generated during the viral replication. Of note, such DVGs have been observed previously in clinical samples of DENV populations (58).

Indel-Mediated Pathways of DENV Genome and Proteome Innovation.

We next focused on two specific instances that illustrate how beneficial Indels contribute to DENV2 adaptation at the RNA and protein levels. First, we examined how Indel dynamics affect structures in the 3′ UTR of the DENV genome, which undergo extensive reorganization in response to immune selection in human and mosquito culture systems (8, 57, 59, 60) (Fig. 5 A and B and SI Appendix, Fig. S3A). Analysis of the start and end positions of the deleted or duplicated sequences across passages revealed a hotspot of Indel variation surrounding the SL-II structure (Fig. 5B), with more beneficial Indels in mosquito populations (Fig. 3B). Many beneficial Indel variants affected SLII, indicating multiple molecular strategies to phenotypic adaptation. A subset of small Indels disrupted SLII, most specifically nt 10,427 (i.e., UTR nt. 155), destabilizing the pseudoknot structure in SLII (Fig. 5C). Another subset consisted of larger deletions of approximately 100 nt in length (Fig. 5C), which deleted the entire SLI and a portion of SLII (Fig. 5B, and SI Appendix, Fig. S3A). Of note, both classes lead to the disruption of the 3′ SLII structure and removal of the two Xrn-1 halting sites (Fig. 5C) (8, 57). Albeit phenotypically equivalent, these variants confer different fitness advantages to the replicating virus, the single-nucleotide deletion appears to exhibit a fitness advantage over the longer deletions (Fig. 5 B and D). Thus, the long SLII deletions correspond to the ca. 100 nt deletions that appear early during DENV adaptation to mosquito cells (SI Appendix, Fig. S2D), which decrease in frequency at later passages as they are replaced by the phenotypically redundant single-nucleotide deletion in SLII in later passages (Fig. 5D). This observation exemplifies how Indel plasticity can provide rapid solutions and multiple routes of access to a beneficial phenotype.

Fig. 5.

Fig. 5.

Mechanistic analysis of dengue Indels. (A) Diagram of the DENV2 3′ untranslated region (UTR). The canonical UTR contains five stem-loop structures: SLI, SLII, Dumb-bell (DB) I, and DBII, and the 3′ SL. Xrn-1 stopping sites, which yield sfRNA fragments, are located upstream of SLI, SLII, DBI, and DBII. (B) Opening and closing positions of Indels in the dengue virus 2 3′ UTR. Points are colored by their fitness classification. SLII is highlighted in blue. (C) Structural models of the dominant mutations in the 3′ UTR. Disruption of the SLII domain was achieved by mutation or deletion of U155 either through large or small deletions altering the 3′ UTR SLII. (D) Line plot depicting the frequency trajectories of the U155 and SLII deletions across the nine sequenced passages of human- and mosquito-derived populations. n.o. = not observed. (E) Illustration of the insertion site in the dengue virus two NS3 protein, where insertion of an extra adenine leads to a stop codon downstream. (F) ORF and structural model of truncated NS3. Truncation at position 468 occurs 10 amino acids downstream of the internal cleavage site in NS3 (pos. 458).

We next examined how beneficial Indels can shape the viral proteome. We noticed a high-frequency (~1%) duplication in the protein-coding region of viral protease-helicase NS3 (Fig. 4B and SI Appendix, Fig. S3B), that persisted through all passages in all examined DENV populations. Closer examination of this variant revealed it corresponds to a single adenine duplication in a stretch of six adenines, a canonical slippage motif (53), located in the NS3 region. The insertion should change the translation reading frame and produce a stop codon at amino acid position 468 (Fig. 5E). Strikingly, this stop codon terminates translation only 10 residues downstream from the autoproteolytic NS3 cleavage site at 458 (61), near the boundary between the ATPase domain and the C-terminal domain of the NS3 helicase (Fig. 5F). Therefore, translation of DENV genomes containing this insertion will produce an NS3 variant that is almost identical to the product of autoproteolytic cleavage of NS3 in the WT polyprotein. To determine whether this translation product is indeed generated during DENV infection, we analyzed published ribosome profiling data examining viral translation of DENV (M29095.1) cultured in Huh7 cells (62). Since the 5916A insertion is produced at relatively high frequency, our analysis would predict that approximately 1% if viral mRNA may contain this variant and generate ribosome-nascent chains encoding the frameshifted sequence. Indeed, we detect ribosomes engaging viral RNA containing the 5916A duplication. Furthermore, approximately 1% of the ribosomal footprints contained the 5916A duplication, consistent with the frequencies observed in the CirSeq data (SI Appendix, Fig. S3 C and D). Thus, this positively selected Indel-driven frameshift provides a mechanism to generate a NS3 protease variant that does not require autoproteolytic processing. Additionally, premature translation termination within the polyprotein may modulate the stoichiometry of viral proteins during infection.

Generation and Dynamics of Defective Viral Genomes in Viral Populations In Vitro and In Vivo.

In all the analyzed PV and DENV populations, we detected very long deletions encompassing more than 90% of the viral genomes (Figs. 2E and 6A and SI Appendix, Fig. S2 BD). Of note, previous studies reported similarly large deletions in the serum of patients with severe DENV infections (58). Notably, large genome deletions have been proposed to function as interfering genomes regulating pathogenesis during RNA virus replication (63, 64), but their generation, prevalence, and dynamics during infection are poorly understood. DVGs are proposed to attenuate replication of their corresponding parental virus in a process known as viral interference (18, 65, 66). These findings led to speculation that DVGs modulate the course of disease, perhaps by outcompeting the full-length virus for cellular resources (19, 66). Thus, the mechanism and therapeutic potential of viral interference are of great interest.

Fig. 6.

Fig. 6.

Detecting defective viral genomes in vitro and in vivo. (A) Deletions in mosquito- and human-passaged dengue viruses contained genomes with very large deletions (near 10 kb) covering nearly the entire coding region of the virus. These likely constitute defective viral genomes (Right) that package and propagate in viral passage. (B) Scatter plot showing the opening and closing positions of the large deletions in the mosquito-adapted dengue population, colored according to the estimated relative fitness for each variant. (C) Boxplots showing the frequency of DIPs in each passaged lineage of DENV-adapted populations. Each point corresponds to the total DIP load per passage. Differences were tested with the Wilcoxon test. (* = P val < 0.05; n.s. not significant.) (D) Boxplots showing the frequency of DIPs in poliovirus populations. Each point corresponds to the total DIP load per passage. Differences were tested with the Wilcoxon test. (n.s. not significant). (E) Distribution of DIP start (empty bars) and end (filled bars) positions in CirSeq libraries of poliovirus populations derived from cell cultures (HeLa cells) or traditional RNAseq of viral populations from distinct tissues in infected animals (PVR-expressing mouse model). n.s., not significant.

We hypothesized the large deletions observed in our approach represent these DVGs, which have been observed in a wide variety of viruses (63, 64). The diversity of defective genomes in viral populations has been examined using methods ranging from sequencing and transmission electron microscopy (67) to theoretical modeling using Monte Carlo simulations (68) and conventional NGS (69, 70). Because our approach allows us to examine the diversity of deletions in greater resolution than ever before, we sought to examine the dynamics of these large deletions in more detail. First, we examined the very large deletions in the DENV genome. PCR-based studies reported two variants with similar deletion size in the serum of severe DENV patients (58). However, the use of specific PCR probes limits the characterization of the diversity of these defective genomes. In our analyses, we found hundreds of DENV defective genome variants with deletions longer than 10 kb. Interestingly, the start and end points of the deletions fell into two hotspot regions at the 5′ end of the genome within the first nucleotides encoding the capsid protein. They all ended in the 3′ distal end of the 3′UTR (Fig. 6B and SI Appendix, Fig. S4A). Importantly, since the CirSeq sequencing libraries were prepared using RNA obtained from purified dengue virions during low MOI infection (MOI = 0.1), all these genomes with large deletions were packaged into the virion (38). This indicates these large defective genomes encode the requisite packaging signals, which are yet to be identified for DENV. Despite having clear boundary preferences to produce these deletions, the defective genomes seem to be under strong negative selection, as nearly all of them fall into the lethal fitness category (Fig. 6B). Surprisingly, the total load of defective genomes per passage is maintained at a constant level in the viral population, regardless of the host cell or the passage number. This finding suggests that the rate at which these genomes emerge in each replication cycle is determined primarily by their production due to errors during replication and subsequent encapsidation into virions (Fig. 6C) followed by loss in each passage. It will be interesting to assess the potential of these DVGs to modulate DENV infection.

As observed for DENV, deletions spanning most of the genome were detected in PV populations (Fig. 2E). There were remarkable similarities between the characteristics of large DVGs in PV and DENV. First, the very large PV deletions are also lethal. Additionally, as observed for DENV, PV populations maintain a constant load of defective genomes with very large deletions, irrespective of their polymerase genotype (Fig. 6D).

Given that deletions in DENV DVGs exhibit specific start and end boundary preferences, we next examined the sequence boundaries of these large PV-derived defective genomes. Notably, the 5′ and 3′ ends of the PV DVG also map to specific hotspot regions in the genome. Thus, all long deletions from WT PV start downstream of the 5′ cloverleaf RNA replication element and variable regions of the 5′ UTR, and they all end near the 3′ end of the CDS (Fig. 6E, red).

One important question is whether the large defective PV genomes observed in viral populations cultured in vitro are also found in virus populations from infected animals. We thus applied the MultiMatch pipeline to analyze three RNA-Seq datasets of PV populations isolated from the spleen, liver, and kidney of mice infected with PV (71). In all three tissues, we identified long deletions similar to those described in CirSeq datasets from PV in cultured HeLa cells. Strikingly, the hotspots for the 5′ start and 3′ end boundaries for the deletions were largely overlapping in the different in vivo and in vitro virus populations (Fig. 6E). This indicates that common constraints drive the formation of these defective genomes in vivo and in cell culture. Since similar boundaries are observed for large DENV deletions in cell culture and in patients and for PV in cell culture and in infected animals, we conclude that these large-deletion DVGs are not only generated by virus replicating in vitro, but rather are an integral component of viral populations in natural infections that can be captured using the CirSeq - MultiMatch pipeline.

Discussion

Assessing the Dynamics of Indels in Viral Populations in Coding and Noncoding Regions.

Virus adaptation to new host environments and challenges by exploring numerous genotypes is powered by their error-prone viral polymerases. Insertions and deletions are an important but poorly understood source of variation. Here, we implement a powerful experimental-computational approach to examine mechanistic and functional aspects of Indel generation and dynamics. We show that Indels are produced about one order of magnitude less frequently than nucleotide substitutions. However, the repertoire of Indels represents a substantially larger contributor to variation in viral populations (SI Appendix, Tables S3 and S4). Accordingly, Indels have a much greater effect than SNVs on the viability of viral offspring. While SNVs are less likely to drastically alter protein structure or behavior or disrupt an RNA structure (Figs. 2G and 4E), Indels are more likely to dramatically change viral function, as they can easily disrupt the reading frame of the CDS as well as destroy or change RNA structures (Figs. 2G and 4E). Thus, while Indels are more likely than SNV to be lethal they are also a great source of innovation.

Our comprehensive analyses of Indel dynamics in two RNA viruses uncovered shared features. The vastness and complexity of such a genotype space have precluded the estimation of a fitness landscape for indels. The precision and depth of our sequencing approach have allowed us to overcome this challenge, thus permitting the direct comparison of the role of indels in viral populations adapting to different environments. We find that Indels accumulating in viral populations tend to be of variable lengths. However, although smaller Indels are more frequent, very large deletions are observed at moderate frequency, likely corresponding to DVGs that are lost and replenished over passages (Fig. 2E and SI Appendix, Fig. S2). Indels that accumulate in viral populations reflect the constraints of viral RNA structure and protein coding. Of note, we observe Indels are largely depleted in structured regions of the viral genome. This constraint could reflect the pressure to maintain specific viral RNA structures or could reflect a role of RNA structures guiding Indel generation at their flanking regions. When occurring in CDS regions of the genome, we find that Indels exhibit triplet periodicity consistent with the pressure to maintain the correct reading frame (Fig. 3 A and B).

Another important aspect of Indels uncovered by our approach is their functional redundancy. We find that many Indels altering specific viral functions cocirculate in viral populations, as exemplified by the dynamics of 3′UTR Indels in DENV (Fig. 5B). Accordingly, Indels provide an efficient and robust mechanism to access a vast, redundant set of evolutionary solutions that can quickly alter the phenotype of the viral population. We observe this redundant set of variants experience strong purifying selection as the virus continues to adapt to the cell line, finally favoring a smaller subset of Indels (Fig. 3 B and D). Viruses use Indels to explore a vast array of genotypes without compromising the viability of the population, accessing mechanisms of adaptation that are otherwise inaccessible through single-nucleotide changes. For instance, we observed a high frequency insertion in 5916A of DENV leading to a translation frameshift that produces a stop codon in NS3. This insertion appears to mimic a proteolytic processing event for NS3, potentially altering the stoichiometry of NS3 domains in infected cells. This example illustrates how Indels can offer access to mechanisms of protein regulation beyond gain or loss of function. Similarly, we show how Indels in the viral RNA can affect viral-host interactions. For instance, we observe multiple dynamic deletions in the 3′UTR of DENV that appear beneficial for replication in mosquito cells as well as a deletion predicted to affect initiation at a uORF in PV that may alter translation initiation efficiency of the PV polyprotein. Many of the indels described above were found at high frequencies (Figs. 4B and 5D), which increases the probability of combining in haplotypes with other indels or SNVs. Due to the read-size used in CirSeq, our approach is limited to the identification of single indels, and therefore further work with long-read NGS data is needed to understand indel behavior in the context of the whole viral genome. Our work highlights the potential for Indels to produce robust functional innovation in evolving viral populations and the importance of understanding the population dynamics of this often overlooked dimension of viral adaptation.

Mechanistic Contribution of RNA Polymerases to Indel Generation.

Viral RdRp enzymes are evolutionarily tuned to promote diversity while constrained within an error threshold defined not just by the fitness effects of SNVs but also those of Indels and other structural variants. Our data indicate that the evolutionary history of RdRps had to balance SNV and Indel rates, with the latter having such drastic fitness effects. Surprisingly, we find that RdRp mutations altering one trait have opposing effects on the other, indicating these two traits are linked by mutation in antagonistic pleiotropy. Thus, high-fidelity RdRp polymerase variants with low nucleotide misincorporation rates have higher Indel rates, while higher error rates are associated with lower Indel rates. At a mechanistic level, it is notable that this high-fidelity RdRp variant has been shown to have a substantially slower elongation rate (72). This feature could potentially increase the frequency of polymerase stalling and dissociation, which has been shown to be involved in slippage-mediated indel generation (73, 74). In fact, we identified two high frequency insertions in the CDS of DENV, located in a similar sequence context to what other positive-strand RNA viruses use to enable the transcription of overlapping protein sequences through transcriptional slippage (75). Our analysis also defines the contribution of recombination rates to Indel generation as it shows that viral populations with recombination-deficient RdRps accumulate more Indels. Importantly, analysis of double mutations in RdRps with both altered fidelity and reduced recombination indicates that recombination itself is not a major driver of Indel generation but the mutation rate is. It thus appears recombination provides a mechanism to purge the population of lethal Indels rather than play a role in their generation. Our data open a path to understand the mechanistic underpinnings of Indel generation. Poliovirus and other RNA viruses replicate their genomes on intracellular membranes-bound replication compartments (76). Electron microscopy and genetic analyses indicate that RdRps form oligomeric arrays where replication occurs (7779). The generation of genomic lesions, such as large deletions or frameshifts in the main ORF, likely to originate through trans-acting RdRps that switch templates within the replication complex. Mutations that disrupt intermolecular contacts between RdRps disrupt polymerase function, supporting the idea that the RdRp arrays facilitate indel generation through template switching. The fact that polymerases with lower missense mutation rates, such as G64S, have slower incorporation kinetics and higher indel rates further supports that switching templates is enhanced by slower elongation or transient pausing. Interestingly, the D79H mutation maps within one of the interfaces between RdRps, suggesting that this mutation may affect the frequency of recombination, and hence indel production rates, by disrupting polymerase oligomerization (80).

Our analysis of human- and mosquito-derived populations of DENV indicates that both genetic and environmental factors influence the apparent trade-offs between SNV mutation and Indel rates. Thus, we observed significant host-specific differences in insertion and deletion rates of the RdRps, as well as in SNV mutation rates, with human-derived populations exhibiting decreased SNV mutation rates and increased Indel rates, relative to mosquito-derived DENV populations. This trade-off in rates is consistent with the relationship we observed with the fidelity mutants in the PV RdRp (Fig. 3C), supporting an antagonistic pleiotropy relationship between Indels and mutation rates. Because the insect cells are grown at lower temperatures, this effect could be related to enzyme kinetics, where slower extension leads to more frequent slippage on the template molecule and a higher Indel rate. Alternatively, these differences may be due to yet-to-be-defined differential modulation of replication rates by host factors.

Defective Viral Genomes in Infection Outcome and Therapeutics.

Since their discovery, defective genomes have been characterized by their ability to interfere with the replication of the WT virus (81, 82) and to influence the outcome of viral infections, including mediating viral interference, promoting persistent infections and modulating antiviral immunity (66). It has been recently demonstrated that DVGs can be harnessed to engineer therapeutic interfering particles (TIPs), designed to interfere or attenuate the infection with a WT virus as a pre- or postexposure prophylactic (8387). Indeed, the therapeutic and prophylactic potential of TIPs has recently been demonstrated in preclinical studies as protecting against a wide range of viruses, including SARS-Cov2, Flu, and a range of enterovirus infections (87).

DVGs are proposed to attenuate viral infection through several mechanisms, including saturation of the viral replicative machinery, sequestration of essential cellular cofactors, and induction of innate immune responses (88). Additionally, short defective vRNA are thought to replicate more quickly and perhaps compete with the WT virus. Those short DVG containing viral packaging signals would also complete with the WT for structural proteins to be packaged into virions. Indeed, we observe very large genomic deletions occurring in both PV and DENV populations. These deletions essentially comprise the 5′ and 3′ UTRs of the viral genomes and lack most of the coding regions. These genomes, in the case of DENV, were identified from sequencing of virions and therefore they must contain the requisite packaging signals. By applying MultiMatch to RNA Sequencing data of PV virus recovered from tissues of infected animals, we observed that DVGs with large genomic deletions produced in cell culture or in infected animals carry similar 5′ and 3′ UTR boundaries. Our findings for PV and DENV demonstrate that in vitro generated Indel variants recapitulate the in vivo–derived repertoires of Indels.

Our approach furthers the understanding of the mechanistic, structural and functional landscape of Indels in RNA virus populations during infection. While developed for in vitro obtained viral populations, it can be adapted to in vivo infections, including the analysis of tissue-specific Indels. This knowledge may inform or facilitate the design of engineered TIPs for pre- or postexposure prophylactic use, for instance, by identifying defective genomes with higher fitness and incorporating these constraints into TIP design. Further improvements to MultiMatch could be implemented to identify copy- and snap-back defective genomes to provide insights into this interesting type of defective genomes. As we demonstrate here, our approach opens the possibility for revealing functional principles defining the central role of Indels in virus evolution, emergence, and the regulation of viral infection.

Materials and Methods

Sequencing Data Sources.

No original data were generated as part of this study. Dengue virus (DENV) CirSeq datasets (human and mosquito) correspond to data published in ref. 37, and were downloaded from the NCBI repository using the accession code PRJNA669406. RNA-Seq datasets of poliovirus replicating in mice (all analyzed tissues) and CirSeq datasets of poliovirus [wild-type (WT) and mutants] were published in ref. 68, and were retrieved from the NCBI BioProject database using the accession code PRJNA383905. Ribosome profiling data of dengue virus replicating in Huh7 cells correspond to the data published in ref. 59, and were downloaded from Gene Expression Omnibus using the accession number GSE69602. See SI Appendix, Table S2 for a more detailed categorization of the used datasets.

MultiMatch Indel Mapping.

The MultiMatch Indel mapping identification is a split-read strategy. As a first step, given that the CirSeq consensus sequences can be noncontiguous 5′ to 3′ sequences, but rather alternating fragments of an original 5′ to 3′ fragment, the program performs a primary local alignment where a long seed is required, and gaps are avoided. This first long subalignment is stringent given that its goal is to identify a reliable 5′ to 3′ section of the original RNA fragment around which the read can be reconstructed. This will result in a read with a CIGAR string containing matches and clipped bases. The read is rearranged to remove and store the matching part of the alignment and also join the clipped fragments of each read into new, shorter reads that are, in turn, written to a new fastq file. For a more detailed description see the Supplemental Methods file. See https://github.com/marangel/MultiMatch/blob/master/README.md for a detailed description of output files and argument options. MultiMatch uses bowtie2 as the alignment engine (89, 90). Finally, it should be noted that, in its current version, MultiMatch cannot detect variant transcripts produced through copy-back and snap-back mechanisms.

Synthetic CirSeq Data Generation and MultiMatch Benchmarking.

To benchmark the MultiMatch package, we developed a program to generate synthetic CirSeq consensus reads. For more information on this benchmarking procedure, see the Supplemental Methods file.

Sequencing Data Processing.

Raw fastq sequencing files containing the CirSeq tandem repeat reads were employed to obtain consensus reads using ConsensusGeneration from CirSeq_v2 package (https://andino.ucsf.edu/CirSeq). After that, consensus read mapping against the corresponding viral genome and Indel identification were performed using MultiMatch package v1 (https://github.com/marangel/MultiMatch).

Empirical Indel Rate Estimations.

To estimate Indel rates, we filtered out those variants that appeared to be overrepresented in the dataset. To achieve this, we defined an Indel variant as an Indel of length x, starting at position i of the genome, mapped in a read of length y where the minimum read fragment length flanking the Indel position (Fig. 1) is called z. For each Indel variant (i.e., a set of {i,x,y,z} values), an initial Indel rate r was estimated as the Indel variant counts divided by the mean sequencing coverage of the Indel region. Then, for all Indel variants with common values for {x,y,z} (Indel class), the mean of their individual initial rates was taken as the initial rate r′ for the class {x,y,z}. This initial rate estimate, r′, does not consider that some read lengths are more frequent than others in CirSeq libraries. This could bias estimates in variants that fall in regions that result in RNA fragments of such favored lengths. To account for this, we weighted r′ by the conditional probability of finding an Indel in a read of length y given that the smallest of the flanking regions (subalignments, see Fig. 1) is of length z:

P(yi|z)=P(yi)P(z|yi)/k=50P(yk)P(z|yk), [1]

where P(yi) was obtained from the distribution of read lengths of each sequencing experiment; P(z|yi) was calculated from the lists of mapped Indels of each experiment as that is the total number of cases for which a flanking region z is observed, given that its parental read is of length yi (the index k starts at 50 as that is the smallest read length we considered). Then, for each Indel variant we applied a binomial testing using r′ *P(yi|z) as the probability of success. Variants not passing the test (0.05 threshold) were flagged as overrepresented outliers and removed. The procedure was repeated until no further variants were removed. The remaining Indels were binned according to their length, x, and the counts for each x were divided by their coverage to get an empirical Indel rate estimate. The coverage was estimated by using a sliding window of size x (i.e., size of Indel) across the viral genome, computing the coverage in that window, and finally averaging all the obtained values. The entire procedure was applied to each sequencing experiment individually (i.e., each viral passage). This allowed for the independent estimation of the same rates multiple times. The reported values in the figures are jackknife estimates of all passages of a certain class (e.g., DENV replicating in human cells).

Theoretical Deletion Rate Estimations.

The theoretical deletion rate estimates for poliovirus were estimated based on a power law as follows:

n=c1(L-c2). [2]

where n represents the number of deletions of length L; c1 and c2 were found by fitting the deletion counts and lengths via maximum likelihood methods.

Nucleotide Substitution Rate Estimation.

Nucleotide substitution rates were computed as described previously (23, 38).

SHAPE Data.

Shape data were requested from ref. 39.

Fitness Estimation of Indel Variants.

Fitness estimates for individual Indel variants were computed based on the method in ref. 37, with the exception of the regression across individual fitness estimates for each passage interval. Instead of obtaining a fitness estimate (w) for a variant by regressing over the cumulative sum of its partial relative fitness values, w was computed as the geometric mean of those relative fitness values along with resampling to generate a jack-knife CI to better account for variance in the fitness estimates across individual passages. For all estimates (i.e., PV and DENV data) the first six passages were used to estimate relative fitness values to avoid the influence of Hill-Robertson interference as individual variants reach fixation. All passages were performed at fixed MOIs (0.1, or 5E5-1E6 infectious units) to maintain a consistent effective population size. Variants with a w > 1 and lower bound of their CI > 1 were assigned as beneficial; Indel variants with w = 0 and upper bound of c.i. = 0 were cataloged as lethal; variants with w > 0 and upper bound of c.i. < 0 were assigned as detrimental; all other variants were regarded as neutral.

RNA Structure Modeling.

RNA structures were based on the structural models in ref. 57 and were generated using RNAcomposer (91).

Statistical Testing and Other Analysis.

All statistical significance testing was carried out in the R environment (http://www.r-project.org) with significance of 0.05 unless noted otherwise. Fourier transform calculations were performed using the CRAN package GeneCycles.

Supplementary Material

Appendix 01 (PDF)

Acknowledgments

Research reported in this publication was supported by NIH grants AI127447 (J.F.), AI36178, AI40085, and AI091575 (R.A.), 5K99AI139279 (P.T.D.), a DARPA Prophecy Award, fellowships from the Naito Foundation (S.T.), Uehara Memorial Foundation (S.T.) and Fundacion UNAM (M.A.R.).

Author contributions

M.A.R., P.T.D., R.A., and J.F. designed research; M.A.R., P.T.D., S.T., and Y.X. performed research; M.A.R. contributed new reagents/analytic tools; M.A.R., P.T.D., R.A., and J.F. analyzed data; S.T. and Y.X. contributed sequencing data, performed CirSeq experiment; and M.A.R., P.T.D., R.A., and J.F. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

Reviewers: S.F.E., Spanish National Research Council; and A.E.G., Leids Universitair Medisch Centrum.

Contributor Information

Patrick T. Dolan, Email: Patrick.Dolan@nih.gov.

Raul Andino, Email: Raul.Andino@ucsf.edu.

Judith Frydman, Email: jfrydman@stanford.edu.

Data, Materials, and Software Availability

MultiMatch is available for download at https://github.com/marangel/MultiMatch. Data associated with the analysis presented in this manuscript are available at the Digital Dryad Database “High-resolution mapping reveals the mechanism and contribution of genome insertions and deletions to RNA virus evolution”, https://doi.org/10.5061/dryad.qjq2bvqm6. Reviewer download link: https://datadryad.org/stash/share/VjXKitbO8BvkHNNq5dJ_oiOR8lc5l8z5AL-​Y85JMshk. Previously published data were used for this work (No original data were generated as part of this study. Dengue virus (DENV) CirSeq datasets (human and mosquito) correspond to data published in 37, and were downloaded from the NCBI repository using the accession code PRJNA669406. RNA-Seq datasets of poliovirus replicating in mice (all analyzed tissues) and CirSeq datasets of poliovirus [wildtype (WT) and mutants] were published in 77, and were retrieved from the NCBI BioProject database using the accession code PRJNA383905. Ribosome profiling data of dengue virus replicating in Huh7 cells corresponds to the data published in 59, and was downloaded from Gene Expression Omnibus using the accession number GSE69602).

Supporting Information

References

  • 1.Domingo E., Holland J. J., RNA virus mutations and fitness for survival. Annu. Rev. Microbiol. 51, 151–178 (1997). [DOI] [PubMed] [Google Scholar]
  • 2.Sanjuán R., Nebot M. R., Chirico N., Mansky L. M., Belshaw R., Viral mutation rates. J. Virol. 84, 9733–9748 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wellenreuther M., Mérot C., Berdan E., Bernatchez L., Going beyond SNPs: The role of structural genomic variants in adaptive evolution and species diversification. Mol. Ecol. 28, 1203–1209 (2019). [DOI] [PubMed] [Google Scholar]
  • 4.Tromas N., Elena S. F., The rate and spectrum of spontaneous mutations in a plant RNA virus. Genetics 185, 983–989 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Malpica J. M., et al. , The rate and character of spontaneous mutation in an RNA virus. Genetics 162, 1505–1511 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Roby J. A., Pijlman G. P., Wilusz J., Khromykh A. A., Noncoding subgenomic flavivirus RNA: Multiple functions in west nile virus pathogenesis and modulation of host responses. Viruses 6, 404–427 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Muslin C., Joffret M.-L., Pelletier I., Blondel B., Delpeyroux F., Evolution and emergence of enteroviruses through intra- and inter-species recombination: Plasticity and phenotypic impact of modular genetic exchanges in the 5’ untranslated region. PLoS Pathog. 11, e1005266 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Villordo S. M., Carballeda J. M., Filomatori C. V., Gamarnik A. V., RNA structure duplications and flavivirus host adaptation. Trends Microbiol. 24, 270–283 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liu Z., et al. , Identification of common deletions in the spike protein of severe acute respiratory syndrome coronavirus 2. J. Virol. 94, e00790-20 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liu X., et al. , A comprehensive evolutionary and epidemiological characterization of insertion and deletion mutations in SARS-CoV-2 genomes. Virus Evol. 7, veab104 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chrisman B. S., et al. , Indels in SARS-CoV-2 occur at template-switching hotspots. BioData Min. 14, 20 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Weng S., et al. , Conserved pattern and potential role of recurrent deletions in SARS-CoV-2 evolution. Microbiol. Spectr. 10, e0219121 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wrobel A. G., et al. , SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects. Nat. Struct. Mol. Biol. 27, 763–767 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Johnson B. A., Furin cleavage site is key to SARS-CoV-2 pathogenesis. bioRxiv [Preprint] (2020) 10.1101/2020.08.26.268854 (Accessed 6 June 2021). [DOI]
  • 15.Elde N. C., et al. , Poxviruses deploy genomic accordions to adapt rapidly against host antiviral defenses. Cell 150, 831–841 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Senkevich T. G., Zhivkoplias E. K., Weisberg A. S., Moss B., Inactivation of genes by frameshift mutations provides rapid adaptation of an attenuated vaccinia virus. J. Virol. 94, e01053-20 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Coulson D., Upton C., Characterization of indels in poxvirus genomes. Virus Genes 42, 171–177 (2011). [DOI] [PubMed] [Google Scholar]
  • 18.Huang A. S., Baltimore D., Defective viral particles and viral disease processes. Nature 226, 325–327 (1970). [DOI] [PubMed] [Google Scholar]
  • 19.Rezelj V. V., Levi L. I., Vignuzzi M., The defective component of viral populations. Curr. Opin. Virol. 33, 74–80 (2018). [DOI] [PubMed] [Google Scholar]
  • 20.Drake J. W., Holland J. J., Mutation rates among RNA viruses. Proc. Natl. Acad. Sci. U.S.A. 96, 13910–13913 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sanjuán R., Nebot M. R., Chirico N., Mansky L. M., Belshaw R., Viral mutation rates. J. Virol. 84, 9733–9748 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Peck K. M., Lauring A. S., Complexities of viral mutation rates. J. Virol. 92, e01031-17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Acevedo A., Brodsky L., Andino R., Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Acevedo A., Andino R., Library preparation for highly accurate population sequencing of RNA viruses. Nat. Protoc. 9, 1760–1769 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li S., et al. , SOAPindel: Efficient identification of indels from short paired reads. Genome Res. 23, 195–200 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ratan A., Olson T. L., Loughran T. P. Jr., Miller W., Identification of indels in next-generation sequencing data. BMC Bioinformatics 16, 42 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Xia L. C., et al. , A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic Acids Res. 44, e126 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ye K., Schulz M. H., Long Q., Apweiler R., Ning Z., Pindel: A pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhang J., Wu Y., SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data. Bioinformatics 27, 3228–3234 (2011). [DOI] [PubMed] [Google Scholar]
  • 30.Yang R., Nelson A. C., Henzler C., Thyagarajan B., Silverstein K. A. T., ScanIndel: A hybrid framework for indel detection via gapped alignment, split reads and de novo assembly. Genome Med. 7, 127 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Albers C. A., et al. , Dindel: Accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Olmo-Uceda M. J., et al. , DVGfinder: A metasearch tool for identifying defective viral genomes in RNA-Seq data. Viruses 14, 1114 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Schirmer M., D’Amore R., Ijaz U. Z., Hall N., Quince C., Illumina error profiles: Resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics 17, 125 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Whitfield Z. J., Andino R., Characterization of viral populations by using circular sequencing. J. Virol. 90, 8950–8953 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.de Jong W. W., Rydén L., Causes of more frequent deletions than insertions in mutations and protein evolution. Nature 290, 157–159 (1981). [DOI] [PubMed] [Google Scholar]
  • 36.Pita J. S., de Miranda J. R., Schneider W. L., Roossinck M. J., Environment determines fidelity for an RNA virus replicase. J. Virol. 81, 9072–9077 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Fan Y., et al. , Patterns of insertion and deletion in mammalian genomes. Curr. Genomics 8, 370–378 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dolan P. T., et al. , Principles of dengue virus evolvability derived from genotype-fitness maps in human and mosquito cells. Elife 10, e61921 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Burrill C. P., et al. , Global RNA structure analysis of poliovirus identifies a conserved RNA structure involved in viral replication and infectivity. J. Virol. 87, 11670–11683 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lulla V., et al. , An upstream protein-coding region in enteroviruses modulates virus infection in gut epithelial cells. Nat. Microbiol. 4, 280–292 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Chang M. S. S., Benner S. A., Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617–631 (2004). [DOI] [PubMed] [Google Scholar]
  • 42.Benner S. A., Cohen M. A., Gonnet G. H., Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J. Mol. Biol. 229, 1065–1082 (1993). [DOI] [PubMed] [Google Scholar]
  • 43.Cartwright R. A., Problems and solutions for estimating indel rates and length distributions. Mol. Biol. Evol. 26, 473–480 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Qian B., Goldstein R. A., Distribution of indel lengths. Proteins 45, 102–104 (2001). [DOI] [PubMed] [Google Scholar]
  • 45.Hagiwara-Komoda Y., et al. , Truncated yet functional viral protein produced via RNA polymerase slippage implies underestimated coding capacity of RNA viruses. Sci. Rep. 6, 21411 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lai M. M., RNA recombination in animal and plant viruses. Microbiol. Rev. 56, 61–79 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Pfeiffer J. K., Kirkegaard K., A single mutation in poliovirus RNA-dependent RNA polymerase confers resistance to mutagenic nucleotide analogs via increased fidelity. Proc. Natl. Acad. Sci. U.S.A. 100, 7289–7294 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Pfeiffer J. K., Kirkegaard K., Increased fidelity reduces poliovirus fitness and virulence under selective pressure in mice. PLoS Pathog. 1, e11 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Vignuzzi M., Stone J. K., Andino R., Ribavirin and lethal mutagenesis of poliovirus: Molecular mechanisms, resistance and biological implications. Virus Res. 107, 173–181 (2005). [DOI] [PubMed] [Google Scholar]
  • 50.Vignuzzi M., Andino R., “Biological implications of picornavirus fidelity mutants” in The Picornaviruses (ASM Press, 2014), E. Ehrenfeld, E. Domingo, R.P. Roos, Eds., pp. 213–227. [Google Scholar]
  • 51.Korboukh V. K., et al. , RNA virus population diversity, an optimum for maximal fitness and virulence. J. Biol. Chem. 289, 29531–29544 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Xiao Y., et al. , RNA recombination enhances adaptability and is required for virus spread and virulence. Cell Host Microbe 19, 493–503 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Garcia-Diaz M., Kunkel T. A., Mechanism of a genetic glissando: Structural biology of indel mutations. Trends Biochem. Sci. 31, 206–214 (2006). [DOI] [PubMed] [Google Scholar]
  • 54.Pijlman G. P., et al. , A highly structured, nuclease-resistant, noncoding RNA produced by flaviviruses is required for pathogenicity. Cell Host Microbe 4, 579–591 (2008). [DOI] [PubMed] [Google Scholar]
  • 55.Alvarez D. E., Ezcurra A. L. D. L., Fucito S., Gamarnik A. V., Role of RNA structures present at the 3′ UTR of dengue virus on translation, RNA synthesis, and viral replication. Virology 339, 200–212 (2005). [DOI] [PubMed] [Google Scholar]
  • 56.Villordo S. M., Gamarnik A. V., Differential RNA sequence requirement for dengue virus replication in mosquito and mammalian cells. J. Virol. 87, 9365–9372 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Villordo S. M., Filomatori C. V., Sánchez-Vargas I., Blair C. D., Gamarnik A. V., Dengue virus RNA structure specialization facilitates host adaptation. PLoS Pathog. 11, e1004604 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Li D., et al. , Defective interfering viral particles in acute dengue infections. PLoS One 6, e19447 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Jones R. A., et al. , Different tertiary interactions create the same important 3-D features in a distinct flavivirus xrRNA. RNA 27, 54–65 (2021), 10.1261/rna.077065.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Pompon J., et al. , Dengue subgenomic flaviviral RNA disrupts immunity in mosquito salivary glands to increase virus transmission. PLoS Pathog. 13, e1006535 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Luo D., et al. , Crystal structure of the NS3 protease-helicase from dengue virus. J. Virol. 82, 173–183 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Reid D. W., et al. , Dengue virus selectively annexes endoplasmic reticulum-associated translation machinery as a strategy for co-opting host cell protein synthesis. J. Virol. 92, e01766-17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Segredo-Otero E., Sanjuán R., The effect of genetic complementation on the fitness and diversity of viruses spreading as collective infectious units. Virus Res. 267, 41–48 (2019). [DOI] [PubMed] [Google Scholar]
  • 64.Andreu-Moreno I., Sanjuán R., Collective viral spread mediated by virion aggregates promotes the evolution of defective interfering particles. mBio 11, e02156-19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Perrault J., Lane J. L., McClure M. A., “A variant vsv generates defective interfering particles with replicase-like activity in vitro” in Animal Virus Genetics (Elsevier, 1980), B. N. Fields, R. Jaenisch, F. Fox, Eds., pp. 379–390. [Google Scholar]
  • 66.Vignuzzi M., López C. B., Defective viral genomes are key drivers of the virus-host interaction. Nat. Microbiol. 4, 1075–1087 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Giri L., Feiss M. G., Bonning B. C., Murhammer D. W., Production of baculovirus defective interfering particles during serial passage is delayed by removing transposon target sites in fp25k. J. Gen. Virol. 93, 389–399 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Das A., et al. , A detailed model and monte carlo simulation for predicting DIP genome length distribution in baculovirus infection of insect cells. Biotechnol. Bioeng. 118, 238–252 (2021). [DOI] [PubMed] [Google Scholar]
  • 69.Beauclair G., et al. , DI-tector: Defective interfering viral genomes’ detector for next-generation sequencing data. RNA 24, 1285–1296 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Bosma T. J., et al. , Identification and quantification of defective virus genomes in high throughput sequencing data using DVG-profiler, a novel post-sequence alignment processing algorithm. PLoS One 14, e0216944 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Xiao Y., et al. , Poliovirus intrahost evolution is required to overcome tissue-specific innate immune responses. Nat. Commun. 8, 375 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Kempf B. J., Peersen O. B., Barton D. J., Poliovirus polymerase Leu420 facilitates RNA recombination and ribavirin resistance. J. Virol. 90, 8410–8421 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Murat P., Guilbaud G., Sale J. E., DNA polymerase stalling at structured DNA constrains the expansion of short tandem repeats. Genome Biol. 21, 209 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Viguera E., Canceill D., Ehrlich S. D., Replication slippage involves DNA polymerase pausing and dissociation. EMBO J. 20, 2587–2595 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Olspert A., Chung B.Y.-W., Atkins J. F., Carr J. P., Firth A. E., Transcriptional slippage in the positive-sense RNA virus family potyviridae. EMBO Rep. 16, 995–1004 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Wolff G., Melia C. E., Snijder E. J., Bárcena M., Double-membrane vesicles as platforms for viral replication. Trends Microbiol. 28, 1022–1033 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Lyle J. M., Bullitt E., Bienz K., Kirkegaard K., Visualization and functional analysis of RNA-dependent RNA polymerase lattices. Science 296, 2218–2222 (2002). [DOI] [PubMed] [Google Scholar]
  • 78.Hobson S. D., et al. , Oligomeric structures of poliovirus polymerase are important for function. EMBO J. 20, 1153–1163 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Laurent T., et al. , Architecture of the chikungunya virus replication organelle. Elife 11, e83042 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Xiao Y., et al. , RNA recombination enhances adaptability and is required for virus spread and virulence. Cell Host Microbe 22, 420 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Henle W., Henle G., Interference of inactive virus with the propagation of virus of influenza. Science 98, 87–89 (1943). [DOI] [PubMed] [Google Scholar]
  • 82.Von Magnus P., Incomplete forms of influenza virus. Adv. Virus Res. 2, 59–79 (1954). [DOI] [PubMed] [Google Scholar]
  • 83.Chambers T. M., Webster R. G., Protection of chickens from lethal influenza virus infection by influenza A/chicken/Pennsylvania/1/83 virus: Characterization of the protective effect. Virology 183, 427–432 (1991). [DOI] [PubMed] [Google Scholar]
  • 84.Doyle M., Holland J. J., Virus-induced interference in heterologously infected hela cells. J. Virol. 9, 22–28 (1972). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Doyle M., Holland J. J., Prophylaxis and immunization in mice by use of virus-free defective T particles to protect against intracerebral infection by vesicular stomatitis virus. Proc. Natl. Acad. Sci. U.S.A. 70, 2105–2108 (1973). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Madsen M. S., Snelling D. F., Heaphy S., Norris V., Antiviruses as therapeutic agents: A mathematical analysis of their potential. J. Theor. Biol. 184, 111–116 (1997). [DOI] [PubMed] [Google Scholar]
  • 87.Shirogane Y., et al. , Experimental and mathematical insights on the interactions between poliovirus and a defective interfering genome. PLoS Pathog. 17, e1009277 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Lai M. M. C., “Genetic recombination in RNA viruses” in Genetic Diversity of RNA Viruses, Current Topics in Microbiology and Immunology, Holland J. J., Ed. (Springer, Berlin Heidelberg, 1992), pp. 21–32. [DOI] [PubMed] [Google Scholar]
  • 89.Langmead B., Salzberg S. L., Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Langmead B., Wilks C., Antonescu V., Charles R., Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics 35, 421–432 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Biesiada M., Purzycka K. J., Szachniuk M., Blazewicz J., Adamiak R. W., Automated RNA 3D structure prediction with RNAComposer. Methods Mol. Biol. 1490, 199–215 (2016). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Data Availability Statement

MultiMatch is available for download at https://github.com/marangel/MultiMatch. Data associated with the analysis presented in this manuscript are available at the Digital Dryad Database “High-resolution mapping reveals the mechanism and contribution of genome insertions and deletions to RNA virus evolution”, https://doi.org/10.5061/dryad.qjq2bvqm6. Reviewer download link: https://datadryad.org/stash/share/VjXKitbO8BvkHNNq5dJ_oiOR8lc5l8z5AL-​Y85JMshk. Previously published data were used for this work (No original data were generated as part of this study. Dengue virus (DENV) CirSeq datasets (human and mosquito) correspond to data published in 37, and were downloaded from the NCBI repository using the accession code PRJNA669406. RNA-Seq datasets of poliovirus replicating in mice (all analyzed tissues) and CirSeq datasets of poliovirus [wildtype (WT) and mutants] were published in 77, and were retrieved from the NCBI BioProject database using the accession code PRJNA383905. Ribosome profiling data of dengue virus replicating in Huh7 cells corresponds to the data published in 59, and was downloaded from Gene Expression Omnibus using the accession number GSE69602).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES