Abstract
Defective viral genomes (DVGs) have been identified in many RNA viruses as a major factor influencing antiviral immune response and viral pathogenesis. However, the generation and function of DVGs in SARS-CoV-2 infection are less known. In this study, we elucidated DVG generation in SARS-CoV-2 and its relationship with host antiviral immune response. We observed DVGs ubiquitously from RNA-seq datasets of in vitro infections and autopsy lung tissues of COVID-19 patients. Four genomic hotspots were identified for DVG recombination and RNA secondary structures were suggested to mediate DVG formation. Functionally, bulk and single cell RNA-seq analysis indicated the IFN stimulation of SARS-CoV-2 DVGs. We further applied our criteria to the NGS dataset from a published cohort study and observed significantly higher DVG amount and frequency in symptomatic patients than that in asymptomatic patients. Finally, we observed unusually high DVG frequency in one immunosuppressive patient up to 140 days after admitted to hospital due to COVID-19, first-time suggesting an association between DVGs and persistent viral infections in SARS-CoV-2. Together, our findings strongly suggest a critical role of DVGs in modulating host IFN responses and symptom development, calling for further inquiry into the mechanisms of DVG generation and how DVGs modulate host responses and infection outcome during SARS-CoV-2 infection.
Keywords: defective viral genomes, SARS-CoV-2, recombination, secondary structure, type I/III IFN responses, human epithelial cells
Introduction
Respiratory tract infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) results in varying immunopathology underlying coronavirus disease 2019 (COVID-19). Its symptoms vary from asymptomatic infection to milder/moderate disease and further critical illness, including respiratory failure and death. Immune responses in COVID-19 patients of various disease severities have been studied (Lega, Naviglio et al. 2020, Chiale, Greene et al. 2022, Dadras, Afsahi et al. 2022). In general, broad induction of IFN responses and antiviral genes are associated with milder/moderate COVID-19, whereas severe COVID-19 is often characterized by a blunt early IFN responses and elevated proinflammatory cytokine expression in nasopharyngeal mucosa (Kwon, Kim et al. 2020, Liu, Li et al. 2020, Gozman, Perry et al. 2021, Janssen, Grondman et al. 2021, Vanderbeke, Van Mol et al. 2021). Investigation of how IFN responses are induced by SARS-CoV-2 infection, especially early IFN stimulation in some patients, requires further study.
During SARS-CoV-2 infection, in addition to full-length viral genomes and single nucleotide mutations, three major types of viral RNAs are generated from non-homologous recombination that are critical for viral pathogenesis, including subgenomic mRNAs (sgmRNAs), structural variants (SVs), and defective viral genomes (DVGs). The viral replication-transcription complex performs recombination at specific transcription regulatory sequences (TRSs) to generate a set of sgmRNAs, which subsequently translate into viral structural proteins (van Hemert, van den Worm et al. 2008, Dufour, Mateos-Gomez et al. 2011, Sola, Almazán et al. 2015, Brant, Tian et al. 2021). SVs comprise small insertion/deletions that allow the variant genome to independently replicate and transmit. Numerous SVs have been described including small deletions in viral spike protein that alter the fitness and virulence of SARS-CoV-2 isolates (Davidson, Williamson et al. 2020, Li, Wu et al. 2020, Majumdar and Niyogi 2021, Wang, Lau et al. 2021). Different from sgmRNAs and SVs, SARS-CoV-2 DVGs contain large internal deletions and have recombination positions distinct from TRSs while retaining 5’ and 3’ genomic untranslated regions (UTRs) (Gribble, Stevens et al. 2021).
This type of DVGs, also known as defective viral or interfering RNAs (D-RNAs), is widely generated during replication of most positive sense RNA viruses (Huang 1973, Marcus and Sekellick 1977) and influenza (Nayak, Chambers et al. 1985), and their replication relies on viral machinery provided by co-infected homologous full-length viruses (Huang and Baltimore 1970, Brian and Spaan 1997, Wu and Brian 2010). When accumulated to a high level, DVGs can interfere with full-length viral genome production by stealing essential viral elements from full-length viruses (Roux, Simon et al. 1991, Vignuzzi and López 2019). This interference activity has been reported for influenza viruses (De and Nayak 1980) and multiple non-SARS-CoV-2 coronaviruses (CoVs), such as SARS-CoV (Raman and Brian 2005), mouse hepatitis virus (MHV) (Makino, Fujioka et al. 1985), bovine CoV (Hofmann, Sethna et al. 1990), avian infectious bronchitis virus (IBV) (Pénzes, Wroe et al. 1996), transmissible gastroenteritis virus (Méndez, Smerdou et al. 1996), and middle east respiratory syndrome CoV (MERS-CoV) (Gribble, Stevens et al. 2021). In addition to interference activity, DVGs from influenza A virus have strong IFN stimulation (Kupke, Riedel et al. 2019) and are reported to promote viral persistence in vitro (De and Nayak 1980, Moscona 1991, Frensing, Heldt et al. 2013). More importantly, DVGs are largely observed in nasal samples from patients positive for influenza and their abundance is negatively correlated with patients’ disease severity, indicating the critical roles of DVGs in host responses and clinical outcome (Vasilijevic, Zamarreño et al. 2017). The current approach to identify DVGs from SARS-CoV-2 infection is through short-read and long-read next generation deep sequencing (NGS). Several algorithms, such as DI-tector (Beauclair, Mura et al. 2018), VODKA (Viral Opensource DVG Key Algorithm) (Sun, Kim et al. 2019), and ViReMa, (Viral-Recombination-Mapper) (Routh and Johnson 2014), and metasearch tool DVGfinder (Olmo-Uceda, Muñoz-Sánchez et al. 2022) are developed to specifically detect the reads containing the recombination sites of DVGs. Using these approaches, DVGs are documented in SARS-CoV-2 infected Vero E6 cells (Chaturvedi, Vasen et al. 2021, Rand, Kupke et al. 2021) and in nasal samples of COVID-19 patients (Xiao, Lidsky et al. 2021). Long-read NGS, such as full length iso-seq and nanopore direct RNA-seq, further confirmed that substantial TRS-independent deletions identified from short-read NGS are from SARS-CoV-2 genomes and maintain two genomic ends (Gribble, Stevens et al. 2021, Wong, Ngan et al. 2021). Additionally, identical deletions are found in various transcripts encoding distinct sgmRNAs (Wong, Ngan et al. 2021), strongly suggesting that even deletions in sgmRNAs are likely to be originated from viral genomes, since deletions existing in the viral genome can be used as the template to generate a set of sgmRNAs with the same deletions during transcription.
Despite DVGs playing such an important role in viral pathogenesis, their function in SARS-CoV-2 biology is less known. Recent reports show that synthetic SARS-CoV-2 DVGs (named therapeutic interfering particles, TIPs) exhibit substantial reduction on viral load across different viral variants when delivered in hamsters (Chaturvedi, Vasen et al. 2021) and mice (Xiao, Lidsky et al. 2021) pre- or shortly after infection, demonstrating the potential of SARS-CoV-2 DVGs as a new class of antiviral intervention by interfering genomic replication. No reports have been identified for the role of DVGs in IFN responses and viral persistence for SARS-CoV-2 infection so far. Interestingly, a COVID-19 cohort study (Wong, Ngan et al. 2021) indicates that the abundance of TRS-independent deletions (>20nts) is significantly more in symptomatic patients than that in asymptomatic patients, suggesting a potential role of DVGs in modulating host responses and symptom development in COVID-19 patients.
As our interest lies with the generation of DVGs, in relation to viral pathogenesis rather than sgmRNAs or smaller deletions in SVs, we used a pipeline based on ViReMa combined with sequence filtering via RStudio to specifically identify TRS-independent DVGs with deletion lengths larger than 100nts. We identified DVGs with varying degrees of junction frequency, termed Jfreq, from multiple NGS datasets that are either publicly available or from our own infections. Interestingly, we found DVG junctions consistently clustered in several genomic hotspots among different NGS datasets and secondary structures within viral genome are likely to guide the recombination. Functionally, we found that with similar infection level, samples with more DVG reads had enhanced type I/III IFN responses than samples with less or no DVGs, indicating the potential IFN stimulation of SARS-CoV-2 DVGs. In support, analysis of single cell RNA-Seq from infected primary human lung epithelial cells showed an earlier primary IFN expression (IFNB and IFNL1) in DVG+ cells than in DVG− cells. Finally, we applied our DVG analysis to several published NGS datasets from nasal samples of COVID-19 patients. We found persistent DVG reads with unusually high frequency in one immunosuppressive patient and higher DVG abundance in symptomatic patients than asymptomatic patients. Taken together, our analyses demonstrate critical roles of DVGs in modulating host IFN responses, viral persistence, and clinic outcome for SARS-CoV-2 infection.
Results
DVGs are ubiquitously produced during SARS-CoV-2 infection both in vitro and in patients.
To examine whether DVGs can be detected universally during SARS-CoV-2 infections, we used the ViReMa pipeline (Virus Recombination Mapper) combining with R filtering (Fig. S6) to specifically map the DVG recombinant sites (Fig. 1A) in multiple next generation sequencing (NGS) datasets. As reported previously, ViReMa can agnostically detect RNA recombination events and reported these junction positions in BED files. Reported junction positions include sgmRNAs, of which their junctions contain leader transcriptional-regulatory signal (TRS-L, within the first 85 nts of leader), and other recombinant RNAs with their jumping positions that are far away from TRS-L. We defined our targeted DVGs as TRS-L independent RNA species bearing deletions larger than 100 nts (Fig. 1A). Use these criteria, we first examined DVGs in 4 publicly available in vitro infected NGS datasets with various cell types, MOIs, viral stocks, and sample origins (Table S1). We found that DVGs can be detected in all examined datasets ranging from several counts to several thousand counts (Fig. 1B). As the infection level varied significantly among different datasets, we normalized DVG levels by junction frequency (Jfreq), a ratio of DVG counts over virus counts. DVG counts were the total number of DVG reads obtained from ViReMa and meeting the above criteria, whereas virus counts were the total amount of reads fully aligned to the reference viral genome. We observed two ranges of Jfreq, <0.1% and 0.1%-1%. A549-ACE2 infected cells have the highest Jfreq, whereas infections in NHBE varied. In addition, either total RNA or polyA enriched RNA were used for NGS for Calu3-total RNA and Calu3-polyA, respectively. Both samples had very similar Jfreq, suggesting Jfreq is robust to different library preparation methods. Interestingly, we detected DVGs, although with low Jfreq, in the supernatants collected from infected Vero E6 cells, suggesting that certain DVG species generated within infected cells, potentially the DVGs containing packaging signals, were able to be packaged into virions and released out into supernatants.
Figure 1. DVGs were ubiquitously generated in SARS-CoV-2 in vitro infections and autopsy tissues of COVID-19 patients.
(A) Schematic representation of DVG generation from positive sense viral genome and the general principle of ViReMa identification of deletion DVGs. The V’ site represents the break point and the E’ site represents the rejoin point of the viral polymerase in the formation of DVGs. The gray dashed box marks the recombinant site that distinguishes DVGs from full length viral genomes, which are identified by ViReMa, and further filtered using two criteria shown in the graph. (B) The DVG read counts, viral read counts, and Jfreq percentages were graphed for each of the in vitro samples including the infected cells and supernatants. (C) The DVG read counts, viral read counts, and Jfreq percentages were graphed for autopsy lung tissues of 9 DVG + COVID-19 patients. Each case represented one patients and different dots represented RNA-seq from the different location of the same lung tissues. (D) The correlations between DVG read counts and viral read counts were plotted for both the in vitro and autopsy samples. ****p<0.0001 by Pearson’s correlation. (E) The percentage of -sense DVG among total DVGs in in vitro and autopsy samples were shown.
We then examined DVGs in autopsy tissues from patients that unfortunately died from COVID-19 complications (GSE150316). We analyzed lung, heart, jejunum, liver, and kidney from 19 cases and DVGs were observed in only lung tissues in 9 cases (Fig. 1C). Their DVG counts were close to the level observed in infections in NHBE cells but much less compared to infections in cell lines, such as A549-ACE2, Vero E6, Calu3, and Caco2. Jfreq from autopsy tissues were mostly less than 0.1%, comparable with the lower range of Jfreq observed from in vitro infections. Next, we sought to examine the relationship between DVG production and viral replication. Interestingly, we observed strong positive correlation between DVG counts and virus counts for autopsy tissues, but not for in vitro infections (Fig. 1D). In addition, Jfreq was not significantly correlated with virus replication level. It is noted that both negative sense (−sense) and positive sense (+sense) DVGs were detected in all NGS datasets. The percentage of -sense DVGs dominated in most in vitro infected NGS using total RNA to prepare the library (Fig. 1E). Together with the previous reports in nasal specimens of COVID-19 patients (Xiao, Lidsky et al. 2021) and our own analysis, we concluded that DVGs are ubiquitously generated during SARS-CoV-2 infection in vitro and in patients.
Recombination sites of SARS-CoV-2 DVGs were clustered in certain genomic hotspots.
To characterize positions of DVGs’ recombination sites, we graphed the actual junction positions of all identified DVGs from in vitro infections from different cells and DVG+ autopsy tissues. As both +sense and −sense DVGs were identified, we examined their distributions separately and first analyzed the junction positions of -sense DVGs. Interestingly, we found that their generation were clustered in three conserved genomic hotspots, indicated as junction areas A, B, C (green boxes in Fig. 2A and 2B). Among them, area B was observed in all infections and area A was largely observed in infected cells but absent in the supernatants from infected Vero E6 cells. As DVGs formed in junction area A contained the largest deletion compared to B and C, it is possible that DVGs within area A lack the package signal and thus were less efficiently released into supernatants. To further identify the genomic hotspots for DVG break and rejoin points, we graphed their locations separately based on the junction frequency per DVG position. We identified one major hotspot for break point, corresponding to genomic positions 28200–29750 (highlighted in grey dashed box in Fig. 2C, details in Fig. 2D). Additionally, three major rejoin hotspots were identified including 700–2500 (red box), 6500–8200 (yellow box), and 27000–29400 (green box). When comparing the distribution between −sense and +sense DVGs, we observed that rejoin points, V, of +sense DVGs shared the same hotspots with break point, V’, of −sense DVGs (Fig. S1A–D vs Fig. 2A). This suggests that the junction positions of −sense and +sense DVGs are correlated, likely resulting from their self-replication (Fig. S1E). Finally, we ought to examine whether common DVGs can be detected from different infection or different autopsy tissues. We only identified common DVGs from different in vitro infections within the same RNA-seq dataset (likely used the same viral stock for infections, Table S2). We did not find any common DVGs from different autopsy tissues. Taken together, our analysis from multiple NGS datasets indicated that SARS-CoV-2 DVGs are not generated randomly, rather they are formed at specific genomic regions.
Figure 2. Four genomic hotspots were identified for DVG formation.
Break point (V’) and rejoin point (E’) distributions for -sense DVGs from in vitro samples (A) and autopsy samples (B). Circle size and color intensity indicated the DVG counts. The green dashed boxes represented hotspots clustered with DVG junctions. (C) Break point (V’) and rejoin point (E’) distributions by Jfreq per position for all in vitro samples. The dashed boxes indicated hotspots with high concentrations of break or rejoin points. The width of each bar represented 300 nts. (D) Detailed positions of 4 identified hotspots clustered with DVG break and rejoin points. The color of the dashed outline around each graph indicated the corresponding hotspot with the same color in (C). The width of each bar represented 10 nts.
The RNA structure distance between SARS-CoV-2 DVG junction positions is shorter than any two random SARS-CoV-2 genomic positions.
Ziv et al. developed COMRADES (Ziv, Gabryelska et al. 2018), which can probe RNA base pairing inside cells, and applied it to detect short- and long-range interactions along the full-length SARS-CoV-2 genome (Ziv, Price et al. 2020). Interestingly, the positions of SARS-CoV-2 DVG junctions correlated well with the pairings found by COMRADES (red arches in Fig. 3A), which suggests a role of RNA secondary structures in the formation of DVGs. The paired bases bring distant nucleotides in the primary sequence close and make it possible for the breaking and rejoining actions to occur around those close pairs. To further study the relationship between DVG junctions and the identified secondary structure within the SARS-CoV-2 genome, we calculated the structural distance between DVG junction positions, which is the shortest distance between two nucleotides by traversing the backbone and base pairs (red solid path in Fig. 3B) (Clote, Ponty et al. 2012). We further extended this definition to allow competing base pairs from alternative secondary structures since many RNAs are known to populate multiple conformations in equilibrium and Ziv et al.’s data included alternative conformations of SARS-CoV-2.
Figure 3. The correlation between DVGs and secondary structures.
(A) Comparison between DVG junction positions (top, in vitro, -sense DVGs) and chimeric reads from COMRADES (bottom) along full-length SARS-CoV-2 genome (Ziv, Price et al. 2020). The red arches represented DVG positions that match COMRADES crosslinks and the blue arches represented positions that do not match crosslinks. (B) Example to compare sequence distance and structural distance. The structural distance between nucleotides 10 to 50 is only 5 (red solid path that includes a connection across a base pair), while the sequence distance is 40 (orange dashed path). (C–D) The distribution of all structural distances between any two positions in SARS-CoV-2 (C), and between SARS-CoV-2 DVG junction positions (D). The percent of distances less than 50, 100 and 200 were indicated, respectively. (E–F) As a negative control, the distribution of all sequence distances between any two positions in SARS-CoV-2 (E), and between SARS-CoV-2 DVG junction positions (F). The mean and median distances of all distributions were annotated in C–F. In (D) and (F), the blue, yellow, and red bars corresponded to three hotspots annotated in Fig. 2C, respectively, while the grey bars were out of the range of these detected hotspots. The inset in (D) distinguished structural distance’s distributions of three hotspots and the rest up to a structural distance of 100. The dashed contour in the inset represented the sum of all distributions for the same structural distance, and it was with the same shape as the major figure in (D). In both (C) and (E), the total occurrence of all distances equals the number of any two positions along SARS-CoV-2, and in (D) and (F), the total occurrence of all distances is the same as the number of DVG data points (with counts 2 or above).
We first analyzed the distribution of all structural distances between any two nucleotides in SARS-CoV-2 (counts >= 2), where 41% of the distances were under 100 (Fig. 3C) with a long tail up to 1200. The median distance of the distribution was 112. However, for the structural distances only between SARS-CoV-2 DVG junction positions, the peak of the distribution shifted to the left with a smaller median value 33, and the vast majority (94%) of distances were less than 100 (Fig. 3D). Therefore, the structural distances between DVG junction positions were substantially shorter than the distances between any two random positions, which indicated a strong correlation between secondary structures and DVGs formation. Moreover, we observed that the larger the cutoff value for DVG counts, the greater the proportion of distances under 100 and the smaller the mean distance (Fig. S2). As a negative control, we also evaluated the sequence distance, which is the distance between nucleotides only based on their positions along the primary sequence; in fact, it is a special case of structural distance without any secondary structure. We analyzed the sequence distance between any two nucleotides in SARS-CoV-2 and between SARS-CoV-2 DVG junction positions (Fig. 3E and 3F), respectively. The distribution of sequence distances between any two nucleotides on SARS-CoV-2 was a triangular distribution. Most of the distances between DVG junctions were clustered similarly as the hotspots previously observed (Fig. 2C vs Fig. 3F), which is completely different from the distribution of structural distances of DVG junctions that has its peak on the left (Fig. 3C and 3D).
SARS-CoV-2 DVGs specifically enhanced type I/III IFN responses.
To understand the dynamics of SARS-CoV-2 DVGs during infection and how that affects host responses and viral replication, we infected PHLE cells from donors of different age groups with SARS-CoV-2 Hong Kong strain (SARS-CoV-2/human/HKG/VM20001061/2020) at MOI of 5. Mock and infected cells were harvested at different time points post infection (hpi) followed by bulk RNA-seq-ViReMa analysis. We observed DVGs as early as 48 hpi in cells from infants and younger adults, whereas in the elderly sample, we did not detect DVGs until 72 hpi (Fig. 4A), suggesting that DVG generation may be delayed in the elderly who are more likely to display severe symptoms when infected. We observed the same genomic hotspots for DVG junction regardless of their age groups and time points (Fig. S3A–S3D). Strikingly, those hotspots were consistent with the ones identified from different cell lines (Fig. 2), autopsy lung tissues (Fig. 2), and the following single cell RNA-seq analysis (Fig. S3E). Again, we observed that V (rejoin point of +sense DVGs) and V’ (break point of −sense DVGs) shared the same hotspots and E (break point of +sense DVGs) and E’ (rejoin point of −sense DVGs) shared the same hotspots (Fig. S3A vs S3B), indicating that our identified recombination sites were likely from DVGs capable of replication.
Figure 4. DVGs influence type I/III interferon responses in infected PHLE cells.
PHLE cells of donors from different age groups were infected with SARS-CoV-2 at MOI of 5. Samples were harvested at designated time points post infection. (A) Viral read counts, DVG read counts, and Jfreq were graphed for all samples, grouped by donor’ age group and time points. NA indicated that the samples were not available for RNA-seq and thus no data were collected. (B) Differential expression levels of genes related to type I interferon responses were graphed as heat map for all infected samples. Samples were grouped by viral infection level. DVG levels of each sample were indicated by different color codes on top of the heatmap. Four infected samples at 72 hpi with similar level of viral counts were selected to compare their IFN responses (C) and other gene expression unrelated to type I/III IFN responses (D). (E) The viral and DVG read counts for the selected 4 infected samples (D198, D203, D239, and D283) were graphed.
In order to examine the role of DVGs in host responses, we grouped our infected samples based on their DVG counts and viral counts. Three samples (D231_I_48hr, D231_I_72hr, and D239_I_48hr) were significantly higher in both viral counts and DVG counts and thus categorized as High group (marked dark blue in Fig. 4B and S4A). When compared this group with the rest infected samples, one cluster of genes (pink cluster) were identified as upregulated in the High group. Gene Ontology (GO) enrichment analysis of this cluster was highly enriched in genes involved in type I IFN antiviral responses (Fig. S4B). A heatmap focusing on type I/III IFN related genes confirmed that samples in High group had enhanced gene expression compared to the rest of samples (Fig. 4B). In order to test if the IFN stimulation is specific to DVGs, we selected 4 samples at 72 hpi with similar levels of viral replication but different level of DVGs (Fig. 4E) to compare their type I/III IFN responses. We observed that the sample with more DVGs exhibited enhanced antiviral responses than samples with less DVGs (Fig. 4C), but this enhancement was not observed for genes in other pathways such as type II responses and inflammation (Fig. 4D). Although we cannot perform proper statistical analysis due to limited sample size, these data, for the first time, suggest that SARS-CoV-2 DVGs enhance IFN production as observed previously in other RNA viruses (Kupke, Riedel et al. 2019).
Primary IFNs were expressed earlier in DVG+ cells with moderate infection.
To understand DVG generation and their host responses at single cell level, we obtained one single cell RNA-seq dataset using adult NHBE cells with infection at MOI of 0.01 (GSE166766). Consistent with the previous observations, viral counts, DVG counts and Jfreq at 2 dpi were all significantly increased compared to that at 1 dpi, but not significantly different from 3 dpi (Fig. 5A). Major cell types enriched with DVGs were ciliated cells, basal cells, and SLC16A7+ (red in Fig. 5B, grouping of cell types were based on the markers used in original publication). Among these three cell types, ciliated cells had the most DVG+ cells, whereas SLC16A7+ cells had the highest percentage of DVG+ (Fig. 5C). All DVG+ cells contained at least 1 viral count (virus positive cells) and total viral counts were significantly higher in DVG+ cells than DVG− cells at all three time points (Fig. 5D). Only about 1% of virus positive cells at 1 dpi (n=60) were DVG+. Therefore, we focused on the DVG+ population at 2 dpi (n=348) and 3 dpi (n=725) to analyze their host responses. Differential expression tests were then performed using three different methods in Seurat (MAST, Wilcox, and DEseq2) between DVG+ and DVG− groups within virus positive cells. Significantly more genes were identified as downregulation in DVG+ cells than genes that were upregulated at both time points (adj_pvalue < 0.01 and logFC > 0.25) and similar enriched pathways were observed from GO analysis. Specifically, the ribosomal cytoplasmic translation (host protein synthesis) was largely inhibited in DVG+ cells, possibly due to their higher level of viral replication (more expression of NSP1) than DVG− cells (2 dpi: upper panel in Fig. 6A; 3 dpi: Fig. S5A). Despite of this, pathways such as transcription from RNA polymerase II promoter, TNF and NF-kB, and apoptosis were significantly enriched in the upregulated genes. Importantly, defense to virus and chemokines were also observed in the upregulation list, consistent with the results from bulk RNA-seq (2 dpi: bottom panel in Fig. 6A, 3 dpi: Fig. S5B). Next, we specifically examined the expression level of representative genes related to type I/III IFN pathways between DVG− and DVG+ viral positive cells, including two primary IFNs (IFNB1 and IFNL1), ISGs and chemokines selected from the differentially expressed gene list. To better control viral loads, we further categorized virus positive cells (cells with virus count ≥1) based on their viral counts as three groups: low (viral counts ≤10), moderate (10< viral counts <20000 for 1 dpi and 2 dpi; 10< viral counts <10000 for 3 dpi), and high (viral counts ≥20000 at 1 dpi and 2 dpi; viral counts ≥10000 at 3 dpi). DVGs were identified majorly in moderate (~12%) and high groups (>84%), and extremely small percentage (<0.2%) of low infected cells generated DVGs. Two primary IFNs were predominantly expressed only in moderate viral group regardless of DVG presence. However, DVG+ cells expressed two primary IFNs 1 day earlier than DVG− cells (2 dpi vs 3 dpi, moderate group in Fig. 6B), suggesting a role of DVGs in stimulating primary IFNs early. In support, ISGs showed similar trend. As IFN related genes are zero-inflated, we performed comparisons for both the expression level of cells expressing interested genes (gene counts >0, named as non-zero cells) and their percentages within DVG+ and DVG− groups. Briefly, the average expression of ISGs (non-zero cells) was all significantly enhanced in DVG+ cells within moderate group at 2 dpi but this enhancement was partially lost at 3 dpi despite of higher percentage of DVG+ cells expressing IFNs and ISGs at 3 dpi relative to that of DVG− cells (Fig. 6C and 6D). Different from moderate group, high viral group had minimal expression of all IFN related genes, further confirming IFN pathways were suppressed in highly infected cells (Fig. 6A, 6B). Low viral group predominantly expressed ISGs rather than two primary IFNs at all time points (Fig. 6E), suggesting they are the secondary responders to initial type I/III IFN production. Taken together, our analysis strongly suggests that DVG+ cells with moderate infection were the first responders to viral infection, quickly expressing primary IFNs and subsequentially alerting neighboring cells to express ISGs.
Figure 5. DVG generation in infected NHBE cells from single cell level.
(A) Violin plots of log transformed viral UMI counts, DVG UMI counts, and Jfreq for 1 dpi, 2 dpi, 3 dpi, and mock samples. (B) Bar plots of cell counts of uninfected cells, DVG− infected cells, and DVG+ cells within different cell type for mock, 1 dpi, 2 dpi, 3 dpi samples. Infected cells were cells with viral UMI over 1 and DVG+ cells were the ones with DVG UMI over 1. All DVG+ cells had at least 1 viral UMI. (C) Bar plots of DVG+ cell counts and DVG+ percentages per cell type for mock, 1 dpi, 2 dpi, and 3 dpi samples. (D) Violin plots of log transformed viral counts for DVG+ and DVG− viral positive cells. *** p < 0.01, ** p < 0.05 by two-sided Wilcoxon Rank Sum test.
Figure 6. DVG+ cells expressed primary IFNs earlier than DVG− cells.
(A) Gene ontology analysis of genes that were downregulated (Top) and upregulated (Bottom) in DVG+ cells relative to DVG− cells at 2 dpi. Circle size represented number of genes in each pathway. Gene ratio represented the ratio of number of genes in that pathway to the number of genes in the entire cluster. (B) Gene expression of IFNB1 and IFNL1 (Y-axis) were correlated with viral UMI level (X-axis) within each virus counts group. Virus groups with their counts criteria were indicated on top of the graph. Each dot represented individual cell and were colored based on their presence of DVGs. (C-D) In the moderate virus group, expression level of IFNB, IFNL1, selected ISGs and chemokines for non-zero (gene counts > 0) cells and percentage of non-zero cells within DVG+ and DVG− groups were compared and graphed as violin plots at 2 dpi (C) and 3 dpi (D). *** p < 0.01, ** p < 0.05 by two-sided Wilcoxon Rank Sum test. (E) Expression level of IFNB, IFNL1, and selected ISGs for DVG- cells with low virus group at 2 dpi and 3 dpi were graphed as violin plots. **** P<0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05 by Fisher’s exact test.
Symptomatic COVID-19 patients had higher amount and Jfreq of SARS-CoV-2 DVGs than asymptomatic patients.
As SARS-CoV-2 DVGs can stimulate early expression of primary IFNs, the question of whether DVG generation is associated to COVID-19 disease severity was asked. We identified a publicly available NGS dataset (PRJNA690577) investigating subgenomic RNAs and their protein expression from symptomatic vs asymptomatic COVID-19 patients and the authors also indicated more deletions with length over 20 nts in symptomatic patients than asymptomatic patients (Wong, Ngan et al. 2021). To better examine the DVG (larger deletions) level between two patient groups, we applied our criteria to this dataset and found a distinguished increased DVG counts (both −sense and +sense, Fig. 7A) and subsequent higher Jfreq (Fig. 7C) in symptomatic individuals compared to asymptomatic patients on average. Additionally, our method also confirmed the original finding that the read counts for genomic RNA was significantly lower in symptomatic patients than that in asymptomatic patients (Fig. 7B). These data imply the potential role of DVGs in COVID-19 symptom development.
Figure 7. Symptomatic COVID-19 patients had higher amount and Jfreq of SARS-CoV-2 DVGs than asymptomatic patients.
Samples of various collection methods including nasopharyngeal (n = 42), anterior nasal (n = 35), and oropharyngeal (n = 5) were used from NGS dataset PRJNA690577. Symptomatic samples (n = 51) were collected from patients presented at the hospital with symptoms consistent with COVID-19 while asymptomatic samples (n = 30) were collected from patients who did not have symptoms consistent with COVID-19 and were found through contact tracing and workforce screening. DVG read counts (A), viral read counts (B), and Jfreq (C) percentages were calculated and graphed for all symptomatic and asymptomatic samples. **** p < 0.0001, *** p < 0.001 by two-sided Mann-Whitney test.
High DVG Jfreq was observed in one COVID-19 persistent patient.
SARS-CoV-2 can develop persistent infections in immunosuppressive patients (Caccuri, Messali et al. 2022, Quaranta, Fusaro et al. 2022), and DVGs have been reported to facilitate viral persistence (Sun and López 2017). To examine whether DVGs are associated with persistent SARS-CoV-2 infection in patient, we identified one NGS dataset, where nasal samples were taken at 9 time points from one immunosuppressive patient who was infected with SARS-CoV-2 and was positive for virus up to 140 days since the first hospital admission (PRJEB47786). We detected DVGs in all 9 time points, but the amount of DVGs were not always correlated with total virus counts (Fig. 8A and 8B). More interestingly, Jfreq of DVGs from the samples in this patient were at least 10 times higher than the number we observed in in vitro infections and autopsy tissues (Fig. 8C vs Fig. 1B, 4A, and 5A) with highest Jfreq up to nearly 20% at 56 days post initial admission to hospital. We noticed that the method used in this dataset was tiled-PCR using ARTIC V3 followed by Illumina sequencing, which is different from all the previous bulk and single cell RNA-seq we examined. To test whether the high Jfreq was due to the different approach and potentially because of nasal samples, we found another NGS dataset with nasal samples of normal COVID-19 patients using tiled-PCR (ARTIC V1 and V3) followed by Illumina sequencing (PRJNA707211). We found that the Jfreq of each patient sample was below 1%, within the range observed from previous in vitro and autopsy NGS (Fig. 8D vs Fig. 1B, 4A, and 5A). This strongly suggests that the high Jfreq of DVGs in this patient was not due to the amplification and sequencing methods, but rather may be associated with the suppression status of patient’s immune system and persistent viral infection.
Figure 8. High DVG Jfreq was observed in one SARS-CoV-2 persistent patient.
Nasal samples were collected from one immunosuppressive COVID-19 patient with persistent viral infection at 9 different time points. DVGs were identified from the NGS dataset (ERP132087/PRJEB47786) of the nasal samples from this patient. DVG read counts (A), viral read counts (B), and Jfreq (C) percentages were calculated and graphed for samples at each time points. (D) Jfreq of samples in another NGS dataset (PRJNA707211) utilizing the same amplification and sequencing methods demonstrated a much smaller Jfreq than the SARS-CoV-2 persistent patient, comparable to Jfreq levels found SARS-CoV-2-infected in vitro and autopsy samples.
Discussion
It has been well-documented that DVGs are universally generated across single stranded RNA viruses both in vitro and in vivo, such as Respiratory Syncytial Virus (RSV), measles, influenza, Ebola, Dengue, CoVs, and many more. For SARS-CoV-2, DVGs are resulted from non-homologous recombination and are previously observed in infected Vero cells (Chaturvedi, Vasen et al. 2021) and nasal samples of COVID-19 patients (Xiao, Lidsky et al. 2021). In Vero cells, SARS-CoV-2 is reported to be more than 10 times more recombinogenic than other CoVs, such as MERS-CoV (Gribble, Stevens et al. 2021) and junctions of SARS-CoV-2 DVGs are most commonly flanked at U-rich RNA sequences, suggesting a novel mechanism by which viral polymerases use to generate DVGs. Interestingly, recombination is also proposed to be critical for coronavirus diversity and emergence of SARS-CoV-2 and other zoonotic CoVs. To further understand the recombination positions of SARS-CoV-2 DVGs, we expanded DVG analyses to 4 more commonly used cells lines for SARS-CoV-2 studies, primary human lung epithelial cells (NHBE), and autopsy tissues from patients died of complications of COVID-19, further confirming that DVGs are ubiquitously produced during SARS-CoV-2 infections. Importantly, we identified specific genomic hotspots for DVG recombinant sites that are not only consistent in in vitro and in patient samples, but also shared between +sense and −sense DVGs. These results imply two points: 1) DVG recombination is not random in SARS-CoV-2 and certain mechanisms are utilized to regulate their production; and 2) our identified +sense DVGs and −sense DVGs are correlated with each other, likely due to the self-replication in between. One limitation of our analyses using short-read NGS is that short reads are <400 bp long and thus junction reads are less likely to cover the entire DVG sequence. Despite of this, the replication capability of identified DVGs strongly suggest that the 5’ UTR and 3’ UTR are retained in our identified DVGs, as two UTRs are essential for genome replication. More analysis from long read sequencing data are needed to further confirm full sequences of DVGs.
Based on the secondary structures identified by COMRADES crosslinking in the +sense viral genome (Ziv, Gabryelska et al. 2018), we calculated the structural distance between two recombination sites of any −sense DVGs and surprisingly found an association between DVG break and rejoin points with short structural distance (Fig. 3C, D), as mediated by RNA base pairing. The relatively short structural distance, as compared to the sequence length, indicates that DVGs form when the viral polymerase falls off the template during replication and then rejoins the viral template at a position close in space, which can be quite distant in sequence. This strongly suggests that the recombination of viral polymerase complex can be guided by the secondary structures within viral genomes. As the structures formed within the −sense strand are expected to be different from those in +sense strand (because folding stability is strand-direction dependent and G-U pairs map to A-C mismatches in the complementary strand), we postulate that DVG generation is initiated as −sense by viral polymerase complex using +sense viral genomes as template and −sense DVGs are then used as templates to replicate +sense DVGs (Fig. S1E). More investigations on the secondary structures in both strands of viral genomes and their role in viral recombination are needed to further test this hypothesis.
The presence of DVGs on host response and viral replication were additionally explored. It was observed that samples with moderate and high amounts of DVGs exhibited enhanced antiviral responses than samples with low amounts of DVGs. From scRNA-seq analysis, IFN pathways were suppressed in highly infected cells and primary IFNs were stimulated earlier in moderately infected cells with DVGs than the ones without DVGs. These data suggest DVG generation earlier on in infection can enhance antiviral response more quickly, which is critical for mounting adequate and in-time immune response. The mechanisms by which DVGs enhance IFN responses are unknown. DVGs from RSV and influenza can function as primary triggers to directly stimulate type I IFN production through RIG-I like receptors (Sun and López 2017). It is previously reported that SARS-CoV-2 RNAs can be recognized by MDA5 (Thorne, Reuschl et al. 2021, Znaidia, Demeret et al. 2022) and we showed that the expression of MDA5 (IFIH1) was elevated in DVG+ cells at 2 dpi (Fig. 6C). Therefore, it is possible that SARS-CoV-2 DVGs stimulate type I/III IFNs through MDA5. Alternatively, if DVGs do not directly stimulate IFN production, they can suppress the expression of viral-encoding IFN antagonists by large deletions, resulting in an earlier and higher IFN expression in DVG+ cells. Indeed, IFN antagonists are encoded in NSP1, NSP3, NSP5, NSP12, NSP13, NSP14, NSP15, ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, ORF9b, N, and M (Lei, Dong et al. 2020, Xia, Cao et al. 2020, Han, Zhuang et al. 2021, Wong, Cheung et al. 2022, Znaidia, Demeret et al. 2022) and most of them are within the deletion regions based on our conserved genomic hotspots for DVG recombination sites (Fig. 2A and 2B). Nevertheless, the higher IFN expression in DVG+ samples/cells suggest the critical role of DVGs in modulating host responses and sequential disease severity of COVID-19.
To further explore the role of DVGs in COVID-19 severity, we take advantage of one published NGS dataset that investigates sgmRNA levels in patients with differing clinical severity (Wong, Ngan et al. 2021). They observed a reduction of viral sgmRNAs and viral deletions larger than 20 nts but an increased viral genomic RNA level in nasal samples from asymptomatic patients. As deletions with a cutoff of 20 nts may not represent the viral genomes that are defective, we applied our criteria to this dataset and found that the abundance and Jfreq of DVGs containing deletions larger than 100 nts were similarly reduced in asymptomatic patients compared to symptomatic patients. A significant difference in DVG production between patients with and without symptoms leads us to posit that quantity and Jfreq of DVGs contribute to the heterogeneity of both disease outcomes and presentation of symptoms in infected individuals, potentially through modulating host immune responses. As sgmRNAs and DVGs were both reduced in asymptomatic group in this cohort study, we wonder whether sgmRNAs production is always positively correlated with DVG generation. To examine this, we quantified TRS-dependent junction reads (recombination sites <85) from the ViReMa output in infected PHLE cells from different age groups as the estimation of sgmRNAs (dataset used in Fig. 4). Interestingly, we did not observe any positive correlation. Specifically, D198 with the least DVG amount among all samples at 72 hpi had more sgmRNAs counts (n=385) than D239 (n=32), which again confirm that DVGs, rather than sgmRNAs, specifically stimulate IFN responses. Why do symptomatic patients generate more DVGs? It is possible that the IFN response induced by DVGs lead to subsequential expression of cytokines, such as IL6, which is known to be an important mediator for immune-induced fever, as shown in blood monocytes for SARS-CoV-2 infection (Junqueira, Crespo et al. 2021). However, rapid and controlled immune response will lead to milder symptoms, whereas prolonged and uncontrolled immune response will lead to severe symptoms and even death (Janssen, Grondman et al. 2021). Future studies with higher symptom scoring resolution, such as mild/moderate, severe, and death, could elucidate the potential associations of DVG abundance and/or frequency with viral load, IFN responses, and COVID-19 disease severity.
Analysis of DVG presence in longitudinal clinical samples describe the kinetics of the DVG population across entire infection course. For one NGS dataset, we were surprised to find one immunosuppressed patient generating DVGs consistently in every collected time point over a period of 140 days, and Jfreq of these samples being at least 10-fold higher than all previous analyzed datasets (>1%). When comparing a similar method, it was determined that the increased Jfreq was not due to the amplification and sequencing methods, but rather a biological difference either from a compromised immune status or a prolonged viral infection. These data additionally imply that a prolonged DVG presence/production may associate with a prolonged viral infection and a longer length of illness. Indeed, DVGs have been shown to promote viral persistence for various viruses, such as influenza A (De and Nayak 1980), dengue (Juárez-Martínez, Vega-Almeida et al. 2013), Japanese encephalitis virus (Park, Choi et al. 2013), mumps (Andzhaparidze, Bogomolova et al. 1983), rabies (Kawai, Matsumoto et al. 1975), Sendai (Roux and Waldvogel 1981), measles (Baczko, Liebert et al. 1986); additionally, worse disease outcome was found to be associated with prolonged DVG detection in RSV (Felt, Sun et al. 2021). More longitudinal studies are needed to elucidate the relationship between DVGs and prolonged viral infection especially in immunosuppressed COVID-19 patients.
Determining the generation (recombination) and function of DVGs during SARS-CoV-2 infection would facilitate reducing the viral recombination events, which greatly contribute to newly emerging CoVs, and elucidate another point of mitigating disease severity from those infected. We showed here that the recombination sites of SARS-CoV-2 DVGs are clustered in several genomic regions, which are likely to be determined by RNA secondary structures formed in between. Furthermore, our studies provide the evidence that DVGs play vital roles in IFN stimulation, prolonged viral replication, and symptom development during SARS-CoV-2 infection, urging for more investigations to further determine the mechanism of DVG generation and their impact on SARS-CoV-2 pathogenesis.
Materials and Methods
Virus and cell preparation
The following reagents were deposited by the Centers for Disease Control and Prevention and obtained through BEI Resources, NIAID, NIH: SARS-Related Coronavirus 2, Isolate USA-WA1/2020, NR-52281. SARS-CoV-2 was propagated and titered using African green monkey kidney epithelial Vero E6 cells (American Type Culture Collection, CRL-1586) in Eagle’s Minimum Essential Medium (Lonza, 12–125Q) supplemented with 2% fetal bovine serum (FBS) (Atlanta Biologicals), 2 mM l-glutamine (Lonza, BE17–605E), and 1% penicillin (100 U/ml) and streptomycin (100 μg/ml). Viral stocks were stored at − 80°C. All work involving infectious SARS-CoV-2 was performed in the Biosafety Level 3 (BSL-3) core facility of the University of Rochester, under institutional biosafety committee (IBC) oversight.
PHLE culture on air-liquid interface and SARS-CoV-2 infection
Primary human lung epithelial (PHLE) cells were cultured on an air-liquid interface as previously described (Wang, Bhattacharya et al. 2020, Anderson, Chirkova et al. 2021). Briefly, lung tissue issues were digested with a protease cocktail and cells were then cultured on a collagen-coated transwell plate (Corning, 3470) until each well reaches a transepithelial electrical resistance (TEER) measurement of >300 ohms. Cells were then placed on an air-liquid interface (ALI) by removing media from the apical layer of the transwell chamber and continuing to feed cells on the basolateral layer as they differentiate. Cells were differentiated for 4–5 weeks at ALI before use in experiments. The apical layer of primary lung cells that had been cultured on an air-liquid interface for about 4–5 weeks were inoculated with SARS-CoV-2 (BEI, NR-52281, hCoV-19/USA-WA1/2020) at a MOI of 5 (titered in VeroE6 cells) in phosphate-buffered saline containing calcium and magnesium (PBS++; Gibco, 14040–133), and incubated at 37°C for 1.5 hours. The infectious solution was then removed and the apical layer washed with PBS++. Cells were then incubated for 24, 48, or 72 hours.
SARS-CoV-2 inactivation and sample preparation
Cells that were harvested at 24 and 72 hours post infections were lysed with SDS lysis buffer (50mM Tris pH8.0, 10mM EDTA, 1% SDS) and collected with a wide-bore pipette tip. Cells that were harvested at 48 hours were first washed by dispensing and aspirating 37°C HEPES buffered saline solution (Lonza, CC-5022), and then trypsinized with 0.025% Trypsin/EDTA (Lonza, CC-5012) for 10 min at 37°C. Dissociated cells were aspirated using a wide-bore pipette tip and to a tube containing ice-cold Trypsin Neutralization Solution (Lonza, CC5002); this was repeated to maximize cell collection. Cells were then pelleted by centrifugation, resuspended in chilled HEPES, and centrifugally pelleted once more before being resuspended in SDS lysis buffer. All samples were physically lysed with QIAshredder homegenizers (Qiagen, 79656) and stored at −80°C. Homogenized SDS lysates were diluted 1:1 with RNA lysis buffer (Agilent) and RNA was extracted using the Absolutely RNA Microprep Kit (Agilent) according to the manufacturer’s protocol, including on-column DNase treatment.
Bulk RNA-sequencing of infected PHLE cells
RNA concentration was determined with the NanopDrop 1000 spectrophotometer (NanoDrop, Wilmington, DE) and RNA quality assessed with the Agilent Bioanalyzer 2100 (Agilent, Santa Clara, CA). 1 ng of total RNA was pre-amplified with the SMARTer Ultra Low Input kit v4 (Clontech, Mountain View, CA) per manufacturer’s recommendations. The quantity and quality of the subsequent cDNA was determined using the Qubit Flourometer (Life Technologies, Carlsbad, CA) and the Agilent Bioanalyzer 2100 (Agilent, Santa Clara, CA). 150 pg of cDNA was used to generate Illumina compatible sequencing libraries with the NexteraXT library preparation kit (Illumina, San Diego, CA) per manufacturer’s protocols. The amplified libraries were hybridized to the Illumina flow cell and sequenced using the NovaSeq6000 sequencer (Illumina, San Diego, CA). Single end reads of 100nt were generated for each sample.
Bulk RNA-seq data processing and DVG identification
The datasets used for bulk RNA-Seq analyses in Fig. 1 and Fig. 2 were publicly available. Their detailed information was listed in Table S1. The RNA-seq used in Fig. 4 were from our own infection following the protocol as demonstrated earlier. For each sample, we first used Bowtie2 (v. 2.2.9, (Langmead and Salzberg 2012)) to align the reads to the GRCh38 human reference genome. The unmapped reads were then applied to ViReMa (Viral-Recombination-Mapper v. 0.21) to identify recombination junction sites and their corresponding read counts using SARS-CoV-2 reference genome (GenBank ID MT020881.1). A custom filtering script was developed in R to identify junction reads that met our criteria (R v4.1.0 and RStudio v1.4.17, script in Fig. S6). We required the positions of both sites (break and rejoin) of junction reads larger than 85, as TRS-L is reported to be located with the first 85 nts of the SARS-CoV-2 genome. Additionally, we required deletions longer than 100 nts to ensure that the truncated viral RNAs are deficient in replication. We also included all deletions that had one or more reads as identified by ViReMa. The number of viral reads in each bulk RNA-Seq sample was quantified using the RSubread Bioconductor package. The junction frequency (Jfreq) was calculated as shown below for each sample.
For host transcriptome analysis, raw fastq files were mapped to the human transcriptome (cDNA; Ensembl release 86) using Kallisto (Bray, Pimentel et al. 2016) with 60 bootstraps per sample. Annotation and summarization of transcripts to genes was carried out in R, using the TxImport package (Soneson, Love et al. 2015). Differentially expressed genes (≥twofold and ≤ 1% false discovery rate) were identified by linear modeling and Bayesian statistics using the VOOM function in the Limma package (Ritchie, Phipson et al. 2015). Gene Ontology (GO) was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Dennis, Sherman et al. 2003).
DVG identification from scRNA-seq dataset
We used the publicly available dataset from Ravindra et al. 2021 accessed through the NCBI database (GSE166766). This study consisted of single cell RNA-Seq (scRNA-Seq) data from human bronchial epithelial cells (NHBEs) infected with SARS-CoV-2 that were harvested 1 day post infection (dpi), 2 dpi, and 3 dpi. We first used Cell Ranger (Zheng, Terry et al. 2017) to construct gene expression matrices for each sample. To identify the number of viral transcripts, the SARS-CoV-2 reference sequence was concatenated to the end of the human genome reference as one additional gene. The gene expression matrices were then loaded into the Seurat package in R (Satija, Farrell et al. 2015), followed by principal component analysis and cell clustering were performed. Cells were then clustered and annotated based on the gene makers used in the original publication of this dataset. To identify DVGs, we first used UMI-tools (Smith, Heger et al. 2017) to associate the cell barcodes and UMIs with each corresponding read name. Similar to the bulk RNA-Seq analysis, we used Bowtie2 (Langmead and Salzberg 2012), ViReMa, and a custom R filtering script for DVG identification (details in Fig. S6). We then used the filtered ViReMa output to re-quantify DVG count based on the UMIs associated with each cell barcode, which is considered as DVG count per cell. We also calculated Jfreq for each cell by using DVG UMI/viral UMI per cell barcode. The numbers of DVG UMIs and Jfreq of each cell barcode was then added to the gene expression matrix created by Cell Ranger. The Jfreq values were multiplied by 103 so that they would not be left out during the cell clustering and type identification steps. Cells with more than one DVG UMI (virus positive cells) were grouped as DVG+ and DVG− based on the presence or absence of DVG UMI, respectively.
Differentially expressed genes between DVG+ and DVG− in scRNA-seq analysis
The list of differentially expressed genes between the DVG+ group and DVG− group was generated with the Seurat function FindMarkers, after normalizing and scaling the data with the Seurat function SCTransform. Three different types of tests were used to create three differential gene expression (DGE) lists for both 2 dpi and 3 dpi: Mast, DESeq2, and the Wilcoxon rank sum test (default) using the criteria of percentage of cells where the gene was detected (pct) > 0.1, adj_pval < 0.01, and log fold change > 0.25. The final DGE list was determined based on common genes that were found in two of the three methods. To identify the pathways enriched in the DGE list, we first divided the DGE list based on their upregulation and downregulation in DVG+ group. GO analysis was performed for the upregulated genes and downregulated genes separately through DAVID functional annotation clustering tool and graphed in R using the code in Fig. S6. We then specifically focused on interferon responses between DVG+ and DVG− groups. Low, medium and high groups were further categorized based on their amount of viral UMI within virus positive cells and the expression of selected IFN related genes were specifically compared and graphed between DVG+ and DVG− cells within each viral groups in R (code in Fig. S6).
DVG identification from the tiled-PCR deep sequencing
The protocol for identifying DVGs in three publicly available datasets that utilize PCR tiling from ARTIC LoCost (V1 or V3) (https://artic.network) primer sets followed bulk sequencing data processing for DVG identification. The first dataset was used to study DVG generation during longitudinal COVID-19 persistence in one immunosuppressed patient (ENA: ERP132087, NCBI SRA: PRJEB4778) and the second one was served as the control cohort containing 16 regular COVID-19 patients using the same way to prepare the library (PRJNA707211). The third one is to study DVGs in a cohort of both asymptomatic and symptomatic COVID-19 patients (NCBI SRA: PRJNA690577). This method of amplification produced overlapping 400 bp amplicons that are then used to construct respective sequencing libraries from which data processing and subsequent analysis can occur. For the longitudinal study, the ARTIC V3 amplicons were sequenced as paired-end 300 bp reads on Illumina Miseq. The ARTIC V3 amplicons of the symptomatic cohort study was PCR amplified by five cycles and also sequenced identically.
Secondary structure analysis of DVG junction positions
Our definition of structural distance follows (Clote, Ponty et al. 2012). For a given primary sequence and a corresponding secondary structure, we first convert them to a graph where each nucleotide i is a node. We add an edge (i, i+1) between any two adjacent nucleotides i and i+1 (gray bonds in Fig. 3B), and an edge (i, j) between any paired bases i and j (black bonds in Fig. 3B) as reported by Ziv et al. from their COMRADES mapping (Ziv, Gabryelska et al. 2018). This graph can model alternative base pairs. For example, if nucleotide i has possible pairs with nucleotides j, k, and l, then node i will connect five edges (i, i−1), (i, i+1), (i, j), (i, k), and (i, l). Based on the connected graph, the structural distance between two nucleotides is formalized as the number of edges in the shortest path between them (red solid path in Fig. 3B), which can be solved by classical graph algorithms (Cormen, Leiserson et al. 2022).
The chimeric reads detected by COMRADES from (Ziv, Price et al. 2020) consist of only left- and right-side sequences without base-pairing information. For short-range interactions, they extracted a (continuous) subsequence between the 5’ end of the left side and the 3’ end of the right side and used RNAfold (Lorenz, Bernhart et al. 2011) to predict structures for that subsequence. For long-range interactions, they utilized RNAduplex (Lorenz, Bernhart et al. 2011) to predict interactions between the two (distant) segments, which does not model any intra-segmental base pairs for either segment. Note that alternative base pairs exist in the data. Therefore, we built the graph based on the predicted base pairs in Ziv et al.’s data and calculated the structural distance between any two positions using the method described above. Additionally, we chose a cutoff value of 50 for the number of chimeric reads, which leads to a balanced precision and sensitivity evaluated on the known structure (Ziv, Gabryelska et al. 2018).
Statistical analysis
Pearson’s correlation was performed to identify the association between virus and DVG counts and virus and Jfreq in the bulk RNA-Seq datasets. For the scRNA-Seq dataset, unpaired two-sided Wilcoxon rank sum tests were performed to identify the differences in viral load, DVG counts, and Jfreq among mock, 1 dpi, 2 dpi, and 3 dpi samples. We first log transformed viral UMI counts and expression level of selected IFN related genes and then compared between DVG− and DVG+ cells for each time point using unpaired two-sided Wilcoxon rank sum tests.
Supplementary Material
Importance.
Defective viral genomes (DVGs) are ubiquitously generated in many RNA viruses, including SARS-CoV-2. Their interference activity to full-length viruses and IFN stimulation provide them the potential for novel antiviral therapies and vaccine development. SARS-CoV-2 DVGs are generated through the recombination of two discontinuous genomic fragments by viral polymerase complex and the recombination is also one of the major mechanisms for the emergence of new coronaviruses. Focusing on the generation and function of SARS-CoV-2 DVGs, these studies identify new hotspots for non-homologous recombination and strongly suggest that the secondary structures within viral genomes mediate the recombination. Furthermore, these studies provide the first evidence for IFN stimulation activity of de novo DVGs during natural SARS-CoV-2 infection. These findings set up the foundation for further mechanism studies of SARS-CoV-2 recombination and provide the evidence to harness DVGs’ immunostimulatory potential in the development of vaccine and antivirals for SARS-CoV-2.
Acknowledgments
We would like to thank the lab group of Dr. Andrew Routh from UTMB for assistance in setting up ViReMa for our analysis of DVG generation. We would like to thank Dr. Xing Qiu from University of Rochester for statistical advice. Publicly available datasets provided by the following lab groups listed are especially recognized: Chandam Deshpande, Landthaler, Lipsky, and Wilen. The authors want to acknowledge the contributions of Gloria S. Pryhuber, M.D., University of Rochester Center for Advanced Research Technologies, the University of Rochester Genomics Research Center (GRC), the Biosafety Level 3 program, the University of Rochester Biosafety Level 3 (BSL3) core facility, and the University of Rochester’s Institutional Biosafety Committee (IBC). We thank Sara Ali, University of Rochester, for help in discussion and correlations between −sense and +sense DVGs.
Funding
This work was supported by the University of Rochester’s Institutional Program Unifying Population and Laboratory Based Sciences Award from the Burroughs Wellcome Fund, Request ID 1014095; National Center for Advancing Translational Sciences, TL1-TR002000; NIH-NHLBI Human Tissue Core (Dr. Gloria Pryhuber, Principal Investigator, U01 HL122700) for the Lung Molecular Atlas Program; University of Rochester Technology Development Fund, OP346177; University of Rochester School of Medicine and Dentistry Scientific Advisory Committee Incubator Award; University of Rochester HSCCI OP211341; and NIH grant R35GM145283 to D.H.M.
Footnotes
Competing interest
Authors declare that no competing interesting exist.
Data availability
Source data for the publicly available NGS datasets described in this manuscript is available as Supplementary Table S1. All NGS datasets were retrieved with NCBI and ENA accession numbers GSE147507 (Daamen, Bachali et al. 2021), GSE148729 (Wyler, Mösbauer et al. 2021), BioProject PRJNA628043 (Ogando, Dalebout et al. 2020), GSE166766 (Ravindra, Alfajaro et al. 2021), GSE150316 (Desai, Neyaz et al. 2020), BioProject PRJNA707211 (Jaworski, Langsjoen et al. 2021), and BioProject PRJNA690577 (Wong, Ngan et al. 2021); ERP132087-BioProject PRJEB47786 (Weigang, Fuchs et al. 2021), respectively. Dataset used in Fig. 4 are available upon request and the raw data of all infected samples are under submission to GEO.
References
- Anderson C. S., Chirkova T., Slaunwhite C. G., Qiu X., Walsh E. E., Anderson L. J. and Mariani T. J. (2021). “CX3CR1 Engagement by Respiratory Syncytial Virus Leads to Induction of Nucleolin and Dysregulation of Cilium-Related Genes.” Journal of Virology 95(11): e00095–00021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andzhaparidze O. G., Bogomolova N. N., Boriskin Yu S. and Drynov I. D. (1983). “Chronic non-cytopathic infection of human continuous cell lines with mumps virus.” Acta Virol 27(4): 318–328. [PubMed] [Google Scholar]
- Baczko K., Liebert U. G., Billeter M., Cattaneo R., Budka H. and ter Meulen V. (1986). “Expression of defective measles virus genes in brain tissues of patients with subacute sclerosing panencephalitis.” J Virol 59(2): 472–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beauclair G., Mura M., Combredet C., Tangy F., Jouvenet N. and Komarova A. V. (2018). “DI-tector: defective interfering viral genomes’ detector for next-generation sequencing data.” RNA 24(10): 1285–1296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brant A. C., Tian W., Majerciak V., Yang W. and Zheng Z. M. (2021). “SARS-CoV-2: from its discovery to genome structure, transcription, and replication.” Cell Biosci 11(1): 136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bray N. L., Pimentel H., Melsted P. and Pachter L. (2016). “Near-optimal probabilistic RNA-seq quantification.” Nat Biotechnol 34(5): 525–527. [DOI] [PubMed] [Google Scholar]
- Brian D. A. and Spaan W. J. M. (1997). “Recombination and Coronavirus Defective Interfering RNAs.” Semin Virol 8(2): 101–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caccuri F., Messali S., Bortolotti D., Di Silvestre D., De Palma A., Cattaneo C., Bertelli A., Zani A., Milanesi M., Giovanetti M., Campisi G., Gentili V., Bugatti A., Filippini F., Scaltriti E., Pongolini S., Tucci A., Fiorentini S., d’Ursi P., Ciccozzi M., Mauri P., Rizzo R. and Caruso A. (2022). “Competition for dominance within replicating quasispecies during prolonged SARS-CoV-2 infection in an immunocompromised host.” Virus Evol 8(1): veac042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaturvedi S., Vasen G., Pablo M., Chen X., Beutler N., Kumar A., Tanner E., Illouz S., Rahgoshay D., Burnett J., Holguin L., Chen P. Y., Ndjamen B., Ott M., Rodick R., Rogers T., Smith D. M. and Weinberger L. S. (2021). “Identification of a therapeutic interfering particle-A single-dose SARS-CoV-2 antiviral intervention with a high barrier to resistance.” Cell 184(25): 6022–6036.e6018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiale C., Greene T. T. and Zuniga E. I. (2022). “Interferon induction, evasion, and paradoxical roles during SARS-CoV-2 infection.” Immunol Rev 309: 12–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clote P., Ponty Y. and Steyaert J. M. (2012). “Expected distance between terminal nucleotides of RNA secondary structures.” J Math Biol 65(3): 581–599. [DOI] [PubMed] [Google Scholar]
- Cormen T. H., Leiserson C. E., Rivest R. L. and Stein C. (2022). Introduction to algorithms. Cambridge, Massachusett, The MIT Press. [Google Scholar]
- Daamen A. R., Bachali P., Owen K. A., Kingsmore K. M., Hubbard E. L., Labonte A. C., Robl R., Shrotri S., Grammer A. C. and Lipsky P. E. (2021). “Comprehensive transcriptomic analysis of COVID-19 blood, lung, and airway.” Sci Rep 11(1): 7052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dadras O., Afsahi A. M., Pashaei Z., Mojdeganlou H., Karimi A., Habibi P., Barzegary A., Fakhfouri A., Mirzapour P., Janfaza N., Dehghani S., Afroughi F., Dashti M., Khodaei S., Mehraeen E., Voltarelli F., Sabatier J. M. and SeyedAlinaghi S. (2022). “The relationship between COVID-19 viral load and disease severity: A systematic review.” Immun Inflamm Dis 10(3): e580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davidson A. D., Williamson M. K., Lewis S., Shoemark D., Carroll M. W., Heesom K. J., Zambon M., Ellis J., Lewis P. A., Hiscox J. A. and Matthews D. A. (2020). “Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein.” Genome Med 12(1): 68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De B. K. and Nayak D. P. (1980). “Defective interfering influenza viruses and host cells: establishment and maintenance of persistent influenza virus infection in MDBK and HeLa cells.” J Virol 36(3): 847–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dennis G. Jr., Sherman B. T., Hosack D. A., Yang J., Gao W., Lane H. C. and Lempicki R. A. (2003). “DAVID: Database for Annotation, Visualization, and Integrated Discovery.” Genome Biol 4(5): P3. [PubMed] [Google Scholar]
- Desai N., Neyaz A., Szabolcs A., Shih A. R., Chen J. H., Thapar V., Nieman L. T., Solovyov A., Mehta A., Lieb D. J., Kulkarni A. S., Jaicks C., Xu K. H., Raabe M. J., Pinto C. J., Juric D., Chebib I., Colvin R. B., Kim A. Y., Monroe R., Warren S. E., Danaher P., Reeves J. W., Gong J., Rueckert E. H., Greenbaum B. D., Hacohen N., Lagana S. M., Rivera M. N., Sholl L. M., Stone J. R., Ting D. T. and Deshpande V. (2020). “Temporal and spatial heterogeneity of host response to SARS-CoV-2 pulmonary infection.” Nat Commun 11(1): 6319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dufour D., Mateos-Gomez P. A., Enjuanes L., Gallego J. and Sola I. (2011). “Structure and functional relevance of a transcription-regulating sequence involved in coronavirus discontinuous RNA synthesis.” J Virol 85(10): 4963–4973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felt S. A., Sun Y., Jozwik A., Paras A., Habibi M. S., Nickle D., Anderson L., Achouri E., Feemster K. A., Cárdenas A. M., Turi K. N., Chang M., Hartert T. V., Sengupta S., Chiu C. and López C. B. (2021). “Detection of respiratory syncytial virus defective genomes in nasal secretions is associated with distinct clinical outcomes.” Nat Microbiol 6(5): 672–681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frensing T., Heldt F. S., Pflugmacher A., Behrendt I., Jordan I., Flockerzi D., Genzel Y. and Reichl U. (2013). “Continuous influenza virus production in cell culture shows a periodic accumulation of defective interfering particles.” PLoS One 8(9): e72288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gozman L., Perry K., Nikogosov D., Klabukov I., Shevlyakov A. and Baranova A. (2021). “A Role of Variance in Interferon Genes to Disease Severity in COVID-19 Patients.” Front Genet 12: 709388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gribble J., Stevens L. J., Agostini M. L., Anderson-Daniels J., Chappell J. D., Lu X., Pruijssers A. J., Routh A. L. and Denison M. R. (2021). “The coronavirus proofreading exoribonuclease mediates extensive viral recombination.” PLoS Pathog 17(1): e1009226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han L., Zhuang M. W., Deng J., Zheng Y., Zhang J., Nan M. L., Zhang X. J., Gao C. and Wang P. H. (2021). “SARS-CoV-2 ORF9b antagonizes type I and III interferons by targeting multiple components of the RIGI/MDA-5-MAVS, TLR3-TRIF, and cGAS-STING signaling pathways.” J Med Virol 93(9): 5376–5389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hofmann M. A., Sethna P. B. and Brian D. A. (1990). “Bovine coronavirus mRNA replication continues throughout persistent infection in cell culture.” J Virol 64(9): 4108–4114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang A. S. (1973). “Defective interfering viruses.” Annu Rev Microbiol 27: 101–117. [DOI] [PubMed] [Google Scholar]
- Huang A. S. and Baltimore D. (1970). “Defective viral particles and viral disease processes.” Nature 226(5243): 325–327. [DOI] [PubMed] [Google Scholar]
- Janssen N. A. F., Grondman I., de Nooijer A. H., Boahen C. K., Koeken V., Matzaraki V., Kumar V., He X., Kox M., Koenen H., Smeets R. L., Joosten I., Brüggemann R. J. M., Kouijzer I. J. E., van der Hoeven H. G., Schouten J. A., Frenzel T., Reijers M. H. E., Hoefsloot W., Dofferhoff A. S. M., van Apeldoorn M. J., Blaauw M. J. T., Veerman K., Maas C., Schoneveld A. H., Hoefer I. E., Derde L. P. G., van Deuren M., van der Meer J. W. M., van Crevel R., Giamarellos-Bourboulis E. J., Joosten L. A. B., van den Heuvel M. M., Hoogerwerf J., de Mast Q., Pickkers P., Netea M. G. and van de Veerdonk F. L. (2021). “Dysregulated Innate and Adaptive Immune Responses Discriminate Disease Severity in COVID-19.” J Infect Dis 223(8): 1322–1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaworski E., Langsjoen R. M., Mitchell B., Judy B., Newman P., Plante J. A., Plante K. S., Miller A. L., Zhou Y., Swetnam D., Sotcheff S., Morris V., Saada N., Machado R. R., McConnell A., Widen S. G., Thompson J., Dong J., Ren P., Pyles R. B., Ksiazek T. G., Menachery V. D., Weaver S. C. and Routh A. L. (2021). “Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants.” Elife 10: e68479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juárez-Martínez A. B., Vega-Almeida T. O., Salas-Benito M., García-Espitia M., De Nova-Ocampo M., Del Ángel R. M. and Salas-Benito J. S. (2013). “Detection and sequencing of defective viral genomes in C6/36 cells persistently infected with dengue virus 2.” Arch Virol 158(3): 583–599. [DOI] [PubMed] [Google Scholar]
- Junqueira C., Crespo Ã., Ranjbar S., Lewandrowski M., Ingber J., de Lacerda L. B., Parry B., Ravid S., Clark S., Ho F., Vora S. M., Leger V., Beakes C., Margolin J., Russell N., Kays K., Gehrke L., Adhikari U. D., Henderson L., Janssen E., Kwon D., Sander C., Abraham J., Filbin M., Goldberg M. B., Wu H., Mehta G., Bell S., Goldfeld A. E. and Lieberman J. (2021). “SARS-CoV-2 infects blood monocytes to activate NLRP3 and AIM2 inflammasomes, pyroptosis and cytokine release.” Res Sq. [Google Scholar]
- Kawai A., Matsumoto S. and Tanabe K. (1975). “Characterization of rabies viruses recovered from persistently infected BHK cells.” Virology 67(2): 520–533. [DOI] [PubMed] [Google Scholar]
- Kupke S. Y., Riedel D., Frensing T., Zmora P. and Reichl U. (2019). “A Novel Type of Influenza A Virus-Derived Defective Interfering Particle with Nucleotide Substitutions in Its Genome.” J Virol 93(4): e01786–01718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwon J. S., Kim J. Y., Kim M. C., Park S. Y., Kim B. N., Bae S., Cha H. H., Jung J., Kim M. J., Lee M. J., Choi S. H., Chung J. W., Shin E. C. and Kim S. H. (2020). “Factors of Severity in Patients with COVID-19: Cytokine/Chemokine Concentrations, Viral Load, and Antibody Responses.” Am J Trop Med Hyg 103(6): 2412–2418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B. and Salzberg S. L. (2012). “Fast gapped-read alignment with Bowtie 2.” Nat Methods 9(4): 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lega S., Naviglio S., Volpi S. and Tommasini A. (2020). “Recent Insight into SARS-CoV2 Immunopathology and Rationale for Potential Treatment and Preventive Strategies in COVID-19.” Vaccines (Basel) 8(2): 224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lei X., Dong X., Ma R., Wang W., Xiao X., Tian Z., Wang C., Wang Y., Li L., Ren L., Guo F., Zhao Z., Zhou Z., Xiang Z. and Wang J. (2020). “Activation and evasion of type I interferon responses by SARS-CoV-2.” Nat Commun 11(1): 3810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q., Wu J., Nie J., Zhang L., Hao H., Liu S., Zhao C., Zhang Q., Liu H., Nie L., Qin H., Wang M., Lu Q., Li X., Sun Q., Liu J., Zhang L., Li X., Huang W. and Wang Y. (2020). “The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity.” Cell 182(5): 1284–1294.e1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu J., Li S., Liu J., Liang B., Wang X., Wang H., Li W., Tong Q., Yi J., Zhao L., Xiong L., Guo C., Tian J., Luo J., Yao J., Pang R., Shen H., Peng C., Liu T., Zhang Q., Wu J., Xu L., Lu S., Wang B., Weng Z., Han C., Zhu H., Zhou R., Zhou H., Chen X., Ye P., Zhu B., Wang L., Zhou W., He S., He Y., Jie S., Wei P., Zhang J., Lu Y., Wang W., Zhang L., Li L., Zhou F., Wang J., Dittmer U., Lu M., Hu Y., Yang D. and Zheng X. (2020). “Longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of SARS-CoV-2 infected patients.” EBioMedicine 55: 102763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lorenz R., Bernhart S. H., Höner Zu Siederdissen C., Tafer H., Flamm C., Stadler P. F. and Hofacker I. L. (2011). “ViennaRNA Package 2.0.” Algorithms Mol Biol 6: 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Majumdar P. and Niyogi S. (2021). “SARS-CoV-2 mutations: the biological trackway towards viral fitness.” Epidemiol Infect 149: e110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makino S., Fujioka N. and Fujiwara K. (1985). “Structure of the intracellular defective viral RNAs of defective interfering particles of mouse hepatitis virus.” J Virol 54(2): 329–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marcus P. I. and Sekellick M. J. (1977). “Defective interfering particles with covalently linked [+/−]RNA induce interferon.” Nature 266(5605): 815–819. [DOI] [PubMed] [Google Scholar]
- Méndez A., Smerdou C., Izeta A., Gebauer F. and Enjuanes L. (1996). “Molecular characterization of transmissible gastroenteritis coronavirus defective interfering genomes: packaging and heterogeneity.” Virology 217(2): 495–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moscona A. (1991). “Defective interfering particles of human parainfluenza virus type 3 are associated with persistent infection in cell culture.” Virology 183(2): 821–824. [DOI] [PubMed] [Google Scholar]
- Nayak D. P., Chambers T. M. and Akkina R. K. (1985). “Defective-interfering (DI) RNAs of influenza viruses: origin, structure, expression, and interference.” Curr Top Microbiol Immunol 114: 103–151. [DOI] [PubMed] [Google Scholar]
- Ogando N. S., Dalebout T. J., Zevenhoven-Dobbe J. C., Limpens R., van der Meer Y., Caly L., Druce J., de Vries J. J. C., Kikkert M., Bárcena M., Sidorov I. and Snijder E. J. (2020). “SARS-coronavirus-2 replication in Vero E6 cells: replication kinetics, rapid adaptation and cytopathology.” J Gen Virol 101(9): 925–940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olmo-Uceda M. J., Muñoz-Sánchez J. C., Lasso-Giraldo W., Arnau V., Díaz-Villanueva W. and Elena S. F. (2022). “DVGfinder: A Metasearch Tool for Identifying Defective Viral Genomes in RNA-Seq Data.” Viruses 14(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park S. Y., Choi E. and Jeong Y. S. (2013). “Integrative effect of defective interfering RNA accumulation and helper virus attenuation is responsible for the persistent infection of Japanese encephalitis virus in BHK-21 cells.” J Med Virol 85(11): 1990–2000. [DOI] [PubMed] [Google Scholar]
- Pénzes Z., Wroe C., Brown T. D., Britton P. and Cavanagh D. (1996). “Replication and packaging of coronavirus infectious bronchitis virus defective RNAs lacking a long open reading frame.” J Virol 70(12): 8660–8668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quaranta E. G., Fusaro A., Giussani E., D’Amico V., Varotto M., Pagliari M., Giordani M. T., Zoppelletto M., Merola F., Antico A., Stefanelli P., Terregino C. and Monne I. (2022). “SARS-CoV-2 intra-host evolution during prolonged infection in an immunocompromised patient.” Int J Infect Dis 122: 444–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raman S. and Brian D. A. (2005). “Stem-loop IV in the 5’ untranslated region is a cis-acting element in bovine coronavirus defective interfering RNA replication.” J Virol 79(19): 12434–12446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rand U., Kupke S. Y., Shkarlet H., Hein M. D., Hirsch T., Marichal-Gallardo P., Cicin-Sain L., Reichl U. and Bruder D. (2021). “Antiviral Activity of Influenza A Virus Defective Interfering Particles against SARS-CoV-2 Replication In Vitro through Stimulation of Innate Immunity.” Cells 10(7): 1756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ravindra N. G., Alfajaro M. M., Gasque V., Huston N. C., Wan H., Szigeti-Buck K., Yasumoto Y., Greaney A. M., Habet V., Chow R. D., Chen J. S., Wei J., Filler R. B., Wang B., Wang G., Niklason L. E., Montgomery R. R., Eisenbarth S. C., Chen S., Williams A., Iwasaki A., Horvath T. L., Foxman E. F., Pierce R. W., Pyle A. M., van Dijk D. and Wilen C. B. (2021). “Single-cell longitudinal analysis of SARS-CoV-2 infection in human airway epithelium identifies target cells, alterations in gene expression, and cell state changes.” PLoS Biol 19(3): e3001143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ritchie M. E., Phipson B., Wu D., Hu Y., Law C. W., Shi W. and Smyth G. K. (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Res 43(7): e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Routh A. and Johnson J. E. (2014). “Discovery of functional genomic motifs in viruses with ViReMa-a Virus Recombination Mapper-for analysis of next-generation sequencing data.” Nucleic Acids Res 42(2): e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roux L., Simon A. E. and Holland J. J. (1991). “Effects of defective interfering viruses on virus replication and pathogenesis in vitro and in vivo.” Adv Virus Res 40: 181–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roux L. and Waldvogel F. A. (1981). “Establishment of Sendai virus persistent infection: biochemical analysis of the early phase of a standard plus defective interfering virus infection of BHK cells.” Virology 112(2): 400–410. [DOI] [PubMed] [Google Scholar]
- Satija R., Farrell J. A., Gennert D., Schier A. F. and Regev A. (2015). “Spatial reconstruction of single-cell gene expression data.” Nat Biotechnol 33(5): 495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith T., Heger A. and Sudbery I. (2017). “UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy.” Genome Res 27(3): 491–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sola I., Almazán F., Zúñiga S. and Enjuanes L. (2015). “Continuous and Discontinuous RNA Synthesis in Coronaviruses.” Annu Rev Virol 2(1): 265–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soneson C., Love M. I. and Robinson M. D. (2015). “Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.” F1000Res 4: 1521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun Y., Kim E. J., Felt S. A., Taylor L. J., Agarwal D., Grant G. R. and López C. B. (2019). “A specific sequence in the genome of respiratory syncytial virus regulates the generation of copy-back defective viral genomes.” PLoS Pathog 15(4): e1007707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun Y. and López C. B. (2017). “The innate immune response to RSV: Advances in our understanding of critical viral and host factors.” Vaccine 35(3): 481–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorne L. G., Reuschl A. K., Zuliani-Alvarez L., Whelan M. V. X., Turner J., Noursadeghi M., Jolly C. and Towers G. J. (2021). “SARS-CoV-2 sensing by RIG-I and MDA5 links epithelial infection to macrophage inflammation.” Embo j 40(15): e107826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Hemert M. J., van den Worm S. H., Knoops K., Mommaas A. M., Gorbalenya A. E. and Snijder E. J. (2008). “SARS-coronavirus replication/transcription complexes are membrane-protected and need a host factor for activity in vitro.” PLoS Pathog 4(5): e1000054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanderbeke L., Van Mol P., Van Herck Y., De Smet F., Humblet-Baron S., Martinod K., Antoranz A., Arijs I., Boeckx B., Bosisio F. M., Casaer M., Dauwe D., De Wever W., Dooms C., Dreesen E., Emmaneel A., Filtjens J., Gouwy M., Gunst J., Hermans G., Jansen S., Lagrou K., Liston A., Lorent N., Meersseman P., Mercier T., Neyts J., Odent J., Panovska D., Penttila P. A., Pollet E., Proost P., Qian J., Quintelier K., Raes J., Rex S., Saeys Y., Sprooten J., Tejpar S., Testelmans D., Thevissen K., Van Buyten T., Vandenhaute J., Van Gassen S., Velásquez Pereira L. C., Vos R., Weynand B., Wilmer A., Yserbyt J., Garg A. D., Matthys P., Wouters C., Lambrechts D., Wauters E. and Wauters J. (2021). “Monocyte-driven atypical cytokine storm and aberrant neutrophil activation as key mediators of COVID-19 disease severity.” Nat Commun 12(1): 4117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vasilijevic J., Zamarreño N., Oliveros J. C., Rodriguez-Frandsen A., Gómez G., Rodriguez G., Pérez-Ruiz M., Rey S., Barba I., Pozo F., Casas I., Nieto A. and Falcón A. (2017). “Reduced accumulation of defective viral genomes contributes to severe outcome in influenza virus infected patients.” PLoS Pathog 13(10): e1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vignuzzi M. and López C. B. (2019). “Defective viral genomes are key drivers of the virus-host interaction.” Nat Microbiol 4(7): 1075–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang P., Lau S. Y., Deng S., Chen P., Mok B. W., Zhang A. J., Lee A. C., Chan K. H., Tam R. C., Xu H., Zhou R., Song W., Liu L., To K. K., Chan J. F., Chen Z., Yuen K. Y. and Chen H. (2021). “Characterization of an attenuated SARS-CoV-2 variant with a deletion at the S1/S2 junction of the spike protein.” Nat Commun 12(1): 2790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Q., Bhattacharya S., Mereness J. A., Anderson C., Lillis J. A., Misra R. S., Romas S., Huyck H., Howell A. and Bandyopadhyay G. (2020). “A novel in vitro model of primary human pediatric lung epithelial cells.” Pediatric research 87(3): 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weigang S., Fuchs J., Zimmer G., Schnepf D., Kern L., Beer J., Luxenburger H., Ankerhold J., Falcone V., Kemming J., Hofmann M., Thimme R., Neumann-Haefelin C., Ulferts S., Grosse R., Hornuss D., Tanriver Y., Rieg S., Wagner D., Huzly D., Schwemmle M., Panning M. and Kochs G. (2021). “Within-host evolution of SARS-CoV-2 in an immunosuppressed COVID-19 patient as a source of immune escape variants.” Nat Commun 12(1): 6405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wong C. H., Ngan C. Y., Goldfeder R. L., Idol J., Kuhlberg C., Maurya R., Kelly K., Omerza G., Renzette N., De Abreu F., Li L., Browne F. A., Liu E. T. and Wei C. L. (2021). “Reduced subgenomic RNA expression is a molecular indicator of asymptomatic SARS-CoV-2 infection.” Commun Med (Lond) 1: 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wong H. T., Cheung V. and Salamango D. J. (2022). “Decoupling SARS-CoV-2 ORF6 localization and interferon antagonism.” J Cell Sci 135(6). [DOI] [PubMed] [Google Scholar]
- Wu H. Y. and Brian D. A. (2010). “Subgenomic messenger RNA amplification in coronaviruses.” Proc Natl Acad Sci U S A 107(27): 12257–12262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wyler E., Mösbauer K., Franke V., Diag A., Gottula L. T., Arsiè R., Klironomos F., Koppstein D., Hönzke K., Ayoub S., Buccitelli C., Hoffmann K., Richter A., Legnini I., Ivanov A., Mari T., Del Giudice S., Papies J., Praktiknjo S., Meyer T. F., Müller M. A., Niemeyer D., Hocke A., Selbach M., Akalin A., Rajewsky N., Drosten C. and Landthaler M. (2021). “Transcriptomic profiling of SARS-CoV-2 infected human cell lines identifies HSP90 as target for COVID-19 therapy.” iScience 24(3): 102151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia H., Cao Z., Xie X., Zhang X., Chen J. Y., Wang H., Menachery V. D., Rajsbaum R. and Shi P. Y. (2020). “Evasion of Type I Interferon by SARS-CoV-2.” Cell Rep 33(1): 108234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao Y., Lidsky P. V., Shirogane Y., Aviner R., Wu C. T., Li W., Zheng W., Talbot D., Catching A., Doitsh G., Su W., Gekko C. E., Nayak A., Ernst J. D., Brodsky L., Brodsky E., Rousseau E., Capponi S., Bianco S., Nakamura R., Jackson P. K., Frydman J. and Andino R. (2021). “A defective viral genome strategy elicits broad protective immunity against respiratory viruses.” Cell 184(25): 6037–6051.e6014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng G. X., Terry J. M., Belgrader P., Ryvkin P., Bent Z. W., Wilson R., Ziraldo S. B., Wheeler T. D., McDermott G. P., Zhu J., Gregory M. T., Shuga J., Montesclaros L., Underwood J. G., Masquelier D. A., Nishimura S. Y., Schnall-Levin M., Wyatt P. W., Hindson C. M., Bharadwaj R., Wong A., Ness K. D., Beppu L. W., Deeg H. J., McFarland C., Loeb K. R., Valente W. J., Ericson N. G., Stevens E. A., Radich J. P., Mikkelsen T. S., Hindson B. J. and Bielas J. H. (2017). “Massively parallel digital transcriptional profiling of single cells.” Nat Commun 8: 14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ziv O., Gabryelska M. M., Lun A. T. L., Gebert L. F. R., Sheu-Gruttadauria J., Meredith L. W., Liu Z. Y., Kwok C. K., Qin C. F., MacRae I. J., Goodfellow I., Marioni J. C., Kudla G. and Miska E. A. (2018). “COMRADES determines in vivo RNA structures and interactions.” Nat Methods 15(10): 785–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ziv O., Price J., Shalamova L., Kamenova T., Goodfellow I., Weber F. and Miska E. A. (2020). “The Short-and Long-Range RNA-RNA Interactome of SARS-CoV-2.” Mol Cell 80(6): 1067–1077.e1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Znaidia M., Demeret C., van der Werf S. and Komarova A. V. (2022). “Characterization of SARS-CoV-2 Evasion: Interferon Pathway and Therapeutic Options.” Viruses 14(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Source data for the publicly available NGS datasets described in this manuscript is available as Supplementary Table S1. All NGS datasets were retrieved with NCBI and ENA accession numbers GSE147507 (Daamen, Bachali et al. 2021), GSE148729 (Wyler, Mösbauer et al. 2021), BioProject PRJNA628043 (Ogando, Dalebout et al. 2020), GSE166766 (Ravindra, Alfajaro et al. 2021), GSE150316 (Desai, Neyaz et al. 2020), BioProject PRJNA707211 (Jaworski, Langsjoen et al. 2021), and BioProject PRJNA690577 (Wong, Ngan et al. 2021); ERP132087-BioProject PRJEB47786 (Weigang, Fuchs et al. 2021), respectively. Dataset used in Fig. 4 are available upon request and the raw data of all infected samples are under submission to GEO.








