Abstract
The SARS-CoV-2 Omicron variants are notorious for their transmissibility, but little is known about their subgenomic RNA (sgRNA) expression. This study applied RNA-seq to delineate the quantitative and qualitative profiles of canonical sgRNA of 118 respiratory samples collected from patients infected with Omicron BA.2 and compared with 338 patients infected with non-variant of concern (non-VOC)-D614G. A unique characteristic profile depicted by the relative abundance of 9 canonical sgRNAs was reproduced by both BA.2 and non-VOC-D614G regardless of host gender, age and presence of pneumonia. Remarkably, such profile was lost in samples with low viral load, suggesting a potential application of sgRNA pattern to indicate viral activity of individual patient at a specific time point. A characteristic qualitative profile of canonical sgRNAs was also reproduced by both BA.2 and non-VOC-D614G. The presence of a full set of canonical sgRNAs carried a coherent correlation with crude viral load (AUC = 0.91, 95% CI 0.88–0.94), and sgRNA ORF7b was identified to be the best surrogate marker allowing feasible routine application in characterizing the infection status of individual patient. Further potentials in using sgRNA as a target for vaccine and antiviral development are worth pursuing.
Keywords: Coronavirus, SARS-CoV-2, Omicron BA.2, Non-VOC-D614G, RNA-seq, Subgenomic RNA(sgRNA)
Highlights
-
•
SARS-CoV-2 sgRNAs exhibit a characteristic profile regardless of variants.
-
•
A full set of sgRNAs in respiratory sample predicts well for viral load.
-
•
ORF7b sgRNA is the best surrogate marker for SARS-CoV-2 sgRNA profile.
-
•
sgRNA is a potential indicator of infectivity and response to treatment.
1. Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel coronavirus that emerged in late 2019, causing a global pandemic of a disease known as coronavirus disease 2019 (COVID-19) (Lu et al., 2020; Huang et al., 2020). Since its emergence, SARS-CoV-2 has undergone numerous mutations, giving rise to different variants of concern (VOCs) (Viana et al., 2022; Carabelli et al., 2023; Telenti et al., 2022). One of these variants that carry a high number of mutations in the spike protein and have garnered significant attention is the Omicron group that emerged in November 2021 (Viana et al., 2022). Although several studies have shown a lower risk of hospitalization, a shorter duration of viral shedding and potentially a reduced risk of severe disease among individuals infected with the Omicron variants compared to previous VOCs such as the Alpha (B.1.1.7) and Delta (B.1.617.2) variants (Menni et al., 2022; Suzuki et al., 2022; Hu et al., 2022), the increased transmissibility and enhanced immune escape of the Omicron variants highlight the need for continued vigilance and adherence to public health measures.
Mutations of receptor-binding domain (RBD) in the spike protein of Omicron variants have been shown to enhance their ability to evade some neutralizing antibodies generated by previous infections or vaccinations, thus likely contribute to the rapid spread (Qu et al., 2023; Cao et al., 2022; Mannar et al., 2022). In addition to the structural and genomic RNAs of SARS-CoV-2, another important component is the subgenomic RNAs (sgRNAs), a subset of viral RNA molecules that are transcribed from the viral genome and play a crucial role in the replication and pathogenesis of the virus (Sola et al., 2015). SARS-CoV-2 utilizes a unique replication mechanism known as discontinuous transcription to generate sgRNAs that contain a common 5′-leader sequence of approximately 70 nucleotides fused to different segments from the 3′ end of the viral genome that were likely generated from a paused negative-sense RNA synthesis occurring at the so-called transcription regulatory sequences (TRS) locating at the 3′ end of the leader sequence (TRS-L) as well as preceding each viral gene called the body TRS (TRS-B) (Kim et al., 2020; Chen et al., 2022). These sgRNAs are essential for viral replication and assembly, and encode for various viral proteins, including four conserved structural proteins, namely the spike (S), envelope (E), membrane (M) and nucleocapsid (N), and six accessory proteins (ORF3a, 6, 7a, 7b, 8 and 10). The production and regulation of sgRNAs are therefore central to the life cycle of SARS-CoV-2 and its ability to cause severe disease.
Studies of the sgRNA profiles of SARS-CoV-2 suggest that the expression levels of sgRNAs can vary with the severity of disease, the specific site of infection and the type of viral variant. For example, previous studies including our work have identified distinct patterns of sgRNA expression in patients with severe COVID-19 compared to those with mild or moderate disease, indicating that sgRNAs may serve as biomarkers for disease progression and prognosis in COVID-19 patients (Kim et al., 2020; Chen et al., 2022). Higher levels of sgRNA expression were observed in respiratory system compared to gastrointestinal tract (Chen et al., 2022), and in B.1.1.7 (Alpha, 20I/501Y.V1) compared to B.1.177 (Parker et al., 2022). While there is no dispute that sgRNAs are essential elements of active virus replication (Dimcheff et al., 2021), its clinical application remains unresolved. Some studies based on longitudinal clinical samples have shown a shorter duration of positivity for sgRNA than genomic RNA, and suggested that sgRNA can be a useful proxy for viral infectivity and monitoring of response to antiviral treatment (Santos Bravo et al., 2021; Alonso-Navarro et al., 2023). However, it has been shown that sgRNA did not provide additional value on top of the copy number threshold determined for genomic RNA (Dimcheff et al., 2021). Furthermore, based on the observations that sgRNA was highly correlated with genomic RNA and both share a similar decay pattern, the value of sgRNA as a viability marker was questioned (Verma et al., 2021). Given the lack of consensus, we sought to conduct the current study based on a set of well-characterized longitudinal clinical samples collected from patients infected with different variants to provide more solid data to enrich our understanding on this topic.
In this work, we delineated quantitatively and qualitatively the profile of sgRNAs detected from respiratory specimens with respect to the two highly transmissible variants, Omicron BA.2 and non-VOC-D614G, and explored potential clinical applications of testing for sgRNAs.
2. Material and methods
2.1. Ethics approval and patient recruitment
This study included two independent cohorts: (A) 118 Hong Kong COVID-19 patients infected with Omicron BA.2 between January 23, 2022 and April 22, 2022; and (B) a previously published dataset composed of 338 samples harbouring viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC) designated as “non-VOC-D614G” in this report. These samples were collected from patients infected between March 14, 2020 and February 9, 2021 (Chen et al., 2022). The key demographic and clinical characteristics of patients are shown in Table 1. Informed consent was obtained for all patients, and the study was approved by the Joint Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee (Ref. 2020.076).
Table 1.
Patient characteristics included in this study.
| All samples |
Early samples (≤ 5 days after illness onset) |
|||
|---|---|---|---|---|
| BA.2 (N = 118) | Non-VOC-D614G (N = 338) | BA.2 (N = 86) | Non-VOC-D614G (N = 162) | |
| Gender | ||||
| Female | 63 | 187 | 42 | 96 |
| Male | 55 | 151 | 44 | 66 |
| P = 0.748 | P = 0.140 | |||
| Age, mean (range) in years | 38 (0–85) | 45 (1–88) | 35 (0–85) | 45 (1–85) |
| P = 0.081 | P = 0.014 | |||
| ≤ 5 | 31 | 20 | 26 | 9 |
| 6–74 | 70 | 296 | 48 | 143 |
| 75–88 | 17 | 22 | 12 | 10 |
| P < 0.001 | P < 0.001 | |||
| Ct value, mean (range) | 20 (11–33) | 24 (12–33) | 20 (11–32) | 21 (12–32) |
| P < 0.001 | P = 0.007 | |||
| 10–15 | 13 | 11 | 12 | 10 |
| 16–30 | 102 | 295 | 71 | 148 |
| 31–33 | 3 | 32 | 3 | 4 |
| P < 0.001 | P = 0.104 | |||
| Sampling after onset, mean (range) in days | 4 (0–26) | 6 (0–30) | 3 (0–5) | 2 (0–5) |
| P < 0.001 | P = 0.002 | |||
| Early (≤ 5 days) | 86 | 162 | 86 | 162 |
| Late (6–30 days) | 29 | 150 | – | – |
| P < 0.001 | ||||
| Pneumonia | ||||
| No | 87 | 152 | 64 | 79 |
| Yes | 31 | 186 | 22 | 83 |
| P < 0.001 | P < 0.001 | |||
BA.2: Omicron BA.2 variant.
Non-VOC-D614G: viruses with D614G substitution on S protein but does not belong to any known VOCs.
2.2. SARS-CoV-2 complete genome sequencing
The methods employed in the current study were the same as those used to generate the data of non-VOC-D614G samples in our previous study (Chen et al., 2022). Briefly, respiratory samples from individual patients were collected for whole viral genome sequencing using probe-hybridization RNA-seq as previously published. Total RNA extracted using the QIAamp Viral RNA mini kit (Qiagen, Germany) were converted to double-stranded cDNA by KAPA HyperPrep Kit (Roche, USA) and prepared for dual-indexed Illumina RNA-seq library by Swift 2S Turbo DNA Library kit (Swift Biosciences, USA). The library was further hybridized with SARS-CoV-2 capture probes provided by IDT (Integrated DNA Technologies, USA) to enrich the proportion of viral contents. The IDT's xGen SARS-CoV-2 Hyb Panel consists of 498 overlapping 120-bp probes that are capable of spanning the entire viral genome and are insensitive to known and novel mutations arising from diverse strains. In the actual next-generation sequencing library preparation process, we added an excess of probes to ensure that all viral fragments in the sample can be hybridized completely. The library was then sequenced on Illumina NextSeq (Illumina, USA) at the Core Utilities of Cancer Genomics and Pathobiology (CUCGP) at Department of Anatomical and Cellular Pathology of the Chinese University of Hong Kong, and Illumian NovaSeq (Illumina, USA) at Novogene (HK) Co., Ltd., Hong Kong, China, using 150 bp paired-end reads.
2.3. Bioinformatics data analysis
Illumina raw reads were proceeded for quality control to remove adapters and low-quality sequences using Trimmomatic v0.39 (Bolger et al., 2014), and further filtered for human genome contaminants (GRCh38) using HISAT2 v2.2.0 (Kim et al., 2015). A reference-guided de novo assembly (NC_045512) was performed to profile SARS-CoV-2 transcriptome using STAR v2.7.9a (Dobin et al., 2013), with specific parameters as described before (Kim et al., 2020; Chen et al., 2022). Consensus sequences were called using bcftools v1.9 (Li, 2011) and manually checked for ambiguous sites. The SARS-CoV-2 genomes with ≥ 80% completeness and ≥ 30× total RNA mean coverage were retained for template switch junction annotation. Only template switch junction reads ≥ 2 were retained.
2.4. Template switch junction annotation
The template switch junction (SJ) reads were divided into four types (I–IV) based on the deletion positions at the 5′- (the first position) and 3′-terminus (the end position) as previously described (Chen et al., 2022). Type I was defined when the leader transcription regulatory sequences (TRS-L) and the body TRS (TRS-B) were located to a genomic position of 55–85 nt upstream of the first AUG of the open reading frame (ORF) for discontinuous transcription of the major (canonical) sgRNAs (S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, and N). Type II contains noncanonical fusion between TRS-L and an unexpected 3′-terminus site in the body of known genes that encodes partial (IIa) or frameshifted proteins (IIb). Types III and IV were TRS-L independent and represent inter- and intra-junctions when 5′- and 3′-terminus positions were located within the body of different or same gene(s), respectively.
2.5. Phylogenetic analysis
In order to assess the phylogenetic relationship of SARS-CoV-2 variants included in this study, we retrieved 1092 high-quality SARS-CoV-2 complete genomes deposited in the GISAID database (accessed on July 29, 2022) that represent the major lineages of VOCs (Alpha, Beta, Gamma, Delta and Omicron). The concatenated twelve genes (ORF1a, ORF1b, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N and ORF10) nucleotide sequences were aligned using MAFFT v7.402 (Katoh and Toh, 2010); the Maximal likelihood (ML) tree based on these aligned genomes was constructed using RAxML MPI v8.2.12 (Stamatakis, 2006) with optimized parameters.
2.6. RT-PCR quantitation for genomic N gene and canonical subgenomic ORF7b gene
Crude viral load was measured using quantitative real-time reverse transcription-polymerase chain reaction (RT-PCR) targeting the genomic version of gene N as described before (gN forward primer: 5′- CGA ACT TCT CCT GCT AGA ATG G -3′, gN reverse primer: 5′- TAC CAG ACA TTT TGC TCT CAA GCT -3′, gN probe: 5′- 6-FAM-TT GCT GCT G-ZEN-C TTG ACA GAT T-IABkFQ -3′) (Lui et al., 2020). SARS-CoV-2 Omicron BA.2 canonical sgRNA ORF7b in clinical samples was quantified using an in-house developed RT-PCR assay (sg7b forward primer: 5′- ACA AAC CAA CCA ACT TTC GA -3′, sg7b reverse primer: 5′- AAC AAG GAA TAK CAG AAA GGC -3′, sg7b probe: 5′- 6-FAM-TC TTG TAG A-ZEN-T CTG TTC TCT AAA CGA-IABkFQ -3′). The RT-PCR reactions contained 5 μL of TaqMan Fast Virus 1-Step Master Mix (Thermo Fisher, Foster City, CA) and 0.4 μmol/L primers and probes in a final reaction volume of 25 μL, with cycling conditions of 25 °C for 2 min, 50 °C for 15 min, 95 °C for 2 min, followed by 45 cycles of 95 °C for 3 s and 58 °C for 30 s using the StepOne Plus™ Real-Time PCR System (Thermo Fisher, CA). Sample was considered negative if the Ct value exceeded 39.9 cycles.
2.7. Statistical analysis
Bray-Curtis distance metrics were computed inferred from the relative abundance of canonical sgRNAs to differentiate community compositions between samples (beta diversity) using permutational multivariate analysis of variance (PERMANOVA) with 9999 permutations using the adonis2 in the Vegan R package. In the effect size analysis, viral load (high vs middle vs low), variant (BA.2 vs D614G) and sgRNA pattern (sg9 vs others) controlled association between other variables, including sampling day after illness onset, age, gender, pneumonia and sgORF7b presence, were tested by adding viral load, variant, and sgRNA pattern into the model formula. Logistic regression and receiver operating characteristic (ROC) curve with the calculation of area under the ROC curve (AUC) were used to evaluate the potential biomarkers of canonical sgRNAs for viral load detection. The canonical sgRNAs from each sample were defined as present or absent based on the reads number to build the ROC curve against the Ct value using the roc in the pROC R package. Comparison of viral loads and read abundance between groups were performed using Mann–Whitney U test, Kruskal–Wallis rank sum test or linear regression analysis as appropriate, and with two-tailed P ≤ 0.05 considered as statistically significant. Regression correlation analysis was applied to evaluate the association between viral load and sampling day after illness onset. Sequencing read coverage, subgenomic profiling and template switch junctions were visualized using in-house developed scripts in R v4.2.1 and ggplot2 v3.3.6.
3. Results
3.1. Study subjects and genome dataset
In this study, complete viral genomes were obtained from the respiratory samples of 118 patients infected with BA.2. These sequences were analyzed together with the 338 viral genomes obtained from patients infected with the non-VOC-D614G derived from our previous study (Supplementary Table S1) (Chen et al., 2022). There was no statistically significant difference in age and gender between the two cohorts, while samples of patients infected with BA.2 were collected earlier (mean days of 4 vs 6, P < 0.001), had relatively higher crude viral loads (mean Ct of 20 vs 24, P < 0.001), but a lower proportion of pneumonia (26% vs 55%, P < 0.001) (Table 1). As expected, viral sequences obtained from the two cohorts were phylogenetically distinct, and the non-VOC-D614G viruses do not belong to any known variants of concern (Fig. 1).
Fig. 1.
Phylogenetic tree showing the classification and relationship of SARS-CoV-2 variants analyzed in this study. A maximal likelihood (ML) tree based on the concatenated nucleotide sequences of twelve genes (ORF1a, ORF1b, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N and ORF10) using MAFFT v7.402. Non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
3.2. Quantitative profile of SARS-CoV-2 subgenomic RNAs
3.2.1. Relative proportion of canonical subgenomic RNAs
We first examined the quantitative profile of sgRNAs based on the relative proportion and relative abundance of each sgRNA detected. According to the classification of SARS-CoV-2 sgRNAs that we reported previously (Chen et al., 2022), over half (58.1%) of junction-spanning reads of BA.2 variants were grouped as Type I that represents canonical sgRNAs for the nine proteins (S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8 and N) (Fig. 2A). The relative proportions of the 9 canonical sgRNAs were similar between BA.2 and non-VOC-D614G (Fig. 2B). Quantitatively, sgRNA N was the most frequently detected (relative proportion of 27.14% and 22.74% in BA.2 and non-VOC-D614G, respectively), followed by S, ORF7a, ORF3a, M, ORF8, E, ORF6 and ORF7b (Table 2). ORF3b encoded by a short 22-codon gene (MMPTIFFAGILIVTTIVYLTIV, nucleotides 25814-25882), has been reported as interferon antagonists (Konno et al., 2020), suppressing the type I interferon response through inhibition of IRF3. However, no template switch junction reads potentially coding the canonical ORF3b were detected in the surveyed samples. Similarly, we did not find template switch junction sequence with the expected structure and length of ORF10 in any surveyed samples. In both BA.2 and non-VOC-D614G samples, the majority of junction-spanning reads of noncanonical transcripts (Types II, III and IV) were TRS-L independent (intra- and inter-gene) and 88.3% (536,070/607,072) of events were singletons observed in one sample only, suggesting a complex but random noncanonical sgRNA transcription that may translate frameshifted and fused proteins.
Fig. 2.
SARS-CoV-2 sgRNAs in BA.2 and non-VOC-D614G variants detected in Hong Kong. The reads observed in 10 and more samples are shown, and with curves in gradient colours showing differential relative abundance. Non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
Table 2.
Canonical subgenomic RNAs (sgRNAs) between SARS-CoV-2 Omicron BA.2 and non-VOC-D614G.
| sgRNA name | Template switch junction |
Junction length | Start codon offset | Detection rate (BA.2, non-VOC-D614G) (%) | Relative abundance (BA.2, non-VOC-D614G) (mean %) | |
|---|---|---|---|---|---|---|
| 5′ | 3′ | |||||
| S | 66 | 21551 | 21486 | 11 | 89.8, 86.1 | 6.67, 7.42 |
| ORF3a | 67 | 25381 | 25315 | 11 | 89.8, 84.6 | 6.57, 5.43 |
| E | 70 | 26236 | 26167 | 8 | 78.8, 80.0 | 1.59, 3.02 |
| M | 65 | 26467 | 26403 | 55 | 89.0, 87.3 | 2.00, 5.20 |
| ORF6 | 70 | 27040 | 26971 | 161 | 82.2, 71.6 | 0.97, 1.59 |
| ORF7a | 67 | 27384 | 27318 | 9 | 89.8, 87.9 | 5.31, 7.41 |
| ORF7b | 72 | 27761 | 27690 | −6 | 56.8, 45.3 | 0.05, 0.15 |
| ORF8 | 66 | 27883 | 27818 | 10 | 89.8, 82.8 | 2.57, 3.96 |
| N | 65 | 28254 | 28190 | 19 | 99.2, 95.6 | 26.12, 21.71 |
3.2.2. Relative abundance of canonical subgenomic RNAs and viral load
Fig. 3 showed the relative abundance of sgRNAs according to crude viral load measured from the genomic N gene. Taken all samples together, sgRNA N was the most abundant (mean of relative abundance, 22.85%), whereas ORF7b was the least expressed (0.12%) (Fig. 3A). This pattern was reproduced in both BA.2 and non-VOC-D614G samples, although a slightly higher expression of N (26.12% vs 21.71%, P = 0.002) and a lower level of ORF7b (0.05% vs 0.15%, P = 0.555) was observed for BA.2 when compared with non-VOC-D614G (Fig. 3B, Table 2). Further subgroup analysis according to viral load showed that while the abundance patterns were still similar between BA.2 and non-VOC-D614G among samples with high (Ct: 10–15) and moderate (Ct: 16–30) viral load, but more variation was observed in samples with low viral load (Ct: 31–33) (Fig. 3C and D). One reason could be that samples with low viral load comprised mainly fragments of non-functioning sequences and therefore more varied in their relative abundance.
Fig. 3.
Relative abundance in proportion (%) of canonical sgRNAs of BA.2 and non-VOC-D614G variants by crude viral load. A Overall expression levels of canonical sgRNAs inferred from the RNA-seq data. B Differential expression of canonical sgRNAs between BA.2 and non-VOC-D614G variants. C Overall expression levels of canonical sgRNAs by crude viral load. D Differential expression of canonical sgRNAs between BA.2 and non-VOC-D614G variants by crude viral load. Crude viral load was measured by real-time PCR targeting genomic N gene. sgRNA proportion was calculated based on the splice reads of the canonical subgenomic RNA gene divided by the total splice reads. D614G∗, non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
3.2.3. Relative abundance of canonical sgRNAs and host properties
This analysis was restricted to 248 early samples (86 BA.2 and 162 non-VOC-D614G) collected within 5 days of illness onset to account for natural changes across the course of illness. As shown in Fig. 4, a characteristic pattern with high abundance of sgRNA N, moderate abundance of S, ORF3a, M, ORF7a, and low abundance of E, ORF6 and ORF7b was observed; regardless of age, gender and presence of pneumonia. Moreover, this characteristic pattern was reproduced in both BA.2 and non-VOC-D614G samples. This observation suggests that the relative abundance of canonical sgRNAs is an intrinsic property of the virus to maintain life cycle and is not affected by host and variant properties.
Fig. 4.
Relative abundance in proportion (%) of canonical sgRNAs by sampling day after illness onset (A), patients' age (B), gender (C) and pneumonia (D). D614G∗, non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
3.3. Qualitative profile of canonical subgenomic RNAs
3.3.1. A full set of sgRNAs
We then took another approach to analysis the pattern of canonical sgRNAs by examining the positivity, i.e. whether the particular sgRNA was detectable by a RT-PCR assay or not. Overall, 46.5% (212/456) of samples had a full-set of the 9 canonical sgRNAs being detectable (designed as pattern “sg9”). The presence of sg9 showed a significant positive association with crude viral load (mean gN Ct value: 19.0, range: 11.3–27.7) when compared with samples containing 8 or less detectable canonical sgRNAs (“sg0”–“sg8”) (26.3, 15.9–32.5) (P < 0.001) (Fig. 5A). This correlation with viral load was reproduced for both BA.2 and non-VOC-D614G samples (Fig. 5B). Samples collected later in the course of infection (6–30 days after illness onset) were less likely to contain a full set of detectable sgRNAs (sg9 pattern), suggesting these samples mainly harboured non-replicating segmented genomes (Fig. 5C and D). There was no significant difference in the proportion of “sg9” between BA.2 and non-VOC-D614G (61.6% vs 64.2%, P = 0.782) among early samples collected within five days after illness onset.
Fig. 5.
Canonical sgRNA positivity patterns by crude viral load. A Detection of canonical sgRNA patterns positively associated with crude viral load measured by the genomic N gene. B Canonical sgRNA positivity patterns between BA.2 and non-VOC-D614G. C Detection of canonical sgRNA patterns by sampling day after illness onset. D Detection of canonical sgRNA patterns between BA.2 and non-VOC-D614G by sampling day after illness onset. “sg0”–“sg9” indicate the number of detectable canonical sgRNAs. For example, “sg9” indicates a full-set of the nine canonical sgRNAs, while “sg0” means no detectable canonical sgRNA. D614G∗, non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
3.3.2. Surrogate marker of a full set of sgRNAs
Since the presence of a full set of sgRNAs (sg9) appeared to be a potential marker for clinical application, we further analyzed which sgRNA could be a sensitive surrogate marker for the full set pattern. Based on the detection rate, sgRNA ORF7b was the most specific as it was undetectable in the majority (96.7%) of samples without a full set of sgRNAs (sg0–8), followed by ORF6 (48.0%), E (38.1%), ORF8 (28.7%), ORF3a (26.2%), S (24.1%), M (23.0%), ORF7a (21.7%), and N (6.6%). This pattern of positivity was reproduced for both BA.2 and non-VOC-D614G, and was retained across age, gender and presence of pneumonia (Fig. 6A, B, and C). For both BA.2 and non-VOC-D614G, the detection rate of sgRNA ORF7b dropped most deeply from high to low viral load (Fig. 6D), and from early to late samples (Fig. 6E). Further support for sgORF7b was the best among all canonical sgRNAs to indicate the changing pattern.
Fig. 6.
Detection rate (%) of individual sgRNA by patients' age (A), gender (B), pneumonia (C), crude viral load by genomic N gene (D), and sampling day after illness onset (E). D614G∗, non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
We computed the Bray–Curtis distance based on the relative abundance of the canonical sgRNA and differentiated the composition between samples using permutational multivariate analysis of variance (PERMANOVA) (Fig. 7A). As expected, viral load, omicron variant and sgRNA pattern sg9 were the most independent variants that had the largest effects on the overall sgRNA abundance, whereas the size by sampling day after illness onset was limited (Fig. 7B). Canonical sgRNA abundance was not significantly affected by age, gender, pneumonia and sgORF7b when viral load, variant, and sg9 pattern were included in the model formula. It is worth noting that the 97% (233/241) of samples without a full set of sgRNAs (sg9) had no detectable sgORF7b (P < 0.001), although sgORF7b also had a significant effect (P < 0.001) on canonical sgRNA abundance by the univariate PERMANOVA test.
Fig. 7.
Clustering of canonical sgRNAs of SARS-CoV-2. A Principal coordinate analysis based on Bray-Curtis distance metrics inferred from the relative abundance of canonical sgRNAs of SARS-CoV-2. Beta diversity among groups was evaluated using permutational multivariate analysis of variance (PERMANOVA) with 9999 permutations. D614G∗, non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC). B Effect size (R2 value) of variables on the composition of canonical sgRNAs in the surveyed samples. Viral load, variant, and sgRNA pattern sg9 were adjusted for covariates (sampling day from illness onset, age, gender, pneumonia and sgORF7b presence) in adonis2. ∗, P < 0.05; ∗∗, P < 0.01; ∗∗∗, P < 0.001; ∗∗∗∗, P < 0.0001; ns, not significant.
We further analyzed the receiver operating characteristics to identify the best surrogate sgRNA for predicting viral load. The presence of a full-set of the 9 canonical sgRNAs (“sg9” pattern) provided a good correlation with crude viral load (AUC = 0.91, 95% CI 0.88–0.94), and sgRNA ORF7b (0.90, 0.87–0.93) was the best surrogate for such pattern, followed closely by sgRNA ORF7a (0.89, 0.84–0.93) (Fig. 8A), and these correlations were reproduced in both BA.2 and non-VOC-D614G (Fig. 8B). To validate the performance, we developed a sgRNA ORF7b-specific quantitative RT-PCR, and observed a positive correlation between the levels of sgORF7b and crude viral load measured by gN (R2 = 0.77, P < 0.001) (Fig. 8C) while the levels of sgORF7b decreased dramatically by sampling day after illness onset (R2 = 0.10, P = 0.010).
Fig. 8.
Detection of individual sgRNA related to crude viral load of the genomic N gene. A Receiver operating characteristic (ROC) curve with calculation of the area under the ROC curve (AUC) of each sgRNA. B AUC of sgORF7b and sg9 positivity in relation to crude viral load between BA.2 and non-VOC-D614G. C Correlations of sgORF7b measured by real-time PCR with crude viral load measured from the genomic N gene, and sampling day after illness onset. D614G∗, non-VOC-D614G, viruses with the S gene amino acid substitution D614G, but not belonging to any variants of concern (VOC).
4. Discussion
While the rapid global spread of the SARS-CoV-2 Omicron variants is remarkable, little is known about their subgenomic RNA transcription pattern compared to previous circulating strains. In this study, we examined a cohort infected with the Omicron BA.2 and compared with those infected with an earlier version of the pandemic virus, non-VOC-D614G, which was the most prominent variant spreading rapidly across the globe during the early phase of pandemic (Chen et al., 2021). Overall, our results are in line with previous reports that the quantitative profile of sgRNAs, as depicted by their relative abundance, is a unique and consistent characteristic of SARS-CoV-2 (Kim et al., 2020; Chen et al., 2022; Alexandersen et al., 2020). Furthermore, we demonstrated that the characteristic profile of sgRNAs was reproduced in these two phylogenetically distinct groups, BA.2 and non-VOC-D614G, regardless of the host properties. This indicates that the characteristic composition and structure of canonical sgRNAs are indispensable in maintaining the life cycle of the virus. Of note, the characteristic sgRNA profile was lost in samples with low viral load probably due to fact that these samples mainly harboured fragmented rather full replicative genomes. Therefore, one potential application is to monitor changes in the quantitative pattern (relative abundance) of canonical sgRNAs to infer viral activity and predict infectivity, clinical progress or response to therapy of individual patients.
We found that qualitative assessment by using a sensitive method to determine the presence or absence of sgRNAs carries a good clinical potential. For both BA.2 and non-VOC-D614G, when all the nine canonical sgRNAs were detectable, it predicted well for a high crude viral load carried in the sample. Performing multiple tests to cover all sgRNAs is obviously not a preferred approach in clinical settings. In this regard, we found that a single sgRNA, sgORF7b, could serve as a good surrogate marker. Our observations suggest that sgRNA viral load based on sgORF7b could be a more specific and informative marker for infection status compared to crude viral measured from the genomic version of N or other genes that is commonly used in current diagnostic assays. The main limitation of this study is the lack of virus isolation to investigate the viability of viruses carried in the samples, further investigations based on in-vitro and in-vivo models to elucidate more clinical application of sgRNA measurement are worth pursuing (Alexandersen et al., 2020; Wölfel et al., 2020; van Kampen et al., 2021).
In conclusion, our study used probe hybridization RNA-seq approach to delineate the differentially expressed sgRNAs in SARS-CoV-2 Omicron BA.2 variant compared to the non-VOC-D614G viruses. There are consensus features of sgRNA profile that can be applied to improve the clinical value of viral testing. In addition to being a marker of infection status, sgRNAs may be a potential target of vaccine development and antiviral intervention. Targeting sgRNA synthesis or stability could disrupt viral replication and reduce the viral load, potentially leading to better clinical outcomes (Wong et al., 2021).
5. Conclusions
SARS-CoV-2 encodes an essential set of sgRNAs which exhibit a characteristic profile that being retained regardless of variant and host properties. Delineating such profiles or detecting the surrogate markers in clinical samples should carry a good clinical value. Further studies in this direction are worthwhile.
Data availability
All short reads assembled to the reference genome have been deposited to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) (Bioproject ID: PRJNA1010395). The consensus complete genome sequences have been deposited to the Global Initiative on Sharing All Influenza Data (GISAID) (gisaid_accession: from EPI_ISL_18147051 to EPI_ISL_18147168).
Ethics statement
This study was approved by The Joint Chinese University of Hong Kong – New Territories East Cluster Clinical Research Ethics Committee (Ref. No.: 2020.076). Informed consent was obtained for all patients.
Author contributions
Zigui Chen: conceptualization, formal analysis, writing-original draft. Rita Way Yin Ng: data curation, writing-review & editing. Grace Lui: data curation, investigation, writing-review & editing. Lowell Ling: data curation, investigation, writing-review & editing. Agnes SY Leung: data curation, investigation; writing-review & editing. Chit Chow: methodology, writing-review & editing. Siaw Shi Boon: methodology, writing-review & editing. Wendy CS Ho: project administration, writing-review & editing. Maggie Haitian Wang: validation, writing-review & editing. Renee Wan Yi Chan: validation, writing-review & editing. Albert Martin Li: data curation, investigation, writing-review & editing. David Shu Cheong Hui: data curation, investigation, writing-review & editing. Paul Kay Sheung Chan: conceptualization, funding acquisition, supervision, writing-original draft.
Conflicts of interest
PKSC received honorarium from Merck Sharp and Dohme, GlaxoSmithKline, Moderna and Pfizer for serving as speaker, advisor or consultant. MHW is a shareholder of Beth Bioinformatics Co. Ltd.
Acknowledgements
The authors would like to thank the anonymous participants who provided samples for this study. We also thank the Core Utilities of Cancer Genomics and Pathobiology (CUCGP) at Department of Anatomical and Cellular Pathology of the Chinese University of Hong Kong for the service of RNA-seq library preparation and next-generation sequencing.
The study was supported by the Food and Health Bureau, Hong Kong SAR Government (reference no. COVID19F06).
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.virs.2024.01.010.
Appendix A. Supplementary data
The following are the supplementary data to this article:
References
- Alexandersen S., Chamings A., Bhatta T.R. SARS-CoV-2 genomic and subgenomic RNAs in diagnostic samples are not an indicator of active replication. Nat. Commun. 2020;11:6059. doi: 10.1038/s41467-020-19883-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alonso-Navarro R., Cuesta G., Santos M., Cardozo C., Rico V., Garcia-Pouton N., et al. Qualitative subgenomic RNA to monitor the response to remdesivir in hospitalized patients with coronavirus disease 2019: impact on the length of hospital stay and mortality. Clin. Infect. Dis. 2023;76:32–38. doi: 10.1093/cid/ciac760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao Y., Wang J., Jian F., Xiao T., Song W., Yisimayi A., et al. Omicron escapes the majority of existing SARS-CoV-2 neutralizing antibodies. Nature. 2022;602:657–663. doi: 10.1038/s41586-021-04385-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carabelli A.M., Peacock T.P., Thorne L.G., Harvey W.T., Hughes J., COVID-19 Genomics UK Consortium, et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nat. Rev. Microbiol. 2023;21:162–177. doi: 10.1038/s41579-022-00841-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Z., Chong K.C., Wong M.C.S., Boon S.S., Huang J., Wang M.H., et al. A global analysis of replacement of genetic variants of SARS-CoV-2 in association with containment capacity and changes in disease severity. Clin. Microbiol. Infect. 2021;27:750–757. doi: 10.1016/j.cmi.2021.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Z., Ng R.W.Y., Lui G., Ling L., Chow C., Yeung A.C.M., et al. Profiling of SARS-CoV-2 subgenomic RNAs in clinical specimens. Microbiol. Spectr. 2022;10 doi: 10.1128/spectrum.00182-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dimcheff D.E., Valesano A.L., Rumfelt K.E., Fitzsimmons W.J., Blair C., Mirabelli C., et al. Severe acute respiratory syndrome coronavirus 2 total and subgenomic RNA viral load in hospitalized patients. J. Infect. Dis. 2021;28:1287–1293. doi: 10.1093/infdis/jiab215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu J., Peng P., Cao X., Wu K., Chen J., Wang K., et al. Increased immune escape of the new SARS-CoV-2 variant of concern Omicron. Cell. Mol. Immunol. 2022;19:293–295. doi: 10.1038/s41423-021-00836-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang C., Wang Y., Li X., Ren L., Zhao J., Hu Y., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506. doi: 10.1016/S0140-6736(20)30183-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh K., Toh H. Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics. 2010;26:1899–1900. doi: 10.1093/bioinformatics/btq224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D., Langmead B., Salzberg S.L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D., Lee J.Y., Yang J.S., Kim J.W., Kim V.N., Chang H. The architecture of SARS-CoV-2 transcriptome. Cell. 2020;181:914–921.e10. doi: 10.1016/j.cell.2020.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konno Y., Kimura I., Uriu K., Fukushi M., Irie T., Koyanagi Y., et al. SARS-CoV-2 ORF3b is a potent interferon antagonist whose activity is increased by a naturally occurring elongation variant. Cell Rep. 2020;32:108185. doi: 10.1016/j.celrep.2020.108185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lui G., Ling L., Lai C.K., Tso E.Y., Fung K.S., Chan V., et al. Viral dynamics of SARS-CoV-2 across a spectrum of disease severity in COVID-19. J. Infect. 2020;81:318–356. doi: 10.1016/j.jinf.2020.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mannar D., Saville J.W., Zhu X., Srivastava S.S., Berezuk A.M., Tuttle K.S., et al. SARS-CoV-2 Omicron variant: antibody evasion and cryo-EM structure of spike protein-ACE2 complex. Science. 2022;375:760–764. doi: 10.1126/science.abn7760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Menni C., Valdes A.M., Polidori L., Antonelli M., Penamakuri S., Nogal A., et al. Symptom prevalence, duration, and risk of hospital admission in individuals infected with SARS-CoV-2 during periods of omicron and delta variant dominance: a prospective observational study from the ZOE COVID Study. Lancet. 2022;399:1618–1624. doi: 10.1016/S0140-6736(22)00327-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parker M.D., Stewart H., Shehata O.M., Lindsey B.B., Shah D.R., Hsu S., et al. Altered subgenomic RNA abundance provides unique insight into SARS-CoV-2 B.1.1.7/Alpha variant infections. Commun. Biol. 2022;5:666. doi: 10.1038/s42003-022-03565-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qu P., Evans J.P., Faraone J.N., Zheng Y.M., Carlin C., Anghelina M., et al. Enhanced neutralization resistance of SARS-CoV-2 Omicron subvariants BQ.1, BQ.1.1, BA.4.6, BF.7, and BA.2.75.2. Cell Host Microbe. 2023;31:9–17.e3. doi: 10.1016/j.chom.2022.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santos Bravo M., Nicolás D., Berengua C., Fernandez M., Hurtado J.C., Tortajada M., et al. Severe acute respiratory syndrome coronavirus 2 normalized viral loads and subgenomic RNA detection as tools for improving clinical decision making and work reincorporation. J. Infect. Dis. 2021;224:1325–1332. doi: 10.1093/infdis/jiab394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sola I., Almazán F., Zúñiga S., Enjuanes L. Continuous and discontinuous RNA synthesis in coronaviruses. Annu. Rev. Virol. 2015;2:265–288. doi: 10.1146/annurev-virology-100114-055218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- Suzuki R., Yamasoba D., Kimura I., Wang L., Kishimoto M., Ito J., et al. Attenuated fusogenicity and pathogenicity of SARS-CoV-2 Omicron variant. Nature. 2022;603:700–705. doi: 10.1038/s41586-022-04462-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Telenti A., Hodcroft E.B., Robertson D.L. The evolution and biology of SARS-CoV-2 variants. Cold Spring Harb. Perspect. Med. 2022;12:a041390. doi: 10.1101/cshperspect.a041390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Kampen J.J.A., van de Vijver D.A.M.C., Fraaij P.L.A., Haagmans B.L., Lamers M.M., Okba N., et al. Duration and key determinants of infectious virus shedding in hospitalized patients with coronavirus disease-2019 (COVID-19) Nat. Commun. 2021;12:267. doi: 10.1038/s41467-020-20568-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Verma R., Kim E., Martínez-Colón G.J., Jagannathan P., Rustagi A., Parsonnet J., et al. SARS-CoV-2 subgenomic RNA kinetics in longitudinal clinical samples. Open Forum Infect. Dis. 2021;8:ofab310. doi: 10.1093/ofid/ofab310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viana R., Moyo S., Amoako D.G., Tegally H., Scheepers C., Althaus C.L., et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature. 2022;603:679–686. doi: 10.1038/s41586-022-04411-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wölfel R., Corman V.M., Guggemos W., Seilmaier M., Zange S., Müller M.A., et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581:465–469. doi: 10.1038/s41586-020-2196-x. [DOI] [PubMed] [Google Scholar]
- Wong C.H., Ngan C.Y., Goldfeder R.L., Idol J., Kuhlberg C., Maurya R., et al. Reduced subgenomic RNA expression is a molecular indicator of asymptomatic SARS-CoV-2 infection. Commun. Med. 2021;1:33. doi: 10.1038/s43856-021-00034-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All short reads assembled to the reference genome have been deposited to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) (Bioproject ID: PRJNA1010395). The consensus complete genome sequences have been deposited to the Global Initiative on Sharing All Influenza Data (GISAID) (gisaid_accession: from EPI_ISL_18147051 to EPI_ISL_18147168).








