Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2018 Dec 10;93(1):e01342-18. doi: 10.1128/JVI.01342-18

Going the Distance: Optimizing RNA-Seq Strategies for Transcriptomic Analysis of Complex Viral Genomes

Daniel P Depledge a,, Ian Mohr a, Angus C Wilson a,
Editor: Felicia Goodrumb
PMCID: PMC6288342  PMID: 30305358

Transcriptome profiling has become routine in studies of many biological processes. However, the favored approaches such as short-read Illumina RNA sequencing are giving way to long-read sequencing platforms better suited to interrogating the complex transcriptomes typical of many RNA and DNA viruses.

KEYWORDS: PacBio, RNA sequencing, minION, nanopore sequencing, next-generation sequencing, single-cell RNA-seq, transcriptome, virus

ABSTRACT

Transcriptome profiling has become routine in studies of many biological processes. However, the favored approaches such as short-read Illumina RNA sequencing are giving way to long-read sequencing platforms better suited to interrogating the complex transcriptomes typical of many RNA and DNA viruses. Here, we provide a guide—tailored to molecular virologists—to the ins and outs of viral transcriptome sequencing and discuss the strengths and weaknesses of the major RNA sequencing technologies as tools to analyze the abundance and diversity of the viral transcripts made during infection.

INTRODUCTION

When embarking on any experimental study, it is vital to carefully frame the question(s) being asked and to understand the exact nature of the information that different methodologies and approaches provide. This is especially relevant when profiling viral transcriptomes by using next-generation sequencing (NGS). Careful planning pays dividends, and four key decisions should be made at the outset. The first is whether the primary focus of the study is on viral and/or host transcripts. The second is the choice of viral strain and host model (in vivo, ex vivo, in vitro), both of which can have huge impacts on data output and the resulting biological observations and interpretations. The third is whether the goal is to document the diversity of RNAs present (define the transcript isoform landscape) or to quantify the relative abundance of specific transcripts (perform gene expression profiling), often a surrogate for the more difficult task of profiling protein products. The final decision is whether to incorporate multiple infection or reactivation time points (i.e., to profile early or late stages of infection) and to consider which time points may be optimal for a given experimental system. Qualitative measures might take the form of mapping transcription start site (TSS) usage or detecting posttranscriptional processing events such as alternative splicing or alternative polyadenylation. The activity of the virus itself can confound the interpretation of transcriptome data through the modifications of transcriptional and posttranscriptional processes (1). Examples include acute infection by orthomyxoviruses, poxviruses, coronaviruses, and picornaviruses, as well as many herpesviruses that either degrade or transcriptionally suppress host and viral mRNAs (210). One consequence is that the templates used for sequencing reactions are no longer translation competent, undermining the utility of RNA sequencing (RNA-Seq) for predicting the proteome. Elegant solutions abound but remain underutilized, due mainly to the issues of cost and technical complexity. These include sequencing only mRNAs loaded onto ribosomes (11, 12) or the use of specific adaptors to generate sequencing libraries limited to full-length polyadenylated RNAs (13). Lastly, it is important to keep in mind that technical and analytical requirements for transcriptome studies are different from those of genomic studies that might characterize new viruses, detect low frequency variants, or trace the origins and consequences of sequence diversity within populations (14).

In this Gem, we highlight the major challenges that can arise when studying viral transcriptomes with the leading RNA-Seq technologies. The limitations are most evident for viruses with large gene-dense double-stranded DNA genomes such as herpesviruses, poxviruses, and adenoviruses, where transcription often takes place on opposing DNA strands, where protein-coding open reading frames (ORFs) are organized as mono- or polycistronic transcription units, and where further complexity arises through the use of alternative 5′ and 3′ ends or internal splicing. In general, the methods to generate and interpret RNA-Seq data were not designed for genomes with such complexity and are better suited to analyze host transcripts. Heterogeneity within infections, resulting either from the asynchronous onset of viral gene expression (15) or the mix of infected and uninfected cells, can make the analysis even more difficult. Depending on the virus and the conditions of the infection, the relative proportions of viral and host transcripts can vary significantly, and this will influence the depth of sequencing required to achieve a robust signal for viral RNAs, a parameter that needs to be considered from the outset. Readers should be aware that while we have endeavored to cite the most pertinent peer-reviewed studies, the cutting-edge nature of several methodologies presented necessitates the citation of preprint publications. Over time, we anticipate all of these will be published and encourage readers to seek out the final peer-reviewed versions as they become available.

OPTIMIZING SHORT-READ SEQUENCING APPROACHES FOR VIRAL TRANSCRIPTOMES

The majority of short-read RNA-Seq studies document host responses to different perturbations or compare different cell types and tissues, each approach requiring a careful consideration of the experimental design to avoid batch effects and other confounding influences that are discussed in detail elsewhere (16, 17). Standard short-read RNA-Seq pipelines generate tens of millions of paired-end reads that are then aligned to the host genome and/or transcriptome. There are numerous variations of this general approach, designed to answer more specific questions such as the mapping of sites of transcript initiation (cap analysis gene expression sequencing [CAGE-Seq] [18]) or the placement of modified bases (N6-methyladenine sequencing [m6A-Seq] [19, 20]). While these variants are not dealt with in detail here (for a full review, see reference 21), many of the principles discussed will still apply.

Standard RNA-Seq is comparatively simple in terms of sample preparation and data analysis. Depending on the needs of the experiment, polyadenylated RNA (ostensibly, mRNA) is isolated from total RNA and used to construct sequencing libraries (Fig. 1A). Alternatively, the highly abundant ribosomal RNAs are removed and the remaining RNA is used for the library (Fig. 1B). The choice between these strategies is dictated by whether nonpolyadenylated host and/or viral RNAs are of interest in the study. With either option, the retained RNA is fragmented and used as a template to synthesize first- and second-strand cDNA, followed by end repair, the ligation of Illumina-compatible adaptors, and the indexing of individual samples to enable multiplexed sequencing on Illumina NextSeq, HiSeq, or NovaSeq platforms. The resulting sequence reads are aligned (22) to well-annotated genomes and processed to generate expression counts, a measure of the relative abundance of the corresponding transcript. For instance, transcripts per million (TPM) is used to specify the relative frequency of a given transcript in a population. The generation of the expression counts requires that genome annotations specify the limits of individual transcribed mRNAs, the coding sequence (CDS) within, and known splicing patterns. This is crucial to ensure reads are correctly assigned to a given transcription unit (23, 24), a step that becomes significantly more complicated where transcription units overlap. Annotations are available for the genomes of humans and the major model organisms but, more often than not, are missing or grossly oversimplified for viruses. As a result, alternative transcript structures can easily go undetected, and the presence of overlapping transcription units, which cannot be distinguished readily, can seriously confound expression level estimates and impair subsequent interpretations of the underlying biology. A partial solution is to increase the sequence read length by altering the RNA fragmentation step by using reduced temperature and fragmentation time and by increasing the number of cycles in the sequencing reaction (e.g., 200 to 300 cycles instead of the usual 75). Longer sequence reads can significantly improve the detection of splice site usage and the discrimination of transcripts originating from overlapping genes (25). Where viral transcripts are less abundant than those of the host, for instance, in clinical samples or where a low multiplicity of infection is involved, a targeted enrichment of viral nucleic acids offers an invaluable tool that can be integrated into standard RNA-Seq workflows (26). Prepared sequencing libraries are hybridized to a set of short (40- to 150-mer) overlapping biotinylated RNA or DNA probes complementary to the viral genome and isolated using streptavidin-coated magnetic beads (Fig. 1A). An important drawback is the requirement for extensive amplification by PCR to generate sufficient material for sequencing, and for this reason, it is crucial to accurately deduplicate mapped sequence reads following an alignment against a reference genome/transcriptome. Improvements in short-read RNA-seq combined with targeted enrichment can lead to exciting biological findings, such as the recent discovery of a varicella-zoster virus (VZV) latency-associated transcript, which has otherwise eluded conventional approaches to transcript identification (27).

FIG 1.

FIG 1

Comparison of major RNA sequencing methodologies. Viral transcriptome profiling by NGS can be performed using either short-read (Illumina) or long-read (PacBio and Nanopore) platforms. RNA-Seq of polyadenylated RNAs (A) or total RNA after rRNA depletion (B) enables profiling at a high resolution but requires a well-curated reference genome for analysis. (C) Single-cell RNA sequencing (scRNA-Seq) enables the profiling of gene expression within tens to thousands of individual cells, but sequencing is limited to the extreme 3′ end of each RNA and is less effective if viral transcript levels are low relative to host RNAs. Long-read sequencing can be achieved using either cDNA or RNA. (D) cDNA sequencing on the PacBio platform enables full-length sequencing from 5′ cap to the 3′ RNA cleavage site but includes amplification and size selection steps than can bias outputs. (E) Nanopore arrays can be utilized to directly sequence individual polyadenylated RNAs from the 3′ polyadenylated tail toward the 5′ cap and can potentially map RNA modifications and estimate poly(A) tail lengths. High error rates require dedicated correction and analysis pipelines. (F) The choice of sequencing platform is dictated by the required depth of sequencing, as these differ markedly in the numbers and lengths of reads generated. A schematic transcriptome plot denotes how different methodologies may impact data interpretation. Here, polyadenylated transcripts (blue bars) are consistently represented regardless of protocol choice, while nonadenylated transcripts (gray bars) are underrepresented/absent when using protocols that incorporate poly(A) selection or a cDNA priming step using poly(T) adaptors. Broad indications of the general advantages (+) and disadvantages (−) of each major sequencing protocol/platform are indicated. Recoding refers to the inclusion of steps involving reverse transcription or amplification by a thermostable DNA polymerase.

SINGLE-CELL RNA SEQUENCING

It has long been understood that infection by genetically identical bacteria or viruses can give rise to different outcomes, presumably reflecting differences in the responses of individual host cell genomes. The underlying assumption in all “bulk” RNA sequencing is that cellular populations are near homogenous or at least dominated by a specific cell type so that imputed changes in host expression can be meaningfully interpreted. The reality remains, however, that many experimental systems contain heterogeneous cell populations, and the ability to dissect host and viral transcriptomes in each of these offers a powerful new approach for understanding the biology of virus-host interactions. For instance, with a mixed population of neurons, single-cell RNA sequencing (scRNA-Seq) (Fig. 1C) theoretically enables users not only to classify discrete neuronal subpopulations (28) but also to examine whether viral infections (as measured by transcription) are restricted to specific neuronal subtypes and to identify unique markers of these subtypes while also exploring how the host transcriptome reacts to the presence of the virus.

The simultaneous interrogation of host and microbial transcriptomes within a single cell is fast becoming a reality, albeit tempered by some critical limitations (29). scRNA-Seq protocols vary in the degrees of throughput and sensitivity (reviewed in detail in reference 30). In general, the approach requires the sorting of dissociated cells into individual wells on a plate or chip or into individual oil droplets, where they can be mixed with barcoded primers that enable the conversion into cDNA. Each primer sequence incorporates a cell-specific barcode so that all subsequent reads from that one cell can be analyzed together (31). This approach frequently incorporates the use of unique molecular index (UMI) sequences, which enable the exclusion of duplicated sequence reads arising during PCR amplification from later analyses (32). Following lysis and first-strand synthesis, the samples are pooled and the final sequencing libraries are constructed. Paired-end sequencing results in one sequence read containing the cellular barcode (and UMI) and the other containing a short span of sequence mapping to the 3′ end of a given mRNA. Sequence alignment is often performed using STAR (22) or Kallisto (33), either as part of commercial (e.g., 10x Genomics Cell Ranger) or custom pipelines. Subsequent analyses aimed at identifying and stratifying host cell types by the expression of one or more markers (e.g., beta-tubulin III for neuronal lineage cells) and profiling differential expression are readily performed using any ever-growing list of tools, including Seurat (34), Monocle (35), and MAST (36), many of which are accompanied by excellent tutorials.

However, as with bulk RNA sequencing approaches, the current protocols for scRNA-Seq are optimized toward analyzing host mRNAs. The use of 3′ sequencing remains problematic for many viruses because of incomplete genome annotations, the presence of polycistronic gene arrays, and the presence of nonpolyadenylated RNAs, all of which can lead to valid sequence reads being erroneously removed or misassigned. Moreover, mapping and subsequent analyses of sequence reads generally require merging of the host and viral reference genomes to ensure that viral reads are retained and assigned to the correct host cell. Thus, there is a need to generate alternative viral genome annotations in which polycistronic gene arrays are collapsed into transcription units. Studies of herpesvirus latency or other low-abundance viral infections remain challenging, because viral mRNA abundance may often be below the level of detection in single cells and because viral markers of latency are not necessarily polyadenylated, as is the case for the stable intron derived from the herpes simplex virus (HSV) latency-associated transcript (37). Given the pace at which this field is progressing, many of these problems will likely be overcome and soon (30, 38); however, caution must be exercised in the experimental design and interpretation, with an awareness that off-the-shelf bioinformatics solutions are rarely suited to examining host-virus interactions.

RISKS AND REWARDS OF LONG-READ SEQUENCING

The concept of sequencing full-length RNAs (originally, as expressed sequenced tags [ESTs]) can be traced back to the early 1980s (39), and as a technique, this has continued to evolve in step with technological advancements in sequencing technologies (40). Today, the current iteration of long-read RNA-Seq enables the sequencing of polyadenylated mRNAs from the 3′ poly(A) tail toward the 5′ cap. While producing comparatively low numbers of reads when compared to that from Illumina sequencing, the generation of long sequence reads obviates the need to computationally stitch together sequence fragments in order to reconstruct the original transcripts. Long-read RNA-Seq has been used to catalog transcript variation through alternative splicing and to identify novel transcripts or transcript isoforms (41, 42). More importantly, when combined with short-read RNA-Seq and/or variant approaches such as CAGE-Seq, it enables fine detailing of viral transcriptomes at a very high resolution (4345).

Currently, there are two different options for long-read sequencing. Single-molecule real-time (SMRT) sequencing of cDNA using the Pacific Biosciences (PacBio) platform (Fig. 1D) represents the most popular approach but faces stiff competition from nanopore array sequencing (Oxford Nanopore Technologies MinION platform) of either cDNA or, most excitingly, the RNA itself (Fig. 1E) (13). While SMRT sequencing is well established, the relative complexity of constructing the libraries and the physical size of the PacBio sequencer requires most users to work with a core facility to generate the data. In contrast, the nanopore MinION has a very small footprint, can be run locally while attached to a standard laptop or desktop computer, and offers simple yet rapid library construction protocols. The sequencing of RNA without the conversion to cDNA is termed direct (dRNA-Seq) or native (nRNA-Seq) RNA sequencing (46).

Both approaches are currently highly constrained by the amount of starting material required (generally >500 ng of polyadenylated RNA) and produce comparatively few reads (less than one million reads per run), which limits the depth of sequencing. Likewise, both suffer from high error rates (47), although these are lower for SMRT sequencing, which also benefits from the dual capture of the 5′ cap and 3′ poly(A) tail, enabling the accurate mapping of transcription start and RNA cleavage sites. These advantages must be offset against a more involved sequencing protocol, which includes reverse transcription and PCR steps and the need to size-select fragments prior to sequencing. In contrast, the direct sequencing of polyadenylated RNA by using the nanopore arrays (MinION) platform combines a simple library preparation protocol (<2 h) with an overnight sequencing run. Here, a sequencing adapter is ligated to the poly(A) tail, enabling first-strand synthesis to produce a stable RNA:cDNA hybrid. A motor protein is then attached to the polyadenylated RNA strand, which is unwound from the cDNA and guided through protein nanopores embedded in a membrane. As each nucleotide is drawn through the pore, it disrupts the current, enabling the sequence to be read. This represents the most unbiased approach to RNA sequencing, as each individual read is generated from an individual polyadenylated RNA, avoiding any amplification steps.

Although a single nanopore MinION run can generate upwards of double the number of reads as SMRT sequencing, the error rate is, currently, notably higher (47). The accurate mapping of sequence reads first requires error correction, a complex proposition, to accurately map the extreme 5′ and 3′ ends of transcripts, as well as accurately identify sites of splicing. While aligning reads to a reference genome is relatively simple following the development of MiniMap2 (48), custom pipelines are often required to identify transcription start and RNA cleavage sites (44). A visual inspection of the data is crucial for identifying novel genes or splice variants, and users should be particularly aware of sequencing artefacts (signal loss/interruption) on nanopore platforms that can masquerade as excised introns. As base-calling and error-correction techniques improve, it seems likely, on the basis of the overall ease of use and affordability, that nanopore sequencing will become the favored long-read sequencing approach for transcriptomic studies. Enhancements such as the ability to estimate poly(A) tail lengths and to identify specific RNA modifications (such as N6-methyladenosine) are fast becoming a reality (49). Likewise, the ability to design custom adapters targeting RNA populations of interest will broaden the sequencing capabilities beyond polyadenylated mRNAs, as evidenced by the recent direct genome sequencing of RNA viruses such as influenza A virus (50).

As a final point, it should be kept in mind that, due to the comparatively small number of reads generated during each run, long-read sequencing is less useful when viral RNA yields are low. Likewise, the requirement for micrograms of total RNA as the input material will limit the infection models that are compatible with current long-read RNA-Seq methodologies. Whether this will be improved by the application of targeted enrichment approaches remains to be seen, although in the case of nanopore sequencing, this problem might be circumvented by use of higher capacity platforms such as GridION and PromethION. These offer far greater numbers of sequence reads per run, although at the time of writing, neither is compatible with direct RNA sequencing.

ROLE OF THE BIOINFORMATICIAN

Nowadays, many research labs have either direct (integrated into the research group) or indirect (as a collaboration or core facility) access to bioinformaticians who can turn raw sequencing data into lists of regulated genes supported by statistical significant values (P values) corrected for multiple testing. To ensure success, it is crucial to involve these individuals in the planning stages so that subsequent analyses can be tailored to the viral genome of choice and to avoid the pitfalls that come with applying “one size (does not) fits all” approaches. Planning discussions should address issues of reproducibility (biological replicates), batch effects, availability, and the quality of gene annotations for the organism(s) of interest. Establishing and optimizing analytical pipelines using test data sets prior to generating the final experimental data sets can also help to identify and preempt critical issues that might otherwise necessitate an experimental redesign and resequencing, an expensive proposition in both time and money. It is also critical that bioinformaticians move away from standard RNA-Seq analysis pathways when dealing with viruses and, with guidance, become aware of the biological characteristics and genome structure of the virus being studied. For instance, the existence of polycistronic arrays in herpesviruses limits scRNA-Seq gene expression analyses, because all transcripts generated across the polycistronic unit share the same 3′ end. This can significantly impact the alignment of viral reads to a transcriptome, because many scRNA-Seq software packages will by default discard reads that map identically against the 3′ ends of multiple transcripts. Thus, it is necessary to represent polycistronic genes that share the same 3′ ends as a single transcription unit, which diminishes the yield of biologically relevant information. Naturally, it is also critical that the 3′ ends of these transcription units are accurately mapped prior to embarking upon scRNA-Seq projects; otherwise, meaningful biological data will likely end up being discarded. This is a frequent problem with viral annotations specifying only the boundaries of the coding sequences (ORFs) rather than the transcript as a whole. The reads obtained during scRNA-Seq are typically limited to the 3′ untranscribed region (UTR) and may not be correctly assigned to a recognized gene. Another potential confounder is that many viral genomes contain duplicated regions which can result in short or long sequence reads being automatically discarded (no single mapping location) or their distribution distorted (sequence reads not allocated correctly between duplicated units), which can influence resulting TPM counts.

Another important consideration is that while viral genomes exhibit a wide range of sizes, they are orders of magnitude smaller than the genomes of their hosts. This enables the generation of linear or circular genome-wide coverage plots using R packages (e.g., Gviz [51] or Circos [52]) that provide an easy-to-assimilate overview of transcription patterns across the viral genome. It is reasonably straightforward, and crucial, to examine aligned sequence data by loading full data sets into visualization tools such as IGV, the UCSC genome browser, Tablet (5355), or through the use of R packages such as Gviz (51) and ggbio (56). Using these graphical outputs, the read data can be quickly inspected and analyzed against current gene annotations, as this may reveal areas within the genome that were not previously known to be transcribed or that show evidence of alternative transcript structures. This can provide the first clue to novel coding units or noncoding RNAs that were missed by previous annotators focused on identifying sizeable single-exon open reading frames.

WHERE WE ARE AND WHERE WE ARE GOING

The ability to directly sequence full-length RNAs within individual infected cells while retaining spatial and temporal information seems like science fiction. However, the speed at which nanopore and single-cell transcriptome sequencing technologies are developing seems certain to make this a reality, and soon (57). Applying these methodologies to understanding virus-host interactions remains a formidable challenge, but the increasing integration of computational biologists into experimental biology labs raises the prospects of many exciting breakthroughs for virology using NGS methodologies. The use of NGS approaches to follow sequence variation within viral populations is advancing at a breath-taking pace, and it seems inevitable that studies of viral gene expression will follow a similar trajectory. The ability to enrich for viral transcripts and perform full-length sequencing of RNA without cDNA conversion or PCR amplification will be a game changer, especially if this can be performed at the single-cell level.

The decision to use short-read or long-read sequencing approaches remains complex and is often influenced by whether or not the study addresses changes to the host transcriptome or focuses on the virus and whether the profiling of the polyadenylated RNA fraction is sufficient for the experimental goals. Having an adequate quantity and quality of starting material is also a major factor, as the power of long-read sequencing is nullified if RNA yields are low or there is significant RNA degradation. While long-read sequencing presents fresh challenges in terms of read alignment, experimental validation, and, ultimately, the determination of the biological significance of rarer transcript isoforms, we believe this to be advantageous to short-read sequencing when seeking to analyze complex viral transcriptomes, where sufficient material is available. We further anticipate that the speed of developments within the long-read sequencing field will soon yield new approaches to working with smaller amounts of input RNA and/or incorporating steps that enable the enrichment of viral transcripts, although these will require careful evaluation and optimization. The relatively low cost of long-read sequencing approaches makes the integration of both short- and long-read sequencing methodologies an affordable option, maximizing the benefits of both, especially when error correction remains key to the analysis of long-read sequencing data.

Just as NGS technologies are revolutionizing other fields, including molecular epidemiology, pathogen surveillance, and cancer biology, it is incumbent on the wider virology research community to embrace the most recent viral genome annotation/reannotation projects (5861) and, most importantly, to incorporate the findings of new transcriptional profiling work into ongoing studies.

ACKNOWLEDGMENTS

We thank Cristina Venturini, Werner J. D. Ouwendijk, and the two anonymous referees for providing valuable feedback on the manuscript.

This work was supported in part by grants from the NIH (AI073898, GM05692, and AI130618).

REFERENCES

  • 1.Walsh D, Mohr I. 2011. Viral subversion of the host protein synthesis machinery. Nat Rev Microbiol 9:860–875. doi: 10.1038/nrmicro2655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Abernathy E, Gilbertson S, Alla R, Glaunsinger B. 2015. Viral nucleases induce an mRNA degradation-transcription feedback loop in mammalian cells. Cell Host Microbe 18:243–253. doi: 10.1016/j.chom.2015.06.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Glaunsinger B, Ganem D. 2004. Lytic KSHV infection inhibits host gene expression by accelerating global mRNA turnover. Mol Cell 13:713–723. doi: 10.1016/S1097-2765(04)00091-7. [DOI] [PubMed] [Google Scholar]
  • 4.Bercovich-Kinori A, Tai J, Gelbart IA, Shitrit A, Ben-Moshe S, Drori Y, Itzkovitz S, Mandelboim M, Stern-Ginossar N. 2016. A systematic view on influenza induced host shutoff. Elife 5:e18311. doi: 10.7554/eLife.18311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Rowe M, Glaunsinger B, van Leeuwen D, Zuo J, Sweetman D, Ganem D, Middeldorp J, Wiertz EJHJ, Ressing ME. 2007. Host shutoff during productive Epstein-Barr virus infection is mediated by BGLF5 and may contribute to immune evasion. Proc Natl Acad Sci U S A 104:3366–3371. doi: 10.1073/pnas.0611128104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sato H, Callanan LD, Pesnicak L, Krogmann T, Cohen JI. 2002. Varicella-Zoster virus (VZV) ORF17 protein induces RNA cleavage and is critical for replication of VZV at 37°C but not 33°C. J Virol 76:11012–11023. doi: 10.1128/JVI.76.21.11012-11023.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lin HW, Chang YY, Wong ML, Lin JW, Chang TJ. 2004. Functional analysis of virion host shutoff protein of pseudorabies virus. Virology 324:412–418. doi: 10.1016/j.virol.2004.04.015. [DOI] [PubMed] [Google Scholar]
  • 8.Rice AP, Roberts BE. 1983. Vaccinia virus induces cellular mRNA degradation. J Virol 47:529–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tanaka T, Kamitani W, DeDiego ML, Enjuanes L, Matsuura Y. 2012. Severe acute respiratory syndrome coronavirus nsp1 facilitates efficient propagation in cells through a specific translational shutoff of host mRNA. J Virol 86:11128–11137. doi: 10.1128/JVI.01700-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ullmer W, Semler BL. 2016. Diverse strategies used by picornaviruses to escape host RNA decay pathways. Viruses 8:335. doi: 10.3390/v8120335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rutkowski AJ, Erhard F, L’Hernault A, Bonfert T, Schilhabel M, Crump C, Rosenstiel P, Efstathiou S, Zimmer R, Friedel CC, Dölken L. 2015. Widespread disruption of host transcription termination in HSV-1 infection. Nat Commun 6:7126. doi: 10.1038/ncomms8126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Stern-Ginossar N, Weisburd B, Michalski A, Le VT, Hein MY, Huang S-XX, Ma M, Shen B, Qian S-BB, Hengel H, Mann M, Ingolia NT, Weissman JS. 2012. Decoding human cytomegalovirus. Science 338:1088–1093. doi: 10.1126/science.1227919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, Pantic N, Admassu T, James P, Warland A, Jordan M, Ciccone J, Serra S, Keenan J, Martin S, McNeill L, Wallace EJ, Jayasinghe L, Wright C, Blasco J, Young S, Brocklebank D, Juul S, Clarke J, Heron AJ, Turner DJ. 2018. Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods 15:201–206. doi: 10.1038/nmeth.4577. [DOI] [PubMed] [Google Scholar]
  • 14.Renner DW, Szpara ML. 2017. The impacts of genome-wide analyses on our understanding of human herpesvirus diversity and evolution. J Virol 92:e00908-17. doi: 10.1128/JVI.00908-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ralph M, Bednarchik M, Tomer E, Rafael D, Zargarian S, Gerlic M, Kobiler O. 2017. Promoting simultaneous onset of viral gene expression among cells infected with herpes simplex virus-1. Front Microbiol 8:2152. doi: 10.3389/fmicb.2017.02152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chatterjee A, Ahn A, Rodger EJ, Stockwell PA, Eccles MR. 2018. A guide for designing and analyzing RNA-Seq data, p 35–80. In Raghavachari N, Garcia-Reyero N (ed), Gene expression analysis: methods and protocols. Springer New York, New York, NY. [DOI] [PubMed] [Google Scholar]
  • 17.Passow CN, Kono TJY, Stahl BA, Jaggard JB, Keene AC, McGaugh SE. 2018. RNAlater and flash freezing storage methods nonrandomly influence observed gene expression in RNAseq experiments. bioRxiv doi: 10.1101/379834. [DOI]
  • 18.Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T, Fukuda S, Sasaki D, Podhajska A, Harbers M, Kawai J, Carninci P, Hayashizaki Y. 2003. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100:15776–15781. doi: 10.1073/pnas.2136655100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M, Sorek R, Rechavi G. 2012. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485:201–206. doi: 10.1038/nature11112. [DOI] [PubMed] [Google Scholar]
  • 20.Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. 2012. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell 149:1635–1646. doi: 10.1016/j.cell.2012.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Reuter JA, Spacek DV, Snyder MP. 2015. High-throughput sequencing technologies. Mol Cell 58:586–597. doi: 10.1016/j.molcel.2015.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2013. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Liao Y, Smyth GK, Shi W. 2014. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
  • 25.Chhangawala S, Rudy G, Mason CE, Rosenfeld JA. 2015. The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol 16:131. doi: 10.1186/s13059-015-0697-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cheng S, Caviness K, Buehler J, Smithey M, Nikolich-Žugich J, Goodrum F. 2017. Transcriptome-wide characterization of human cytomegalovirus in natural infection and experimental latency. Proc Natl Acad Sci U S A 114:E10586–E10595. doi: 10.1073/pnas.1710522114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Depledge DP, Ouwendijk WJD, Sadaoka T, Braspenning SE, Mori Y, Cohrs RJ, Verjans GMGM, Breuer J. 2018. A spliced latency-associated VZV transcript maps antisense to the viral transactivator gene 61. Nat Commun 9:1167. doi: 10.1038/s41467-018-03569-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lake BB, Ai R, Kaeser GE, Salathia NS, Yung YC, Liu R, Wildberg A, Gao D, Fung HL, Chen S, Vijayaraghavan R, Wong J, Chen A, Sheng X, Kaper F, Shen R, Ronaghi M, Fan JB, Wang W, Chun J, Zhang K. 2016. Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352:1586–1590. doi: 10.1126/science.aaf1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Avital G, Avraham R, Fan A, Hashimshony T, Hung DT, Yanai I. 2017. scDual-Seq: mapping the gene regulatory program of Salmonella infection by host and pathogen single-cell RNA-sequencing. Genome Biol 18:200. doi: 10.1186/s13059-017-1340-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Picelli S. 2017. Single-cell RNA-sequencing: the future of genome biology is now. RNA Biol 14:637–650. doi: 10.1080/15476286.2016.1201618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, Schnall-Levin M, Wyatt PW, Hindson CM, Bharadwaj R, Wong A, Ness KD, Beppu LW, Deeg HJ, McFarland C, Loeb KR, Valente WJ, Ericson NG, Stevens EA, Radich JP, Mikkelsen TS, Hindson BJ, Bielas JH. 2017. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049. doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lönnerberg P, Linnarsson S. 2014. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11:163. doi: 10.1038/nmeth.2772. [DOI] [PubMed] [Google Scholar]
  • 33.Bray NL, Pimentel H, Melsted P, Pachter L. 2016. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
  • 34.Satija R, Farrell JA, Gennert D, Schier AF, Regev A. 2015. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33:495–502. doi: 10.1038/nbt.3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL. 2014. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, Slichter CK, Miller HW, McElrath MJ, Prlic M, Linsley PS, Gottardo R. 2015. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16:278. doi: 10.1186/s13059-015-0844-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Phelan D, Barrozo ER, Bloom DC. 2017. HSV1 latent transcription and non-coding RNA: a critical retrospective. J Neuroimmunol 308:65–101. doi: 10.1016/j.jneuroim.2017.03.002. [DOI] [PubMed] [Google Scholar]
  • 38.Hayashi T, Ozaki H, Sasagawa Y, Umeda M, Danno H, Nikaido I. 2018. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nat Commun 9:619. doi: 10.1038/s41467-018-02866-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Putney SD, Herlihy WC, Schimmel P. 1983. A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing. Nature 302:718–721. doi: 10.1038/302718a0. [DOI] [PubMed] [Google Scholar]
  • 40.Adams M, Kelley J, Gocayne J, Dubnick M, Polymeropoulos M, Xiao H, Merril C, Wu A, Olde B, Moreno R, et al. 1991. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656. doi: 10.1126/science.2047873. [DOI] [PubMed] [Google Scholar]
  • 41.Balázs Z, Tombácz D, Szűcs A, Csabai Z, Megyeri K, Petrov AN, Snyder M, Boldogkői Z. 2017. Long-read sequencing of human cytomegalovirus transcriptome reveals RNA isoforms carrying distinct coding potentials. Sci Rep 7:15989. doi: 10.1038/s41598-017-16262-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tombácz D, Csabai Z, Szűcs A, Balázs Z, Moldován N, Sharon D, Snyder M, Boldogkői Z. 2017. Long-read isoform sequencing reveals a hidden complexity of the transcriptional landscape of herpes simplex virus type 1. Front Microbiol 8:1079. doi: 10.3389/fmicb.2017.01079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.O'Grady T, Wang X, Höner zu Bentrup K, Baddoo M, Concha M, Flemington EK. 2016. Global transcript structure resolution of high gene density genomes through multi-platform data integration. Nucleic Acids Res 44:e145. doi: 10.1093/nar/gkw629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Depledge DP, Kalanghad Puthankalam S, Sadaoka T, Beady D, Mori Y, Placantonakis D, Mohr I, Wilson A. 2018. Native RNA sequencing on nanopore arrays redefines the transcriptional complexity of a viral pathogen. bioRxiv doi: 10.1101/373522. [DOI] [PMC free article] [PubMed]
  • 45.Moldován N, Tombácz D, Szűcs A, Csabai Z, Snyder M, Boldogkői Z. 2017. Multi-platform sequencing approach reveals a novel transcriptome profile in pseudorabies virus. Front Microbiol 8:2708. doi: 10.3389/fmicb.2017.02708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hussain S. 2018. Native RNA-sequencing throws its hat into the transcriptomics ring. Trends Biochem Sci 43:225–227. doi: 10.1016/j.tibs.2018.02.007. [DOI] [PubMed] [Google Scholar]
  • 47.Weirather JL, de Cesare M, Wang Y, Piazza P, Sebastiano V, Wang X-J, Buck D, Au KF. 2017. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 6:100. doi: 10.12688/f1000research.10571.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Stoiber MH, Quick J, Egan R, Lee JE, Celniker SE, Neely R, Loman N, Pennacchio L, Brown JB. 2017. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. bioRxiv doi: 10.1101/094672. [DOI]
  • 50.Keller MW, Rambo-Martin BL, Wilson MM, Ridenour CA, Shepard SS, Stark TJ, Neuhaus EB, Dugan VG, Wentworth DE, Barnes JR. 2018. Direct RNA sequencing of the coding complete influenza A virus genome. Sci Rep 8:14408. doi: 10.1038/s41598-018-32615-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hahne F, Ivanek R. 2016. Visualizing genomic data using Gviz and bioconductor. Methods Mol Biol 1418:335–351. doi: 10.1007/978-1-4939-3578-9_16. [DOI] [PubMed] [Google Scholar]
  • 52.Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. 2009. Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. 2011. Integrative genomics viewer. Nat Biotechnol 29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler AD. 2002. The Human Genome Browser at UCSC. Genome Res 12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Milne I, Bayer M, Stephen G, Cardle L, Marshall D. 2016. Tablet: visualizing next-generation sequence assemblies and mappings. Methods Mol Biol 1374:253–268. doi: 10.1007/978-1-4939-3167-5_14. [DOI] [PubMed] [Google Scholar]
  • 56.Yin T, Cook D, Lawrence M. 2012. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol 13:R77. doi: 10.1186/gb-2012-13-8-r77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Volden R, Palmer T, Byrne A, Cole C, Schmitz RJ, Green RE, Vollmers C. 2018. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc Natl Acad Sci U S A 115:9726–9731. doi: 10.1073/pnas.1806447115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Arias C, Weisburd B, Stern-Ginossar N, Mercier A, Madrid AS, Bellare P, Holdorf M, Weissman JS, Ganem D. 2014. KSHV 2.0: a comprehensive annotation of the Kaposi’s sarcoma-associated herpesvirus genome using next-generation sequencing reveals novel genomic and functional features. PLoS Pathog 10:e1003847. doi: 10.1371/journal.ppat.1003847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Irigoyen N, Firth AE, Jones JD, Chung BYW, Siddell SG, Brierley I. 2016. High-resolution analysis of coronavirus gene expression by RNA sequencing and ribosome profiling. PLoS Pathog 12:e1005473. doi: 10.1371/journal.ppat.1005473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Stutika C, Gogol-Döring A, Botschen L, Mietzsch M, Weger S, Feldkamp M, Chen W, Heilbronn R. 2016. A comprehensive RNA sequencing analysis of the adeno-associated virus (AAV) type 2 transcriptome reveals novel AAV transcripts, splice variants, and derived proteins. J Virol 90:1278–1289. doi: 10.1128/JVI.02750-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bruce GA, Barcy S, DiMaio T, Gan E, Garrigues HJ, Lagunoff M, Rose TM. 2017. Quantitative analysis of the KSHV transcriptome following primary infection of blood and lymphatic endothelial cells. Pathogens 6:e11. doi: 10.3390/pathogens6010011. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES