Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 1.
Published in final edited form as: Curr Protoc Microbiol. 2020 Jun;57(1):e99. doi: 10.1002/cpmc.99

Using Direct RNA Nanopore Sequencing to deconvolute Viral Transcriptomes

Daniel P Depledge 1, Angus C Wilson 2
PMCID: PMC7187905  NIHMSID: NIHMS1576823  PMID: 32255550

Abstract

The genomes of DNA viruses encode deceptively complex transcriptomes evolved to maximise coding potential within the confines of a relatively small genome. Defining the full range of viral RNAs produced during an infection is key to understanding the viral replication cycle and its interactions with the host cell. Traditional short-read (Illumina) sequencing approaches are problematic in this setting, due to the difficulty of assigning short reads to individual RNAs in regions of transcript overlap and to the biases introduced by the required recoding and amplification steps. Additionally, different methodologies may be required to analyse the 5’ and 3’ ends of RNAs, which increases both cost and effort. The advent of long-read nanopore sequencing simplifies this approach by providing a single assay that captures and sequences full length RNAs, either in cDNA or native RNA form. The latter is particularly appealing as it reduces known recoding biases whilst allowing more advanced analyses such as estimation of poly(A) tail length and the detection of RNA modifications including N6-methyladenosine. Using herpes simplex virus (HSV)-infected primary fibroblasts as template we provide a step-by-step guide to the production of direct RNA sequencing libraries suitable for sequencing using Oxford Nanopore Technologies platforms and provide a simple computational approach to deriving a high-quality annotation of the HSV transcriptome from the resulting sequencing data.

Keywords: Herpesvirus, DNA virus, nanopore, RNA sequencing, annotation

INTRODUCTION

Recent studies using high throughput sequencing (HTS) platforms have revealed an unanticipated complexity to the transcriptomes of many well-studied viruses and has fostered a growing appreciation for sequence variation between virus strains and isolates (Akhtar et al., 2019; Boldogkői, Moldován, Balázs, Snyder, & Tombácz, 2019; Depledge et al., 2014; Loiseau et al., 2020; Prusty & Whisnant, 2020). Compared to the genome of the host cell, viral genomes typically exhibit a higher gene density (the ratio of the number of genes per the number of nucleotides), presumably reflecting the need for a wide variety of regulatory functions balanced against limits on the overall size of the viral genetic material, which must be replicated efficiently and ultimately packaged into a proteinaceous capsid for delivery into new host cells (Mahmoudabadi & Phillips, 2018). The crowding of genetic units as well as frequent use of overlapping transcripts, alternative transcriptional start and stop sites and, in the case of many nuclear-replicating viruses, the deployment of complex splicing patterns, means that the actual transcriptional landscape can extremely difficult to predict from the genomic sequence alone.

The remarkable clustering of overlapping transcripts can be illustrated by the E3 transcription unit of adenovirus (Fig. 1A). Here, all of the transcripts initiate from a common transcription start site (TSS) but undergo different combinations of alternative RNA splicing and terminate at one of three possible cleavage and polyadenylation sites (CPAS). As a consequence, at least eight distinct mRNAs are produced from this one transcription unit, each of which encodes a unique protein product. The first and second exons are included within all of the transcripts but the coding potential of this section differs depending on the structure of the transcript. Alternatively, some viral loci may encode a series of overlapping transcripts that initiate from distinct promoters but share a single CPAS. This is a recurring arrangement in herpes simplex virus type 1 (HSV-1) and is most clearly demonstrated by the UL24–26 gene locus, where in multiple mRNA encoding different protein product share an extensive 3’ region (Fig. 1B). One additional mechanism to generate transcript diversity is by read-through transcription as illustrated by the RL2/UL1 region of HSV-1 (Fig. 1C). Here transcripts initiating at the RL2 TSS can either terminate downstream of the canonical spliced open reading frame (ORF) for the viral E3 ligase ICP0 or continue downstream into the UL1, UL2 and UL3 gene cluster and terminate at a CPAS used by the canonical UL1 and UL2 transcripts. Splicing between the second intron of RL2 and a cryptic splice acceptor site within UL1 results in fusion transcript that encodes a readily detectable chimeric protein consisting of the amino terminus of ICP0 and carboxy terminus of glycoprotein L (Depledge et al., 2019).

Figure 1. Illustrating the complexity of transcriptional outputs from gene loci from two unrelated DNA viruses.

Figure 1

Alignment of dRNA-Seq data against viral genomes can be used to produce coverage plots (beige, blue-gray) integrated with pileup analyses of sequence read termini (5’ end – red, 3’ end – black). Peak-calling of the pileup data yields putative transcription start site (TSS, red vertical boxes) and cleavage and polyadenylation sites (CPAS, black boxes) that are filtered to remove artefacts occurring at splice junctions (yellow boxes). Full length mRNAs identified by dRNA-seq are depicted in green with untranslated regions (UTRs) shown as narrow horizontal boxes and coding sequences (CDS) shown as broader boxes. Spliced introns are shown as narrow lines. (A) Example of shared TSS. The adenovirus strain 5 (Ad5) E3 locus specifies at least eight distinct polyadenylated RNAs, each encoding a unique protein product (Price, Hayer, Depledge, et al., 2019). They share a common transcription start site (TSS) but differ in their splicing patterns and sites of cleavage and polyadenylation (CPAS). Non-E3 transcripts that overlap in this region are shown in gray. (B) Example of shared CPAS. HSV-1 produces transcripts that differ in their TSS and splicing patterns but that make use of a shared CPAS. This is exemplified in the gene cluster encoding transcripts for UL24, UL25, and UL26. (C) Example of readthrough-derived fusion transcripts. HSV-1 also produces fusion transcripts that are generated following readthrough transcription and splicing. This is exemplified using transcripts spanning the RL2 (ICP0) and clustered UL1/UL2/UL3 loci (Depledge et al., 2019).

In terms of molecular analyses (profiling) the situation can become more complicated when applied to viruses with double-stranded DNA genomes such as herpesviruses, poxviruses, parvoviruses, papillomaviruses, and adenoviruses, because transcription can occur on both strands across the same region of the genome. Temporal analyses of viral gene expression using hybridization to oligonucleotide or cDNA probes (microarrays) or by primer-directed amplification (reverse-transcription-qPCR) can be significantly compromised unless the probes and primer sets are carefully designed to distinguish between these overlapping but functionally distinct transcripts.

Recognizing the potential for expanded coding capacity complexity, a number of laboratories, including our own, have begun to revisit reference genome annotations with the goal of better understanding the transcriptional landscape during viral infections (Arias et al., 2014; Concha et al., 2012; Depledge et al., 2019; Finkel et al., 2020; Greninger et al., 2018; O’Grady et al., 2019, 2016; Price, Hayer, Depledge, Wilson, & Weitzman, 2019; Tombácz et al., 2018; Whisnant et al., 2019). These efforts have often combined a number of different profiling techniques to preferentially map landmarks such as 5’ and 3’ ends, splice junctions or ribosome occupancy and along with comparative genomics have led to the discovery of tens or hundreds of new viral proteins, numerous functionally important non-coding RNAs, and use of new layers of gene regulation such as translational initiation at alternative start codons (reviewed in (Prusty & Whisnant, 2020)). For transcript structure determination, comparatively inexpensive short read sequencing methods such as the Illumina platform, provide superb accuracy and thus remain the protocol of choice for cellular genes but are less useful for the determination of viral transcriptomes (reviewed in (Depledge, Mohr, & Wilson, 2018)).

When applied to viral transcriptomes, the chief limitation of Illumina sequencing is the fact that the reads are relatively short and when mapped to the reference genome often cannot distinguish between overlapping RNAs. As a consequence, the reads may be incorrectly assigned to what is effectively a composite transcript. Likewise, reads corresponding to different sections of an RNA are not necessarily linked. This means that information on splice site usage is uncoupled from information about the 5’ and 3’ ends of the transcript (Steijger et al., 2013). Furthermore, short read sequencing involves copying the RNA into cDNA resulting into misidentification of 3′ ends through internal priming (Jan, Friedman, Ruby, & Bartel, 2011), or the reporting of spurious antisense and splicing events produced by template switching (Houseley & Tollervey, 2010). Lastly, recoding into cDNA can preclude the detection of certain base modifications (Helm & Motorin, 2017).

Fortunately, direct RNA sequencing (dRNA-seq) using nanopore arrays offers an exciting, affordable alternative whereby individual polyadenylated RNAs are sequenced without the recoding and amplification biases inherent to other methodologies (Schirmer et al., 2015). In brief, mRNAs are captured using the poly(A) tail and ligated to an adaptor that primes a reverse transcription step to generate a stabilizing cDNA strand (Fig. 2). A second adaptor is added to covalently link to a motor protein (helicase) that unwinds the RNA-cDNA hybrid and selectively draws the RNA through nanopores embedded in a flow cell membrane. A continuous flow of ions through the pores is disturbed by the passage of the RNA and the resulting changes in current are recorded and used by a proprietary neural network algorithm to predict the corresponding bases.

Figure 2. Workflow for direct RNA sequencing of virus-infected cells.

Figure 2.

(A) Infected cells are harvested by lysis in the strong denaturant TRIzol and the RNA separated from DNA, proteins and other cellular components by organic extraction and recovered by isopropanol precipitation. (B) The poly(A) fraction of RNA is enriched using oligo dT-coupled magnetic beads. (C) The nanopore RTA adapter is ligated to polyadenylated RNAs and is followed by reverse transcription that produces a complementary cDNA strand that stabilizes the RNA but is itself not sequenced. (D) A further ligation step adds a motor protein (helicase) to the RTA adapter and this (E) allows unwinding of the RNA:cDNA complex at the pore proteins that form the channels embedded in the membrane of the flow cell within the nanopore array sequencer. Traversal of the RNA through a pore disrupts the flow of current and these signal changes allow base calling of the RNA.

An important limitation to nanopore sequencing is the high error rate (Garalde et al., 2018). With respect to viral transcriptome analysis, these errors can confound the identification of open reading frames and splice site prediction. Improvements to both the nanopore chemistry and the supporting computational tools have raised accuracy to around 93 – 95%, which is still below the 98–99% accuracy of the newest PacBio long-read sequencing and the ~99.9% accuracy achievable with Illumina sequencing (Rang, Kloosterman, & de Ridder, 2018). To address this, we and others have developed computational error-correction methods that use high quality Illumina data from the same RNA preparations as a reference against which nanopore data is compared (Depledge et al., 2019; Tang et al., 2018). While these enhance the accuracy significantly, the added cost of sequencing Illumina libraries and the computational time required for error-correction reduces the viability of these methods in the longer term.

The aim of this article is to provide researchers with the tools required to infect cells, harvest RNA, isolate the poly(A) fraction, prepare and sequence a dRNA-Seq library, and subsequently apply a variety of informatics analyses that can be used to deconvolute transcript structures and perform reliable gene expression analyses. The specific methods outlined herein will allow investigators to rapidly interrogate dsDNA viral transcriptomes at a previously unattainable resolution by combining simple molecular biology techniques with an ultra-portable sequencing unit. The approach is amenable to any experimental system from which sufficient quantities of high-quality RNA can be isolated and its utility has been demonstrated for high resolution reannotation projects and RNA isoform expression analysis to transcript discovery and the detection of RNA modifications at nucleotide resolution (Depledge et al., 2019; Price, Hayer, Depledge, et al., 2019; Price, Hayer, McIntyre, et al., 2019). As a framework for the protocols we focus on the model human pathogen, herpes simplex virus type 1 (HSV-1), a double-stranded DNA virus that causes a number of diseases ranging from painful lesions to potentially fatal viral encephalitis (Whitley & Roizman, 2001). HSV-1 replicates in a variety of mammalian cell types including primary dermal fibroblasts, which are described here for illustrative purposes. These basic protocols can be easily adapted to infections of other primary and transformed cells or to other DNA viruses.

CAUTION: Human herpes simplex virus (HSV) is a Biosafety Level 2 (BSL-2) pathogen. Follow all institutional guidelines and local regulations for the use and handling of pathogenic viruses.

NOTE: All solutions and equipment coming into contact with living cells must be sterile. Aseptic technique should be used accordingly.

NOTE: All culture incubations should be performed in a 37°C, 5% CO2 humidified incubator unless otherwise specified.

BASIC PROTOCOL 1

PRODUCTIVE INFECTION OF PRIMARY FIBROBLASTS WITH HERPES SIMPLEX VIRUS

Here, the aim is to infect cultures of normal human dermal fibroblasts (NHDFs) with herpes simplex virus type 1 (HSV-1) at a high multiplicity of infection (MOI) and allow infection to proceed long enough for viral mRNAs to become prevalent within the total mRNA population. NHDF cultures are widely used because they can be easily established from patient biopsy material, are readily maintained and require no purification of the cells prior to culture. For guidance on the preparation of infectious virus stocks see (Blaho, Morton, & Yedowitz, 2006).

Materials

Cells of interest (normal human dermal fibroblasts (NHDFs), Lonza, CC-2509)

Herpes Simplex Virus type 1 (stock of known infectious titre)

DMEM/5% FBS (see recipe)

Phosphate buffered saline (PBS)

Sterile 10-cm tissue culture grade dishes

Sterile disposal pipettes

Hand-held battery-operated pipetting device (e.g., Pipet-Aid; Drummond Scientific)

Hemocytometer or automated cell counter

37°C, 5% CO2 incubator

Aspirator

low speed centrifuge

37°C water bath

  • 1

    Resuspend 3×106 NHDFs in 10 ml DMEM/5% FBS and pipette across the surface of a 10-cm tissue culture dish. Incubate overnight until the cell monolayer covers 80–100% of the plate surface.

10-cm dishes are recommended as these typically yield 50–100 μg of total RNA – an optimal amount for one to two dRNA-Seq library preparations.

  • 2

    Remove media from 10-cm dish. Infect cells at the desired multiplicity of infection (MOI) by adding viral stock to 3 ml DMEM/2.5% FBS and then adding to dish. Incubate for 90 mins with gently rocking by hand every 30 mins.

A MOI of 3 is optimal to ensure every cell is infected with at least one replication-competent virion. Note that the reduced FBS content improves infectivity.

  • 3

    Replace infection media with 10 ml DMEM/2.5% FBS and incubate for the desired period

The longer the infection, the greater the proportion of viral RNA will be present. This must be counterbalanced against the impact on cell viability (remaining attached to the dish) and the experimental goals, which may require harvesting at early timepoints. For HSV-1 at a MOI of 3, we incubate infected cells for up to 20 hours prior to harvesting RNA.

SUPPORT PROTOCOL 1

CELL PASSAGE AND PLATING OF PRIMARY FIBROBLASTS

Primary cells such as NHDFs are not life extended and can only be propagated and expanded for a finite number of generations before they undergo senescence. Typically, upon receipt from the vendor as frozen vials, we amplify the NHDFs through six successive passages (p6) which we then aliquot and cryopreserve as working stocks that can be stored long-term in liquid nitrogen. Once re-thawed, we will propagate a stock for a maximum of twenty passages (p20) and then discard. Cultures should be monitored for cell density and general health by 10x phase-contrast microscopy and be refed with fresh, warmed complete DMEM media every 2–3 days.

Thawing

  • 1.

    Place the cryogenic vial in a 37°C water bath. Do not let the cell suspension thaw completely. A small ice pellet should remain.

  • 2.

    Using a sterile 5 ml pipette, transfer the cell suspension from the cryogenic vial into a 50 mL tube containing 5–10 ml of warm (37°C) DMEM/5% FBS.

  • 4.

    Rinse the cryogenic vial with 1–1.5 ml of the suspension to recover any remaining cells.

  • 5.

    Centrifuge (200 × g) the NHDFs suspension for 5 min at room temperature.

  • 6.

    Remove the supernatant and resuspend NHDFs in 10 ml of warm (37°C) DMEM/5% FBS.

  • 7.

    Determine cell number per unit volume using a hemocytometer or automated cell counter.

Cell viability is expected to be greater than 80% and can be monitored by trypan blue exclusion.

  • 8.

    Centrifuge (200 × g) the NHDF suspension for 5 min at room temperature.

  • 9.

    Remove the supernatant and resuspend NHF at the desired concentration in DMEM/5% FBS.

  • 10.

    Seed NHDFs at 1 × 106 cells per 10-cm dish directly into the culture medium. Use 10 ml of total medium volume per 10-cm dish.

Passage (subculture)

  1. Remove the culture medium.

  2. Rinse NHDFs in 10-cm dish with 5 ml PBS. Aspirate with sterile pipette to remove.

  3. Add 2 ml of trypsin/EDTA into the 10-cm dish.

  4. Incubate for 5 min at 37°C until all NHDF are completely detached from the flask (verify cell detachment using a phase-contrast microscope).

  5. Neutralize the trypsin by adding 5 ml of complete DMEM medium, and rinse rest from plate.

  6. Vigorously pipette the NHDF suspension up and down at least five times to ensure suspension homogeneity.

  7. Transfer the NHDF suspension into a 15 ml tube.

  8. Centrifuge (200 × g) the NHDF suspension for 5 min at room temperature.

  9. Remove the supernatant and resuspend NHF in complete DMEM medium.

  10. Count cells using an automated cell counter or a hemocytometer. Use trypan blue staining to estimate cell viability. Cell viability is expected to be greater than 95%.

  11. Seed NHDFs at 1×106 – 3×106 cells per 10-cm dish directly into the culture medium. Use 10 ml of total medium volume per 10-cm dish.

Cryopreservation

  1. Fill a freezing container with 100% isopropyl alcohol and store it at 4°C until cooled.

  2. Follow steps 1 through 11 from Subheading 3.4.

  3. Remove the supernatant and resuspend NHDFs at the desired concentration in cryopreservation medium. Put the tube on ice.

  4. Aliquot in cryogenic vials on ice.

  5. Put the cryogenic vials in the freezing container.

  6. Store the container overnight at −80°C. In these conditions, cell temperature should drop by about 1°C per minute.

  7. Store cryogenic vials in liquid nitrogen for long-term storage.

BASIC PROTOCOL 2

PREPARATION AND SEQUENCING OF dRNA-SEQ LIBRARIES FROM VIRUS-INFECTED CELLS

The protocol described below uses HSV-1 infected NHDFs as an exemplar but can be broadly applied to any virus-infected cell from which high-quality total RNA can be extracted.

Materials

  • Cells of interest

  • Virus

  • DMEM/5% FBS (see recipe)

  • PBS

  • 10-cm dishes

  • TRIzol (Life Technologies, cat. no. 15596–026)

  • Chloroform (Sigma Aldrich, cat. no. 496189)

  • Ethanol (200 proof molecular grade)

  • Isopropanol

  • Nuclease-free H20

  • Glycogen blue [optional]

  • Desktop centrifuge prechilled to 4°C

  • Heat block set to 60°C

  • 37°C, 5% CO2 incubator

Harvesting total RNA
  • 1

    Prepare infected cell cultures according to Basic Protocol 1.

  • 2

    Remove media and gently wash the cells with 5 ml room-temperature PBS by holding the dish at an angle and slowly pipetting onto the side of the dish before lying flat. Remove the PBS by aspiration.

  • 3

    Add 8 ml TRIzol, and incubate for one minute. Use a disposable plastic pipette to gently draw and release the TRIzol in the dish in a manner that allows all cells to be lysed.

TRIzol is a monophasic solution of phenol, guanidine isothiocyanate, and other proprietary components (Chomczynski, 1993). Because it is such an effective RNase denaturant, TRIzol-containing cell lysates can be stored at room temperature for several hours without impacting RNA quality. If necessary samples can be transferred to a −80°C freezer for long-term storage (days/weeks) but upon thawing ensure the contents reach room temperature before proceeding with the extraction protocol.

  • 4

    Transfer the TRIzol to a 50 ml falcon tube and proceed to RNA extraction or store at −80°C.

Extracting total RNA
  • 1

    Add 1.6 ml chloroform to TRIzol and vortex vigorously for 15 seconds. Divide samples into eight 1.5 ml microcentrifuge tubes (1.2 ml per tube) and incubate for 3 mins at RT before centrifuging at 12,000 x g for 15 mins at 4°C.

If starting with frozen Trizol, allow to come to room temperature and then let sit for five minutes before adding chloroform.

  • 2

    Carefully remove the (upper) aqueous phase from each microfuge tube (avoid interface) into a single 15 ml falcon tube. Add 4 ml 100% Isopropanol, shake gently, and then split into eight microfuge tubes (approx. 1 ml per tube). Incubate at RT for 10 mins and then centrifuge at 12,000 x g for 10 minutes at 4°C.

The aqueous phase sits on top of the interface. Pipette carefully to avoid capturing any interface. To improve visibility of RNA pellet 1 μl of glycogen blue can be added to the isopropanol-aqueous phase mix without impacting downstream sequencing.

  • 3

    Carefully remove the supernatant, leaving only the pellet behind. Add 1 ml of 75% ethanol and vortex briefly to dislodge pellet. Centrifuge at 7,500 x g for 5 minutes at 4°C.

  • 4

    Remove the supernatant and air-dry the RNA pellet for 5–10 minutes (do not allow to dry completely). Resuspend each RNA pellet by adding 20 μl of RNAse free water to each microfuge tube and incubating on a heat block for 10 mins at 55°C. Combine eluted RNA samples into a single tube and determine the concentration (Qubit Flourometer or similar) and the quality (Agilent Bioanalyzer or Tapestation). Store eluted RNA at −80°C until needed.

Isolating the poly(A) fraction ~ 30 mins

This step allows the user to purify poly(A) RNA from the total RNA pool for use as the input material the standard direct RNA sequencing library preparation protocol.

Materials
  • 1.5 ml microfuge tubes (sterile and RNase-free)

  • 0.2 ml thin-walled PCR tubes (sterile and RNase-free)

  • Nuclease-free water

  • Dynabeads Poly(A) purification kit (Thermo Fisher Scientific, cat. No. 61006)

  • Neodymium magnet and tube rack (e.g., DynaMag-2, Thermo Fisher Scientific)

  • 1

    Adjust the volume of input total RNA (25 – 75 μg) to 100 μl using 10mM Tris-HCL (included in the purification kit) in a 0.2 ml tube.

  • 2

    Incubate in a preheated thermocycler (heated lid set at 105°C) at 65°C for 2 min to disrupt secondary structure, then place on ice for 1 min.

  • 3

    Vortex Dynabeads until homogenous and then transfer required amount (Table 1) to a 1.5 ml tube and place in a magnetic rack until beads have pelleted.

  • 4

    Discard supernatant, remove tube from magnet and add 100 μl binding buffer, mix by pipetting and then place in a magnetic rack until beads have pelleted

  • 5

    Discard supernatant, remove tube from magnet and add 100 μl binding buffer, mix by pipetting. The Dynabeads are now primed.

  • 6

    Add total RNA (100 μl) to the primed Dynabeads suspension (100 μl), mix thoroughly and incubate on a rotator for 5 mins at RT

  • 7

    Place tube in magnetic rack until solution is clear, remove and either discard or transfer supernatant (which now contains the non-poly(A) fraction.

  • 8

    Remove tube from magnet and wash with 200 μl Washing Buffer B by gentle pipetting.

  • 9

    Place back in magnetic rack and remove supernatant when clear.

  • 10

    Repeat step 9.

  • 11

    Ensure all wash buffer is removed and then elute poly(A) RNA by adding 10 μl of 10 mM Tris-HCL. Incubate in a preheated thermocycler (heated lid set at 105°C) at 80°C for 2 mins.

Table 1:

Volumes of Dynabeads required for poly(A) selection

Total RNA input Volume of Dynabeads
75 μg 200 μL
50 μg 166 μL
25 μg 133 μL
≤ 10 μg 100 μL

Note that depending on equipment available it may be necessary to perform the incubation step in a 0.2 ml tube and then transfer solution to a 1.5 ml tube prior to placing in the magnetic rack.

  • 12

    Immediately place tube in a magnetic rack and transfer supernatant to a fresh 0.2 ml tube. The purified poly(A) RNA is now ready for library preparation.

  • 13

    Quantify poly(A) RNA using a Qubit HS RNA assay (or similar) and check isolation efficiency using an RNA picochip (Agilent Bioanalyzer) or RNA chip (Agilent Tapestation).

Comparing traces for total RNA and poly(A) RNA should demonstrate that the 18S and 26S ribosomal peaks are absent in the poly(A) purified fraction. Typically, the poly(A) yield should be around 1 – 3% of total input RNA although this may be impacted by cell type and experimental conditions.

Preparing & sequencing direct RNA-Seq libraries

Sequencing libraries are prepared according to the standard SQK-RNA002 protocol developed by Oxford Nanopore Technologies. This can be readily accessed via the community section of the Nanopore website (https://nanoporetech.com/community). While no specific modifications to the protocol are required, we regularly reduce the run time to 24 hrs and subsequently wash and store the flow cell so they may be used for additional runs. Note that washing and storage protocols are also available through the Nanopore website.

BASIC PROTOCOL 3

PROCESSING, ALIGNMENT, AND ANALYSIS OF dRNA-SEQ DATASETS

This section guides the user through the basics of processing dRNA-Seq datasets, aligning them to a genome of choice, and parsing files into formats that are compatible with visualization. The general workflow is shown schematically in Fig. 3.

Figure 3. Computational workflow for rapid alignment and analysis of viral dRNA-Seq data.

Figure 3.

Following the acquisition of raw sequence data produced by MinKNOW during a nanopore run, Guppy is used to rebasecall data, trim the adapter sequence, reverse the orientation of the sequence read (from 3’->5’ to 5’->3’), and replace uracil bases with thymine bases. Processed sequence reads are aligned against a chosen reference genome sequence using MiniMap2 and (optionally) Illumina data are used in conjunction with FLAIR to perform splice-site correction. At this stage, aligned datasets may be visualized using standard tools such as IGV or GVIZ before subsequent processing to produce transcript abundance counts (using custom scripts) and/or to identify TSS and CPAS (using Homer).

Processing of sequencing data

The following protocols detail the computational steps required to (i) align dRNA-Seq datasets against a reference genome and visualize, (ii) generate proximal mappings of TSS and CPAS, and (iii) align dRNA-Seq datasets against a reference transcriptome and generate expression counts. Note that this requires working within a UNIX environment and the prior installation of the latest versions of the following software modules.

  • Guppy

  • MiniMap2

  • SAMtools

  • BEDtools

Basecalling and data aggregation

Basecalling allows the conversion of raw signal into individual bases. While this is performed by the GUPPY algorithm during sequencing, users may prefer to rerun basecalling using a later version of the GUPPY algorithm with additional parameters that trim the adapter sequence, reverses sequence reads (from 3’->5’ to 5’->3’), and replaces uracil (U) bases with thymine (T). Note that where multiple datasets are used in a study, basecalling should be performed using the same parameters and the same version of GUPPY.

  1. guppy_basecaller -i fast5_pass -s basecalledRun --flowcell FLO-MIN106 --kit SQK-RNA002 -r --trim_strategy rna --reverse_sequence true --u_substitution true -x auto

  2. cat basecalledRun/*fastq > basecalled.fastq

Alignment and Visualization of Sequencing Data

Here we align our FASTQ sequence datasets against a reference genome (e.g. HSV-1 strain 17, GenBank accession NC_001806.2) using a long read local aligner, miniMap2 (Li, 2018), and subsequently parse data to generate BEDGRAPH, BED6, and BED12 files that can be used to examine coverage and individual read mappings using GVIZ (Hahne & Ivanek, 2016) and IGV (Thorvaldsdóttir, Robinson, & Mesirov, 2013).

Align basecalled fastq data against a reference genome using optimized minimap2 flags for spliced dRNA-Seq alignment and convert to a sorted BAM file that contains only sequence reads aligned against the viral genome

  • 1

    minimap2 -ax splice -k14 -uf --secondary=no reference.fasta basecalled.fastq > out.sam

    samtools view –F4 -b -o out.bam out.sam

    samtools sort -o out.sorted.bam out.sam

    samtools index out.sorted.bam

Parse aligned data using SAMtools and BEDtools. Retain only reads with alignment flag 0 (primary mapping, forward strand of genome) and use these to generate a bedgraph coverage file. Note that both the out.sorted.forward.bam and out.forward.bedgraph files can be visualized using IGV.

  • 2

    samtools view -b -F2324 out.sorted.bam > out.sorted.forward.bam

    samtools index out.sorted.forward.bam

    samtools view -b out.sorted.forward.bam | genomeCoverageBed -ibam stdin -bg -split -g reference.fasta > out.forward.bedgraph

    bamToBed -bed12 -i out.sorted.forward.bam > out.sorted.forward.12.bed

Parse aligned data using SAMtools and BEDtools. Retain only reads with alignment flag 16 (primary mapping, reverse strand of genome) and use these to generate a bedgraph coverage file as above.

  • 3

    samtools view -b -f16 out.sorted.bam > out.sorted.reverse.bam

    samtools index out.sorted.reverse.bam

    samtools view -b out.sorted.reverse.bam | genomeCoverageBed -ibam stdin -bg -split -g reference.fasta > out.reverse.bedgraph

    bamToBed -bed12 -i out.sorted.reverse.bam > out.sorted.reverse.12.bed

Aligned sequence data in sorted BAM and/or BED12 formats can now be visually inspected using the Integrative Genomics Viewer (IGV). Note that BED12 files requires significantly less memory to view. Advanced users may prefer to visualize data using Gviz (Hahne & Ivanek, 2016), an R package that allows great flexibility in producing figures. A sample script and dataset for this can be found at: https://github.com/dandepledge/CurrentProtocols.

Proximal mapping of TSS and CPAS

This final step aims to identify transcription start sites (TSS) as well as cleavage and polyadenylation sites (CPAS). The first step in peak calling is to extract the genome positions at which the 5’ (TSS) and 3’ (CPAS) end of each sequence read aligns on a stranded basis, and then count the number of occurrences of each position. The following code will produce a BED-like 6-column file containing this output which will be used as input for peak-calling.

  • 1

    awk ‘$6 ~ /^+$/ {print $0}’ out.sorted.bed | cut -f2 | sort -n | uniq -c | sed ‘s/^ *//’ | sed ‘s/ /\t/g’ | awk ‘{print “name”,$2,$2+1,”+”,$1,”+”}’ | sed ‘s/ /\t/g’ > TSS.forward.faux.bed

  • 2

    awk ‘$6 ~ /^-$/ {print $0}’ out.sorted.bed | cut -f3 | sort -n | uniq -c | sed ‘s/^ *//’ | sed ‘s/ /\t/g’ | awk ‘{print “name”,$2–1,$2,”-”,$1,”-”}’ | sed ‘s/ /\t/g’ > TSS.reverse.faux.bed

  • 3

    awk ‘$6 ~ /^+$/ {print $0}’ out.sorted.bed | cut -f3 | sort -n | uniq -c | sed ‘s/^ *//’ | sed ‘s/ /\t/g’ | awk ‘{print “name”,$2,$2+1,”+”,$1,”+”}’ | sed ‘s/ /\t/g’ > CPAS.forward.faux.bed

  • 4

    awk ‘$6 ~ /^-$/ {print $0}’ out.sorted.bed | cut -f2 | sort -n | uniq -c | sed ‘s/^ *//’ | sed ‘s/ /\t/g’ | awk ‘{print “name”,$2–1,$2,”-”,$1,”-”}’ | sed ‘s/ /\t/g’ > CPAS.reverse.faux.bed

The next step uses HOMER (Heinz et al., 2010) to call TSS and CPAS peaks. First, we generate TAG directories which simply contain a parsed version of the data output from above. Then we use HOMER’s findPeaks module to call peaks against background. Note that -localSize and -size parameters are optimized for alphaherpesvirus gene structures and require tuning for other viral species.

  • 5

    makeTagDirectory forTSS/ TSS.forward.faux.bed -format bed -force5th

  • 6

    makeTagDirectory revTSS/ TSS.reverse.faux.bed -format bed -force5th

  • 7

    makeTagDirectory forCPAS/ CPAS.forward.faux.bed -format bed -force5th

  • 8

    makeTagDirectory revCPAS/ CPAS.reverse.faux.bed -format bed -force5th

  • 9

    findPeaks forTSS/ -o auto -style tss -localSize 100 -size 15

  • 10

    findPeaks revTSS/ -o auto -style tss -localSize 100 -size 15

  • 11

    findPeaks forCPAS/ -o auto -style tss -localSize 500 -size 50

  • 12

    findPeaks revCPAS/ -o auto -style tss -localSize 500 -size 50

Finally, we parse the HOMER output to extract all identified peaks into a standard 6-column BED file that can be view within IGV.

  • 13

    grep -v \# forTSS/tss.txt | awk ‘{ print $2”\t”$3”\t”$4”\t”$1”\t”$6”\t”$5 }’ > .15–100.for.TSS.bed

  • 14

    grep -v \# revTSS/tss.txt | awk ‘{ print $2”\t”$3”\t”$4”\t”$1”\t”$6”\t”$5 }’ > .15–100.rev.TSS.bed

  • 15

    grep -v \# forCPAS/tss.txt | awk ‘{ print $2”\t”$3”\t”$4”\t”$1”\t”$6”\t”$5 }’ > .50–500.for.CPAS.bed

  • 16

    grep -v \# revCPAS/tss.txt | awk ‘{ print $2”\t”$3”\t”$4”\t”$1”\t”$6”\t”$5 }’ > .50–500.rev.CPAS.bed

Optionally, it is also possible to parse the output to extract all identified peaks into a standard GVIZ (Hahne & Ivanek, 2016) compatible peak plotting file (Fig. 1).

  • 17

    cut -f2,3,4,6 forTSS/tss.txt | grep -v \# | awk ‘{if (NR!=1) {print}}’ > .15–100.for.TSS.R.txt

  • 18

    cut -f2,3,4,6 revTSS/tss.txt | grep -v \# | awk ‘{if (NR!=1) {print}}’ > .15–100.rev.TSS.R.txt

  • 19

    cut -f2,3,4,6 forCPAS/tss.txt | grep -v \# | awk ‘{if (NR!=1) {print}}’ > .50–500.for.CPAS.R.txt

  • 20

    cut -f2,3,4,6 revCPAS/tss.txt | grep -v \# | awk ‘{if (NR!=1) {print}}’ > .50–500.rev.CPAS.R.txt

Detection and correction of splice junctions

The comparatively high error rate in nanopore basecalling impacts on the precise mapping of splice donor and splice acceptor sites. We initially overcame this using an effective but time-consuming error correction protocol (Depledge et al., 2019) but have since switched to a rapid alternative that uses the ‘align’ and ‘correct’ modules within FLAIR (Tang et al., 2018). It is important to note however that splice site correction by either method relies on the availability of a stranded, poly(A) selected Illumina RNA-Seq dataset, ideally generated from the same source RNA as the nanopore run. There is thus an imperative to identify new computational approaches for splice-site correction that are not reliant on the generation of supporting datasets through other technologies.

Here we outline the use of FLAIR modules ‘align’ and ‘correct’ to enable precise mapping of splice donor and splice acceptor sites within each individual dRNA-Seq read. Note that we do not recommend utilizing the FLAIR ‘collapse’ module as this is not currently suitable for transcriptomes with many overlapping RNAs. As a first step we convert previously aligned sequence reads back into fastq format in a strand-specific manner. This significantly improves the performance of FLAIR in our hands.

  • 1

    bamToFastq -i out.sorted.forward.bam -fq forwardONLY.fq

  • 2

    bamToFastq -i out.sorted.reverse.bam -fq reverseONLY.fq

The next step requires aligning of these sequence reads against the reference genome using the FLAIR ‘align’ module. The –n parameter is used to specify that sequence read data are dRNA-Seq

  • 3

    python flair.py align -g reference.fasta -r forwardONLY.pass.fq -n -o for.flair.aligned -v1.3

  • 4

    python flair.py align -g reference.fasta -r reverseONLY.pass.fq -n -o rev.flair.aligned -v1.3

FLAIR correct requires the availability of a junction database generated from an Illumina dataset. This is generated as follows and requires a sorted BAM file generated from prior alignment of the Illumina dataset against the reference genome.

  • 5

    python junctions_from_sam.py -s illumina.mapping.sorted.bam -n junctionDB

While not required, we nonetheless recommend filtering of the junctionDB.txt file to remove low frequency junctions and to separate the junctionsDB.txt file into strand specific files.

  • 6

    awk ‘ $5 >= 100 ‘ junctionDB.bed > junctionDB.filtered.bed

  • 7

    awk ‘ $4 < 2 ‘ junctionDB.filtered.bed > junctionDB.filtered.reverse.bed

  • 8

    awk ‘ $4 != 1 ‘ junctionDB.filtered.bed > junctionDB.filtered.forward.bed

For the final step, junctionDB files are supplied to the FLAIR ‘align’ module.

  • 9

    python flair.py correct -n -q for.flair.aligned.bed -c reference.txt -g reference.fasta -j junctionDB.filtered.forward.bed -o corrected_for

  • 10

    python flair.py correct -n -q rev.flair.aligned.bed -c reference.txt -g reference.fasta -j junctionDB.filtered.reverse.bed -o corrected_rev

The output BED files (BED12 format) can subsequently be visualized using IGV and/or GVIZ and serve as a reference for splice junctions in annotation projects.

Generating transcript counts

A common goal of gene expression studies is to determine the relative abundance of different RNAs within a population. This is typically achieved by aligning sequence reads against a transcriptome (rather than genome) database. Such a database should comprise all mature RNA sequences known to be coded for in the genome of interest. Note however that poor quality and/or incomplete transcriptome annotations can significantly skew the resulting data – principally due to sequence reads originating from an unannotated RNA being incorrectly aligned against an annotated RNA.

Generating transcript counts requires first aligning the sequence read data against the transcriptome using minimap2. Note that the flags used differ from alignments against a reference genome (i.e. these flags are optimal for aligning against a transcriptome).

  • 1

    minimap2 -t 8 -ax map-ont -p 0.99 transcriptome.fasta basecalled.fastq > transcriptome-aligned.sam

As exemplified in Figure 1, the overlapping nature of many viral transcripts makes filtering of reads difficult. In the case of HSV-1 where many distinct RNAs share the same CPAS, we introduce a filtering step that retains only alignments that map within 50 nucleotides of the defined TSS for a given transcript (Fig. 4).

Figure 4. An HSV-1 specific strategy for generating transcript counts following transcriptome alignment.

Figure 4.

Sequence reads (red, green) are shown aligned against three overlapping RNAs (blue). Abundance counts are generated by retaining only reads (green) that map within 50 nucleotides of the TSS (dark blue region) of a given RNA. All remaining reads (red) can be discarded.

  • 2

    samtools view -h transcriptome-aligned.sam | awk ‘ ( $4 < 51 || $1 ~ /SQ/ ) ‘ | awk ‘ ( $6 !~ /H/ || $1 ~ /SQ/ ) ‘ | samtools view -F4 -b - > transcriptome-aligned-filtered.bam

As a final step, extracting and collapsing column 3 from the filtered bam file produces a count file showing the abundance of all RNAs to which individual sequence reads are aligned. Note however that this filtering step requires careful assessment as the filtering style is not optimal for all viruses.

REAGENTS AND SOLUTIONS

Unless otherwise noted, sterile and RNAse-free stock solutions of many reagents can be purchased from various suppliers.

DMEM/5% FBS 500 ml Dulbecco’s modified Eagle’s medium (DMEM), 25 ml (5% v/v) of fetal bovine serum (FBS), 5 ml 10,000 U/ml penicillin/10 mg/mL streptomycin, 5 ml 1 M HEPES; store in the dark at 4°C for up to 4 weeks.

DMEM/2.5% FBS 500 ml Dulbecco’s modified Eagle’s medium (DMEM), 12.5 ml (2.5% v/v) of fetal bovine serum (FBS), 5 ml 10,000 U/ml penicillin/10 mg/mL streptomycin, 5 ml 1 M HEPES; store in the dark at 4°C for up to 4 weeks.

Cryopreservation medium Add 10% v/v of dimethyl sulfoxide (DMSO) to fetal bovine serum, keep on ice or store at 4°C, and use within the same day.

Commentary

Background information

As discussed in the introduction, nanopore array sequencing requires the samples of nucleic acid (DNA or RNA) to be actively drawn through microscopic pores embedded in a membrane resulting in fluctuations in current that can be measured and used for base calling. Developed by Oxford Nanopore Technologies (ONT), several sequencing instruments are available including the ground-breaking minION™ portable sequencer. This mobile phone-sized instrument is powered through a USB connection to a laptop computer and accommodates a single consumable flow cell that contains the sensor array. In our experience, standard flow cells can be reused up to three times, with each cycle of use producing decreasing numbers of sequence reads.

A cost-effective strategy is to perform the most crucial runs on new flow cells with subsequent runs dedicated to less-critical studies such as protocol development and/or low throughput runs. Independent comparisons of poly(A) RNA from a human B lymphocyte cell line found that individual read-lengths ranged from 85 nucleotides to 21,000 nucleotides, with a median of 771 nucleotides (Workman et al., 2019). Users can also purchase single-use Flongle™ (flow cell dongle) adapters for use within the minION sequencer. This is a good option if you suspect that a limited number of reads will be sufficient for the analysis. This may be the case when the RNAs of interest are very abundant or when specific RNAs are being targeted, either through prior selection (i.e. enrichment for specific polyadenylated RNAs (Tan, Maurer-Stroh, Wan, Sessions, & de Sessions, 2019) or, if specific non-adenylated RNAs are to be captured, using a custom 3’ adapter (Keller et al., 2018). High throughput options that are better suited for the analysis of cellular transcriptomes or for the detection of less abundant transcripts exist in the form of the GridION (up to five flow cells) and PromethION (up to 48 flow cells) sequencers, which allow multiple samples to be sequenced in parallel but at a higher initial set up cost (Jain, Olsen, Paten, & Akeson, 2016).

Critical parameters

When is dRNA-seq the right approach?

At the outset of any transcriptomics project it is extremely important to consider the experimental goals, the nature of the samples available, and the available budget. Direct RNA-seq (dRNA-seq) is best suited to studies in which large quantities of high-quality RNA can be recovered, such as from cell culture infections rather than from primary tissues. In such cases, dRNA-seq is arguably the method of choice for comprehensive characterization of the most abundant transcripts, which is often the case in robust viral infections. dRNA-seq may not be such a good option to examine changes in cellular transcriptomes due to the fact that fewer reads are obtained for any given transcript isoform when compared with short-read sequencing. If profiling of the host transcriptome is important this might be achievable by sequencing multiple samples in parallel and/or using one of the high capacity sequencers but this adds significantly the costs (Soneson et al., 2019; Workman et al., 2019).

While generally limited to the profiling of the poly(A) fraction of RNA, individual non-adenylated RNAs (including certain viral RNA genomes) may be targeted using modified RTA adapters (Keller et al., 2018). At the time of writing, no established target enrichment strategies are available (i.e. for specific capture and sequencing of viral RNAs within a larger RNA population) although this is technology a number of groups are actively developing (Tan et al., 2019). Readers should also be aware that artificial A-tailing can be an effective strategy for capturing nascent or otherwise non-adenylated RNAs (Drexler, Choquet, & Churchman, 2019). Note however that these strategies often require the use of ribosomal RNA depletion strategies prior to A-tailing.

RNA quantity and quality

The extraordinary potential of dRNA-Seq is limited by both the amount and quality of the input RNA. The standard ONT dRNA-Seq protocol, while optimised for ~500 ng of input poly(A) RNA, is capable of producing high quality libraries using lower input amounts (> 10 ng) albeit with a corresponding reduction in the number of reads that can be obtained.

dRNA-Seq by nanopore array proceeds in a 3’ to 5’ direction, starting from the adapter ligated to the end of the poly(A) tail. High quality RNA is required to maximise the possibility that each sequence read represents a full-length mRNA. Users should thus ensure degradation is kept at an absolute minimum throughout the protocol by operating in a dust-free environment, by wearing disposable gloves and by using only sterile certified RNase-free disposable pipette tips and tubes, to avoid contamination with flakes of skin, body fluids, microbes and other sources of RNase. While treatment with an RNase-free DNase is not strictly required, this option should be considered for samples which contain high levels of DNA, possibly as the result of carryover of the white interphase layer or red organic layer during TRIzol extraction.

Loading flowcells

The introduction of bubbles or other contaminants can significantly impact the function of flowcells. Users should take extra care during flow cell priming to ensure no bubbles are introduced. Comprehensive written guides and videos are available via the Nanopore website.

Troubleshooting

Low yield (total number of reads)

Multiple factors can influence the total of number of reads obtained from a given sample. Starting with insufficient input RNA is a major factor but losses and degradation can also occur during poly(A) selection, library preparation and sequencing process itself. For example, introduction of bubbles into the flow cell or the overloading of library material can result in damaged or blocked pores that reduce the rate of sequencing.

Low yield (virus reads)

The proportion of viral RNAs present in a sample will strongly influence the quality of the data. For this reason, the experimental design should carefully consider the number of viral reads required for successful profiling and additional experiments may be necessary to confirm that a sufficient proportion of viral RNA is present in the samples. For instance, the relative abundance of specific viral and cellular RNAs can be determined by reverse-transcription qPCR (RT-qPCR) and compared to dilutions of known standards to obtain copy number estimates (Derveaux, Vandesompele, & Hellemans, 2010).

Degraded RNA

Degraded RNA can indicate improper technique or RNAse contamination of one or more buffers. The quality and integrity of total RNA in the samples can be assessed using RNA quantification kits on a Tapestation, Bioanalyzer (both Agilent Technologies), or a Qubit fluorometer (Invitrogen). Where possible, users should routinely clean equipment such as pipettors or centrifuge buckets, to minimize potential exposure of samples to RNAses, use certified RNase-free consumables and ensure that fresh stock solutions are used.

Understanding Results

A typical minION run should yield between one to two million sequence reads over the course of the first 24 hours. Around 90% of these reads should pass the internal quality control filters during basecalling using the Guppy data processing toolkit and should range in length from about 200 bases to 10,000 bases.

Time Considerations

The culturing and collection of virus-infected cells typically takes three to four days but is dependent the length of infection required and whether additional experimental factors (e.g. the use of siRNAs) are to be incorporated into the experiment. Once RNA has been collected in TRIzol, the remainder of the sequencing protocol can be completed in less than one day. RNA extraction and isolation of the poly(A) fraction can be achieved in under three hours, while preparing the dRNA-Seq libraries usually takes another 2 to 3 hours. The sequencing reaction can be run for up to 72 hours but in our hands usually tails off after 24 hours. Data processing and visualization can be achieved with a few hours but this is dependent on access to high performance computing.

Advanced Applications

Aside from allowing for the comprehensive annotation of viral RNA isoforms, dRNA-seq also offers exciting opportunities for the analysis of variation in polyA tail length and base modifications, including but not limited to the addition of N6-methyladenosine (m6A). Users may opt to generate per-read estimates of poly(A) tail lengths using tools such as Nanopolish (https://github.com/jts/nanopolish) or tailfindr (Krause et al., 2019) that operate on fast5 files generated during the sequencing run. Recent studies have established new methods to identify modified bases through comparative analysis of dRNA-Seq datasets such as the comparison of control (m6A present) and methyltransferase knockdown (m6A-depleted) datasets (Leger et al., 2019; Liu et al., 2019; Price, Hayer, McIntyre, et al., 2019). These examples of post-transcriptional processing are likely to confer more nuanced levels of gene regulation but until recently have been very difficult to address experimentally and for most viruses remain largely unexplored.

Acknowledgements

We extend special thanks to Ian Mohr (New York University School of Medicine) for support of DPD in part through National Institutes of Health (NIH) grants R01-AI073898 and R01-GM056927. This work was supported through NIH grants R21-AI130618, R21-AI147163 and R03AI151358.

Footnotes

Conflict of Interest

The authors declare no conflicts of interest.

Internet Resources with Annotations

https://nanoporetech.com/community

Extensive collection of step-by-step protocols, software, and guidance on troubleshooting.

https://github.com/dandepledge/CurrentProtocols

Data files and an R script that allows visualization of processed dRNA-Seq datasets using Gviz.

Literature cited

  1. Akhtar LN, Bowen CD, Renner DW, Pandey U, Della Fera AN, Kimberlin DW, … Szpara ML (2019). Genotypic and Phenotypic Diversity of Herpes Simplex Virus 2 within the Infected Neonatal Population. MSphere, 4(1). 10.1128/msphere.00590-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Arias C, Weisburd B, Stern-Ginossar N, Mercier A, Madrid AS, Bellare P, … Ganem D (2014). KSHV 2.0: A Comprehensive Annotation of the Kaposi’s Sarcoma-Associated Herpesvirus Genome Using Next-Generation Sequencing Reveals Novel Genomic and Functional Features. PLoS Pathogens, 10(1). 10.1371/journal.ppat.1003847 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Blaho JA, Morton ER, & Yedowitz JC (2006). Herpes Simplex Virus: Propagation, Quantification, and Storage. In Current Protocols in Microbiology (pp. 14E.1.1–14E.1.23). 10.1002/9780471729259.mc14e01s00 [DOI] [PubMed] [Google Scholar]
  4. Boldogkői Z, Moldován N, Balázs Z, Snyder M, & Tombácz D (2019). Long-Read Sequencing – A Powerful Tool in Viral Transcriptome Research. Trends in Microbiology. 10.1016/j.tim.2019.01.010 [DOI] [PubMed] [Google Scholar]
  5. Chomczynski P (1993). A reagent for the single-step simultaneous isolation of RNA, DNA and proteins from cell and tissue samples. BioTechniques, 15(3), 532–537. [PubMed] [Google Scholar]
  6. Concha M, Wang X, Cao S, Baddoo M, Fewell C, Lin Z, … Flemington EK (2012). Identification of New Viral Genes and Transcript Isoforms during Epstein-Barr Virus Reactivation using RNA-Seq. Journal of Virology, 86(3), 1458–1467. 10.1128/jvi.06537-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Depledge DP, Kundu S, Jensen NJ, Gray ER, Jones M, Steinberg S, … Breuer J (2014). Deep sequencing of viral genomes provides insight into the evolution and pathogenesis of varicella zoster virus and its vaccine in humans. Molecular Biology and Evolution, 31(2), 397–409. 10.1093/molbev/mst210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Depledge DP, Mohr I, & Wilson AC (2018). Going the Distance: Optimizing RNA-Seq Strategies for Transcriptomic Analysis of Complex Viral Genomes. Journal of Virology, 93(1). 10.1128/jvi.01342-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Depledge DP, Srinivas KP, Sadaoka T, Bready D, Mori Y, Placantonakis DG, … Wilson AC (2019). Direct RNA sequencing on nanopore arrays redefines the transcriptional complexity of a viral pathogen. Nature Communications, 10(1). 10.1038/s41467-019-08734-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Derveaux S, Vandesompele J, & Hellemans J (2010). How to do successful gene expression analysis using real-time PCR. Methods, 50(4), 227–230. 10.1016/j.ymeth.2009.11.001 [DOI] [PubMed] [Google Scholar]
  11. Drexler HL, Choquet K, & Churchman LS (2019). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell. 10.1016/j.molcel.2019.11.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Finkel Y, Schmiedel D, Tai-Schmiedel J, Nachshon A, Winkler R, Dobesova M, … Stern-Ginossar N (2020). Comprehensive annotations of human herpesvirus 6A and 6B genomes reveal novel and conserved genomic features. ELife, 9 10.7554/eLife.50960 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, … Turner DJ (2018). Highly parallel direct RNA sequencing on an array of nanopores. Nature Methods, 15(3), 201–206. 10.1038/nmeth.4577 [DOI] [PubMed] [Google Scholar]
  14. Greninger AL, Knudsen GM, Roychoudhury P, Hanson DJ, Sedlak RH, Xie H, … Jerome KR (2018). Comparative genomic, transcriptomic, and proteomic reannotation of human herpesvirus 6. BMC Genomics, 19(1). 10.1186/s12864-018-4604-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hahne F, & Ivanek R (2016). Visualizing genomic data using Gviz and bioconductor. In Methods in Molecular Biology (Vol. 1418, pp. 335–351). 10.1007/978-1-4939-3578-9_16 [DOI] [PubMed] [Google Scholar]
  16. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, … Glass CK (2010). Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Molecular Cell, 38(4), 576–589. 10.1016/j.molcel.2010.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Helm M, & Motorin Y (2017). Detecting RNA modifications in the epitranscriptome: Predict and validate. Nature Reviews Genetics, 18(5), 275–291. 10.1038/nrg.2016.169 [DOI] [PubMed] [Google Scholar]
  18. Houseley J, & Tollervey D (2010). Apparent non-canonical trans-splicing is generated by reverse transcriptase in vitro. PLoS ONE, 5(8). 10.1371/journal.pone.0012271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jain M, Olsen HE, Paten B, & Akeson M (2016). The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biology, 17(1). 10.1186/s13059-016-1103-0l [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jan CH, Friedman RC, Ruby JG, & Bartel DP (2011). Formation, regulation and evolution of Caenorhabditis elegans 3’UTRs. Nature, 469(7328), 97–103. 10.1038/nature09616 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Keller MW, Rambo-Martin BL, Wilson MM, Ridenour CA, Shepard SS, Stark TJ, … Barnes JR (2018). Direct RNA Sequencing of the Coding Complete Influenza A Virus Genome. Scientific Reports, 8(1). 10.1038/s41598-018-32615-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Krause M, Niazi AM, Labun K, Torres Cleuren YN, Müller FS, & Valen E (2019). TailFindR: Alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing. RNA, 25(10), 1229–1241. 10.1261/rna.071332.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Leger A, Amaral PP, Pandolfini L, Capitanchik C, Capraro F, Barbieri I, … Kouzarides T (2019). RNA modifications detection by comparative Nanopore direct RNA sequencing. BioRxiv, 843136 10.1101/843136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Li H (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094–3100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Liu H, Begik O, Lucas MC, Ramirez JM, Mason CE, Wiener D, … Novoa EM (2019). Accurate detection of m6A RNA modifications in native RNA sequences. Nature Communications, 10(1). 10.1038/s41467-019-11713-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Loiseau V, Herniou EA, Moreau Y, Lévêque N, Meignin C, Daeffler L, … Gilbert C (2020). Wide spectrum and high frequency of genomic structural variation, including transposable elements, in large double-stranded DNA viruses. Virus Evolution, 6(1). 10.1093/ve/vez060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mahmoudabadi G, & Phillips R (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7 10.7554/eLife.31955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. O’Grady T, Feswick A, Hoffman BA, Wang Y, Medina EM, Kara M, … Tibbetts SA (2019). Genome-wide Transcript Structure Resolution Reveals Abundant Alternate Isoform Usage from Murine Gammaherpesvirus 68. Cell Reports, 27(13), 3988–4002.e5. 10.1016/j.celrep.2019.05.086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. O’Grady T, Wang X, Höner Zu Bentrup K, Baddoo M, Concha M, & Flemington EK (2016). Global transcript structure resolution of high gene density genomes through multi-platform data integration. Nucleic Acids Research, 44(18). 10.1093/nar/gkw629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Price AM, Hayer KE, Depledge DP, Wilson AC, & Weitzman MD (2019). Novel splicing and open reading frames revealed by long-read direct RNA sequencing of adenovirus transcripts. BioRxiv, 2019.12.13.876037. 10.1101/2019.12.13.876037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Price AM, Hayer KE, McIntyre ABR, Gokhale NS, Della Fera AN, Mason CE, … Weitzman MD (2019). Direct RNA sequencing reveals m6A modifications on adenovirus RNA are necessary for efficient splicing. BioRxiv, 865485 10.1101/865485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Prusty BK, & Whisnant AW (2020). Revisiting the genomes of herpesviruses. ELife, 9 10.7554/eLife.54037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Rang FJ, Kloosterman WP, & de Ridder J (2018). From squiggle to basepair: Computational approaches for improving nanopore sequencing read accuracy. Genome Biology. 10.1186/s13059-018-1462-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Schirmer M, Ijaz UZ, D’Amore R, Hall N, Sloan WT, & Quince C (2015). Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Research, 43(6). 10.1093/nar/gku1341 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Soneson C, Yao Y, Bratus-Neuenschwander A, Patrignani A, Robinson MD, & Hussain S (2019). A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nature Communications, 10(1). 10.1038/s41467-019-11272-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Steijger T, Abril JF, Engström PG, Kokocinski F, Akerman M, Alioto T, … Zhang MQ (2013). Assessment of transcript reconstruction methods for RNA-seq. Nature Methods, 10(12), 1177–1184. 10.1038/nmeth.2714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Tan CCS, Maurer-Stroh S, Wan Y, Sessions OM, & de Sessions PF (2019). A novel method for the capture-based purification of whole viral native RNA genomes. AMB Express, 9(1). 10.1186/s13568-019-0772-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Tang AD, Soulette CM, Baren MJ, van, Hart K, Hrabeta-Robinson E, Wu CJ, & Brooks AN (2018). Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. BioRxiv, 410183 10.1101/410183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Thorvaldsdóttir H, Robinson JT, & Mesirov JP (2013). Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Briefings in Bioinformatics, 14(2), 178–192. 10.1093/bib/bbs017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Tombácz D, Prazsák I, Szucs A, Dénes B, Snyder M, & Boldogkoi Z (2018). Dynamic transcriptome profiling dataset of vaccinia virus obtained from long-read sequencing techniques. GigaScience, 7(12). 10.1093/gigascience/giy139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Whisnant AW, Jürges CS, Hennig T, Wyler E, Prusty B, Rutkowski AJ, … Dölken L (2019). Integrative functional genomics decodes herpes simplex virus 1. BioRxiv, 603654 10.1101/603654 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Whitley RJ, & Roizman B (2001). Herpes simplex virus infections. Lancet, 357(9267), 1513–1518. 10.1016/S0140-6736(00)04638-9 [DOI] [PubMed] [Google Scholar]
  43. Workman RE, Tang AD, Tang PS, Jain M, Tyson JR, Razaghi R, … Timp W (2019). Nanopore native RNA sequencing of a human poly(A) transcriptome. Nature Methods, 16(12), 1297–1305. 10.1038/s41592-019-0617-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

Key References

  1. Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, Pantic N, Admassu T, James P, Warland A, Jordan M, Ciccone J, Serra S, Keenan J, Martin S, McNeill L, Wallace EJ, Jayasinghe L, Wright C, … Turner DJ (2018). Highly parallel direct RNA sequencing on an array of nanopores. Nature Methods, 15(3), 201–206. 10.1038/nmeth.4577 [DOI] [PubMed] [Google Scholar]
  2. First application of dRNA-seq using nanopore arrays. Provides useful discussion of the strengths and limitations of the technique and highlights the potential for detection of modified bases.
  3. Depledge DP, Mohr I, & Wilson AC (2018). Going the Distance: Optimizing RNA-Seq Strategies for Transcriptomic Analysis of Complex Viral Genomes. Journal of Virology, 93(1). 10.1128/jvi.01342-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. A short review discussing the utility of the major RNA-seq methodologies to the analysis of viral transcriptomes.

RESOURCES