Abstract
Nanopore long-read sequencing enables real-time monitoring and controlling of individual nanopores. This allows us to enrich or deplete specific sequences in DNA sequencing in a process called “adaptive sampling.” So far, adaptive sampling (AS) was not applicable to the direct sequencing of RNA. Here, we show that AS is feasible and useful for direct RNA sequencing (DRS), which has its specific technical and biological challenges. Using a well-controlled in vitro transcript-based model system, we identify essential characteristics and parameter settings for AS in DRS, as the superior performance of depletion over enrichment. Here, the efficiency of depletion is close to the theoretical maximum. Additionally, we demonstrate that AS efficiently depletes specific transcripts in transcriptome-wide sequencing applications. Specifically, we applied our AS approach to poly(A)-enriched RNA samples from human-induced pluripotent stem cell–derived cardiomyocytes and mouse whole heart tissue and show efficient 2.5- to 2.8-fold depletion of highly abundant mitochondrial-encoded transcripts. Finally, we characterize depletion and enrichment performance for complex transcriptome subsets, that is, at the level of the entire Chromosome 11, proving the general applicability of direct RNA AS. Our analyses provide evidence that AS is especially useful to enable the detection of lowly expressed transcripts and reduce the sequencing of highly abundant disturbing transcripts.
Keywords: direct RNA-seq, enrichment, depletion, mitochondria, heart
INTRODUCTION
Adaptive sampling (AS) describes an add-on technique of the single molecule sequencing technology introduced by Oxford Nanopore Technologies (ONT) (Oxford Nanopore Technologies 2020) to select specific reads during sequencing. The combination of real-time basecalling and voltage reversal at specific pores allows the rejection of molecules that are not of interest. The general approach has been conceptualized by ONT, but the underlying “Read Until” scripts were implemented by Loose et al. (2016). These initial scripts used “squiggle” data and a dynamic time warping algorithm (Loose et al. 2016). GPU basecalling nowadays enables AS based on short “chunks” of basecalled nucleotide sequences (Payne et al. 2021). By this approach, individual read sampling strategies can be implemented based on either a black list (depletion mode) or white list (enrichment mode) provided as a simple FASTA or BED file (Fig. 1A). Briefly, RNA molecules are captured for sequencing and the produced data sent as small chunks in real time for basecalling with Guppy and subsequent alignment to a provided reference file (FASTA/BED) (Fig. 1A, left panel). If the size of the chunk is not sufficient, more data are collected (Fig. 1A, middle panel). As soon as the “Read Until” script identifies or excludes a reference match, the respective RNA strand is either accepted (completely sequenced) or rejected (voltage reversal results in ejection from pore) (Fig. 1A, right panel).
FIGURE 1.
Setup of direct RNA-seq adaptive sampling (DRAS) using an in vitro transcripts (IVTs) model system. (A) Schematic representation of the AS principle. RNA strands ligated to adapters with motor protein are captured and sequencing is initiated. Small data chunks are sent for basecalling and alignment to a provided FASTA or BED file (left panel). If these data chunks are too small (<200 nt) for a reliable classification or mapping to multiple sites, more data chunks are collected until enough data for a decision are available (middle panel). In the enrichment case (vice versa for the depletion case), a reference match results in acceptance of the read, while no reference match results in rejection of the read (right panel). (B) Schematic representation of the classification of reads in AS. During the sequencing run, two files are generated that can be used to identify accepted and rejected reads, “adaptive sampling” and “sequencing summary.” The “adaptive sampling” file provides information on the decision that was made on individual sequencing chunks. Here, “stop_receiving” corresponds to accepted reads, whereas “unblock” represents rejected reads. Very short chunks cannot be classified and are categorized as “no_decision.” The “sequencing summary” file provides information on the end reason for every read, which is classified as: “signal_positive” (read passed pore completely), “data_service_unblock_mux_change” (read was rejected during adaptive sampling), “signal_negative” (current delta of 80 pA was observed), and “unblock_mux_change” (strand blocked pore and was rejected). (C–G) Equimolar mixtures of IVT1 (1869 nt) and IVT2 (1452 nt) were sequenced on Flongle flow cells in normal sequencing (NS) mode (C), AS with enrichment of IVT2 (D,E), or AS with depletion of IVT1 (F,G). The obtained reads were aligned to the reference sequences and split into rejected (D,F) and accepted reads (E,G) based on the sequencing summary.
In parallel, several other approaches have been implemented and published to enable Nanopore DNA adaptive sampling. For example, UNCALLED uses raw signal (Kovaka et al. 2021), whereas RUBRIC also uses sequencing chunks to filter out unwanted reads (Edwards et al. 2019). Furthermore, BOSS-RUNS allows dynamic decisions during sequencing in real time (Weilguny et al. 2023).
Currently, directed RNA-sequencing is mainly possible by RNA CaptureSeq (Mercer et al. 2011a), which relies on tiling arrays to enrich for specific cDNA species and can be also adapted to PacBio long-read sequencing (Lagarde et al. 2017). To the best of our knowledge and despite its relevance in long-read direct RNA-sequencing (DRS) experiments, the feasibility of AS, and more specifically the Read Until scripts implemented in the MinKNOW GUI, to enrich or deplete specific transcripts for DRS has not been shown yet. This is especially of interest as DRS experiments are usually limited by read number, also compared to short-read sequencing approaches. So far, only RISER uses the “Read Until” API to classify DRS reads in real time as “coding” or “noncoding” (Sneddon et al. 2022). Apart from the technical differences (lower sequencing speed [70 vs. 240/400 nt per second at the time of writing] and throughput, [faster loss of active pores]), several fundamental differences between DRS and sequencing of, e.g., genomic DNA need to be considered. First, RNA, or more specifically mRNA molecules, are usually much shorter than fragments in long-read DNA sequencing. Consequently, the time for computational analysis of the provided sequencing chunks is critical and needs to be short to prevent the sequencing of full molecules before a decision is made. Furthermore, in highly complex mammalian transcriptomes, the abundance of RNA species may vary over several orders of magnitude, whereas usually two alleles of each gene are present for DNA sequencing.
Taking advantage of a simple model system composed of two IVTs, we derive general parameters and inherent characteristics of DRAS. We show that depletion outcompetes enrichment in terms of efficiency and specificity.
We then applied DRAS to two examples on the mammalian transcriptome and demonstrate the feasibility of the method in these real-life approaches. First, we used AS to specifically deplete mitochondrial-derived transcripts (mt-RNA) in DRS runs of mouse heart tissue and human-induced pluripotent stem cell–derived cardiomyocytes (hiPSC-CM). In libraries from cardiac samples prepared according to standard protocols, up to 30%–40% of all reads originate from mt-RNAs (Mercer et al. 2011b; Yang et al. 2014). These reads are mainly composed of mt-mRNAs and mt-rRNAs that are polyadenylated as well (Slomovic et al. 2005). We show that, despite their short length, DRAS depletes mt-RNAs efficiently and improves the detection of transcripts derived from nuclear chromosomes, both for mouse and human samples.
In the second use case, we demonstrate the depletion and enrichment of transcripts originating from a single nuclear chromosome (Chromosome 11). Despite the higher efficiency and specificity of DRAS depletion, we also demonstrate the useful capacity to enrich specific, mainly lowly expressed transcripts by DRAS. Importantly, DRAS does not lead to a faster deterioration of active pores, as occasionally observed for sequencing of DNA (Oxford Nanopore Technologies 2022).
In summary, we present here a simple method using the MinKNOW GUI implemented Read Until scripts to deplete unwanted transcripts from biological and clinical samples with limited availability without prior purification that may be associated with sample loss. Additionally, we provide evidence for capturing transcripts or transcripts of special interest by AS.
RESULTS
Evaluation of adaptive sampling for direct RNA-seq
Despite the well-supported application of AS to Nanopore sequencing of DNA, to our knowledge it has not been established for direct sequencing of RNA. Both the technical as well as inherent biological differences (see Introduction) do not allow a simple transfer of the knowledge derived from AS in DNA sequencing. Consequently, we first investigated the general usability of AS depletion and enrichment for DRS. In depletion mode, voltage reversal results in the rejection of reads that map to the provided blacklist FASTA/BED file, whereas all other reads are sequenced completely. On the other hand, in the enrichment mode, reads that do not map to the provided whitelist FASTA/BED file are rejected. Reads that map to this whitelist are kept and sequenced completely (Fig. 1A).
The outcome of AS (Fig. 1A) can be assessed by two different provided data types that are produced by the MinKNOW/Read Until APIs (Fig. 1B): the “adaptive sampling” file provides decision information on every sequenced chunk (small read segment). Chunks that are classified as “stop_receiving” are accepted and the read is sequenced completely, whereas “unblock” results in rejection of the read. Chunks that are too short for classification are categorized as “no_decision” (Fig. 1B, left panel). Besides this, the AS outcome can be derived from the “sequencing summary” that provides the end reason for every read (Fig. 1B, right panel). Reads that are rejected in the AS process are labeled as “data_service_unblock_mux_change,” whereas reads that are completely sequenced without intervention are classified as “signal_positive.” These may be either “no_decision” or accepted (“stop_receiving”) reads. Consequently, both labels, “unblock” and “data_service_unblock_mux_change,” can be used to identify rejected reads.
We determined the general properties and characteristics of DRAS using a simple model system of two defined IVTs (Fig. 1C–G; Supplemental Fig. 1).
Equimolar amounts of IVT1 (1869 nt) and IVT2 (1452 nt), which were derived from human ribosomal RNAs, were sequenced on Flongle flow cells either in NS mode (Fig. 1C), AS mode with enrichment of IVT2 (Fig. 1D,E) or depletion of IVT1 (Fig. 1F,G), and analyzed by the end reason provided in the sequencing summary. As expected, the NS setup generates mostly “signal_positive” reads (Supplemental Fig. 1A). For the enrichment of IVT2, 60.4% of reads correspond to reads rejected by “data_service_unblock_mux_change” (Supplemental Fig. 1B), whereas in depletion mode, 39.6% of reads were rejected (Supplemental Fig. 1C). The majority of these reads are shorter than 500 nt; however, especially in depletion mode, longer reads are also detected (Supplemental Fig. 1D).
The alignment of reads to the IVT1 and IVT2 reference sequences revealed the expected peaks of the full-length transcripts in NS mode (Fig. 1C). The aligned data of the AS approaches were split into accepted and rejected reads according to the sequencing summary. Here, full-length IVT1 was not detectable either upon enrichment of IVT2 (Fig. 1D,E) or depletion of IVT1 (Fig. 1F,G). The majority of the IVT1 reads correspond to the reads rejected by AS (Fig. 1D,F). However, we noticed also a substantial fraction of IVT2 reads below 500 nt that are wrongly rejected in enrichment mode (Fig. 1D). This coincides with a larger fraction of erroneously accepted IVT1 reads in enrichment (Fig. 1E) compared to depletion (Fig. 1G). Based on the number of sequenced bases per transcript, we calculated an enrichment factor (Table 1; Supplemental Information) of 1.35 for the enrichment mode and 1.75 for the depletion mode, revealing a higher efficiency of depletion in the context of direct RNA-seq experiments. The enrichment factor of 1.75 is close to the theoretically possible enrichment factor of 1.86, as calculated according to Martin et al. (2022) (Supplemental Information).
TABLE 1.
The number of bases mapping to IVT2 as well as the enrichment factor were calculated as described in the Supplemental Information
Furthermore, for the presented IVT model the error rates (Table 1) revealed a higher specificity and efficiency of the depletion mode, with no wrongly rejected reads and 32% wrongly accepted reads, compared to 42% wrongly accepted and 28% wrongly rejected reads in the enrichment mode.
Depletion of mitochondrial transcripts from mouse heart
Based on the observation of more efficient depletion in DRAS, we next aimed to deplete mitochondrial-encoded mRNAs and rRNAs (mt-RNAs) as a first use case. mt-RNAs that are highly expressed in heart tissue (Mercer et al. 2011b; Yang et al. 2014) can disturb the detection of cytosolic transcripts due to the limited sequencing capacity available in DRS experiments. We recently introduced a method for the experimental depletion of mitochondrial transcripts by RNase H cleavage, mt-clipping (Naarmann-de Vries et al. 2022). This method has the advantage that sequencing of mt-RNAs is completely prevented (Naarmann-de Vries et al. 2022). However, one caveat of this approach is the potential loss of material during clipping and subsequent purification. This is in contrast to the DRAS approach where short stretches of every captured transcript are sequenced, but quickly rejected. On the other hand, the prepurification of the sample is unnecessary. Thus, while one may be hesitant to apply mt-clipping to samples with limited availability, DRAS may represent an alternative strategy here.
Herein, we applied DRAS to RNA derived from mouse heart tissue (Fig. 2; Supplemental Fig. 2) as well as from hiPSC-CM (Supplemental Figs. 3–5). The depletion of mt-RNAs was conducted on MinION flow cells, from which 50% of pores were used for NS and 50% for AS (Fig. 2A, upper panel). The specificity and efficiency of AS were analyzed based on the decision derived from the AS summary (Figs. 1B, 2A, lower panel). About 90% of the reads mapping to the mitochondrial chromosome (MT) were rejected (“unblock”), whereas for most other chromosomes only minor off-target effects were observed (Fig. 2B). One notable exception is Chromosome 1, which encodes a high number of mitochondrial pseudogenes in mouse (Bensasson et al. 2001). These results are in line with the “end reason” obtained from the sequencing summary (Supplemental Figure S2E). As for the IVT model, AS rejected reads have a median read length of only 320–330 nt (Supplemental Figure 2A,B). The enrichment factor per sequenced bases revealed a strong depletion of mitochondrial reads and a corresponding enrichment of the nuclear chromosome–derived reads (Fig. 2C). Compared to NS, the number of sequenced bases mapping to the MT was drastically reduced in AS from 32.9% to 12.2% (Fig. 2D), and the sequenced reads were reduced as well (Supplemental Fig. 2F). As expected, the rejected mitochondrial reads are significantly shorter (Fig. 2E,F, a t-test AS vs. NS comparison of mt-RNA read lengths yielded a P-value of P < 2.225074 × 10−308 for all reads and t-value statistics of Mean = −146.32 and SD = 88.77). Importantly, no mt-RNAs were wrongly accepted (Fig. 2B,F). In total, 99% of rejected reads mapped to the MT and Chromosome 1 (Supplemental Table 1). To exclude a potential detrimental effect on sequencing output due to faster deterioration of pores, we analyzed the pore health separately for NS and AS and found no differences between sequencing conditions (Supplemental Fig. 2C,D). Strikingly, in AS mode we identified several transcripts that were not detected in NS as the chemokine Ccl7 (Fig. 2G; Supplemental Table 2), as well as genes with significantly more or longer reads (Supplemental Table 3), as Rbm6 (Fig. 2H).
FIGURE 2.
Depletion of mt-RNAs in direct RNA-seq of mouse heart tissue samples. (A) Schematic representation of the experimental setup. (Upper panel) Libraries were sequenced on a MinION flow cell, with 50% percent of channels in NS mode and 50% of channels in AS mode. (Lower panel) Reads were split prior to further analysis according to the AS decision. (B) Stack plot of reads per chromosome split according to the AS decision. (C) The Enrichment factor (per sequenced bases) for every chromosome was calculated for two biological replicates from reads split into “normal sequencing” and “total reads from adaptive sampling.” (D) Percent of sequenced bases mapping to the mitochondrial and nuclear chromosomes in AS and NS. (E) Length of reads mapping to the individual mitochondrial transcripts in AS and NS. (F) Normalized coverage (autoscale group to [0–18,974]) of reads mapped to the mitochondrial chromosome (chrM) in the individual samples as indicated. (G,H) Normalized coverage (autoscale group to [0–10]) of Ccl7 (G) and Rbm6 (H) in NS and AS as indicated.
We applied the depletion of mitochondrial transcripts by DRAS additionally to human iPSC-CM that were also used in our previous RNase H-based mt-clipping protocol (Naarmann-de Vries et al. 2022). General characteristics as read length and pore health were highly comparable to the mouse experiment (Supplemental Fig. 3A–D). In this case, about 75% of the mitochondrial reads were targeted by DRAS. Off-target effects for mitochondrial pseudogenes were not observed in the human samples (Supplemental Fig. 3E,F), and 98% of rejected reads mapped to the MT (Supplemental Table 4). In hiPSC-CM the number of sequenced bases mapping to the MT is reduced 2.8-fold from 7.1% to 2.5% by AS (Supplemental Fig. 4C,D). The depletion of mt-RNAs (Supplemental Fig. 4A,B,E) is accompanied by an increased number of reads mapping to the nuclear chromosomes (Supplemental Fig. 4A,C,D). Similar to the mouse samples, almost no full-length transcripts of mt-RNAs were detected in the AS reads (a t-test AS vs. NS comparison of mt-RNA read lengths yielded a P-value of P < 2.225074 × 10−308 for all reads and t-value statistics of Mean = −48.27 and SD = 33.50) (Supplemental Fig. 4B,E). Among others, PLD2 was only detectable in AS (Supplemental Fig. 4F) and EHBP1L at significantly higher levels (Supplemental Fig. 4G; Supplemental Tables 5, 6).
To compare the mt-RNA DRAS depletion with the experimental depletion by mt-clipping (Naarmann-de Vries et al. 2022), we analyzed the detection of transcripts derived from genes implicated in the pathogenesis of dilated cardiomyopathy (DCM) (Jordan et al. 2021). Similar to mt-clipping, the obtained number of reads per gene is increased for most, but not all DCM related genes in AS over NS in two replicates (Supplemental Fig. 5A), as illustrated for MYH6 (Supplemental Fig. 5B; Carniel et al. 2005) and PLN (Supplemental Fig. 5C; van der Zwaag et al. 2012).
In summary, depletion of highly abundant mitochondrial transcripts (e.g., in cardiac samples) improves the detection of cytoplasmic mRNAs without the requirement of additional experimental steps that may be associated with material loss. Consequently, it can increase the output derived from samples with limited availability.
Enrichment and depletion of all transcripts derived from a nuclear chromosome
Mitochondrial-derived transcripts are very different from nuclear-encoded genes due to the absence of intronic sequences and their short length. To generalize the DRAS approach, we targeted human Chromosome 11 as a test case. We chose Chromosome 11 as the expected read length is here relatively close to the mean for cardiomyocyte-expressed genes (see Supplemental Information). Reads derived from Chromosome 11 were either depleted or enriched using the genomic sequence of Chromosome 11 as reference (Figs. 3, 4; Supplemental Figs. 6–11). In all experiments, we used 50% of the available pores for AS, whereas the other 50% were run in NS mode (Fig. 2A). In line with the IVT model and depletion of mitochondrial-encoded transcripts, AS had no negative effect on pore health (Supplemental Fig. 7).
FIGURE 3.
Depletion and enrichment on chromosome scale from hiPSC-CM. Fifty percent of pores on a MinION flow cell were used for NS and 50% for AS. Reads mapping to Chromosome 11 were subjected to AS in depletion, enrichment or reverse enrichment (depletion of all chromosomes except for Chromosome 11) mode. (A–C) Stack plot of reads per chromosome split according to the AS decision for depletion (A), enrichment (B), and reverse enrichment (C) of Chromosome 11. (D) Enrichment factor per sequenced bases for every chromosome calculated for the three AS modes from reads split into “normal sequencing” and “total reads from adaptive sampling.”
FIGURE 4.
Effect of depletion and enrichment on Chromosome 11. (A) Loess curve for depletion of Chromosome 11. The log2 gene expression based on read counts from the NS run was plotted against the observed enrichment in sequenced bases in AS versus NS. The dashed red line (=1) indicates the threshold for deletion/ enrichment. (B) Loess curve for the enrichment of Chromosome 11 as in A. (C) Normalized coverage (autoscale group to [0–38]) of CKAP5 in NS and AS with enrichment or depletion of Chromosome 11, as indicated.
The efficiency and specificity of targeting were analyzed based on the decision derived from the AS file for all reads. For the depletion case, 60.3% of Chromosome 11 reads were targeted (unblock) (Fig. 3A; Supplemental Fig. 10A), with some aberrantly terminated reads on the other chromosomes. On the other hand, the enrichment of Chromosome 11 was less efficient (<50% accepted) and displayed off-target effects, both for rejected reads on Chromosome 11 and accepted reads on the other chromosomes (Fig. 3B; Supplemental Fig. 10B). Based on these observations, we decided to analyze the efficiency of a “reverse enrichment” (deplete reads from all chromosomes except for Chromosome 11). Compared to the other chromosomes, less Chromosome 11 reads are rejected (Fig. 3C; Supplemental Fig. 10C). However, the overall targeting efficiency of this approach is lower compared to direct depletion or enrichment (Fig. 3A,B). Thus, we focus on the direct targeting approaches and present analysis for the reverse enrichment case in Supplemental Figures 6–11.
Enrichment factors were calculated for all approaches in comparison to the corresponding NS data from the same flow cell. The enrichment respective depletion of Chromosome 11 reads was efficient and the effect size highly comparable (Fig. 3D, −0.49 vs. 0.48, Supplemental Table 7). On the other hand, the reverse enrichment displayed a lower efficiency compared to the direct enrichment (Fig. 3D), as expected.
In AS, all RNA molecules are sequenced to some extent, and based on these data reads are either accepted or rejected. Consequently, we expect only minor differences in terms of read counts (Supplemental Fig. 8A,B), but substantial differences in the number or sequenced bases. To further elucidate the effect of AS on individual genes expressed at different levels, we plotted the log2 expression derived from the NS run against the log2 fold change of sequenced bases for AS versus NS (Fig. 4A,B). Strikingly, the depletion (Fig. 4A) or enrichment (Fig. 4B) by AS was detectable for the majority of genes across all expression levels.
The depletion of Chromosome 11 has only minor effects on all other chromosomes (Supplemental Fig. 9A,B), whereas the enrichment of Chromosome 11 results in many significantly shorter reads on the other chromosomes, as expected (Supplemental Fig. 9C,D; Supplemental Tables 8–10).
Focusing on direct depletion and enrichment, we analyzed possible reasons for the observed off-target effects. This may be a result of not recognizing exon–exon boundaries, as we used the genomic sequence as reference here. As a consequence, transcripts with short 3′ exons would be expected to be affected more than transcripts with long 3′ exons. We assigned reads to transcripts with Bambu (Chen et al. 2023) and analyzed the length of the last exon for reads classified as no_decision, stop_receiving and unblock. Importantly, the length of the last exon was not significantly shorter for stop_receiving in depletion (Supplemental Fig. 10D) or unblock in enrichment mode (Supplemental Fig. 10E). Consequently, the genomic sequence is a suitable reference for DRAS. Based on the mitochondrial depletion and IVT use cases, a transcriptomic reference is expected to work as well; however, this may prohibit the identification of novel isoforms.
In the case of aberrantly accepted/rejected reads on the nontargeted chromosomes, we hypothesized that one major cause could be paralogous genes located on nontargeted chromosomes. Thus, we derived paralog genes from BioMart (EnsEMBL 102, GRCh38.p13) and compared the log2 fold change of sequenced bases with genes that do not have paralogs on Chromosome 11 (Supplemental Fig. 11). Importantly, the eCDF showed no difference between genes with and without paralogs on Chromosome 11 (Supplemental Fig. 11).
Despite the observed off-target effects that are most likely caused by the high complexity of the AS task, DRAS is a useful method to select for specific transcripts, genes or chromosomes (Supplemental Table 7), as displayed for CKAP5 (Fig. 4C) and GATD1 (Supplemental Fig. 10G).
We conclude that the depletion of unwanted transcripts can significantly improve the outcome of DRS. On the other hand, the value of enrichment strongly depends on the biological question, the abundance of the target and the choice between direct and reverse enrichment.
DISCUSSION
In this work, we show for the first time that the AS option provided by ONT is suitable for DRS as well and may be beneficial in solving biological problems.
Taking advantage of a simple model system composed of two defined IVTs, we determine the essential parameters of DRAS. We show that decisions are made based on sequencing “chunks” that are shorter than for sequencing of DNA, mainly due to the slower translocation speed of RNA compared to DNA, rendering it useful also for the targeting of relatively short transcripts (see below). Furthermore, we provide evidence that the depletion of specific transcripts has a higher efficiency and specificity than the enrichment. Both efficiency and specificity of depletion will also depend on the complexity of the AS task and the extent of similarity of the sequences in the mixture.
As illustrated in Figure 1B, AS may be analyzed either using the “adaptive sampling” or the “sequencing summary” file. Although in theory the reads classified as “unblock” or “data_service_unblock_mux_change” should be identical, we observed minor differences in classification between the two files. Therefore, we recommend to use only one file system throughout the analysis.
We present two use cases in mammalian transcriptome-wide experimental setups. First, we established the depletion of mt-RNAs from heart samples (mouse heart tissue and hiPSC-CM). Mt-RNAs are highly enriched in cardiac cells, as a high number of mitochondria is required to meet the high oxygen demand of these cells. Because these transcripts can make up to 30%–40% of the polyadenylated RNA fraction (Mercer et al. 2011b; Yang et al. 2014), and may not be of general interest, they block valuable sequencing capacity for cytosolic mRNAs. We show here that, despite the short nature of the mt-RNAs, AS for depletion of mt-RNAs is beneficial for the detection of cytosolic mRNAs, as exemplified for a group of genes implicated in DCM. We recently introduced mt-clipping [based on RNase H cleavage of the poly(A) tail of mitochondrial transcripts] to prevent sequencing of mt-RNAs in DRS (Naarmann-de Vries et al. 2022). Both methods have their pros and cons and can be considered as alternative approaches depending on the sample type. The RNase H-clipped transcripts are not sequenced at all, whereas in the case of AS, short reads are always produced. On the other hand, AS requires no additional experimental steps and has no risk of sample loss during purification. As no increase of pore loss is observed (Supplemental Figs. 2, 3, 7), mt-RNA depletion by AS can be safely applied to all sample types.
Although the depletion mode applied here has a higher efficiency in direct RNA-seq, the enrichment of specific transcripts is also of general interest. In the second use case, we investigated the performance of DRAS in a more general setting. We show that both depletion as well as enrichment can be applied at a chromosome-wide scale; however, off-target effects should be considered especially in the enrichment scenario. Furthermore, we tested a “reverse enrichment” strategy. However, in the context of a complex mammalian transcriptome, this strategy is in contrast to the simple IVT system and not advisable. Nevertheless, DRAS can improve the detection of lowly abundant and difficult-to-capture transcripts.
In this work, we established direct RNA AS using the Read Until script implemented in MinKNOW, as we think that this is the most accessible solution for most Nanopore users. Unfortunately, some parameters, such as the response time, are hard-coded and we could not test adjustments here.
An alternative approach for directed sequencing of RNA is RNA CaptureSeq (Mercer et al. 2011a) that has also been transferred to PacBio long-read sequencing (Lagarde et al. 2017). As there are fundamental differences between the two approaches, the usage of RNA CaptureSeq and DRAS will be complementary. RNA CaptureSeq enables a very high enrichment of sequences of interest, which enables the reliable detection on novel transcripts. This is achieved by the capture of cDNA sequences of interest using a custom tiling array (Mercer et al. 2011a). This array needs to be designed and synthesized for every study. Furthermore, the method is quite laborious and includes many experimental steps. On the other hand, enrichment by DRAS can be limited by the abundance of the targeted species. It requires no additional experimental steps and only a FASTA or BED file for targeting. Furthermore, the direct sequencing of RNA, which is currently only possible on the Nanopore system, allows the study of (isoform-specific) RNA modifications (Liu et al. 2019; Hendra et al. 2022; Piechotta et al. 2022; Mateos et al. 2023). Thus, DRAS provides a simple and cost-effective option to enrich and deplete specific RNA sequences.
Although the targeted sequences are usually shorter compared to DNA sequencing, due to the slower sequencing speed, depletion or enrichment is efficient here as well. In summary, AS is feasible to improve the outcome of DRS experiments and may improve the detection of novel transcript isoforms, for example.
MATERIALS AND METHODS
Isolation of total RNA from hiPSC-CM and mice samples
Total RNA from hiPSC-CM (generated as described in Naarmann-de Vries et al. 2022) was isolated using TRIzol (Thermo Fisher Scientific) as described previously (Naarmann-de Vries et al. 2022). Mice heart tissue sections from Sham-operated mice were homogenized in 750 µL Qiazol (Qiagen) using the TissueLyser (Qiagen) (4–5 times, 1 min, 30 Hz). Afterward, RNA was isolated with the miRNeasy Mini Kit (Qiagen), according to the manufacturer's protocol.
Isolation of poly(A)+ RNA
For isolation of poly(A)+ RNA from hiPSC-CM total RNA, 30 µg total RNA was digested with 1 µL DNase I (New England Biolabs) in a total volume of 100 µL for 10 min at 37°C followed by inactivation of the enzyme (10 min, 65°C). A total of 50 µL Dynabeads Oligo(dT)25 beads were washed once with binding buffer (10 mM Tris pH 7.5, 1 M lithium chloride, 6.5 mM EDTA) and resuspended in 110 µL binding buffer. Beads and DNase I-treated RNA were combined and incubated for 5 min at room temperature. Beads were collected on a magnet and washed two times with 200 µL washing buffer (5 mM Tris pH 7.5, 150 mM lithium chloride, 1 mM EDTA). RNA was eluted in 100 µL water by heating to 70°C for 2 min. Beads were resuspended in 100 µL binding buffer and recombined with the eluted RNA for a second purification round as described above. Final poly(A)+ RNA was eluted in a volume of 10 µL. Poly(A)+ RNA from mouse total RNA was isolated with the Oligotex mRNA Purification Kit (Qiagen), according to the manufacturer's protocol. Concentration and purity of the isolated poly(A)+ RNA were analyzed on a Nanodrop (Thermo Fisher Scientific) and Fragment Analyzer (Agilent), respectively.
Generation of in vitro transcripts
Generation of the IVTs has been described previously (Naarmann-de Vries et al. 2021). IVT1 represents the complete sequence of the human 18S rRNA, whereas IVT2 represents a 3′ fragment of the human 28S rRNA. Both IVTs have been generated with the MEGAscript T7 Transcription Kit (Thermo Fisher Scientific), according to the manufacturer's protocol and purified with Zymo RNA Clean & Concentrator Kits (Zymo Research).
Generation of direct RNA-seq libraries
Libraries were generated with 500 ng poly(A)+ RNA per reaction for MinION flow cells and 200 ng RNA equimolar IVT mixture for Flongle flow cells using the Direct RNA Sequencing Kit (SQK-RNA002, Oxford Nanopore Technologies), including the reverse transcription step. Concentration of libraries was determined with the Qubit dsDNA HS Kit (Thermo Fisher Scientific). Libraries were loaded completely on a MinION R9.4.1 flow cell (FLO-MIN106D) or Flongle flow cell and sequenced for 24–48 h, as outlined below.
Setup of GPU basecalling and adaptive sampling
The RNA sequencing data were acquired by using MinKNOW v21.10.4 (IVT analysis and depletion of mitochondrial transcripts) or 22.10.10 (Chromosome 11) installed on an HP zBook Create G7 Notebook running on Ubuntu 20.04.2 LTS, Intel Core i7-10850H CPU at 2.70 GHz (one CPU socket, six cores, 12 threads). Live basecalling of the FAST5 files was performed using Guppy 6.0.1 (IVT analysis and depletion of mitochondrial transcripts) or 6.3.9 (Chromosome 11) using a built-in NVIDIA GPU (GeForce RTX 2070 Mobile, CUDA version 11.4). Sequencing runs were started via the MinKNOW GUI and SQK-RNA002 Kit as well as the respective flow cell chosen. Adaptive sampling was activated in the MinKNOW GUI either in depletion or enrichment mode and required a FASTA file providing the reference sequences that should be enriched (whitelist) or depleted (blacklist), respectively. In the case of the “reverse enrichment” of Chromosome 11, a preindexed .mmi file was provided. For mt-RNA and Chromosome 11 depletion or enrichment experiments, the range for AS was defined to 50% of channels of the MinION flow cell. Guppy was run in high accuracy mode with a Q score threshold of 7.
Data processing
Subsequent FASTQ read alignment was performed with minimap2 (2.22 -r1101) using the following command options for Nanopore direct RNA sequencing: -ax splice -uf -k14 -secondary = no. Secondary reads were subsequently filtered using the following SAM flags: samtools view -S -b -F 2304. Matching of Read to Gene identifiers as well as estimation of read lengths for each read and gene counts was performed with bedtools (v2.29.2) with the following parameter options: -loj -s |cut -f9-23 |cut -f1-2,4-15. For the analysis of the IVT model, the following options were used: minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam. Bam-file analysis and allocation of IVT sequencing data were performed using samtools (1.10.2-3). Read IDs of rejected reads were identified by the sequencing summary file automatically generated after sequencing by filtering for Data_Service_Unblock_Mux_Change. Stack-plots indicating the proportions of sequenced reads based on decision and end_reason have been provided for each chromosome.
Visualization of data
Data analysis and visualization of read length histograms, heatmaps, stack-plots, estimator of the cumulative distribution function (eCDF) plots, and MA- and scatter-plots were all performed in R. Smoothing of the scatter plots was performed with LOESS in order to reveal target enrichment trends as a function of their expression. Capabilities of base R were expanded by several packages including the tidyverse (1.3.2.) collection (dplyr-v1.1.1.9, tidyr-v1.3.0, ggplot2-v3.4.2). Heatmaps were generated using pheatmap (1.0.12).
Coverage tracks were visualized with Integrative Genomics Viewer (v2.12.2) and schematic figures were drawn with Inkscape, except for Figure 1A, which was generated with biorender.com.
DATA DEPOSITION
Workflow and sequencing data are provided at Zenodo: https://doi.org/10.5281/zenodo.7701823 and https://doi.org/10.5281/zenodo.7688592.
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.
ACKNOWLEDGMENTS
We thank Harald Wilhelmi for excellent support in setting up the computational framework, and Jessica Eschenbach for excellent technical assistance. We are grateful to Matthias Dewenter and Johannes Backs for providing the mouse heart tissue samples. We thank Thiago Britto-Borges for generation of Figure 1A. I.S.N.d.V. and C.D. acknowledge support by the Klaus Tschira Stiftung gGmbH (00.219.2013). E.G. and C.D. acknowledge support by the Informatics for Life Consortium funded by the Klaus Tschira Foundation. C.D. acknowledges additional support by the DZHK (German Centre for Cardiovascular Research) Partner Site Heidelberg/Mannheim and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—project number 439669440 TRR319 RMaP TP B01.
Author contributions: I.S.N.d.V. and C.L.A.G. performed all experiments. E.G. and C.L.A.G. analyzed direct RNA-seq data. I.S.N.d.V., E.G., and C.L.A.G. compiled figures. I.S.N.d.V. wrote the manuscript. I.S.N.d.V. and C.D. designed and supervised the project. C.D. provided funding. All authors critically read the manuscript, revised it and approved the final version.
Footnotes
Article is online at http://www.rnajournal.org/cgi/doi/10.1261/rna.079727.123.
REFERENCES
- Bensasson D, Zhang D, Hartl DL, Hewitt GM. 2001. Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends Ecol Evol 16: 314–321. 10.1016/S0169-5347(01)02151-6 [DOI] [PubMed] [Google Scholar]
- Carniel E, Taylor MR, Sinagra G, Di Lenarda A, Ku L, Fain PR, Boucek MM, Cavanaugh J, Miocic S, Slavov D, et al. 2005. α-Myosin heavy chain: a sarcomeric gene associated with dilated and hypertrophic phenotypes of cardiomyopathy. Circulation 112: 54–59. 10.1161/CIRCULATIONAHA.104.507699 [DOI] [PubMed] [Google Scholar]
- Chen Y, Sim A, Wan YK, Yeo K, Lee JJX, Ling MH, Love MI, Goke J. 2023. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat Methods 20: 1187–1191. 10.1038/s41592-023-01908-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edwards HS, Krishnakumar R, Sinha A, Bird SW, Patel KD, Bartsch MS. 2019. Real-time selective sequencing with RUBRIC: Read Until with basecall and reference-informed criteria. Sci Rep 9: 11475. 10.1038/s41598-019-47857-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendra C, Pratanwanich PN, Wan YK, Goh WSS, Thiery A, Göke J. 2022. m6Anet identifies N6-methyladenosine from individual direct RNA sequencing reads. Nat Methods 19: 1530–1531. 10.1038/s41592-022-01666-1 [DOI] [PubMed] [Google Scholar]
- Jordan E, Peterson L, Ai T, Asatryan B, Bronicki L, Brown E, Celeghin R, Edwards M, Fan J, Ingles J, et al. 2021. Evidence-based assessment of genes in dilated cardiomyopathy. Circulation 144: 7–19. 10.1161/CIRCULATIONAHA.120.053033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kovaka S, Fan Y, Ni B, Timp W, Schatz MC. 2021. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat Biotechnol 39: 431–441. 10.1038/s41587-020-0731-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lagarde J, Uszczynska-Ratajczak B, Carbonell S, Perez-Lluch S, Abad A, Davis C, Gingeras TR, Frankish A, Harrow J, Guigo R, et al. 2017. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat Genet 49: 1731–1740. 10.1038/ng.3988 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H, Begik O, Lucas MC, Ramirez JM, Mason CE, Wiener D, Schwartz S, Mattick JS, Smith MA, Novoa EM. 2019. Accurate detection of m6A RNA modifications in native RNA sequences. Nat Commun 10: 4079. 10.1038/s41467-019-11713-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loose M, Malla S, Stout M. 2016. Real-time selective sequencing using nanopore technology. Nat Methods 13: 751–754. 10.1038/nmeth.3930 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin S, Heavens D, Lan Y, Horsfield S, Clark MD, Leggett RM. 2022. Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples. Genome Biol 23: 11. 10.1186/s13059-021-02582-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mateos PA, Sethi AJ, Ravindran A, Guarnacci M, Srivastava A, Xu J, Woodward K, Yuen ZWS, Mahmud S, Kanchi M, et al. 2023. Simultaneous identification of m6A and m5C reveals coordinated RNA modification at single-molecule resolution. bioRxiv 10.1101/2022.03.14.484124 [DOI] [Google Scholar]
- Mercer TR, Gerhardt DJ, Dinger ME, Crawford J, Trapnell C, Jeddeloh JA, Mattick JS, Rinn JL. 2011a. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 30: 99–104. 10.1038/nbt.2024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mercer TR, Neph S, Dinger ME, Crawford J, Smith MA, Shearwood AM, Haugen E, Bracken CP, Rackham O, Stamatoyannopoulos JA, et al. 2011b. The human mitochondrial transcriptome. Cell 146: 645–658. 10.1016/j.cell.2011.06.051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Naarmann-de Vries IS, Zorbas C, Lemsara A, Bencun M, Schudy S, Meder B, Eschenbach J, Lafontaine DLJ, Dieterich C. 2021. Deep assessment of human disease-associated ribosomal RNA modifications using Nanopore direct RNA sequencing. bioRxiv 10.1101/2021.11.10.467884 [DOI] [Google Scholar]
- Naarmann-de Vries IS, Eschenbach J, Dieterich C. 2022. Improved nanopore direct RNA sequencing of cardiac myocyte samples by selective mt-RNA depletion. J Mol Cell Cardiol 163: 175–186. 10.1016/j.yjmcc.2021.10.010 [DOI] [PubMed] [Google Scholar]
- Oxford Nanopore Technologies. 2020. Adaptive sampling: Release of Read Until api. https://community.nanoporetech.com/posts/adaptive-sampling-release.
- Oxford Nanopore Technologies. 2022. Adaptive sampling, best practice guidance. https://community.nanoporetech.com/posts/adaptive-sampling-best-pr.
- Payne A, Holmes N, Clarke T, Munro R, Debebe BJ, Loose M. 2021. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat Biotechnol 39: 442–450. 10.1038/s41587-020-00746-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piechotta M, Naarmann-de Vries IS, Wang Q, Altmuller J, Dieterich C. 2022. RNA modification mapping with JACUSA2. Genome Biol 23: 115. 10.1186/s13059-022-02676-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slomovic S, Laufer D, Geiger D, Schuster G. 2005. Polyadenylation and degradation of human mitochondrial RNA: the prokaryotic past leaves its mark. Mol Cell Biol 25: 6427–6435. 10.1128/MCB.25.15.6427-6435.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sneddon A, Ravindran A, Hein N, Shirokikh N, Eyras E. 2022. Real-time biochemical-free targeted sequencing of RNA species with RISER. bioRxiv 10.1101/2022.11.29.518281 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Zwaag PA, van Rijsingen IA, Asimaki A, Jongbloed JD, van Veldhuisen DJ, Wiesfeld AC, Cox MG, van Lochem LT, de Boer RA, Hofstra RM, et al. 2012. Phospholamban R14del mutation in patients diagnosed with dilated cardiomyopathy or arrhythmogenic right ventricular cardiomyopathy: evidence supporting the concept of arrhythmogenic cardiomyopathy. Eur J Heart Fail 14: 1199–1207. 10.1093/eurjhf/hfs119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weilguny L, De Maio N, Munro R, Manser C, Birney E, Loose M, Goldman N. 2023. Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design. Nat Biotechnol 41: 1018–1025. 10.1038/s41587-022-01580-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang KC, Yamada KA, Patel AY, Topkara VK, George I, Cheema FH, Ewald GA, Mann DL, Nerbonne JM. 2014. Deep RNA sequencing reveals dynamic regulation of myocardial noncoding RNAs in failing human heart and remodeling with mechanical circulatory support. Circulation 129: 1009–1021. 10.1161/CIRCULATIONAHA.113.003863 [DOI] [PMC free article] [PubMed] [Google Scholar]