Abstract
A unique feature of Oxford Nanopore Technologies sequencers, adaptive sampling, allows precise DNA molecule selection from sequencing libraries. Here, we present enhancements to our tool, readfish, enabling all features for the industrial scale PromethION sequencer, including standard and “barcode-aware” adaptive sampling. We demonstrate effective coverage enrichment and assessment of multiple human genomes for copy number and structural variation on a single PromethION flow cell.
Adaptive sampling (Loose et al. 2016) allows for the optimization of sequencing efficiency, reducing sequencing capacity wasted on DNA fragments that do not provide utility when answering a given biological question. The application of adaptive sampling to nanopore sequencing can address long-standing challenges inherent to other sequencing methods. It can lower sequencing costs and save time, alongside improving the depth and quality of sequencing data for targeted genomic regions. These issues are particularly present for large genomes, where a large amount of sequencing is required to produce sufficient data (Payne et al. 2021). By enhancing the relevance of the data produced, adaptive sampling can help with personalized medicine and diagnostics (Miller et al. 2021; Miyatake et al. 2022; Chen et al. 2024) and even with genomic environmental surveillance (Urban et al. 2023).
We previously developed readfish (Payne et al. 2021), which uses real-time base-calling to analyze molecules during translocation; determining if they should be sequenced or, instead, ejected from the pore to be replaced with a new molecule. Readfish has been instrumental in the development of adaptive sampling, providing researchers with unparalleled adaptability to address a wide range of biological questions (Patel et al. 2022; Stevanovski et al. 2022; Weilguny et al. 2023). Oxford Nanopore Technologies (ONT) also provide an adaptive sampling implementation built into MinKNOW, their software for controlling sequencing. This implementation is fundamentally the same as that of readfish; however, it is deeply embedded in MinKNOW, making it hard to customize and inflexible in certain scenarios. For example, it does not offer the ability to alter genomic targets during sequencing.
With the release of the PromethION, nanopore sequencing has grown in throughput, highlighting the requirement for additional features and improvements in readfish. Combining adaptive sampling with PromethION scale sequencing is extremely beneficial; if the throughput of sequence production is increased, the effect of enrichment is compounded. Increased coverage over target regions allows for the determination of structural variants (SVs) and single-nucleotide polymorphisms (SNPs) with greater confidence (Beyter et al. 2021). However, the increased data generation rates enabled by PromethION flow cells is too large for the original implementation of readfish. Specifically, the time taken to process all the alignments using mappy (Li 2018, 2021) (minimap2 Python bindings) led to significant bottlenecks in the analysis pipeline.
To this end, we have addressed issues with previous versions of readfish, refactoring the source code for better maintainability, efficiency, extensibility, and stability. We demonstrate the ability of readfish to keep up with PromethION scale sequencing, using a custom multithreaded implementation of mappy in Rust. Readfish has a number of features such as “barcode-awareness” (Munro et al. 2023) and compatibility with the latest Dorado versions, all of which are now available for PromethION scale experiments. As a demonstration of these improvements, we multiplexed and sequenced three different target panels across distinct human cell lines, confirming known SVs. We proceed to use the alignment of sequenced and rejected reads for precise copy number variation (CNV) assessment on PromethION.
Results
Barcode demultiplexing
Using base-calling in adaptive sampling decision-making allows existing sequence-based tools, such as barcode demultiplexers, to be incorporated into the readfish workflow. We previously adapted readfish to be compatible with built-in Guppy or Dorado demultiplexing (ONT) and incorporated barcode classifications into the data readfish can use to make a decision about sequencing or rejecting a read (Munro et al. 2023).
This “barcode-awareness” is now functional on the PromethION, and can be leveraged for greater impact on this platform, due to the larger amount of data generated per barcode. Reads can be rejected based solely on barcode classification (Supplemental Fig. 1A), or using independent target sets provided for their corresponding barcodes (Supplemental Fig. 1B). This differs from the adaptive sampling built into MinKNOW, which can also demultiplex barcoded reads for use in decision-making, as only one panel can be used for all barcodes. Currently, barcode-based demultiplexing when used in built-in adaptive sampling does not have the same flexibility as readfish, and can only be used to balance barcodes if they are present at uneven ratios in the sequencing library. All barcode demultiplexing testing was performed on a PromethION P24, with 2 NVIDIA Quadro GV100s, 2 Intel(R) Xeon(R) Platinum 8168 CPU's (96 cores total), and 384GB RAM.
Barcodes can be accurately identified with very little signal data, as shown in Figure 1B. Given that the default number of samples taken per read chunk is 5000 on the PromethION (1 sec at 5 kHz), barcode classification has an F1 score of ≥0.9 at this number of samples, when compared to those the full-length read produced. It would be possible to take less signal data and still receive an accurate alignment and barcode classification, which can be seen when examining both Figure 1A and B.
Figure 1.
Comparison of Alignment and Barcode classification F1 as a result of signal length. In total, 10,000 reads were truncated into 500 sample increments up to 5000 samples, with a 1000 sample increment after that, to a maximum of 10,000. Reads were base-called and demultiplexed using the super accuracy (sup), high accuracy (hac), and fast (fast) Dorado models. (A) F1 scores for alignment, where “truth” alignments were defined as the start position of the mapping being within 100 base pairs of the full-length “sup” model read alignment. (B) F1 scores for barcoding classifications, where “truth” barcode classifications were defined as the barcode classification as assigned by the “sup” model for the full-length read.
PromethION and GridION experimental comparison
To test the performance difference when applying adaptive sampling on a PromethION compared to GridION, we used three previously described cell lines: GM12878, from the Utah/CEPH pedigree; NB4, a cell line carrying a fusion between PML and RARA representing an acute promyelocytic leukemia (APL); and 22Rv1, a prostate cancer-derived cell line containing significant chromosomal abnormalities (Mozziconacci et al. 2002; Liu et al. 2010; Jain et al. 2018). For each sample, we chose a specific gene panel. GM12878 was targeted using a panel defined by the gene list in the commercially available TruSight 170 Tumor panel (Na et al. 2019). As the NB4 cell line contains an APL fusion, we selected the TruSight RNA Fusion Panel (Siegfried et al. 2019). For the more complex 22Rv1 prostate cancer cell line, we used the previously described COSMIC panel (Tate et al. 2019; Payne et al. 2021). Samples were barcoded and sequenced on a single flow cell, and run for 72 h (see Methods and Table 1), on both PromethION and GridION. Alignment on GridION was performed by mappy, and on PromethION alignment was performed by Guppy, and not mappy-rs. These alignments are the same as those that would be used by MinKNOW's inbuilt adaptive sampling. At the time the experiment was run, adaptive sampling had only just become feasible on the PromethION. Therefore, we had implemented the most straightforward method to keep up with PromethION data production, which was inbuilt base-caller alignments. While using built-in alignments is a feasible alternative to alignment by readfish, using the base-caller for alignment imposes restrictions, as it ties base-calling to Dorado or Guppy. Neither of these tools allows for alterations to the reference sequence during an experiment, limiting the ability to run true adaptive experiments.
Table 1.
Sample performance
Yield (Gb) | N50 (bases) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Barcode | Seq. | Unb. | Seq. | Unb. | Sample | Panel | Gene number | Off target Med. Cov. | Target Med. Cov. | Fold enrichment |
01 | 0.334 | 3.47 | 8149 | 555 | GM12878 | TruSight 170 tumor panel | 170 | 1 | 12 | 12 |
02 | 1.24 | 4.84 | 7191 | 552 | NB4 | TruSight RNA fusion panel | 508 | 1 | 15 | 15 |
03 | 1.25 | 3.84 | 6858 | 556 | 22Rv1 | COSMIC | 717 | 1 | 12 | 12 |
Unclassified | 0.170 | 3.66 | 923 | 792 | * | * | * | 0 | 1 | * |
Total | 2.99 | 15.80 | 5780 | 614 | 1 | 12 | ||||
05 | 1.28 | 14.76 | 7163 | 917 | GM12878 | TruSight 170 tumor panel | 170 | 4 | 35 | 9 |
06 | 4.36 | 23.39 | 7349 | 919 | NB4 | TruSight RNA fusion panel | 508 | 8 | 52 | 7 |
07 | 3.01 | 13.63 | 6999 | 923 | 22Rv1 | COSMIC | 717 | 4 | 27 | 7 |
Unclassified | 0.703 | 15.50 | 543 | 989 | * | * | * | 4 | 5 | * |
Total | 9.35 | 67.29 | 5514 | 937 | 4 | 31 |
Run metric performance per barcode and over the entire flow cell. Metrics are derived from real-time monitoring with minoTour (Munro et al. 2022), and from analysis of the final data set. Barcodes 01–03 were run on a single GridION flow cell; barcodes 05–07 were run on a single PromethION flow cell. (Seq.) Sequenced; (Unb.) unblocked; (Med. Cov.) median coverage. (*) Not applicable.
On GridION, in a single experiment using a flow cell with 1330 pores, 18.79 Gb of data were generated, with a total of 15 Gb successfully demultiplexed into barcoded data (Table 1). Inspection of individual targets PML and RARA demonstrates the ability to specifically target unique regions on each barcoded sample (Fig. 2A,B). Current best practice for single nucleotide variant calling requires higher minimal depth than we achieve when looking at three samples on a MinION flow cell. However, long-range SVs can be determined, and so we used cuteSV (Jiang et al. 2020) to analyze these three samples. As expected, multiple reads supporting the detection of a fusion between PML and RARA were detected in the NB4 cell line (barcode 02, 06), as visualized using Genome Ribbon (Nattestad et al. 2021), and shown in Figure 3B. A full comparison of structural variance across this region for all samples can be seen in Supplemental Figure 3.
Figure 2.
Target and barcode-specific gene coverage. Illustration of coverage over each barcoded sample for the target genes PML and RARA. Blue is coverage from accepted read; red illustrates coverage from rejected reads. Barcodes 01 and 05: samples prepared from NA12878 cells; Barcodes 02 and 06: NB4; and Barcodes 03 and 07: 22Rv1. (A,B) Data generated on a single MinION flow cell. The targeted regions are illustrated below the coverage plots. (C,D) Data generated on a single PromethION flow cell.
Figure 3.
Visualizing structural variation. (A) Using Samplot (Belyeu et al. 2021), reads from the PromethION run linking PML (Chromosome 15) to RARA (Chromosome 17) in a known fusion were visualized. Only the NB4 sample carries this fusion (indicated by the dashed lines). (B,C) Using Ribbon (Nattestad et al. 2021), we can visualize individual reads that span the fusion from (B) the GridION NB4 sample and (C) the PromethION NB4 sample. SVs, in this case, were identified using cuteSV (Jiang et al. 2020).
For PromethION testing, a single experiment using a flow cell with 6960 pores generated a total of 78.2 Gb of data, of which 61 Gb could be demultiplexed into barcoded data. See Table 1 for a complete breakdown of experimental statistics. Again, long-range SVs can be determined, and as expected, multiple reads supporting the detection of a fusion between PML and RARA were detected in the NB4 cell line (barcode 02, 06), visualized using Genome Ribbon (Nattestad et al. 2021) and Samplot (Fig. 3A,C; Belyeu et al. 2021). To achieve this coverage across three samples without adaptive sampling would require 300–400 Gb of untargeted read data. Coverage compared with GridION is greatly improved (Fig. 2C,D).
Finally, we turned to a natural application for adaptive-sampling, which considers the mappings of rejected reads. Various approaches have been developed using binning of short reads to detect CNV by applying a variety of statistical approaches (Zhang et al. 2019). These methods also work with nanopore sequencing (Magi et al. 2019), but the resolution of detection will be dependent on the total number of reads generated during a sequencing run. Adaptive sampling increases read count as a consequence of rejecting molecules once they are confidently mapped to an off-target region. As expected, CNV plots generated in this manner for NB4 (barcode 06) and 22Rv1 (barcode 07) (Fig. 4A,C) both closely recapitulate CNV plots generated by Bionano optical mapping (Fig. 4B,D). A full comparison across all samples can be seen in Supplemental Figure 4.
Figure 4.
Matched ONT and Bionano CNV visualization. Nanopore sequence data from PromethION compared with Bionano optically mapped reads, all mapped against GRCh38 (hg38). Blue points show where binned data indicates greater than expected copy number, red points where binned data indicates lower than expected copy number. (A) NB4 PromethION sample compared with (B) showing Bionano optical mapping data for this cell line. (C) 22Rv1 PromethION sample compared with (D) showing Bionano optical mapping data for this cell line.
Alignment throughput increase
Single-threaded mappy is unable to keep up with the data generation rate of the PromethION. This can be clearly seen in Figure 5D with alignment times lagging upwards of 14 sec within 2 min of sequencing. Simply implementing a multithreaded version of mappy through Python bindings resulted in unnecessary memory utilization, as the reference library cannot easily be shared across threads. To address this, we wrote and integrated mappy-rs (https://github.com/Adoni5/mappy-rs) into readfish. mappy-rs exploits the ability of rust to better integrate with the underlying minimap2 code facilitating multithreaded alignment against a single instance of the reference in memory. By implementing a separate aligner, we can also update the reference during a run for increased customizability.
Figure 5.
Total absolute alignment and base-calling times in readfish for different aligners on PromethION and GridION. Note that the base-called reads are streamed into the aligner for mappy and mappy-rs, so the two processes are occurring in parallel. Each scatter point represents a batch of accumulated signal chunks being analyzed by readfish, and the color represents the mean base-called length of reads in the batch. The ‘break_read_chunk_ms‘ for each device is indicated on the x- and y-axis of each plot as a dashed line, representing the amount of time each chunk of signal data recorded in a batch represents. Ideally, the total batch time should fall below this line. The marginal axes for each facet display Kernel Density estimation plots of the distribution of times for their axis. The PromethION/mappy combination has a different axes scale than the other facets, as batch processing times quickly lagged. When using the Dorado alignments, it was not possible to deconvolute the Alignment time from the base-calling time, as alignments are returned to readfish alongside the base-called signal; therefore, points will fall on x = y.
Signal data chunks are collected from the PromethION once every second as default. Therefore, if the combined time of base-calling, analysis, and decision-making exceeds one second, the analysis will rapidly fall behind and negatively impact the ability to enrich target molecules. Using mappy-rs, alignment can be significantly sped up (Supplemental Fig. 2), preventing alignment from being a bottleneck, which causes an increasing buildup of signal and consequential time lag.
Figure 5 displays the time taken for the base-calling and alignment steps within the readfish decision-making loop when using mappy, mappy-rs, and dorado for alignment at the GridION and PromethION scale. Figure 5A, C, and E shows that the choice of aligner makes little difference when sequencing with a MinION flow cell on a GridION MK1, with all three aligners performing similarly. Figure 5F confirms that the performance of mappy-rs on PromethION is sufficient to keep ahead of signal collection batch times on the PromethION. Performance is also largely comparable with using the built-in alignments, which can be returned by Dorado (Figure 5B), the same alignments that would be used by ONT's implementation of adaptive sampling. Although the times of several batches exceed the 1 sec threshold for Dorado and mappy-rs alignments in Figure 5B and F, the distribution plots in the margins show that this is actually quite a rare occurrence, with the distribution centered comfortably under 1 sec. When comparing the peak of the alignment time distributions between Dorado and mappy-rs alignments in Figure 5B and F, we can see that mappy-rs is slightly slower. It is, however, more consistent, never exceeding 1.3 sec, a threshold which is occasionally passed when using Dorado alignments, for reasons, we cannot currently explain. The coloring of the batches indicates the mean base-called read lengths contained in a batch. The cache for signal chunks from each channel is cumulative, if a read does not have a decision made on it by readfish in a batch, the next chunk of signal is appended to the already held signal. As we can see, for all but mappy on promethION Figure 5D, the mean read length of reads in a batch is almost always <1000 bases (2.5 sec of sequencing at 400 bases per second), which indicates that there is no buildup of sequence.
Discussion
Extending readfish to process the volume of data a PromethION generates allows more sophisticated selection experiments, that are better able to exploit adaptive sampling's potential. By coupling high-throughput PromethION sequencing with readfish's fully customizable “barcode-aware” adaptive sampling, targeted data generation can be fine-tuned based on individual samples, maximizing the effect of enrichment per flow cell used. Here, we demonstrate that individual samples can be targeted with unique panels of genes, selected based on knowledge of the sample, enabling a user to ask and answer specific questions. On a single MinION flow cell, three human genomes can be analyzed in real time, with coverage sufficient to detect SV and CNV. Furthermore, it is possible to target three human genomes on a single flow cell on PromethION devices to a sufficient depth for further SNP analysis. We anticipate that further optimization of the underlying software, both within proprietary MinKNOW control software and firmware as well as within tools such as readfish, will further enhance yield and throughput enabling effective targeted sequencing on 6–12 samples on a single PromethION flow cell. We demonstrate that by using the novel mappy-rs, it is possible for readfish to perform alignments at sufficient speed to keep up with PromethION, with the latest chemistry. The release of the P2i and the P2 solo will increase the amount of PromethION scale sequencing that is occurring outside larger sequencing centers. While the adaptive sampling runs for this publication were performed on the provided P24 or P48 towers, for running adaptive sampling on both positions on a P2, we would recommend using at the minimum an NVIDIA 3080, preferably a 4090, a 24 Core CPU, and 64GB of RAM. Ignoring the compute requirements for base-calling, MinKNOW, and controlling sequencing, we can see in Supplemental Figure 5 that the memory requirements for running readfish are roughly the size of the reference after transformation into a minimap2 index, and up to 1–2GB extra for storing signal and base-called sequence. In terms of CPU threads, a standard CPU would suffice for GridION, however for PromethION, we would recommend at least an 8 core CPU with at least 4 threads dedicated to mappy-rs alignment.
Readfish at industrial scale, alongside sample multiplexing, could have numerous potential applications. In healthcare, it can be used for rapid profiling of central nervous system tumors (Vermeulen et al. 2023), and with the addition of multiplexed samples to the high-throughput PromethION, multiple samples can now be concurrently analyzed. We note that this method is dependent on mean read lengths of sufficient length to enable enrichment and so are dependent on sample extraction methods. The flexibility offered by readfish allows for the updating of targets during an ongoing sequencing run, meaning that in conjunction with real-time analysis, potential genetic mutations and variations linked to diseases can be added to target regions in real time if relevant. In the field of genomics, adaptive sampling can be applied to aid de novo assemblies, targeting reads too difficult to assemble regions, increasing the likelihood of longer nanopore reads spanning these regions and resolving them.
Methods
Mappy-rs development
minimap2 (Li 2018, 2021) is written in C, and provides a structured application programming interface (API) that allows for other programming languages to create bindings to the compiled C code, and execute it via a Foreign Function interface. This is how mappy is designed, using Cython in order to provide access to the C code alignment functions within the Python runtime.
However, in order to perform alignment at sufficient throughput to keep up with the output of a PromethION sequencer in real time, we decided to design a multithreaded aligner with multiple copies of the Aligner sharing references to a single minimap2 index. Rust was chosen over C++ or C due to familiarity with the language, support for creating Python bindings, and excellent support for parallelism and concurrency that is built Rust. Bindings to the minimap2 C library were generated using bindgen allowing for Rust code to call the underlying minimap2 C code. Custom Rust structs to represent the Aligner and Alignments were then created, and custom functions to perform the alignment were written, relying on the underlying C code to perform alignment calculations. Python bindings to the Rust code were generated using PyO3, creating mappy-rs (https://github.com/Adoni5/mappy-rs). mappy-rs receives batches of sequence, and places them into a queue. A thread pool, where each thread has access to its own Aligner instance, pulls a sequence out from the queue, aligns it, and creates an Alignment instance to store the results, and then pushes that to a results queue. The results are yielded from the results queue back to the Python runtime in the order they were sent to the aligner.
Readfish and alignment timings
Six simulated sequencing runs were simulated on the PromethION P48 beta tower, with 4 Tesla V100-PCIE-16GB GPUs, 12 32GiB DIMM DDR4 Synchronous 2666 MHz sticks of RAM (384GB total), and 2 Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (96 core total). Runs were simulated using Icarust 0.0.7, commit cf27f12071f7c9b515883d5e3bf645aad4609831 using the config_dnar10_5Khz_human_barcoded.toml, a copy of which has been included in the accompanying notebook repository. Simulated runs were run for 2 h a piece using the command: cargo run -r – -s Profile_tomls/config_dnar10_5khz_human_barcoded.toml -c config_grid.ini -v -p Where the number of channels and the break_read_chunks was changed to 512 and 0.8 for GridION simulation and 3000 and 1.0 for PromethION.
Dorado base-call server v7.3.9 was used for base-calling for readfish and alignment for using readfish with Dorado alignments. Alignment was performed, using the Hg38.p14 reference, with only complete, primary chromosomes present in the reference. Multithreaded alignments for both Dorado and mappy-rs used 16 threads.
Profiling timings were generating using a custom fork of readfish v2024.2.0, commit cd20ff16c5f3a5f54124515fe58aabde3dc8df3a, https://github.com/LooseLab/readfish/tree/cd20ff16c5f3a5f54124515fe58aabde3dc8df3a. Custom analysis was performed in jupyter-notebooks, code available in attached repository.
Truncated read generation and analysis
Analysis and base-calling were performed on the same PromethION P48 tower as above. Reads were truncated using https://pypi.org/project/pod5/ from ONT and a custom Python script, and truncated reads were written into valid POD5 files for each truncation length. Truncation was done in 0.1 sec increments (500 samples) up until one second's worth of data, after which a 0.2 sec increment (1000 sample) was used. 10,000 Reads were taken from a human clinical run, the library was prepared using the SQK-NBD114-24 sequencing kit, and was sequenced at 5 kHz on a R10.4.1 flow cell.
Each set of truncated reads and the original full-length reads were then base-called with Dorado base-call server 7.3.9, using the Fast, High Accuracy and Super base-calling models at v4.3. For barcode demultiplexing of data, we used Dorado v7.3.9 demultiplexing and tested no other approach.
Alignment was performed by Dorado base-call server 7.3.9, which internally uses minimap2 v2.24. Reads were aligned against the GRCh38.p14 reference with all but the primary contigs removed.
Running readfish barcoding
Running adaptive sampling requires the ONT Read Until API (version 3.0.0, https://github.com/nanoporetech/read_until_api/tree/release-3.0 and the ONT PyGuppy Client library (version 5.0.13, https://pypi.org/project/ont-pyguppy-client-lib/5.0.13/). Readfish (https://github.com/LooseLab/readfish; commit 9e8794a) was run using a GridION MK1 (MinKNOW v4.3.2; Guppy v5.0.13; minimap2 v2.22), the MinKNOW configuration scripts were configured to serve data in 0.8 sec chunks. For PromethION, we ran using a modified version of readfish (https://github.com/LooseLab/readfish; commit c9f5169) on an early release of MinKNOW core 5.1 using a PromethION 24 device (MinKNOW v5.1; Guppy v6.0.6) with the ONT PyGuppy Client library (version 6.0.6, https://pypi.org/project/ont-pyguppy-client-lib/5.0.13/). MinKNOW configuration scripts were left serving data in 1 sec chunks for PromethION.
The readfish script carrying out the selective sequencing was readfish barcode-targets. This script runs the core Read Until process as specified in the experiment's TOML file. With a single reference genome, the script can select specific target regions on each barcode by using Guppy to base call and demultiplex the raw signal in real time. The resultant read is then aligned to the reference using minimap2 and is determined to be on or off target depending on its barcode assignment and mapping start. For PromethION, we used the mapping returned by minimap2 from within Guppy to make decisions.
Library preparation, sequencing, and analysis
Barcoded LSK-110 (ONT) sequencing libraries were prepared from either GM12878 cells (Coriell), NB4 cells (gift from M. Hubank), or 22Rv1 cells (ATCC) as described in Jain et al. (2018). For test experiments, bacterial DNA was extracted using a genomic tip (QIAGEN). Extracted DNA was sheared to ∼12 kb using g-Tube (Covaris). Sequencing used either FLO-MIN106 R9.4.1 flow cells for GridION or FLO-PRO002/FLO-PRO114 R9.4.1/R10.4.1 flow cells for PromethION as appropriate. Flow cells were run with flushing and reloading as previously described in Payne et al. (2021).
To investigate SVs across the data set, we ran cuteSV (https://github.com/tjiangHIT/cuteSV) on each barcoded sample using standard options but varying the -s MIN SUPPORT values, altering the minimum number of reads required to support a SV. No SVs in known fusion genes were reported in NA12878 or 22Rv1 (-s 2), known fusions including PML and RARA were readily detected in NB4 (-s 5) (Jiang et al. 2020). SVs were visualized using Ribbon (Nattestad et al. 2021).
To visualize changes in copy number, reads were mapped to hg38, filtered to mapping scores >20 and uniquely mapping. Then, the first primary mapping for any read was determined and mappings binned into windows along the genome such that on average each bin contains 100 reads. Runs were monitored in real time using minoTour (https://github.com/LooseLab/minotourapp; commit: 1f9c678), providing coverage statistics, mappings, and estimates of CNV in real time (Munro et al. 2022). During real-time analysis, reads were mapped to CHM13 telomere-to-telomere assembly (Nurk et al. 2022). Postrun copy number plots were generated using Matplotlib with data mapped to hg38 to compare with the output of the Bionano copy number pipeline (see notebooks https://github.com/LooseLab/barcode_paper_nb).
To visualize coverage over specific targets, reads were divided into those actively sequenced and those unblocked using the unblocked read IDs file generated by readfish. Reads were mapped to hg38, coverage depth calculated using mosdepth v0.3.1 (Pedersen and Quinlan 2018) and visualized using Matplotlib (v3.4.3).
Bionano methods
DNA extraction and labeling for Bionano
DNA was prepared from frozen cell pellets of 1.5 million cells using the Bionano Prep SP Blood and Cell Culture DNA Isolation Kit (Bionano Genomics; 80042) according to the manufacturer's instructions. DNA was homogenized and quantified using Qubit dsDNA BR Kit (Thermo Fisher Scientific; Q32853) on a Qubit 4 Fluorometer (Thermo Fisher Scientific; Q33238). In total, 750 ng of gDNA was then labeled with Direct Label Enzyme 1 (DLE-1) and DNA backbone stain using the Bionano Prep Direct Label and Stain (DLS) kit (Bionano Genomics; 80005) according to the manufacturer's instructions. Labeled DNA was quantified using the Qubit dsDNA HS Kit (Thermo Fisher Scientific; Q32851) on a Qubit 4 Fluorometer. Labeled DNA was loaded onto a Bionano Saphyr G2.3 chip (Bionano Genomics; 20366) and run on a Gen 2 Bionano Saphyr System (Bionano Genomics; 60325) until 1.320 Tbp of data had been collected for each of NB4 and 22Rv1. This data had respective mapping rates to hg38 reference sequence of 89% and 79%, equating to 382× and 337× coverage, respectively.
Data analysis
Postrun data filtering and analysis were carried out using Bionano Access 1.5.2. For each sample, the data set was filtered and subsampled to produce 320 Gbp of data with 150 kb minimum length and at least nine labels per molecule. Filtered data were processed to produce annotated de novo assemblies using the default parameters, but with masking using the hg38 DLE-1 SV Mask BED file. SVs and CNVs coordinates were then visualized using Bionano Access. All described analysis was performed on dedicated Bionano compute with the following versions installed: Bionano Access1.5.2, Bionano Tools 1.5.3, Bionano Solve Solve3.5.1_01142020, RefAligner 10330.10436rel, HybridScaffold 12162019, SVMerge 12162019, VariantAnnotation 12162019, and Compute on Demand 1.5.1.
Data access
Sequence data and Bionano maps generated for the GridION and PromethION experimental comparison have been submitted to the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena/browser/home) under accession number PRJEB82322. Data and scripts used in this manuscript are available at GitHub (https://github.com/LooseLab/barcode_paper_nb) and as Supplemental Code.
Supplemental Material
Acknowledgments
The authors thank Mike Hubank and Nigel Mongan for gifts of cells and useful discussions. We also thank Stu Reid, Graham Hall, Chris Wright, and teams at ONT for useful conversations. This work was supported by Biotechnology and Biological Sciences Research Council (BBSRC) iCASE studentship awards to R.M. and A.P. In addition, we acknowledge funding from the BBSRC (BB/N017099/1) and Wellcome Trust (grant number 204843/Z/16/Z). We would like to acknowledge Deepseq Nottingham for the creation and sequencing of all DNA libraries.
Author contributions: R.M., A.P., and M.L. conceptualized the study: R.M. and A.P. performed data analysis; R.M. and M.L. wrote the manuscript; and N.H., C.M., and I.C. performed sequencing and DNA extraction for experiments.
Footnotes
[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279329.124.
Freely available online through the Genome Research Open Access option.
Competing interest statement
M.L. was a member of the MinION access program and has received free-flow cells and sequencing reagents in the past. M.L. has received reimbursement for travel, accommodation and conference fees to speak at events organized by Oxford Nanopore Technologies. Early access to MinKNOW software updates from Oxford Nanopore Technologies enabled this work. R.M. is currently on a BBSRC iCASE PhD programme, which is in part funded by Nanopore.
References
- Belyeu JR, Chowdhury M, Brown J, Pedersen BS, Cormier MJ, Quinlan AR, Layer RM. 2021. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol 22: 161. 10.1186/s13059-021-02380-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, Kristmundsdottir S, Mehringer S, Hardarson MT, et al. 2021. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53: 779–786. 10.1038/s41588-021-00865-4 [DOI] [PubMed] [Google Scholar]
- Chen Z, Gustavsson EK, Macpherson H, Anderson C, Clarkson C, Rocca C, Self E, Alvarez Jerez P, Scardamaglia A, Pellerin D, et al. 2024. Adaptive long-read sequencing reveals GGC repeat expansion in ZFHX3 associated with spinocerebellar ataxia type 4. Mov Disord 39: 486–497. 10.1002/mds.29704 [DOI] [PubMed] [Google Scholar]
- Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, et al. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36: 338–345. 10.1038/nbt.4060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. 2020. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 21: 189. 10.1186/s13059-020-02107-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094–3100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2021. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37: 4572–4574. 10.1093/bioinformatics/btab705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu T, Xu F, Du X, Lai D, Liu T, Zhao Y, Huang Q, Jiang L, Huang W, Cheng W, et al. 2010. Establishment and characterization of multi-drug resistant, prostate carcinoma-initiating stem-like cells from human prostate cancer cell lines 22RV1. Mol Cell Biochem 340: 265–273. 10.1007/s11010-010-0426-5 [DOI] [PubMed] [Google Scholar]
- Loose M, Malla S, Stout M. 2016. Real-time selective sequencing using nanopore technology. Nat. Methods 13: 751–754. 10.1038/nmeth.3930 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Magi A, Bolognini D, Bartalucci N, Mingrino A, Semeraro R, Giovannini L, Bonifacio S, Parrini D, Pelo E, Mannelli F, et al. 2019. Nano-GLADIATOR: real-time detection of copy number alterations from nanopore sequencing data. Bioinformatics 35: 4213–4221. 10.1093/bioinformatics/btz241 [DOI] [PubMed] [Google Scholar]
- Miller DE, Sulovari A, Wang T, Loucks H, Hoekzema K, Munson KM, Lewis AP, Fuerte EPA, Paschal CR, Walsh T, et al. 2021. Targeted long-read sequencing identifies missing disease-causing variation. Am J Hum Genet 108: 1436–1449. 10.1016/j.ajhg.2021.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miyatake S, Koshimizu E, Fujita A, Doi H, Okubo M, Wada T, Hamanaka K, Ueda N, Kishida H, Minase G, et al. 2022. Rapid and comprehensive diagnostic method for repeat expansion diseases using nanopore sequencing. npj Genomic Med 7: 62. 10.1038/s41525-022-00331-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mozziconacci M-J, Rosenauer A, Restouin A, Fanelli M, Shao W, Fernandez F, Toiron Y, Viscardi J, Gambacorti-Passerini C, Miller WH, et al. 2002. “Molecular cytogenetics of the acute promyelocytic leukemia-derived cell line NB4 and of four all-trans retinoic acid-resistant subclones. Genes, Chromosomes Cancer 35: 261–270. 10.1002/gcc.10117 [DOI] [PubMed] [Google Scholar]
- Munro R, Santos R, Payne A, Forey T, Osei S, Holmes N, Loose M. 2022. Minotour, real-time monitoring and analysis for nanopore sequencers. Bioinformatics 38: 1133–1135. 10.1093/bioinformatics/btab780 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munro R, Holmes N, Holmes N, Moore C, Carlile M, Payne A, Tyson JR, Williams T, Alder C, Snell LB, et al. 2023. A framework for real-time monitoring, analysis and adaptive sampling of viral amplicon nanopore sequencing. Front Genet 14: 1138582. 10.3389/fgene.2023.1138582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Na K, Kim H-S, Shim HS, Chang JH, Kang S-G, Kim SH. 2019. Targeted next-generation sequencing panel (TruSight tumor 170) in diffuse glioma: a single institutional experience of 135 cases. J Neurooncol 142: 445–454. 10.1007/s11060-019-03114-1 [DOI] [PubMed] [Google Scholar]
- Nattestad M, Aboukhalil R, Chin C-S, Schatz MC. 2021. Ribbon: intuitive visualization for complex genomic variation. Bioinformatics 37: 413–415. 10.1093/bioinformatics/btaa680 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, et al. 2022. The complete sequence of a human genome. Science 376: 44–53. 10.1126/science.abj6987 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel A, Dogan H, Payne A, Krause E, Sievers P, Schoebe N, Schrimpf D, Blume C, Stichel D, Holmes N, et al. 2022. Rapid-CNS2: rapid comprehensive adaptive nanopore-sequencing of CNS tumors, a proof-of-concept study. Acta Neuropathol 143: 609–612. 10.1007/s00401-022-02415-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Payne A, Holmes N, Clarke T, Munro R, Debebe BJ, Loose M. 2021. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat Biotechnol 39: 442–450. 10.1038/s41587-020-00746-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedersen BS, Quinlan AR. 2018. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34: 867–868. 10.1093/bioinformatics/btx699 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siegfried A, Rousseau A, Maurage C, Pericart S, Nicaise Y, Escudie F, Grand D, Delrieu A, Gomez-Brouchet A, Le Guellec S, et al. 2019. EWSR1-PATZ1 gene fusion may define a new glioneuronal tumor entity. Brain Pathol 29: 53–62. 10.1111/bpa.12619 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stevanovski I, Chintalaphani SR, Gamaarachchi H, Ferguson JM, Pineda SS, Scriba CK, Tchan M, Fung V, Ng K, Cortese A, et al. 2022. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. Marina Kennerson 8: 17. 10.1126/sciadv.abm5386 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, et al. 2019. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 47: D941–D947. 10.1093/nar/gky1015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Urban L, Miller AK, Eason D, Vercoe D, Shaffer M, Wilkinson SP, Jeunen G-J, Gemmell NJ, Digby A. 2023. Non-invasive real-time genomic monitoring of the critically endangered kākāpō. Elife 12: RP84553. 10.7554/eLife.84553 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vermeulen C, Pagès-Gallego M, Kester L, Kranendonk MEG, Wesseling P, Verburg N, de Witt Hamer P, Kooi EJ, Dankmeijer L, van der Lugt J, et al. 2023. Ultra-fast deep-learned CNS tumour classification during surgery. Nature 622: 842–849. 10.1038/s41586-023-06615-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weilguny L, De Maio N, Munro R, Manser C, Birney E, Loose M, Goldman N. 2023. Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design. Nat Biotechnol 41: 1018–1025. 10.1038/s41587-022-01580-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang L, Bai W, Yuan N, Du Z. 2019. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput Biol 15: e1007069. 10.1371/journal.pcbi.1007069 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.