Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2026 Jan 19;10(4):e02335. doi: 10.1002/smtd.202502335

Machine Learning‐Driven Nanopore Sensing for Quantitative, Label‐Free miRNA Detection

Caroline Koch 1,2, Seshagiri Sakthimani 1, Victoria Maria Noakes 1, Miruna Cretu 1, David Newman 3, Richard Gutierrez 3, Mark Bruce 3, Julia Gorelik 4, Nadia Guerra 2, Joshua B Edel 1,, Aleksandar P Ivanov 1,
PMCID: PMC12929938  PMID: 41549838

ABSTRACT

Nanopore sensors offer exceptional sensitivity for detecting single molecules, making them ideal for early disease diagnostics. In this study, we present a multiplexed nanopore‐based assay that combines DNA‐barcoded probes with advanced computational analysis to detect microRNAs (miRNAs) with high specificity and accuracy. Each probe selectively binds its target biomarker and induces a characteristic delay in the ionic current signal upon translocation through the nanopore. We evaluated three analytical strategies for classifying delayed versus non‐delayed events: (1) moving standard deviation (MSD), (2) spectral entropy (SE), and (3) a convolutional neural network (CNN). While MSD and SE rely on manually defined thresholds and exhibit limited sensitivity, the CNN model, trained on image representations of raw current traces, achieved near‐perfect classification performance across all metrics (accuracy = 0.99, precision = 0.99, recall = 0.99). Grad‐CAM visualization confirmed that the CNN model focused on relevant signal regions, enhancing interpretability and generalizability. All methods produced sigmoidal concentration‐response curves consistent with expected binding kinetics, and nanopore‐derived delay metrics closely matched RT‐qPCR validation data. All three methods were capable of distinguishing between signal classes; however, the CNN model demonstrated superior sensitivity and robustness. This work highlights the importance of data interpretation in nanopore sensing and presents a comparative framework for binary event classification. The findings pave the way for the development of machine learning‐driven nanopore diagnostics capable of detecting diverse biomarker types at the single‐molecule level.

Keywords: biomarker, data‐analysis, machine‐learning, miRNAs, nanopores


Nanopore sensors combined with DNA‐barcoded probes enable detection of multiple biomarkers in parallel at the single‐molecule level through characteristic delays in ionic current signals. Three analytical strategies are compared: moving standard deviation, spectral entropy, and a convolutional neural network (CNN). The CNN achieves superior sensitivity and robustness, offering a scalable, automated framework for multiplexed nanopore diagnostics.

graphic file with name SMTD-10-e02335-g002.jpg

1. Introduction

Nanopores are single‐molecule electrical sensors capable of detecting and characterizing individual molecules in real time [1]. This ability to resolve molecular features at the single‐molecule level is particularly powerful, as conventional ensemble‐based techniques often obscure such detail through ensemble averaging [2]. Due to their exceptional sensitivity, nanopores have found applications across diverse fields such as environmental monitoring [3], catalysis [4] and isomer [5] studies, DNA‐based storage approaches [6, 7], DNA and RNA sequencing [8] and, most relevant to this study, the detection of biomarkers for distinguishing between healthy and diseased states [9, 10].

The operational principle involves applying an electrical bias across a nanoscale pore to generate a steady ionic current. When charged analytes translocate through the pore, the ion flow is temporarily disrupted, causing a characteristic change in current. These disruptions encode information about the analyte's identity, structure, and physicochemical properties [1].

Due to the complexity of raw ionic current signals in nanopore experiments, a range of computational methods has been developed to extract meaningful information encoded in the data [11]. For biomarker detection, pattern recognition is essential for identifying translocation signatures. Two main challenges must be addressed: specificity and signal interpretation. Since unmodified nanopores lack intrinsic molecular specificity, strategies commonly used include DNA probes [12] or nanopore surface functionalisation [13] to selectively detect target biomarkers. Signal interpretation typically involves analysis of features such as dwell time and current amplitude [14], although more complex approaches also examine subpeak structures [15, 16], and pattern recognition using machine learning (ML) [17].

In previous work, we developed a nanopore‐based biomarker detection strategy using DNA‐barcoded probes and nanopore sequencing [18]. This method relies on sequencing a probe's barcode region and identifying a “delay” in the ionic current signal upon target binding. Events were classified as either delayed or non‐delayed, corresponding to the presence or absence of a biomarker. By quantifying the fraction of delayed events, standard curves were generated, allowing for multiplexed detection of microRNAs (miRNAs), proteins, and small molecules. In that study, signal delays were detected using a moving standard deviation (MSD) method; however, this approach relied on manually defined thresholds, limiting both sensitivity and scalability.

This study evaluates three distinct data analysis strategies for interpreting nanopore signals: (1) MSD, (2) spectral entropy (SE), and (3) a convolutional neural network (CNN). While MSD and SE are traditional, rule‐based thresholding techniques, CNNs offer a modern, machine‐learning‐based approach that automates the classification of time‐series data. This manuscript explores both paradigms, classical threshold‐based methods and advanced deep learning, reflecting the broader shift toward data‐driven methodologies in nanopore diagnostics. The performance of all three approaches was benchmarked against reverse transcription quantitative PCR (RT‐qPCR). This work introduces a new analytical framework for nanopore signal interpretation and supports the development of ML‐driven, high‐throughput tools for single‐molecule biomarker detection.

2. Results and Discussion

2.1. MSD Classification

To demonstrate the biomarker detection concept, previously developed DNA‐barcoded probes [18] were employed to detect hsa‐miR‐27b‐5p, serving as a proof of concept for evaluating different data analysis strategies (Figure 1). These DNA‐barcoded probes generate two distinct signal types in nanopore measurements: (1) non‐delayed events, indicating the absence of the target miRNA, and (2) delayed events, indicating its presence due to probe‐target binding. As an initial analysis, the MSD of the ionic current time series was used to classify events into these two categories. The underlying hypothesis was that miRNA binding to the DNA‐barcoded probe would prolong the occupancy of the nanopore by the probe‐target complex, leading to reduced signal variability, observable as a decrease in the MSD of the ionic current. In all experiments, the DNA‐barcoded probe was incubated with hsa‐miR‐27b‐5p at various concentrations, as described in the Experimental section. The resulting data were used to optimize the MSD method for detecting current delays indicative of target miRNA presence. In brief, the MSD method divides the signal into discrete bins; if the current remains below a predefined threshold across a sufficient number of bins, the event is classified as delayed. The following analyses describe the calibration of these thresholds (Figure 2).

FIGURE 1.

FIGURE 1

Workflow for biomarker detection using a nanopore assay. (A) Schematic overview of the nanopore‐based detection process. (i) Liquid biopsies collected from patients contain (ii) circulating biomarkers such as miRNAs and proteins. (iii) These biomarkers are hybridized with DNA‐barcoded probes prior to nanopore sequencing. (iv) Ionic current traces exhibit no delay in the absence of the target biomarker, but show a characteristic delay when binding occurs. (v) Computational analysis of these signals enables quantitative biomarker detection and disease classification. (B) Structure of the DNA‐barcoded probe, comprising three functional regions: (i) an adapter to regulate translocation speed, (ii) a unique 35‐base barcode sequence, and (iii) a target‐binding domain, which may be a complementary miRNA sequence or an aptamer for protein recognition. (C) Signal classification in the absence of the biomarker. (i) The DNA‐barcoded probe translocate through the nanopore without delay. To analyze these events, three computational methods were evaluated: (ii) MSD, (iii) SE, and (iv) the CNN model. MSD and SE are rule‐based approaches relying on predefined thresholds, whereas the CNN provides automated, data‐driven classification of time‐series signals. Events that do not fall below the threshold or are classified as class 1 are considered non‐delayed. (D) Signal classification in the presence of the biomarker. (i) Probe‐target binding induces a measurable delay during translocation. (ii) MSD and (iii) SE detect a drop below the threshold, while (iv) the CNN classifies the event as class 0. These events are identified as delayed, indicating successful biomarker detection.

FIGURE 2.

FIGURE 2

MSD method for miRNA detection using DNA‐barcoded probes. (A) Delay localization and reproducibility. (i) Representative examples of three randomly selected delayed events. (ii) Overlay of two delayed signals showing consistent delay onset at a fractional position of 0.7. (B) MSD signal profiles. (i) Example of a non‐delayed event, where the MSD remains above the threshold. (ii) Example of a delayed event, exhibiting a transient drop in MSD below the defined threshold of 0.003. (C) Threshold optimization. Detection thresholds ranging from 0.001 to 0.005 were evaluated across miRNA concentrations of 0, 1, and 100 nM. (D) Dynamic range analysis. The difference in delay frequency between 0 and 100 nM samples identified an optimal threshold window of 0.0025–0.004. (E) Bin number effect. Increasing the number of bins from 5 to 100 (with fixed parameters: threshold = 0.003, fractional position = 0.7, minimum delay >1 bin) revealed an inverse correlation between bin number and delay detection. (F) Minimum delay length effect. Varying the minimum number of bins required to classify a delay (0–100 bins), while keeping other parameters constant (threshold = 0.003, fractional position = 0.7, total bins = 100), showed that sensitivity increased with delay length, particularly in miRNA‐containing samples. (G) Performance metrics of the MSD classifier. Accuracy = 0.72, precision = 0.94, recall = 0.47, F1 score = 0.63, Matthews Correlation Coefficient (MCC) = 0.51, and Receiver Operating Characteristics (ROC) Area under the Curve (AUC) = 0.72, indicating high precision but limited sensitivity due to false negatives.

The fractional position of the delay, which is the delay's location within the signal expressed as a value between 0 and 1, was first assessed by plotting three representative delayed events to visualize the delay pattern (Figure 2A,i). Dynamic time warping (DTW) was used to quantify signal similarity and locate the delay region (Figure 2A,ii). Delays consistently occurred in the final quarter of the signal, with DTW confirming a substantial similarity between events (Euclidean distance = 6.85), which validated a fractional position threshold of 0.7 for delay detection. Representative MSD traces for non‐delayed (Figure 2B,i) and delayed (Figure 2B,ii) events showed that only delayed events exhibited a transient drop below the threshold of 0.003 (Supporting Information S1: Data 1). To optimize this threshold, datasets from 0, 1, and 100 nM hsa‐miR‐27b‐5p were analyzed across a range of thresholds (0.001–0.005). Higher thresholds increased the proportion of delayed events across all concentrations (Figure 2C). The dynamic range, defined as the difference in delay percentage between 100 and 0 nM, was plotted to identify the range providing the greatest separation between control and miRNA‐containing samples (Figure 2D). The plateau region, corresponding to thresholds between 0.0025 and 0.004, was selected as optimal. A threshold of 0.003 was chosen to maximize discrimination while maintaining a low false‐positive rate in controls. Next, the influence of the number of bins and the minimum delay length (“larger than x bins”) was assessed. Varying bin number from 5 to 100 (threshold = 0.003, fractional position = 0.7, minimum delay >1 bin) revealed an inverse correlation between bin number and delay percentage (Figure 2E). A bin number of 75 was selected, as it produced consistently low delay percentages under control conditions. The larger‐than‐x bins parameter was then varied (threshold = 0.003, fractional position = 0.7, 100 bins) (Figure 2F). This parameter had little effect on the control data (0 nM) but markedly influenced delay detection in the presence of miRNA. A threshold of >10 bins was selected to balance sensitivity and specificity. To validate the method, an independent dataset of 2,000 manually curated events (delayed and non‐delayed) was analyzed. The MSD classifier achieved an accuracy of 0.72, precision of 0.94, recall of 0.47, F1 score of 0.63, MCC of 0.51, and ROC AUC of 0.72 (Figure 2G). Further inspection revealed a subset of atypical events exhibiting high‐frequency noise within the delayed segment (Supporting Information S1: Data 2 and 3). These “noisy delays” elevated MSD values, leading to misclassification as non‐delayed and contributing to a high false‐negative rate.

2.2. SE Classification

SE was first introduced by Shannon in 1948 in communication theory and has since been applied to pattern recognition [19]. It can detect potential anomalies or irregularities in time series data by quantifying the level of disorder or unevenness in the frequency distribution and, therefore, was hypothesized to be able to differentiate between non‐delayed and delayed events. To achieve this, the same dataset used for MSD calibration was used to optimize SE. This dataset comprised events recorded from DNA‐barcoded probes incubated with increasing concentrations of hsa‐miR‐27b‐5p, as described in the Experimental section. The SE method was optimized to detect signal delays indicative of miRNA binding (Figure 3).

FIGURE 3.

FIGURE 3

Characterization of the SE method for miRNA detection using DNA‐barcoded probes. (A) Distribution of SE values for events recorded in the absence (control) and presence of increasing concentrations of hsa‐miR‐27b‐5p (1 and 100 nM). The classification threshold for delayed events was set at SE = 0.495, corresponding to the 25th percentile of the control distribution. (B) Representative examples of (i) a non‐delayed event and (ii) a delayed event, along with their corresponding SE traces. Delayed events exhibit a distinct drop below the SE threshold, while non‐delayed events remain above it. (C) Dynamic range analysis (delay at 100 nM minus delay at 0 nM) identified an optimal SE threshold window between 0.495 and 0.505. (D) Percentage of delayed events detected across varying SE thresholds for datasets containing 0, 1, and 100 nM miRNA, highlighting the sensitivity of threshold selection. (E) Performance metrics of the SE classifier: accuracy = 0.68, precision = 0.86, recall = 0.43, F1 score = 0.57, MCC = 0.41, and ROC AUC = 0.68. These results indicate moderate classification performance, with high precision but limited sensitivity due to a tendency to under‐detect true delayed events.

To optimize the SE method's parameters, the first step was to determine an appropriate threshold for identifying delayed events (Figure 3A). SE were plotted for three datasets corresponding to 0, 1, and 100 nM concentrations of hsa‐miR‐27b‐5p. As anticipated, SE values decreased with increasing miRNA concentration. The 25th percentile of the control (0 nM) distribution, corresponding to an SE value of 0.495, was selected as the classification threshold. Events with SE values below this threshold were classified as delayed, while those above were considered non‐delayed. Representative electrical current traces and their corresponding SE profiles for a non‐delayed (Figure 3B,i) and a delayed event (Figures 3B,ii) demonstrated that delayed signals exhibited a pronounced drop in SE (Supporting Information S1: Data 4). This observation aligns with the theoretical expectations, confirming SE as a reliable indicator of signal delay. Consistent with earlier findings (Figure 2A,ii), miRNA‐induced delays were localized to the region preceding the second C3 spacer (fractional position >0.7). Consequently, SE values from this region were used for classification. The selected threshold (SE = 0.495) was further validated by analysing the dynamic range of delay detection, defined as the difference in delay frequency between 0 and 100 nM samples. This analysis identified an optimal threshold window between 0.495 and 0.505 (Figure 3C). Varying the SE threshold across the three datasets revealed that the dynamic range increased up to the selected threshold and declined thereafter (Figure 3D), underscoring the importance of precise threshold selection. Inappropriate thresholds risk capturing barcode‐related variability rather than true delay signatures. To evaluate the classifier's performance, the same dataset of 2000 events previously analyzed with the MSD method was re‐evaluated using the SE approach (Figure 3E). The SE classifier achieved an accuracy of 0.68, precision of 0.86, recall of 0.43, F1 score of 0.57, MCC of 0.41, and ROC AUC of 0.68. Despite its utility, the SE method has several limitations. It is sensitive to spectral resolution effects, as the number of frequency components (N) used in SE computation strongly influences SE magnitude, complicating cross‐dataset comparisons. This effect is particularly pronounced when signal patterns repeat or when sampling resolution is high, leading to length‐dependent biases, where shorter signals tend to be classified as non‐delayed, while longer ones appear delayed (Supporting Information S1: Data 5). A further limitation shared by both the MSD and SE methods is their dependence on manually defined thresholds. Threshold choice substantially affects the estimated proportion of delayed events (Figure 3D), reducing reproducibility and introducing user bias. To overcome this limitation, we developed an alternative, threshold‐free classification strategy based on a CNN inspired by the LeNet‐5 architecture (Figure 4).

FIGURE 4.

FIGURE 4

Characterization of the CNN model for miRNA detection using DNA‐barcoded probes. (A) Architecture of the CNN model, inspired by the LeNet‐5 design, comprising two convolutional layers (16 and 32 filters), ReLU activations, max pooling, and a fully connected classification layer. (B) Radar plots displaying normalized performance metrics (min–max scaled per metric) for CNN models trained using six different input image resolutions (28 × 28, 56 × 56, 70 × 70, 84 × 84, 112 × 112, and 140 × 140 pixels), highlighting relative performance across resolutions. The 56 × 56 pixel format yielded optimal performance and was selected for subsequent analyses. (C) Representative examples of (i) non‐delayed and (ii) delayed events, with corresponding Grad‐CAM visualization. Heatmaps indicate regions of model attention, with delayed events showing strong activation in plateau regions associated with probe–target binding. (D) Performance metrics of the CNN classifier: accuracy = 0.99, precision = 0.99, recall = 0.99, F1 score = 0.99, MCC = 0.98, and ROC AUC = 1.00, demonstrating near‐perfect classification and superior performance relative to threshold‐based methods.

2.3. CNN Classification

The CNN model was designed to capture the unique characteristics of signals generated by DNA‐barcoded probe translocations, both with and without miRNA binding. The final architecture (Figure 4A) comprised two convolutional layers with 16 and 32 filters, respectively, each followed by ReLU activation and 2 × 2 max pooling for spatial down‐sampling. These layers enabled the extraction of local signal features such as amplitude shifts and contour shapes. The resulting feature maps were flattened and passed through a fully connected layer with 64 units before reaching the final classification layer, which outputs one of two classes: delay (class 0) or non‐delay (class 1). To optimize input resolution, CNN models were trained using six image sizes (28 × 28, 56 × 56, 70 × 70, 84 × 84, 112 × 112, and 140 × 140 pixels), and performance was evaluated using min–max normalized metrics (Figure 4B). The best performance was achieved at 56 × 56 pixels, which was therefore selected for subsequent analyses. This architecture was chosen for its simplicity, fast training time, and strong performance on low‐resolution input images (Supporting Information S1: Data 6). Gradient‐weighted Class Activation Mapping (Grad‐CAM) [20] was used to generate saliency maps to identify which regions of the input contributed most to the classification outcome. Grad‐CAM was selected over Grad‐CAM++ [21] because it produced a smoother, more coherent envelope of the signal, especially for the non‐delay representations (Supporting Information S1: Data 7). Original current traces were first corrected for signal deviation (Supplementary Data 8) and converted into 56 × 56 pixel grayscale images (Figure 4C, i,ii). These images were normalized using the mean and standard deviation from model training, then subjected to Grad‐CAM analysis to generate heatmaps highlighting areas of model focus (blue = less focus, red = high focus; Figure 4C,i,ii, Supporting Information S1 Data 9). For non‐delayed events, the model concentrated on sharp transitions and local fluctuations. For delayed events, attention was focused on extended flat regions in the center of the trace, consistent with unzipping‐induced delays. These patterns indicate that the CNN learned to distinguish between delayed and non‐delayed signals based on relevant biophysical features, rather than extraneous differences such as barcode sequence. Grad‐CAM overlays confirmed that the model's decisions were grounded in meaningful signal regions, supporting its interpretability and generalizability. To benchmark performance, the same dataset of 2000 events previously analyzed with the MSD and SE methods was re‐evaluated using the CNN (Figure 4D). The model achieved an accuracy of 0.99, precision = 0.99, recall = 0.99, F1 score = 0.99, MCC = 0.98, and ROC AUC = 1.00.

Finally, the performance of all three analytical approaches, MSD, SE, and CNN, was compared and benchmarked against RT‐qPCR, the current gold standard for miRNA detection (Figure 5).

FIGURE 5.

FIGURE 5

Comparative performance of the MSD, SE, and CNN methods for miRNA detection using DNA‐barcoded nanopore probes. (A–C) Confusion matrices illustrating classification outcomes for delay detection using (A) the MSD method, (B) the SE method, and (C) the CNN model. Both MSD and SE exhibit high specificity but reduced sensitivity, whereas the CNN achieves a balanced, accurate classification. (D) Radar plots summarizing key performance metrics‐accuracy, precision, recall, F1 score, MCC, and ROC AUC‐for all three approaches. (E) Concentration‐response curves showing the percentage of delayed events as a function of hsa‐miR‐27b‐5p concentration (0–100 nM), fitted using the Hill equation. The MSD method yielded n h = 1.43, K e = 1.30 nM, V m a x = 26.88%, R 2 = 0.996; the SE method yielded n h = 1.29, K e = 1.25 nM, V m a x = 37.33%, R 2 = 0.998; and the CNN model yielded n h = 2.25, K e = 1.19 nM, V m a x = 36.59%, R 2 = 0.965 Data are presented as mean ± SEM (n  = 3, n t o t a levent s = 1,642,070). All experiments were performed in sequencing buffer. (F) RT‐qPCR analysis showing the standard curve and the corresponding miRNA concentrations used for nanopore measurements, with decreasing cycle threshold (Ct) values at higher miRNA concentrations.

The confusion matrix showed that the MSD method tends to miss true delays, resulting in low sensitivity but high specificity and precision (Figure 5A). Similarly, the SE method performs well at identifying non‐delays but frequently fails to detect true delays, as indicated by its moderate MCC of 0.41 (Figure 5B). These findings suggest that both MSD and SE are prone to false negatives, underscoring a key limitation in sensitivity. In contrast, the CNN model exhibited a markedly improved balance between precision and recall, achieving near‐perfect classification accuracy (Figure 5C). This strong performance indicates that the deep learning approach effectively captures complex signal patterns directly from raw traces, enabling more consistent and scalable delay detection than threshold‐based methods. To quantitatively compare model performance, standard metrics such as accuracy, precision, recall, F1 score, MCC, and ROC AUC were evaluated (Figure 5D, Supporting Information S1: Data 10). The CNN model achieved values close to 1.0 across all parameters, demonstrating both exceptional predictive power and balanced classification. By contrast, the MSD and SE methods displayed lower and more variable performance, particularly in recall, confirming their tendency to underdetect true delays. Collectively, these results highlight the superior performance of the CNN model for this classification task. Comparison of the concentration‐percentage delay relationships further emphasized the differences among methods (Figure 5E). All three approaches yielded sigmoidal response curves, with percentage delay increasing in a concentration‐dependent manner, consistent with expected binding kinetics. The CNN and SE methods demonstrated greater sensitivity at lower miRNA concentrations and earlier saturation than MSD, indicating improved capacity to detect subtle delay signals. The close overlap of the fitted curves and the low standard errors across replicates indicate high reproducibility and robustness of the nanopore‐based detection strategy. Finally, RT‐qPCR validation confirmed the concentration‐dependent detection of hsa‐miR‐27b‐5p (Figure 5F). A standard curve generated from a series of target concentrations (0–50 nM) showed the expected decrease in cycle threshold (Ct) values with increasing miRNA levels. The strong correspondence between nanopore‐derived delay metrics and RT‐qPCR results supports the accuracy and quantitative reliability of the nanopore assay for label‐free miRNA detection.

3. Conclusion

This study presents a comparison of three approaches for classifying delayed and non‐delayed events in nanopore signals generated using DNA‐barcoded probes. In nanopore‐based detection, a delay in ionic current indicates target binding. Delayed events were identified using the MSD and SE methods, both of which relied on manual thresholding. While effective, both require fine‐tuning for each new analyte, limiting their scalability and introducing user bias.

To address these limitations, we developed a lightweight CNN model, inspired by the LeNet‐5 architecture, for robust binary classification of nanopore events. The model was trained on 4500 image representations of raw current traces, with 2000 manually curated for blind testing. Pre‐processing steps removed extreme outliers and preserved signal dynamics within a 200–700 pA range. Various resolutions were evaluated, with the 56 × 56 format achieving the best performance across all metrics, including precision, recall, F1 score, AUC‐ROC, and MCC. Grad‐CAM visualization confirmed that the CNN focused on plateau regions associated with delays, avoiding barcode‐related signal regions. A comparative analysis showed that the CNN consistently outperformed MSD, particularly in terms of sensitivity, validating its superior detection capability. However, the CNN model's generalization is currently limited to patterns learned from specific probe configurations, such as delayed events preceding the C3‐peak. Future improvements could involve hybrid architectures combining CNNs with time‐series models or transformer‐based encoders, and real‐time implementation for live signal classification. Moreover, the CNN model is inherently compatible with real‐time data analysis and could be integrated into online nanopore classification pipelines. Given the growing interest in point‐of‐care applications, future work will explore lightweight, deployment‐ready CNN architectures to enable rapid, real‐time decision making during nanopore sequencing.

In conclusion, this study demonstrates that nanopore‐based detection, combined with advanced computational analysis, enables robust, quantitative miRNA profiling. Among the three evaluated models, the CNN approach achieved the highest accuracy and sensitivity, outperforming MSD and SE in classifying true delayed events and capturing concentration‐dependent signal changes. The strong agreement between nanopore‐derived metrics and RT‐qPCR validation further supports the method's quantitative reliability across a wide dynamic range.

4. Experimental Section

4.1. Nanopore Experiments

All nanopore measurements were conducted using the MinION Mk1B device (Oxford Nanopore Technologies, UK) equipped with R10.4.1 flow cells. The device was connected to a computer with a dedicated NVIDIA graphics card. Prior to each experiment, membrane integrity was assessed using MinKNOW software (version 24.11.10 or later, ONT, UK).

4.2. DNA‐Barcoded Probe and miRNA Sequence

The DNA‐barcoded probe was designed following the protocol described by Koch et al. [18]. Both the probe and its target miRNA (hsa‐miRNA‐27b‐5p) were synthesized by Integrated DNA Technologies (IDT) (Table 1).

TABLE 1.

DNA‐barcoded probe and hsa‐miRNA‐27b‐5p sequence. 5Phos: Phosphorylation on 5’end, iSpC3: Internal C3 spacer.

Sequence (5’‐3’)
DNA‐barcoded probe 5Phos/CCTAGTTCCGCTGGGATCGCTACGCCTTCGGCTCGTAATCATAGTCGAGT/iSpC3//iSpC3/GTTCACCAATCAGCTAAGCTCT
hsa‐miRNA‐27b‐5p AGAGCUUAGCUGAUUGGUGAAC

4.3. Sample Preparation

The DNA‐barcoded probes (0.225 µL, 20 µM) were incubated with a ligation c‐strand (0.6 µL, 25 µM; 5'‐CCCAGCGGAACTAGGA‐3') at room temperature (RT) for 1 h. Subsequently, the mixture was combined with 7.5 µL ligation adapter from the ligation sequencing kit SQK‐LSK114 (ONT, UK) and 8.325 µL TA‐ligase (M0367S, New England Biolabs, USA), centrifuged for 1 min at 3000 rpm, and incubated for 20 min at RT on a HulaMixer.

Purification was performed using the solid‐phase reversible immobilization (SPRI) method with Ampure XP beads (A63880, Beckman Coulter, USA). Beads (23.3 µL) were added, followed by two wash steps with 15 µL short fragment buffer (ONT, UK). The beads were then resuspended in 20 µL elution buffer (ONT, UK), releasing the ligated DNA‐barcoded probes into solution. Magnetic separation was used to remove the beads, yielding a clear solution of purified, adapter‐ligated probes.

4.4. Loading the Flow Cell

Prior to each run, flow cells were flushed according to the manufacturer's instructions (ONT, UK). The sequencing mix was prepared in DNA LoBind tubes (Eppendorf, Germany) by combining 37.5 µL sequencing buffer, 12 µL purified DNA‐barcoded probes, and 25.5 µL library solution (ONT, UK). For miRNA experiments, the target miRNA was added to the probe mix at concentrations ranging from 0 to 100 nM and incubated for 30 min at RT before loading. All experiments were performed in triplicate.

4.5. Nanopore Sequencing and Basecalling

Data acquisition was performed using MinKNOW software (version 24.11.10 or later; ONT, UK). A sampling rate of 5 kHz and a temperature of 37°C were used across all nanopore sequencing runs. Basecalling was performed using the super high‐accuracy algorithm provided by ONT.

4.6. Barcode Alignment

Basecalled events were aligned to the reference barcode sequence (GGGATCGCTACGCCTTCGGCTCGTAATCATAGTCGAGT) using a local alignment algorithm with scoring parameters: match +5, mismatch ‐4, and gap ‐8. Events were further filtered based on the following criteria:

  1. Minimum alignment threshold: at least 15 bases matched to the reference.

  2. Mismatch threshold: no more than 3 mismatches allowed.

  3. Initial sequence accuracy: alignment must begin with the barcode's initial “GGG” motif.

  4. Initial base mismatch filter: a maximum of 3 mismatches permitted within the first 10 bases.

Filtered events were subsequently analysed using the classification algorithms to identify signal delays.

4.7. Data Analysis

Three methods for detecting the delay were tested, including (1) the MSD of the signal, (2) SE, and (3) a CNN model.

4.7.1. MSD

The first approach for detecting a delay in the electrical current signal was based on determining the MSD of the signal. The MSD is defined as:

σ=1N1i=1Nx1μ2

where x is a vector containing N scalar observations and µ is the mean of x. Each standard deviation was calculated over a sliding window of length k across neighboring elements of xi. The MSD was calculated over k (number of bins), as well as other parameters outlined in Table 2. These parameters were used to classify an event as delayed or non‐delayed.

TABLE 2.

MSD thresholds. Standard MSD thresholds were used in all experiments to determine whether an event was delayed or non‐delayed.

Parameter Threshold
1 σ <0.003
2 Total number of bins 75
3 k (bins) 10
4 Delay fractional position 0.7

The parameters indicate (1) a minimum drop of the MSD to 0.003, (2) the total number of bins used for each event, (3) a length condition on the delay pattern, and (4) the fractional position of the delay within the MSD profile.

4.7.2. SE

The second approach for detecting a delay in the electrical current signal was based on determining the SE, which quantifies the power distribution within a signal by encoding its spectral attributes. Using the power spectrum of a signal, SE is defined as:

SEN=m=1NPmlog2Pmlog2N

with

Pm=Smj=1NSj,m=1NSm=1

where N is the number of frequency components, S(m) = |X(m)|2 and X(m) is the discrete Fourier transformation of the signal x(n). The denominator, log2(N), serves as a normalization factor, ensuring that the SE is constrained within the range [0, 1]. This normalization enables comparability between signals of varying lengths. It represents the maximum theoretical value of SE, achieved when all N frequencies contribute equally to the power spectrum. In the definition of SE provided, P(m) quantifies the percentage contribution of the mth frequency to the spectrum, essentially representing the probability that the signal contains that specific frequency. In the time domain, a flat signal corresponds to a power spectrum with few distinct frequency components, resulting in low SE values. Conversely, SE increases for signals with significant fluctuations in amplitude over time.

To compute SE in MATLAB, the pentropy function was used. This function operates on a vector x sampled at a predefined rate. It first calculates the spectrogram of the input time series and subsequently computes the Shannon entropy. Additionally, it returns instantaneous SEs, providing SE as a function of time. If a time‐frequency spectrogram is known, denoted as S(t, f), the probability distribution of frequencies in the power spectrum can be described according to:

Pm=tSt,mftSt,f

To compute instantaneous SE, the probability distribution at time t becomes:

Pt,m=St,mfSt,f

Therefore, the normalized SE at time t reads as:

SENnt=m=1NPt,mlog2Pt,mlog2N

Instantaneous SEs were employed in this manuscript throughout. A threshold of 0.495 was applied: events with SE values dropping below this threshold were classified as delayed.

4.7.3. Convolutional Neural Network

The data for training, testing, and validation of the CNN model were selected from sequencing runs using various DNA‐barcoded probes with varying miRNA concentrations (0, 1, 10, and 100 nM) (Supporting Information S1: Data 11 and 12). These reads were first aligned to their barcode sequences and then classified as delayed or non‐delayed events using the MSD method. The raw current traces were plotted in Python at a resolution of 240 × 240 px, saved into a single directory, and randomly shuffled to eliminate biases. All plotted events were then subjected to manual inspection, with delayed and non‐delayed events classified based on visual assessment of the current traces. This manual filtering step was independently performed and reviewed by three authors, and only events with consensus classification were retained in the final datasets. A total of 4500 images were curated, representing both classes (delayed vs. non‐delayed), and were distributed across the training, testing, and validation datasets, as shown in Table 3.

TABLE 3.

Distribution of delayed and non‐delayed events for training, testing, and validation of the CNN model.

Class Training Testing Validation
Delay 1000 1000 250
Non‐Delay 1000 1000 250

The classifier architecture was inspired by the LeNet‐5 convolutional neural network [22], originally designed for handwritten digit recognition. The model was implemented using PyTorch (v2.1.2). Different image resolutions were evaluated to determine the optimal input dimension. Both the test and blind datasets (see section“ 4.8”) were used to assess model performance. All statistical metrics, including accuracy, precision, recall, F1 score, MCC, and AUC ROC, were calculated using scikit‐learn (v1.5.2).

4.8. Data to Compare Models

An additional 2000 images (1000 delayed, 1000 non‐delayed) were randomly selected without using the MSD method, to test the model independently of the labelling pipeline (Supporting Information S1: Data 11). For this purpose, the events were plotted using the same procedure described above and subsequently filtered exclusively by manual inspection, following the three‐author consensus process. This dataset was used to evaluate the viability of MSD and the generalization performance of the model and was labelled as “Blind data.”

4.9. Statistical Metrics

Recall measures the model's ability to correctly identify true positive (TP) events, which were events that were truly delayed. It is defined as:

Recall=TruePositivesTPTruePositivesTP+FalseNegativesFN

where FN refers to false negatives, or delayed events that were incorrectly classified as non‐delayed.

Precision is measured by the correctness of a model's prediction and is defined as:

Precision=TruePositivesTPTruePositivesTP+FalsePositivesFP

A high precision score indicated that the models produced few FPs and were conservative in assigning the “delay” label.

F1 Score is the harmonic mean of precision and recall, providing a single score that balances sensitivity and correctness. It is calculated according to:

F1Score=2×Precision×RecallPrecision+Recall

A high F1 score indicated that the model accurately captured true delay events (TP) while also minimizing the false positives (FP).

MCC considers all elements of the confusion matrix (TP, TN, FP, and FN) and is defined as:

MCC=TP·TNFP·FNTP+FPTP+FNTN+FNTN+FN

MCC is considered one of the most reliable metrics for evaluating binary classification, especially when the classes are balanced.

AUC‐ROC assesses the model's discriminatory power independent of a specific decision threshold. This metric evaluates the true positive rate (TPR) against the false positive rate (FPR) across all thresholds and is defined as:

AUC=01TPRFPRdFPR

An AUC of 1 indicates perfect separation between classes, while 0.5 indicates random guessing.

Accuracy is a common metric and is defined as the ratio of correctly classified instances to the total number of predictions:

Accuracy=TP+TNTP+TN+FP+FN

4.10. Statistical Tests

For the radar plot shown in Figure 4B, performance metrics were normalized using min‐max scaling. For each metric, the minimum observed value across classifiers was mapped to 0 and the maximum observed value to 1, with all intermediate values linearly scaled within this range. This approach preserves relative differences between classifiers while enabling direct visual comparison across metrics with different units and dynamic ranges.

For the data presented in Figure 5E, measurements were normalized to the 0 nM control prior to model fitting. Specifically, the % delay measured at each miRNA concentration was baseline‐corrected by subtracting the % delay obtained for the corresponding 0 nM condition, such that all concentration‐response curves were expressed relative to the control baseline.

Dose‐response relationships were subsequently fitted using nonlinear least‐squares regression to a Hill equation, and half‐maximal effective concentrations (EC50, reported as Kd) were extracted from the fitted models. Curve fitting was performed without explicit weighting. Error bars shown in the figures reflect experimental variability across independent sequencing runs. Comparisons of fitted parameters between classifiers are descriptive in nature, and no formal statistical hypothesis testing was performed on EC50 values.

For the concentration‐response curves in Figure 5E, outliers were identified prior to fitting using the median absolute deviation (MAD) criterion. Measurements deviating from the median by more than three times the MAD were excluded from downstream analysis. Mean values were calculated from the remaining replicates, and variability is reported as the standard error of the mean (SEM), as indicated in the figure legend.

All statistical analyses and curve fitting were performed using Python.

4.11. RT‐qPCR

Input RNA was prepared using 1–10 ng of total RNA per 15 µL RT reaction. The components of the RT kit (TaqMan MicroRNA Reverse Transcription Kit, 4366596, Thermo Fisher Scientific, USA) were thawed and kept on ice. The RT primer (TaqMan Micro RNA Assay, 002174, Thermo Fisher Scientific, USA) was thawed on ice, vortexed briefly, and then centrifuged to collect the contents at the bottom of the tube. The RT reaction mix was prepared according to the manufacturer's instructions. The RT reaction mix was then centrifuged and placed on ice. Subsequently, 7 µL of RT reaction mix was combined with 5 µL of total RNA in a reaction tube (MicroAmp Fast 8‐Tube Strip, 0.1 mL, 4358293, ThermoFisher, USA). The tube contents were mixed and centrifuged. Finally, 3 µL of 5 × RT Primer (TaqMan Micro RNA Assay, 002174, Thermo Fisher Scientific, USA) were added to the reaction tube, which was then centrifuged and placed on ice before reverse transcription. To perform the RT reaction, the reaction tubes were placed in a PCR machine (StepOnePlusTM, Thermo Fisher Scientific, USA) and incubated using standard cycling conditions, a reaction volume of 15 µL, and the manufacturer's settings.

To prepare the PCR reaction mix, the PCR Master Mix (TaqMan Fast Advanced Master Mix, no UNG, A44359, Thermo Fisher Scientific, USA) was thawed and mixed thoroughly. Then, the PCR reaction mix was prepared according to the manufacturer's instructions (1 µL TaqMan Small RNA Assay (20x), 10 µL PCR Master mix, and 4 µL nuclease‐free water (NFW)). 15 µL of PCR Reaction mix was transferred into each well of an optical reaction plate (MicroAmp Fast 96‐Well Reaction Plate, 4346907, Applied Biosystems, USA). Then, 5 µL of cDNA template or NFW (control) was added to each well. The plate was sealed with optical adhesive film and centrifuged briefly to bring the contents to the bottom of the wells. The manufacturer's instructions were used to run the RT‐qPCR. Experiments were analyzed using StepOnePlus software (Thermo Fisher Scientific, USA).

Funding

A.P.I. and J.B.E. acknowledge support from the Biotechnology and Biological Sciences Research Council (BBSRC) [grant BB/R022429/1], the Engineering and Physical Sciences Research Council (EPSRC) [grant EP/V049070/1], and the Analytical Chemistry Trust Fund [grant 600322/05]. This project has also received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program [grant agreements No. 724300 and 875525]. N.G. is the recipient of the CRI/Esther M. Baird Technology Impact Award [grant CRI5416]. C.K., A.P.I., and J.B.E. also acknowledge support from CRI5416. C.K. further acknowledges an EPSRC Doctoral Prize Fellowship and the Seedcorn Award by the Rosetrees Trust and Sepsis Research FEAT. S.S. is supported by a studentship funded by Oxford Nanopore Technologies and the Institute of Chemical Biology at Imperial College London. V.M. N. received funding from the British Heart Foundation (BHF), grant number FS/4yPhD/F/24/34213. V.M.N. is supported by a British Heart Foundation (BHF) and National Heart & Lung Institute (NHLI) studentship. V.M.N. and J.G. acknowledge support by the Cellular Mechanosensing and Functional Microscopy Centre of Excellence at Imperial College London.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Supporting File: smtd70480‐sup‐0001‐SuppMat.pdf.

SMTD-10-e02335-s001.pdf (11.7MB, pdf)

Acknowledgements

All flow cells were provided by Oxford Nanopore Technologies.

Contributor Information

Joshua B. Edel, Email: joshua.Edel@imperial.Ac.uk.

Aleksandar P. Ivanov, Email: alex.Ivanov@imperial.Ac.uk.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • 1. Xue L., Yamazaki H., Ren R., Wanunu M., Ivanov A. P., and Edel J. B., “Solid‐state Nanopore Sensors,” Nature Reviews Materials 5 (2020): 931–951, 10.1038/s41578-020-0229-6. [DOI] [Google Scholar]
  • 2. Miles B. N., Ivanov A. P., Wilson K. A., Dogan F., Japrung D., and Edel J. B., “Single‐Molecule Sensing With Solid‐State Nanopores: Novel Materials, Methods, and Applications,” Chemical Society Reviews 42 (2012): 15–28. [DOI] [PubMed] [Google Scholar]
  • 3. Lim F. S., González‐Cabrera J., Keilwagen J., Kleespies R. G., Jehle J. A., and Wennmann J. T., “Advancing Pathogen Surveillance by Nanopore Sequencing and Genotype Characterization of Acheta Domesticus Densovirus in Mass‐Reared House Crickets,” Scientific Reports 14 (2024): 8525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Robles‐Martín A., Amigot‐Sánchez R., Fernandez‐Lopez L., et al., “Sub‐Micro‐ And Nano‐Sized Polyethylene Terephthalate Deconstruction With Engineered Protein Nanopores,” Nature Catalysis 6 (2023): 1174–1185. [Google Scholar]
  • 5. Wang J., Li M., Zhang C., et al., “Identification of Isomerically Diverse Ginsenosides Using Engineered Aerolysin Nanopore via Non‐Translocation Blockade Sensing,” Angewandte Chemie 137 (2025): 202506741. [DOI] [PubMed] [Google Scholar]
  • 6. Imburgia C., Organick L., Zhang K., et al., “Random Access and Semantic Search in DNA Data Storage Enabled by Cas9 and Machine‐guided Design,” Nature Communications 16 (2025): 6388, 10.1038/s41467-025-61264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Lopez R., Chen Y.‐J., Dumas Ang S., et al., “DNA Assembly for Nanopore Data Storage Readout,” Nature Communications 10 (2019): 2933, 10.1038/s41467-019-10978-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Ying Y.‐L., Hu Z.‐L., Zhang S., et al., “Nanopore‐based Technologies Beyond DNA Sequencing,” Nature Nanotechnology 17 (2022): 1136–1142, 10.1038/s41565-022-01193-2. [DOI] [PubMed] [Google Scholar]
  • 9. Cai S., Sze J. Y. Y., Ivanov A. P., and Edel J. B., “Small Molecule Electro‐optical Binding Assay Using Nanopores,” Nature Communications 10 (2019): 1797, 10.1038/s41467-019-09476-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cai S., Pataillot‐meakin T., Ladame S., et al., “Single‐molecule Amplification‐free Multiplexed Detection of Circulating microRNA Cancer Biomarkers From Serum,” Nature Communications 12 (2021): 3515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Zhao X., Zhang Y., and Qing G., “Nanopore Toward Genuine Single‐Molecule Sensing: Molecular Ping‐Pong Technology,” Nano Letters 25 (2025): 3692–3706. [DOI] [PubMed] [Google Scholar]
  • 12. Sze J. Y. Y., Ivanov A. P., Cass A. E. G., and Edel J. B., “Single Molecule Multiplexed Nanopore Protein Screening in human Serum Using aptamer Modified DNA Carriers,” Nature Communications 8 (2017): 1552, 10.1038/s41467-017-01584-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Zhang X., Galenkamp N. S., Van Der Heide N. J., Moreno J., Maglia G., and Kjems J., “Specific Detection of Proteins by a Nanobody‐Functionalized Nanopore Sensor,” ACS Nano 17 (2023): 9167–9177, 10.1021/acsnano.2c12733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Raveendran M., Lee A. J., Sharma R., Wälti C., and Actis P., “Rational Design of DNA Nanostructures for Single Molecule Biosensing,” Nature Communications 11 (2020): 4384, 10.1038/s41467-020-18132-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Bell N. A. W. and Keyser U. F., “Digitally Encoded DNA Nanostructures for Multiplexed, Single‐molecule Protein Sensing With Nanopores,” Nature Nanotechnology 11 (2016): 645–651, 10.1038/nnano.2016.50. [DOI] [PubMed] [Google Scholar]
  • 16. Ren R., Cai S., Fang X., et al., “Multiplexed Detection of Viral Antigen and RNA Using Nanopore Sensing and Encoded Molecular Probes,” Nature Communications 14 (2023): 7362, 10.1038/s41467-023-43004-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Taniguchi M., Minami S., Ono C., et al., “Combining Machine Learning and Nanopore Construction Creates an Artificial Intelligence Nanopore for Coronavirus Detection,” Nature Communications 12 (2021): 3726, 10.1038/s41467-021-24001-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Koch C., Reilly‐O'Donnell B., Gutierrez R., et al., “Nanopore Sequencing of DNA‐Barcoded Probes for Highly Multiplexed Detection of MicroRNA, Proteins and Small Biomarkers,” Nature Nanotechnology 18 (2023): 1483–1491, 10.1038/s41565-023-01479-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Shannon C. E., “A Mathematical Theory of Communication,” Bell System Technical Journal 27 (1948): 379–423, 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]
  • 20. Selvaraju R. R., Cogswell M., Das A., Vedantam R., Parikh D., and Batra D., “Grad‐CAM: Visual Explanations From Deep Networks via Gradient‐Based Localization,” paper presented at IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 22–29, 2017, 10.1109/ICCV.2017.74. [DOI] [Google Scholar]
  • 21. Chattopadhay A., Sarkar A., Howlader P., and Balasubramanian V. N., “Grad‐CAM++: Generalized Gradient‐Based Visual Explanations for Deep Convolutional Networks,” paper presented at IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, March 12–15, 2018, 10.1109/WACV.2018.00097. [DOI] [Google Scholar]
  • 22. Lecun Y., Eon Bottou L., Bengio Y., and Haffner P., “Gradient‐Based Learning Applied to Document Recognition,” Proceedings of the IEEE 86 (1998), 2278–2324, 10.1109/5.726791. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting File: smtd70480‐sup‐0001‐SuppMat.pdf.

SMTD-10-e02335-s001.pdf (11.7MB, pdf)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Small Methods are provided here courtesy of Wiley

RESOURCES