Stoichiometry-preserving and stochasticity-aware identification of m6A from direct RNA sequencing

Fangyuan Wang; Menglu Chen; Jinyi Li; Yang Yu; Zhenxing Guo; Meng Zou

doi:10.1093/bib/bbag132

. 2026 Apr 1;27(2):bbag132. doi: 10.1093/bib/bbag132

Stoichiometry-preserving and stochasticity-aware identification of m6A from direct RNA sequencing

Fangyuan Wang ¹, Menglu Chen ², Jinyi Li ³, Yang Yu ^4,⁵, Zhenxing Guo ^6,^✉, Meng Zou ^7,^✉

PMCID: PMC13043019 PMID: 41921196

Abstract

N6-methyladenosine (m6A) is the most prevalent internal modification in mRNA and plays a critical role in post-transcriptional regulation. Despite the development of various detection methods, accurate and quantitative detection of m6A modifications at single-molecule and single-nucleotide resolution remains challenging. Many existing approaches struggle with limited resolution, inaccurate quantification, or dependence on sequence motifs. Here, we present m6Astorm, a novel computational framework for stoichiometry-preserving and stochasticity-aware identification of m6A. m6Astorm encodes the signal features (signal intensity and maximum instantaneous amplitudes derived from raw signal) and sequence context via a hybrid architecture built from convolutional neural networks and bidirectional long short-term memory networks. Trained with quantitative labels from GLORI, m6Astorm could achieve motif-independent detection of m6A modifications at single-molecule resolution by a dual-objective optimization: (i) minimizing binary cross-entropy loss for methylation state classification at molecule level, regularized by a confidence-aware penalty term suppressing low-certainty predictions; (ii) minimizing the stoichiometry bias for accurate quantitative at the nucleotide level. m6Astorm resolves co-methylation events at single-molecule, revealing coordination in m6A regulatory patterning across transcriptomes. Systematic evaluation across Hela and mouse embryonic stem cell datasets demonstrates robust cross-sample generalizability, evidenced by high prediction power (Recall), low false positive rate, accurate stoichiometric, and high area under the receiver operating characteristic curve/area under the precision–recall curve in transcriptome-wide modification profiling.

Keywords: m6A modification, m6A stoichiometric, stoichiometry-preserving and stochasticity-aware, dual-objective optimization, CNN–BiLSTM

Introduction

RNA modifications, also known as epitranscriptomic modifications, are chemical alterations that occur on RNA molecules. To date, over 170 distinct types of modifications have been identified across various RNA species [1]. Among them, N6-methyladenosine (m6A) is the most prevalent and extensively studied internal mRNA modification, typically enriched within specific DRACH motifs (D = A/G/U, R = A/G, and H = A/C/U) [2]. m6A modification can dynamically regulate biological processes, such as RNA stability [3, 4], splicing [5], and translation [6], playing a crucial role in the regulation of gene expression. Growing evidence reveals that the dysregulation of m6A is closely associated with a variety of cancers and other diseases [7, 8], highlighting the importance of accurate detection and characterization of m6A modification for understanding its functional roles in pathophysiological processes.

A majority of sequencing methods have been developed for m6A profiling, which can be broadly classified into two main categories. The first category relies on antibody enrichment or chemical derivatization to identify modified sites. For instance, immunoprecipitation-based techniques, such as MeRIP-seq [9], miCLIP [10], and m6ACE-seq [11] leverage the binding specificity of antibodies against methylated nucleotides to localize m6A modifications. However, non-stoichiometric detection approaches treat m6A modifications as discrete present/absent events, leading to a loss of quantitative resolution, obscuring biologically meaningful variation, and masking graded regulatory dynamics [12, 13]. More recently, GLORI [2] employs a selective deamination reaction to enable absolute quantification of m6A at single-nucleotide resolution. In this study, we integrate the quantitative information from GLORI to enhance the computational accuracy and resolution of m6A identification. However, NGS-based approaches generally require reverse transcription and amplification steps, limiting their single-molecule resolution and introducing systematic biases, especially in complex biological samples—thereby constraining their applicability in high-resolution dynamic RNA modification analyses.

In contrast, Oxford Nanopore’s direct RNA sequencing (dRNA-seq) bypasses reverse transcription and amplification, directly detecting native RNA molecules and capturing current signal changes caused by RNA modifications [14]. This technology enables unbiased, single-molecule detection of RNA modifications, which has spurred the development of computational tools to identify m6A. Early methods like EpiNano [15] and ELIGOS [16] employ comparative strategies between wild-type and modification-depleted samples and, infer modification sites based on base-calling errors. Although foundational, these methods suffer from limited accuracy and low resolution. Recent supervised learning approaches, such as m6Anet [17], mAFiA [18], and CHEUI [19], demonstrate significant improvements. m6Anet uses a multiple-instance learning framework trained on biological data, but its low read level threshold (0.0333) makes the inference less reliable in real applications. mAFiA uses synthetic oligonucleotides with predefined methylation states for adenosine within six known motifs for training. Although the training data attempt to mimic in vivo-derived sequences, the restricted set of motifs constrains the overall sequence diversity and thus still suffers from high false positives. CHEUI employs a two-stage neural network to detect both m6A and m5C. Its thresholds reduce false positives but lead to low sensitivity.

To address the limitations of current RNA modification detection methods, we propose m6Astorm, a deep learning framework that enables stoichiometry-preserving and stochasticity-aware identification of m6A modifications at single-molecule and single-nucleotide resolution. m6Astorm is trained on matched dRNA-seq and GLORI data from HEK293T cells, using a dual-objective strategy that jointly optimizes the binary cross entropy loss for classifying methylated status at single-molecule level, regularized by a confidence-aware penalty term suppressing low-certainty predictions, and the stoichiometry bias for accurate quantitative at the single-nucleotide level. Therefore, m6Astorm can achieve motif-independent detection of m6A modifications at single-molecule and accurate m6A stoichiometry at single-nucleotide. Benchmarking on HEK293T shows that m6Astorm achieves high accuracy in the identification of m6A at single-molecule, and low false positive rate (FPR), high Recall, and accurate stoichiometry at single-nucleotide. Furthermore, m6Astorm demonstrates robustness in cross cell-line generalization, validated through orthogonal benchmarking in HeLa and mouse embryonic stem cell (mESC) with high predictive concordance.

Results

m6Astorm identifies m6A modification through a dual-objective strategy

m6Astorm, a stoichiometry-preserving and stochasticity-aware computation model, enables identification of m6A modification at single-molecule and single-base resolution through a dual-objective strategy (Fig. 1a). Raw nanopore signals are preprocessed using Guppy-minimap2 [20]-Nanopolish [21] pipeline for feature extraction, yielding signal intensity (mean, standard deviation, length, maximum, minimum, and median) and maximum instantaneous amplitudes (MIAs) derived from intrinsic mode functions (IMFs), which are generated from empirical mode decomposition (EMD) with the raw signal. These signal features and sequence context are encoded using a hybrid CNN–BiLSTM neural architecture, followed by a fully connected layer that generates modification probabilities for each molecule at candidate site. To achieve high-precision training, m6Astorm jointly optimizes two complementary objectives: (i) minimizing a binary cross-entropy loss for classifying methylated states (methylated/unmethylated) at single-molecule level, regularized by a distribution penalty to constrain intermediate prediction probabilities ( Inline graphic ); (ii) minimizing the mean squared error loss between predicted stoichiometries against GLORI-derived ground-truth for accurate stoichiometry at single-nucleotide level. m6Astrom is implemented in Python and available from GitHub (https://github.com/HUSTzoulab/m6Astorm).

Structure and input features of the m6Astorm model. (a) Raw Nanopore signals are processed via the Guppy-minimap2-Nanopolish pipeline to extract features, including signal intensity (mean, standard deviation, length, maximum, minimum, and median) and MIA features derived from EMD with raw signal. These signal features and sequence context are jointly encoded by a hybrid convolutional neural network–bidirectional long short-term memory (CNN–BiLSTM) model to generate the methylation probabilities for each read. A dual-loss strategy is adopted: (i) Read-level classification for identify modified status with a penalty to minimize prediction uncertainty, and (ii) site-level stoichiometric regression for quantitative accuracy. (b) The distribution of MIAs discriminates between m6A-modified and unmodified reads at GGACT loci. Boxplots show the median and interquartile range (IQR) per group, with outliers hidden. The y-axis shows feature values after RobustScaler normalization.

To validate the discriminatory power of MIAs, we assess their distributional differences between modified and unmodified reads. Statistical analysis reveals significant differences ( Inline graphic , Mann–Whitney U test; Fig. 1b), confirming their efficacy in classifying methylation states. This validates discriminative capacity contributes critically to m6Astorm’s accurate identification and stoichiometric quantification of m6A modifications in diverse biological samples.

m6Astorm enables accurate identification and quantification of m6A modifications

m6Astorm is first evaluated on the HEK293T cell line. To prevent data leakage, genomic regions with coverage Inline graphic are partitioned into independent training (80%) and testing (20%) sets. For benchmarking, we compare m6Astorm against m6Anet, mAFiA, CHEUI, TandemMod [22] and m6Aiso [23] on an independent HEK293T test set (5255 sites). GLORI data are used as the ground truth, where only sites with stoichiometry Inline graphic are reported as positives (GLORI-positive), and all other sites are treated as negatives (GLORI-negative). Performance is assessed using four metrics: (i) read level AUROC/AUPR; (ii) the site level Pearson correlation coefficient (PCC) between predicted stoichiometries and GLORI-derived ground-truth; (iii) the Recall of detecting methylation among GLORI-positive sites; and (iv) the FPR among GLORI-negative sites.

To quantify m6Astorm’s site performance, a comparative summary of three key evaluation metrics (PCC, Recall, and FPR) is conducted. m6Astorm achieves a PCC of 0.830 (Recall 91%, FPR 3.41%), compared with that of 0.801 (Recall 67%, FPR 37.59%) from m6Anet, 0.738 (Recall 54%, FPR 48.62%) from CHEUI, and 0.846 (Recall 59%, FPR 61.59%) from mAFiA (Fig. 2b). Additionally, TandemMod achieved a PCC of 0.733 with a recall of 92% and an FPR of 55.68%, while m6Aiso showed a PCC of 0.569 with a recall of 21% and an FPR of 11.09%. The results demonstrate that m6Astorm outperforms m6Anet, CHEUI, and m6Aiso in all three evaluation metrics. Although the PCC from m6Astorm is slightly lower than mAFiA, the conditional performance gain from mAFiA originates from its motif-dependent training process. mAFiA trains separate models for the top six enriched motifs (AGACT, GAACT, GGACA, GGACC, GGACT, and TGACT) and only predicts methylation for a subset of sites containing targeted motifs. When restricting the analysis of m6Astorm to the same subset, the PCC rises from 0.830 to 0.860, surpassing that of mAFiA (Fig. 2a). In addition, the significantly greater Recall and substantially reduced FPR of m6Astorm relative to mAFiA further underscore its robustness in precisely identifying high-confidence modification sites. Collectively, these results demonstrate that m6Astorm outperforms the other methods in terms of predictive accuracy, recall, and false positive control, highlighting its significant advantage in m6A site detection tasks.

Performance evaluation of m6Astorm on the HEK293T test set. (a) Correlation of the stoichiometry obtained by m6Astorm and GLORI. The left panel shows the overall PCC(PCC = 0.830, Recall = 91%); the right panel restricts PCC analysis to the top six m6A motifs (AGACT, GAACT, GGACA, GGACC, GGACT, and TGACT) (PCC = 0.860, Recall = 66%). (b) Performance benchmarking of m6Astorm against baseline methods (m6Anet, CHEUI, mAFiA, TandemMod, and m6Aiso) at site level using an independent HEK293T dataset, with evaluation metrics: Recall (detection sensitivity), PCC (stoichiometric accuracy), and 1-FPR. (c) Molecule-level classification performance for m6Astorm, m6Anet, CHEUI, TandemMod, and m6Aiso. ROC and PR curves for read-level methylation state predictions in HEK293T test data. m6Astorm outperforms in both area under the receiver operating characteristic curve (AUROC) and area under the precision–recall curve (AUPR). (d) Distribution of predicted read level methylation probabilities for all GLORI-positive sites in the HEK293T test set. m6Astorm and TandemMod show a clear bimodal distribution centered near 0 and 1, indicating confident separation of modified and unmodified reads. In contrast, m6Anet, CHEUI, and m6Aiso exhibit broader unimodal distributions. (e) Case studies of predicted probability distributions at two representative sites. The upper panel illustrates Chr1: 206732855, a high-confidence site with , where m6Astorm predictions form a sharply separated bimodal pattern. The lower panel shows Chr11: 44244629 with moderate , revealing a shift toward intermediate probabilities across methods.

To quantify m6Astorm’s single-molecule classification performance, we conduct receiver operating characteristic (ROC) and precision–recall (PR) curve analyses, computing the corresponding area under the curve (AUC) metrics. Comparative assessment reveals m6Astorm’s superior predictive capability, achieving an AUROC of 0.992 and an AUPR of 0.992 (Fig. 2c). This significantly surpasses the performance of existing tools (m6Anet: AUROC 0.937, AUPR 0.970; CHEUI: AUROC 0.491, AUPR 0.513; TandemMod: AUROC 0.855, AUPR 0.812; m6Aiso: AUROC 0.845, AUPR 0.788), demonstrating enhanced robustness in discriminating methylation states at the individual read level. Notably, mAFiA is excluded from this evaluation as its model architecture fundamentally lacks single-molecule resolution outputs, precluding direct comparison.

To assess the high accuracy of sites revealed by m6Astrom, we analyze the overlap among high-confidence m6A sites (coverage Inline graphic 20 and stoichiometry 0.5) identified by different tools. Venn diagram analysis reveals significant concordance between m6Astorm, m6Anet, and GLORI, indicating robust cross-validation of shared modified loci (Supplementary Fig. 1a). Furthermore, we characterize the transcriptome-wide distribution of these sites using metagene profiling. Consistent with GLORI annotations, m6Astorm predictions show a characteristic enrichment at the Inline graphic UTR, aligning with previously reported distribution patterns (Supplementary Fig. 1b).

To systematically evaluate the contribution of each major component in m6Astorm, we performed a series of ablation experiments on HEK293T data comparing multiple architectural variants (Supplementary Table 1). Removing EMD features reduced recall by Inline graphic 4.3%, indicating their important role in detection sensitivity. Replacing the bidirectional LSTM with a unidirectional version decreased both PCC and recall, highlighting the importance of bidirectional context. Simplified architectures (pure CNN or pure BiLSTM) led to further performance loss, confirming the complementary functions of local feature extraction and long-range dependency modeling. Omitting the penalty loss lowered AUROC and recall, supporting its contribution to model robustness. A CNN–Transformer variant did not outperform the selected architecture, suggesting that added complexity does not guarantee improved performance in this context. These results collectively validate the integrated design of m6Astorm for accurate m6A detection in direct RNA nanopore data.

To systematically evaluate the potential impact of false-positive signals in non-DRACH regions of the GLORI dataset, we retrained m6Astorm exclusively on DRACH motif-containing sites from the HEK293T cell line. The retrained model achieved strong predictive performance on a held-out test set (PCC = 0.841, Recall = 93%, and FPR = 12.98%), comparable with that of the original model, and continued to outperform established methods (CHEUI: PCC = 0.75, Recall = 54%, FPR = 44.34%; mAFiA: PCC = 0.846, Recall = 67%, FPR = 60.61%; m6Anet: PCC = 0.801, Recall = 77%, FPR = 37.59%) (Supplementary Table 2). These results indicate that m6Astorm’s detection capability for non-DRACH sites is not driven by false-positive training signals and demonstrate the robustness of its predictive framework.

To further evaluate whether m6Astorm’s predictions capture biologically meaningful signals rather than GLORI-specific technical artifacts, we performed a cross-technology comparison with an independent miCLIP dataset. Remarkably, m6Astorm recovered Inline graphic 90% of miCLIP-supported sites in the HEK293T cell line, indicating strong consistency with orthogonal experimental evidence despite fundamental methodological differences (Supplementary Fig. 2b).

The regularization strategy enables a desirable bimodal distribution of methylated probabilities

To further evaluate the effectiveness of the proposed regularization penalty in suppressing ambiguous predictions, we visualize the distribution of read level prediction probabilities across all modified sites identified by GLORI and compare the results with those from m6Anet and CHEUI (Fig. 2d). The results demonstrate that m6Astorm yields a pronounced bimodal distribution, with the majority of reads having predicted probabilities concentrated close to either 0 or 1. This significantly lowers the number of predictions near 0.5, the region of uncertainty, implying that the model makes more confident and discriminative judgments about individual read modifications. In contrast, m6Anet and CHEUI predictions are predominantly skewed toward values <0.4, with m6Anet showing a more pronounced peak near 0. The left-skewed distribution indicates that both methods have limited power to effectively distinguish modified signals from background noise. Similar distribution patterns are also observed at specific individual sites, such as Chr1: 206732855 ( Inline graphic ) and Chr11: 44244629 (), further highlighting the efficacy of the regularization strategy in enhancing the model’s discriminative capability and its sensitivity to read level heterogeneity (Fig. 2e).

m6Astorm generalizes to new cell lines and species

To assess the generalization capability of m6Astorm, we directly apply m6Astorm to independent datasets such as human cell line: HeLa and mouse cell line: mESC. For evaluation, only sites with a sequencing coverage of at least 20 reads are considered. In total, HeLa and mESC datasets contain 10 366 and 24 551 GLORI-positive m6A sites, respectively. Predicted positive sites are defined as those with an inferred stoichiometry Inline graphic . To estimate the FPR, 10 000 GLORI-negative sites are randomly sampled from each dataset and used as the negative set during evaluation.

In the HeLa cell line, m6Astorm still demonstrates the highest Recall of 87%, compared with m6Anet (64%), CHEUI (59%), and mAFiA (56%), while also attaining the lowest FPR of 3.03%, substantially outperforming m6Anet (34.69%), CHEUI (55.46%), and mAFiA (60.61%). The PCC from m6Astorm (0.782) is lower than mAFiA (0.809), but still higher than m6Anet (0.728) and CHEUI (0.662). Notably, m6Astorm predicts methylation across all input sites, thereby offering a broader transcriptome coverage than mAFiA. When restricted to the subset of sites enriched within the six motifs analyzed by mAFiA, the PCC of m6Astorm improves from 0.782 to 0.800, reaching a level comparable with that of mAFiA. Taken together, the results on HeLa cell line further validate that m6Astorm sustains both higher predictive accuracies and remarkably lower FPR compared with other approaches (Fig. 3a).

Evaluation of model generalization across cell lines. (a) Comparisons of m6Astorm, m6Anet, CHEUI, mAFiA at site level across key metrics: Recall, PCC, and 1-FPR in the HeLa cell line. (b) Comparisons of m6Astorm, m6Anet, CHEUI, mAFiA at site level across key metrics: Recall, PCC, and 1-FPR in the mESC cell line. (c) Comparisons of m6Astorm, m6Anet, and CHEUI at molecule level on AUROC and AUPR in the HeLa cell line. (d) Comparisons of m6Astorm, m6Anet, and CHEUI at molecule level on AUROC and AUPR in the mESC cell line.

For the mESC cell line, m6Astorm still shows the highest detection power (Recall for m6Astorm, 90%; m6Anet, 82%; CHEUI, 74%; mAFiA, 68%). A similar trend is observed for FPR across the four methods. For the PCC, still mAFiA achieves the highest value, followed by m6Astorm, m6Anet and CHEUI. Consistently, the PCC from m6Astorm improves from 0.765 to 0.786 when limiting its analysis to subset of sites for the six motifs evaluated by mAFiA. All of the results further reinforce the robustness and generalizability of m6Astorm across different species and cellular contexts (Fig. 3b).

Similar to the evaluation in HEK293T, we assess the single-molecule accuracy of m6Astorm in the HeLa and mESC cell lines, with read level labels generated using the same strategy based on GLORI annotations. In HeLa, m6Astorm achieves an AUROC of 0.959 and an AUPR of 0.997, outperforming m6Anet (AUROC = 0.893, AUPR = 0.905) and CHEUI (AUROC = 0.486, AUPR = 0.496) (Fig. 3c). In mESC, m6Astorm reaches an AUROC of 0.941 and an AUPR of 0.995, again surpassing m6Anet (AUROC = 0.905, AUPR = 0.910) and CHEUI (AUROC = 0.556, AUPR = 0.535) (Fig. 3d), highlighting its enhanced discriminative power at single-molecule resolution. mAFiA is excluded from this evaluation for the same reason as that in HEK293T.

Notably, k-mer analyses of highly modified sites (stoichiometry Inline graphic , predicted by m6Astorm) reveal that HEK293T, HeLa, and mESC share consistent sequence motifs (Fig. 4d), suggesting that m6Astorm reliably captures conserved methylation patterns, further demonstrating its strong generalization capability across diverse cell types and species.

Model interpretability and biological discoveries. (a) The left panel shows the distribution of maximum stoichiometric differences in m6A level among different transcript isoforms in HEK293T. The x-axis representing the maximum difference in modification levels across all identified isoforms of individual sites. The y-axis represents the number of sites; the right panel illustrates a significant stoichiometric difference at the Chr19: 58314912 site between two transcript variants. (b) Distribution of NSD in HeLa (left, 1018 site pairs) and mESC (right, 858 site pairs). Random Distribution refers to values drawn from a normal distribution (mean = 0, standard deviation equals that of the real NSD) with the same sample size as the real NSD. (c) The left panel shows two distinct clusters of PCA-based low-dimensional embeddings derived from the BiLSTM layer’s output features, separating positive and negative reads (PC1, 86.9% variance; PC2, 6.3% variance). The right panel highlights embeddings specific to site Chr20: 31549308 (PC1, 87.7% variance; PC2, 8.1% variance). (d) k-mer analysis of high-modification sites () reveals consistent motif enrichment features across the HEK293T, HeLa, and mESC cell lines. Bars show the fraction of high-modification sites that match each context 5-mer (positions around the putative modified base). (e) Co-modification of m6A within a single transcript variant. The upper panel illustrates two m6A modification sites (positions 648 and 704) on transcript ENST00000592623 of the RPS15 gene in the HEK293T cell line. Among the 308 reads covering both positions, 24 reads are modified at position 648, 21 reads at position 704, and 10 reads are simultaneously modified at both sites. This results in an expected co-occurrence of 1.11% and an observed co-occurrence of 3.25%. The lower panel shows two m6A sites (positions 2351 and 2522) on transcript ENST00000300738 of the RRM1 gene in the HeLa cell line. Among the 243 reads covering both positions, 17 are modified at position 2351, 42 at position 2522, and 7 are co-modified at both sites, resulting in an expected co-occurrence of 1.21% and an observed co-occurrence of 2.88%.

m6Astorm enables the quantification of isoform-specific variations in m6A stoichiometry

One of the major advantages of m6Astorm lies in its ability to distinguish m6A modification among transcript isoforms for each gene. In contrast, current m6A detection methods, such as GLORI, typically provide only a weighted average of modification levels at the same genomic locus, rendering them incapable of resolving potential isoform-specific modification differences. To investigate this, we analyze m6A sites annotated by GLORI in HEK293T cells and find that, among 2704 gene loci examined, the difference in m6A stoichiometry between isoforms reaches up to 0.5 (Fig. 4a, left), with 30.6% of sites showing a difference exceeding 0.1—indicating that isoform-specific modification patterns are widespread. For instance, AC020915.6 at locus chr19: 58314912 in HEK293T cells, two different transcript isoforms exhibit notable differences in m6A stoichiometry (Fig. 4a, right).

m6A modifications tend to co-occurrence on the same mRNA molecule

To investigate m6A co-modification patterns, we employ the normalized standard deviation (NSD) metric [24], where NSD > 0 indicates higher-than-expected co-occurrence (suggesting cooperative modification), while NSD < 0 implies spatial separation of modifications across transcripts.

To quantify this tendency, we perform NSD distribution analysis under the filtering criteria of coverage Inline graphic and expected co-occurring reads for high-confidence m6A sites (stoichiometry ). The results show that the overall NSD is significantly skewed toward positive values (, Mann–Whitney U test), indicating strong statistical significance compared with a random background distribution centered at zero (Fig. 4b; HeLa on the left, mESC on the right; HEK293T shown in Supplementary Fig. 1c). This finding suggests a notable tendency for m6A modifications to co-occur on the same transcript.

In HEK293T cells, RPS15 stands out as one of the genes with both high NSD and significant expression levels. Taking the two modification sites at positions 648 and 704 on its transcript ENST00000592623 as an example, the observed co-modification frequency (3.25%) is significantly higher than the expected frequency (1.11%), suggesting a cooperative enrichment of m6A modifications on this transcript (Fig. 4e, top). Previous studies have shown that RPS15 can promote the translation of core components in the MAPK signaling pathway by interacting with the m6A reader protein IGF2BP1, thereby driving the development of esophageal squamous cell carcinoma [25]. This highlights the critical role of RPS15 in m6A-mediated post-transcriptional regulation.

A similar trend is observed in HeLa cells. For the gene RRM1, its transcript ENST00000300738 harbors two m6A sites at positions 2351 and 2522, with an observed co-modification frequency of 2.88%, significantly exceeding the expected 1.21% (Fig. 4e, bottom). RRM1 encodes the nuclear m6A reader HNRNPA2B1, which recognizes and binds m6A-modified RNA through its RRM1 domain. Notably, previous in vitro studies have demonstrated that RRM1 is the only domain within HNRNPA2B1 showing a binding preference for m6A-containing RNA, and is essential for regulating pre-mRNA splicing and pre-miRNA processing [26]. This molecular mechanism potentially provides a structural and functional basis for the transcript-level co-modification patterns observed in HeLa cells.

In comparison with the recently reported RMPore database [27], our co-modification analysis showed that 57% (674 of 1175) of overlapping site pairs exhibited consistent correlation directions despite differences in methodological frameworks (Supplementary Fig. 2f). This partial concordance suggests that co-modification patterns may be influenced by analytical choices, including site selection criteria and scoring strategies. These observations highlight the need for systematic benchmarking to establish robust analytical standards and enable meaningful cross-study comparisons in single-molecule RNA modification analysis.

Feature space analysis validates m6Astorm’s single-molecule discriminative power

To evaluate the ability of m6Astorm to predict m6A modifications at the single-molecule level, we perform a feature space analysis. High-confidence modified (stoichiometry Inline graphic ) and unmodified (stoichiometry < 0.1) sites are chosen from the GLORI dataset. The feature representations of reads corresponding to these sites are extracted from the output layer of the m6Astorm BiLSTM model, followed by principal component analysis (PCA) to reduce the dimensionality to 2D. Results in left panel of Fig. 4c clearly reveal two well-separated clusters from respectively modified and unmodified reads, indicating that the integration of input features with the deep learning architecture in m6Astorm can effectively distinguish between the two states.

As an example, we investigate individual sites by extracting the feature representations of all associated reads and applying PCA. Similar to the site-level analysis, the reads cluster into two distinct groups corresponding to the modified and unmodified states (Fig. 4c, right panel). These findings further demonstrate that m6Astorm is capable of learning highly discriminative features even under single-molecule input conditions, validating its precise modeling of m6A modification status.

Stoichiometry-aware m6A analysis reveals quantitative regulatory effects

Stoichiometric quantification treats m6A as a continuous quantitative trait, enabling direct association between modification levels and functional outcomes such as gene expression and translation efficiency, improving sensitivity to condition-dependent stoichiometric shifts, and supporting probabilistic modeling of single-molecule and co-modification patterns beyond binary classification. Using m6Astorm, we observed a consistent negative correlation between m6A stoichiometry and gene expression across 8996 genes (Supplementary Fig. 2a), along with a similar relationship with translation efficiency (Supplementary Fig. 2e), consistent with the previous research that m6A negatively regulates mRNA translation efficiency [2, 6]. In addition, stoichiometry-aware analysis enabled the identification of coordinated co-modification patterns that are inaccessible to binary approaches (Fig. 4c), supporting a dose-dependent regulatory role of m6A.

Discussion

Despite the development of various detection methods, accurate and quantitative detection of m6A modifications at single-molecule and single-nucleotide resolution remains a major challenge. In this study, we utilize methylation stoichiometries provided by GLORI at single-nucleotide resolution as training labels to enable accurate quantification of m6A modifications. In addition, our model incorporates the optimization of identification of m6A modification at single-molecule level, regularized by a distribution penalty to constrain intermediate prediction probabilities, thereby improving the accuracy and robustness of prediction.

While m6Astorm shows excellent accuracy and detection strength, several limitations remain. Firstly, the model does not distinguish between different motif types, but instead treats all candidate sites uniformly. While this simplification reduces model complexity, it may limit the ability to detect non-canonical or context-specific modification events. Secondly, the model does not directly utilize raw electrical signals from nanopore sequencing. Instead, it relies on EMD to extract IMFs, using the MIAs and related features as inputs. While this approach captures signal perturbations associated with RNA modifications, it may overlook subtle yet biologically relevant patterns embedded in the raw signal. Additionally, our current framework focuses exclusively on m6A modifications and is not yet extended to other types of RNA modifications. Although the architecture is, in principle, adaptable, additional training or transfer learning based on appropriate labeled datasets will be required.

To assess cross-chemistry generalizability, the RNA002-trained model was applied without retraining to an independent RNA004 dataset from the HEK293T cell line. Among 21 356 GLORI-positive sites, 19 907 were predicted with Inline graphic , corresponding to a recall of 93%. Predicted modification ratios showed a Pearson correlation of 0.742 with GLORI measurements, and the model achieved an AUROC of 0.942 at the read level. Overall performance on RNA004 data was PCC = 0.742, FPR = 5.45%, Recall = 93%, and AUROC = 0.942. We acknowledge that the RNA002-trained model is not optimized for RNA004 chemistry, and retraining on RNA004 data is expected to further improve performance. Building on the current results, our next step will be to incorporate a model retrained using RNA004 sequencing data.

It is well established that different m6A profiling technologies exhibit limited overlap due to their distinct technical principles [28]. Although m6Astorm was trained on GLORI-derived stoichiometric labels, which may introduce platform-specific biases, we applied stringent filtering criteria, including consistency across replicates and a minimum coverage of 20, to enhance label reliability. Notably, high-confidence GLORI sites showed substantial overlap with those identified by eTAM-seq [29, 30] (Supplementary Fig. 2d), an orthogonal stoichiometric method, indicating the presence of technology-robust m6A signals. Nevertheless, reliance on a single profiling technology remains a limitation. Future efforts should integrate labels from multiple orthogonal approaches, such as GLORI, eTAM-seq, and antibody-based methods, to construct consensus training sets that better isolate core biological signals from platform-specific noise and improve model generalizability across experimental systems.

In future work, we aim to integrate raw signal representations and motif-specific features to enhance the model’s sensitivity to complex modification patterns. We also plan to expand the framework to support unified detection of multiple RNA modifications, facilitating the construction of a more comprehensive and quantitative transcriptome-wide modification map.

Materials and methods

Data preprocessing for m6Astorm

To train and evaluate m6Astorm, we collect direct RNA sequencing (dRNA-seq) datasets and matched GLORI modification profiles from three cell lines: human HEK293T, HeLa, and mESC. The raw fast5 data are processed with Guppy Basecalling Software v6.3.8+d9e0f64 (Oxford Nanopore Technologies) for basecalling, followed by transcriptome alignment using minimap2 v2.22-r1101 with species-specific references—GRCh38 (Ensembl release 91) for human cell lines and GRCm38 (Ensembl release 91) for mouse mESC. The aligned reads and raw fast5 are subsequently analyzed using nanopolish v0.14.0 to extract event data and raw current signals for downstream feature extraction.

Feature extraction for m6Astrom

For candidate m6A site Inline graphic , the current mean, standard deviation, length, maximum, minimum, and median are obtained from the event data, and the RobustScaler method is used to normalize these features by

(1)

where Median Inline graphic is the median function for reducing the impact of outliers and IQR is the interquartile range (i.e. the difference between the 75th and 25th percentiles) for enhancing robustness against noise and extreme values. Let

(2)

be the sub-feature vector of site i, with Inline graphic and denoting the current mean, standard deviation, length, maximum, minimum, and median, respectively.

Additionally, Inline graphic 4 bp relative to the candidate sites are incorporated (9-mer sequence) then where and follow analogously.

Nanopore current signals are inherently nonlinear and non-stationary due to sequence-dependent pore interactions and transient disruptions introduced by RNA modifications. To characterize such signals, we employ EMD, a fully data-driven time–frequency analysis method originally proposed for non-stationary signal analysis [31, 32]. Unlike wavelet-based approaches that rely on predefined basis functions and scale parameters, EMD adaptively decomposes signals according to their intrinsic oscillatory modes, making it particularly suitable for nanopore current fluctuations.

Specifically, the raw current signal corresponding to each 9-mer sequence is decomposed into a set of IMFs. For each IMF, the instantaneous amplitude Inline graphic is computed using the Hilbert transform:

(3)

where Inline graphic denotes the Hilbert transform operator. The instantaneous amplitude reflects the local energy and fluctuation intensity of the signal, which is closely associated with transient current modulations induced by nucleotide composition and chemical modifications.

To obtain a compact and interpretable representation, we extract the MIA from each IMF as a representative feature. These MIAs capture the most prominent signal perturbations within each 9-mer window and emphasize localized high-intensity events that are likely caused by m6A-induced disruptions. Compared with global statistical descriptors, MIA-based features preserve sensitivity to subtle, localized current variations while avoiding the need for manually selected basis functions, thereby enhancing both the discriminative power and interpretability of the feature engineering process [33]. The resulting feature vector is denoted as Inline graphic .

(4)

with Inline graphic being the one hot encoding for 9-mer sequence and

To improve model interpretability, we performed a global feature importance analysis using SHAP (SHapley Additive exPlanations). Model features were grouped into sequence context features, signal intensity-related statistical features, and EMD-based features (IMF/MIAs). Aggregated SHAP values indicate that signal statistical features contribute most to model predictions (58.1%), followed by sequence context features (35.1%), while IMF/MIA features contribute 6.8% (Supplementary Fig. 2c).

m6Astorm for stoichiometry-preserving and stochasticity-aware identification of m6A

m6Astorm is trained and evaluated for RNA modification detection using dRNA-Seq and matched GLORI data from the HEK293T cell line by formulating it as a multiple objective optimization problem. Let Inline graphic be the modification status for the read at site , then

Therefore, the probability of methylation is

where F is parameterized by a deep neural network consisting of two 1D convolutional layers (with input channel size 11, output channels 256 Inline graphic 128, kernel size 3, same padding), each followed by Batch Normalization, ReLU activation, and Dropout (); a two-layer bidirectional LSTM (BiLSTM) with hidden size 128 and dropout (), from which the output at the final time step is extracted; and a two-layer fully connected classifier (256 Inline graphic 32 1) with ReLU and Dropout (), followed by a Sigmoid activation to generate the final prediction probability.

To capture stochasticity-aware m6A detection, supervised learning is adopted to classifying methylated from unmethylated at read level. Specifically, the methylated and unmethylated reads are exacted from high-confidence GLORI sites (stoichiometry Inline graphic ) and sites not detected by GLORI, respectively. The binary cross-entropy loss function is

(5)

where Inline graphic is the total number of reads with true labels.

Note that the methylated status for each read should be 1 or 0, reflecting the biological reality that each read at candidate site tend to be either modified or unmodified, rather than uncertain. Then one penalty loss is also incorporated to avoid read-level ambiguity

(6)

To preserve stoichiometry in m6A detection, we minimize the mean square error (MSE) between predicted modification stoichiometry and GLORI-measured stoichiometry at site level. The predicted stoichiometry at position Inline graphic is calculated as

(7)

where Inline graphic is the total number of reads at position .

Since the indicator function is non-differentiable, we use a differentiable approximation via a sigmoid function:

(8)

where the hyperparameter Inline graphic controls the sharpness of the approximation. The differentiable soft stoichiometry estimate is then:

(9)

Therefore, the MSE loss is

(10)

where Inline graphic is the total number of sites, and is the GLORI-measured stoichiometry for site .

Finally, the multiple loss function is integrated to be a single loss by introducing some balancing parameters,

(11)

Where Inline graphic , and are the balancing parameters.

Evaluation metric for benchmarking

The ground truth used for benchmarking is derived from the GLORI method, which reports positively methylated sites with stoichiometry Inline graphic . Sites not detected by GLORI are treated as negatives. Evaluations are conducted across HEK293T, HeLa, and mESC datasets, including only sites with read coverage. We apply four complementary evaluation metrics:

(i) Read level AUROC and AUPR:

Methylated reads are extracted from high-confidence GLORI-positive sites (stoichiometry Inline graphic ), and unmethylated reads from GLORI-negative sites. These are used as positive and negative read level labels, respectively. Based on the predicted per-read methylation probabilities, we generate ROC and PR curves and calculate the AUC (AUROC and AUPR) for each.

(ii) Site level correlation with GLORI stoichiometry (PCC):

We compute PCCs for each method, between its predicted methylation levels (i.e. the fraction of reads predicted as methylated) and the GLORI-reported stoichiometry. Only sites with predicted stoichiometry Inline graphic are included.

(iii) Recall:

Recall is defined as the proportion of GLORI-positive sites (stoichiometry Inline graphic ) that are also predicted as positive (stoichiometry ) by a given method.

(12)

(iv) False positive rate:

FPR is defined as the fraction of GLORI-negative sites (i.e. those not detected by GLORI) that are incorrectly predicted as positive (stoichiometry Inline graphic ).

(13)

Calculation of gene methylation level

For a given gene Inline graphic , the gene methylation level is computed as the coverage-weighted average of site-level modification ratios across all adenine sites:

(14)

where Inline graphic denotes the set of adenine sites in gene , is the predicted modification ratio at site , and is the read coverage at site .

Compare m6Astorm with existing tools

To ensure a fair comparison between m6Astorm and other existing tools, we strictly followed the official GitHub workflows for preprocessing in each method.

For mAFiA (https://github.com/dieterich-lab/mAFiA), all parameters were kept at their default values except that −−min_coverage (minimum number of reads covering a site) was reduced from the default 50 to 20, and −−mod_prob_thresh was kept at its default value of 0.5.

For m6Anet (https://github.com/GoekeLab/m6anet), default parameters were used (DEFAULT_MIN_READS = 20, DEFAULT_READ_THRESHOLD = 0.033379376), and the default threshold was applied for stoichiometry calculation.

For CHEUI (m6A model, https://github.com/comprna/CHEUI), default parameters were also used, where -n 20 specifies a minimum of 20 reads covering a site for inclusion in the analysis, and -d/−−double_cutoff uses the default double cutoff values of 0.3 and 0.7 (probability Inline graphic is considered unmodified, is considered modified).

For TandemMod(https://github.com/yulab2021/TandemMod), the analysis was performed strictly following the recommended workflow provided by the authors, with all parameters kept at their default values.

For m6Aiso (https://github.com/SYSU-Wang-LAB/m6Aiso), default parameters were used together with the recommended double probability cutoff strategy. Specifically, a site within a read was considered modified if the predicted modification probability exceeded 0.9 and considered unmodified if the probability was below 0.1; reads with intermediate probabilities were excluded from classification.

Key Points

We propose m6Astorm, a novel computational framework for stoichiometry-preserving and stochasticity-aware identification of m6A modifications at the single-molecule level.
m6Astorm employs a hybrid convolutional neural networks and bidirectional long short-term memory architecture to jointly encode raw signal-derived features and sequence context, enabling motif-independent detection of m6A sites.
m6Astorm adopts a dual-objective optimization strategy: (i) minimizing binary cross-entropy loss for methylation state classification at the molecule level, regularized by a confidence-aware penalty term suppressing low-certainty predictions; (ii) minimizing the stoichiometry bias for accurate quantification at the nucleotide level.
m6Astorm resolves co-methylation events across individual molecules, revealing coordinated m6A regulatory patterns in transcriptomes.
m6Astorm demonstrates robust performance across HeLa and mouse embryonic stem cell datasets, evidenced by high prediction power (Recall), low FPR, accurate stoichiometry, and high AUROC/AUPR in transcriptome-wide modification profiling.

Supplementary Material

supplementary_bbag132

supplementary_bbag132.docx^{(359.3KB, docx)}

Acknowledgements

The computation is completed in the HPC Platform of Huazhong University of Science and Technology.

Contributor Information

Fangyuan Wang, School of Mathematics and Statistics, Huazhong University of Science and Technology, 1037 Luoyu Road, Hongshan District, Wuhan, Hubei 430074, China.

Menglu Chen, School of Mathematics and Statistics, Huazhong University of Science and Technology, 1037 Luoyu Road, Hongshan District, Wuhan, Hubei 430074, China.

Jinyi Li, School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK–Shenzhen), 2001 Longxiang Road, Longgang District, Shenzhen, Guangdong 518172, China.

Yang Yu, School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK–Shenzhen), 2001 Longxiang Road, Longgang District, Shenzhen, Guangdong 518172, China; School of Medicine, The Chinese University of Hong Kong, Shenzhen (CUHK–Shenzhen), 2001 Longxiang Road, Longgang District, Shenzhen, Guangdong 518172, China.

Zhenxing Guo, School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK–Shenzhen), 2001 Longxiang Road, Longgang District, Shenzhen, Guangdong 518172, China.

Meng Zou, School of Mathematics and Statistics, Huazhong University of Science and Technology, 1037 Luoyu Road, Hongshan District, Wuhan, Hubei 430074, China.

Author contributions

M.Z. and Z.G. conceived and designed the project. F.W., M.C., J.L., Y.Y., Z.G., and M.Z. performed the experiment study. F.W., Z.G., and M.Z. drafted the manuscript. All authors read and approved the final manuscript.

Conflicts of interest

None declared.

Funding

F.W., M.C., and M.Z. were supported by the National Natural Science Foundation of China (NSFC) under Grants No. 12001215. J.L., Y.Y., and Z.G. were partially supported by the National Natural Science Foundation of China (NSFC) under Grants No. 12401650 and 82441027, the intramural funding from The Chinese University of Hong Kong, Shenzhen (CUHKShenzhen), and Guangdong Provincial Key Laboratory of Mathematical Foundations for Artificial Intelligence (2023B1212010001).

Data availability

The datasets used in this study are all publicly available. Direct RNA sequencing data for HEK293T cells are obtained from the Singapore Nanopore Expression Project (https://github.com/GoekeLab/sg-nex-data). HeLa and mESC datasets are downloaded from the NCBI Sequence Read Archive under accession numbers PRJNA777450 [34] and SRP166020 [24], respectively. Site-specific quantitative m6A modification data for HEK293T, HeLa, and mESC generated using the GLORI method are available from the gene expression omnibus (GEO) under accession number GSE210563 [2].

Direct RNA sequencing data generated using the RNA004 chemistry for HEK293T cells were downloaded from the European Nucleotide Archive (ENA) under accession number PRJEB80229 [35]. Translation efficiency (TE) data used in this study were obtained from the GEO under accession number GSE63591 [6].

All code used in this study is available at the GitHub repository: https://github.com/HUSTzoulab/m6Astorm.

References

1. Boccaletto P, Machnicka MA, Purta E et al. MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res 2018;46:D303–7. 10.1093/nar/gkx1030 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Liu C, Sun H, Yi Y et al. Absolute quantification of single-base m6A methylation in the mammalian transcriptome using glori. Nat Biotechnol 2023;41:355–66. 10.1038/s41587-022-01487-9 [DOI] [PubMed] [Google Scholar]
3. Ke S, Pandya-Jones A, Saito Y et al. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes Dev 2017;31:990–1006. 10.1101/gad.301036.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Wang X, Zhike L, Gomez A et al. N 6-methyladenosine-dependent regulation of messenger RNA stability. Nature 2014;505:117–20. 10.1038/nature12730 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Xiao W, Adhikari S, Dahal U et al. Nuclear m6A reader YTHDC1 regulates mRNA splicing. Mol Cell 2016;61:507–19. 10.1016/j.molcel.2016.01.012 [DOI] [PubMed] [Google Scholar]
6. Wang X, Zhao BS, Roundtree IA et al. N6-methyladenosine modulates messenger RNA translation efficiency. Cell 2015; RNA161:1388–99. 10.1016/j.cell.2015.05.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Delaunay S, Frye M. RNA modifications regulating cell fate in cancer. Nat Cell Biol 2019;21:552–9. 10.1038/s41556-019-0319-0 [DOI] [PubMed] [Google Scholar]
8. Barbieri I, Kouzarides T. Role of RNA modifications in cancer. Nat Rev Cancer 2020;20:303–22. 10.1038/s41568-020-0253-2 [DOI] [PubMed] [Google Scholar]
9. Meyer KD, Saletore Y, Zumbo P et al. Comprehensive analysis of mRNA methylation reveals enrichment in 3 UTRs and near stop codons. Cell 2012;149:1635–46. 10.1016/j.cell.2012.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Linder B, Grozhik AV, Olarerin-George AO et al. Single-nucleotide-resolution mapping of m6A and m6am throughout the transcriptome. Nat Methods 2015;12:767–72. 10.1038/nmeth.3453 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Koh CWQ, Goh YT, Goh WSS. Atlas of quantitative single-base-resolution n 6-methyl-adenine methylomes. Nat Commun 2019;10:5936. 10.1038/s41467-019-13561-z [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Sun H, Li K, Liu C et al. Regulation and functions of non-m6A mRNA modifications. Nat Rev Mol Cell Biol 2023;24:714–31. 10.1038/s41580-023-00622-x [DOI] [PubMed] [Google Scholar]
13. Wang S, Lv W, Li T et al. Dynamic regulation and functions of mRNA m6A modification. Cancer Cell Int 2022;22:48. 10.1186/s12935-022-02452-x [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Garalde DR, Snell EA, Jachimowicz D et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods 2018;15:201–6. 10.1038/nmeth.4577 [DOI] [PubMed] [Google Scholar]
15. Liu H, Begik O, Lucas MC et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat Commun 2019;10:4079. 10.1038/s41467-019-11713-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Jenjaroenpun P, Wongsurawat T, Wadley TD et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res 2021;49:e7–7. 10.1093/nar/gkaa620 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Hendra C, Pratanwanich PN, Wan YK et al. Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nat Methods 2022;19:1590–8. 10.1038/s41592-022-01666-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Chan A, Naarmann-de IS, Vries CPM et al. Detecting m6A at single-molecular resolution via direct RNA sequencing and realistic training data. Nat Commun 2024;15:3323. 10.1038/s41467-024-47661-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Acera, Mateos P, Sethi AJ, Ravindran A et al. Prediction of m6A and m5C at single-molecule resolution reveals a transcriptome-wide co-occurrence of RNA modifications. Nat Commun 2024;15:3899. 10.1038/s41467-024-47953-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018;34:3094–100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015;12:733–5. 10.1038/nmeth.3444 [DOI] [PubMed] [Google Scholar]
22. You W, Shao W, Yan M et al. Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing. Nat Commun 2024;15:4049. 10.1038/s41467-024-48437-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Guo W, Ren Z, Huang X et al. Single-molecule m6A detection empowered by endogenous labeling unveils complexities across RNA isoforms. Mol Cell 2025;85:1233–1246.e7. 10.1016/j.molcel.2025.01.014 [DOI] [PubMed] [Google Scholar]
24. Cruciani S, Delgado-Tejedor A, Pryszcz LP et al. De novo basecalling of RNA modifications at single molecule and nucleotide resolution. Genome Biol 2025;26:38. 10.1186/s13059-025-03498-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Zhao Y, Li Y, Zhu R et al. RPS15 interacted with IGF2BP1 to promote esophageal squamous cell carcinoma development via recognizing m6A modification. Signal Transduct Target Ther 2023;8:224. 10.1038/s41392-023-01428-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Alarcón CR, Goodarzi H, Lee H et al. HNRNPA2B1 is a mediator of m6A-dependent nuclear RNA processing events. Cell 2015;162:1299–308. 10.1016/j.cell.2015.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Lin Z, Bao X, Zhang L et al. Rmpore: a comprehensive database of single-molecule RNA modifications detected by nanopore direct RNA sequencing. Nucleic Acids Res 2025;54:D291–302. 10.1093/nar/gkaf1144 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Huang D, Chen K, Song B et al. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res 2022;50:10290–310. 10.1093/nar/gkac830 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Xiao Y-L, Liu S, Ge R et al. Transcriptome-wide profiling and quantification of n 6-methyladenosine by enzyme-assisted adenosine deamination. Nat Biotechnol 2023;41:993–1003. 10.1038/s41587-022-01587-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Zhao X, Ye H, He D et al. m6AConquer: a consistently quantified and orthogonally validated database for the n 6-methyladenosine (m6A) epitranscriptome. Nucleic Acids Res 2025;54:D204–18. 10.1093/nar/gkaf1204 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Huang NE, Shen Z, Long SR et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond Ser A 1998;454:903–95. [Google Scholar]
32. Huang NE, Zhaohua W. A review on Hilbert–Huang transform: Method and its applications to geophysical studies. Rev Geophys 2008;46:RG2006. [Google Scholar]
33. Joseph ER, Jakir H, Thangavel B et al. Comparative analysis of Hilbert–Huang transform and wavelet transform for non-stationary signal processing. Symmetry 2024;16:1223. 10.3390/sym16091223 [DOI] [Google Scholar]
34. Tavakoli S, Nabizadeh M, Makhamreh A et al. Semi-quantitative detection of pseudouridine modifications and type i/ii hypermodifications in human mRNAs using direct long-read sequencing. Nat Commun 2023;14:334. 10.1038/s41467-023-35858-w [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Zou Y, Ahsan MU, Chan J et al. A comparative evaluation of computational models for RNA modification detection using nanopore sequencing with RNA004 chemistry. Brief Bioinform 2025;26:bbaf404. 10.1093/bib/bbaf404 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary_bbag132

supplementary_bbag132.docx^{(359.3KB, docx)}

Data Availability Statement

All code used in this study is available at the GitHub repository: https://github.com/HUSTzoulab/m6Astorm.

[ref1] 1. Boccaletto P, Machnicka MA, Purta E et al. MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res 2018;46:D303–7. 10.1093/nar/gkx1030 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2. Liu C, Sun H, Yi Y et al. Absolute quantification of single-base m6A methylation in the mammalian transcriptome using glori. Nat Biotechnol 2023;41:355–66. 10.1038/s41587-022-01487-9 [DOI] [PubMed] [Google Scholar]

[ref3] 3. Ke S, Pandya-Jones A, Saito Y et al. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes Dev 2017;31:990–1006. 10.1101/gad.301036.117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4. Wang X, Zhike L, Gomez A et al. N 6-methyladenosine-dependent regulation of messenger RNA stability. Nature 2014;505:117–20. 10.1038/nature12730 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] 5. Xiao W, Adhikari S, Dahal U et al. Nuclear m6A reader YTHDC1 regulates mRNA splicing. Mol Cell 2016;61:507–19. 10.1016/j.molcel.2016.01.012 [DOI] [PubMed] [Google Scholar]

[ref6] 6. Wang X, Zhao BS, Roundtree IA et al. N6-methyladenosine modulates messenger RNA translation efficiency. Cell 2015; RNA161:1388–99. 10.1016/j.cell.2015.05.014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7. Delaunay S, Frye M. RNA modifications regulating cell fate in cancer. Nat Cell Biol 2019;21:552–9. 10.1038/s41556-019-0319-0 [DOI] [PubMed] [Google Scholar]

[ref8] 8. Barbieri I, Kouzarides T. Role of RNA modifications in cancer. Nat Rev Cancer 2020;20:303–22. 10.1038/s41568-020-0253-2 [DOI] [PubMed] [Google Scholar]

[ref9] 9. Meyer KD, Saletore Y, Zumbo P et al. Comprehensive analysis of mRNA methylation reveals enrichment in 3 UTRs and near stop codons. Cell 2012;149:1635–46. 10.1016/j.cell.2012.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. Linder B, Grozhik AV, Olarerin-George AO et al. Single-nucleotide-resolution mapping of m6A and m6am throughout the transcriptome. Nat Methods 2015;12:767–72. 10.1038/nmeth.3453 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Koh CWQ, Goh YT, Goh WSS. Atlas of quantitative single-base-resolution n 6-methyl-adenine methylomes. Nat Commun 2019;10:5936. 10.1038/s41467-019-13561-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] 12. Sun H, Li K, Liu C et al. Regulation and functions of non-m6A mRNA modifications. Nat Rev Mol Cell Biol 2023;24:714–31. 10.1038/s41580-023-00622-x [DOI] [PubMed] [Google Scholar]

[ref13] 13. Wang S, Lv W, Li T et al. Dynamic regulation and functions of mRNA m6A modification. Cancer Cell Int 2022;22:48. 10.1186/s12935-022-02452-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] 14. Garalde DR, Snell EA, Jachimowicz D et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods 2018;15:201–6. 10.1038/nmeth.4577 [DOI] [PubMed] [Google Scholar]

[ref15] 15. Liu H, Begik O, Lucas MC et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat Commun 2019;10:4079. 10.1038/s41467-019-11713-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Jenjaroenpun P, Wongsurawat T, Wadley TD et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res 2021;49:e7–7. 10.1093/nar/gkaa620 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] 17. Hendra C, Pratanwanich PN, Wan YK et al. Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nat Methods 2022;19:1590–8. 10.1038/s41592-022-01666-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] 18. Chan A, Naarmann-de IS, Vries CPM et al. Detecting m6A at single-molecular resolution via direct RNA sequencing and realistic training data. Nat Commun 2024;15:3323. 10.1038/s41467-024-47661-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] 19. Acera, Mateos P, Sethi AJ, Ravindran A et al. Prediction of m6A and m5C at single-molecule resolution reveals a transcriptome-wide co-occurrence of RNA modifications. Nat Commun 2024;15:3899. 10.1038/s41467-024-47953-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 20. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018;34:3094–100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 21. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015;12:733–5. 10.1038/nmeth.3444 [DOI] [PubMed] [Google Scholar]

[ref22] 22. You W, Shao W, Yan M et al. Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing. Nat Commun 2024;15:4049. 10.1038/s41467-024-48437-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] 23. Guo W, Ren Z, Huang X et al. Single-molecule m6A detection empowered by endogenous labeling unveils complexities across RNA isoforms. Mol Cell 2025;85:1233–1246.e7. 10.1016/j.molcel.2025.01.014 [DOI] [PubMed] [Google Scholar]

[ref24] 24. Cruciani S, Delgado-Tejedor A, Pryszcz LP et al. De novo basecalling of RNA modifications at single molecule and nucleotide resolution. Genome Biol 2025;26:38. 10.1186/s13059-025-03498-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 25. Zhao Y, Li Y, Zhu R et al. RPS15 interacted with IGF2BP1 to promote esophageal squamous cell carcinoma development via recognizing m6A modification. Signal Transduct Target Ther 2023;8:224. 10.1038/s41392-023-01428-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] 26. Alarcón CR, Goodarzi H, Lee H et al. HNRNPA2B1 is a mediator of m6A-dependent nuclear RNA processing events. Cell 2015;162:1299–308. 10.1016/j.cell.2015.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] 27. Lin Z, Bao X, Zhang L et al. Rmpore: a comprehensive database of single-molecule RNA modifications detected by nanopore direct RNA sequencing. Nucleic Acids Res 2025;54:D291–302. 10.1093/nar/gkaf1144 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] 28. Huang D, Chen K, Song B et al. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res 2022;50:10290–310. 10.1093/nar/gkac830 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29. Xiao Y-L, Liu S, Ge R et al. Transcriptome-wide profiling and quantification of n 6-methyladenosine by enzyme-assisted adenosine deamination. Nat Biotechnol 2023;41:993–1003. 10.1038/s41587-022-01587-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30. Zhao X, Ye H, He D et al. m6AConquer: a consistently quantified and orthogonally validated database for the n 6-methyladenosine (m6A) epitranscriptome. Nucleic Acids Res 2025;54:D204–18. 10.1093/nar/gkaf1204 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31. Huang NE, Shen Z, Long SR et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond Ser A 1998;454:903–95. [Google Scholar]

[ref32] 32. Huang NE, Zhaohua W. A review on Hilbert–Huang transform: Method and its applications to geophysical studies. Rev Geophys 2008;46:RG2006. [Google Scholar]

[ref33] 33. Joseph ER, Jakir H, Thangavel B et al. Comparative analysis of Hilbert–Huang transform and wavelet transform for non-stationary signal processing. Symmetry 2024;16:1223. 10.3390/sym16091223 [DOI] [Google Scholar]

[ref34] 34. Tavakoli S, Nabizadeh M, Makhamreh A et al. Semi-quantitative detection of pseudouridine modifications and type i/ii hypermodifications in human mRNAs using direct long-read sequencing. Nat Commun 2023;14:334. 10.1038/s41467-023-35858-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] 35. Zou Y, Ahsan MU, Chan J et al. A comparative evaluation of computational models for RNA modification detection using nanopore sequencing with RNA004 chemistry. Brief Bioinform 2025;26:bbaf404. 10.1093/bib/bbaf404 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Stoichiometry-preserving and stochasticity-aware identification of m6A from direct RNA sequencing

Fangyuan Wang

Menglu Chen

Jinyi Li

Yang Yu

Zhenxing Guo

Meng Zou

Abstract

Introduction

Results

m6Astorm identifies m6A modification through a dual-objective strategy

Figure 1.

m6Astorm enables accurate identification and quantification of m6A modifications

Figure 2.

The regularization strategy enables a desirable bimodal distribution of methylated probabilities

m6Astorm generalizes to new cell lines and species

Figure 3.

Figure 4.

m6Astorm enables the quantification of isoform-specific variations in m6A stoichiometry

m6A modifications tend to co-occurrence on the same mRNA molecule

Feature space analysis validates m6Astorm’s single-molecule discriminative power

Stoichiometry-aware m6A analysis reveals quantitative regulatory effects

Discussion

Materials and methods

Data preprocessing for m6Astorm

Feature extraction for m6Astrom

m6Astorm for stoichiometry-preserving and stochasticity-aware identification of m6A

Evaluation metric for benchmarking

Calculation of gene methylation level

Compare m6Astorm with existing tools

Key Points

Supplementary Material

Acknowledgements

Contributor Information

Author contributions

Conflicts of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases