Abstract
Post-transcriptional RNA modifications, prevalent in multiple RNA species such as mRNA, rRNA, and tRNA, play a significant role in biological processes by altering RNA structures. With recent advancements in prediction algorithms, it is possible to predict RNA secondary structure for sequences containing modified bases. In this study, we introduce StructRMDB, the first database designed to characterize the impact of chemical modifications on RNA secondary structure. StructRMDB comprises more than 880,000 RNA modification sites and their structural impacts, including N6-Methyladenosine (m6A), pseudouridine (Ψ), and adenosine-to-inosine editing (A-to-I) from nine species in both pre-RNA and mature RNA. Two RNA secondary structure prediction tools (RNAstructure and ViennaRNA), along with four scoring methods (Similarity Score, Relative Score, Distance, and SMC Score), were adopted to assess structural changes induced by these modifications. Additionally, we visualized RNA secondary structures with and without modifications to highlight structural alterations. A user-friendly graphical interface is provided to facilitate the querying, downloading, and sharing of modified site evaluation and annotation data, offering novel insights into the effects of RNA modifications on secondary structure. StructRMDB serves as a valuable resource for studying the structural impact of RNA modifications and is available at: http://www.rnamd.org/StructRMDB/index.html.
Keywords: RNA modification, RNA secondary structure, N6-methyladenosine, Pseudouridine, Adenosine-to-inosine editing
1. Introduction
Single-stranded RNA molecules can fold into a wide spectrum of secondary and tertiary structures that underpin essential biological mechanisms, including catalytic ribozyme activity, temperature or metabolite sensing, and the epigenetic regulation of long non-coding RNAs 1, 2, 3, 4. RNA structural states themselves may act as heritable carriers of epigenetic information, a concept referred to as RNA structural memory [1]. Furthermore, mutations that alter the structure of RNA have been linked to human diseases such as dysduplication, retinoblastoma, and breast cancer [5]. Therefore, understanding RNA folding and structure is critical for advancing our understanding of RNA functions.
RNA secondary structure, a central component of RNA folding, is characterized by distinct features such as hairpins, long-range interactions, G-quadruplexes, R-loops, and pseudoknots. These structures arise from interactions between non-adjacent nucleotides [6] and profoundly influence key mRNA processes, including transcription, splicing, and translation [7]. The successful identification of RNA secondary structure is highly informative. For example, incorporating secondary structure information has been shown to improve triplex-forming oligonucleotide (TPX/TFO) prediction specificity for lncRNAs [8]. Additionally, transcriptional regulation including directionality, dynamics, and RNA splicing relies on RNA secondary structures, which enhance sequence complexity [9]. However, RNA secondary structures are highly dynamic and often regulated by RNA-binding proteins (RBPs), making them difficult to predict solely from primary sequences [6].
In addition to secondary structure, RNA chemical modifications also play pivotal roles in regulating RNA stability and function by altering the chemical properties of individual nucleotides [6]. These modifications can reshape RNA–protein interactions because many RBPs either directly recognize specific modifications or are sensitive to the structural changes they induce [10]. Moreover, modifications can alter secondary structure formation, ranging from slight stabilization to significant destabilization, depending on the position and sequence context of the modification. In some cases, the modification can induce substantial structural rearrangements, such as converting a hairpin to a duplex [11]. For example, m6A modifications destabilize RNA helices and modulate regulatory processes [12], while the positively charged N1-methyladenosine (m1A) can locally alter mRNA structure near translation initiation sites by disrupting Watson-Crick base pairing [13]. Similarly, the secondary structure can influence the modification level, as some studies suggest that folded RNA secondary structures can prevent motifs from being methylated after transcription [14].
This bidirectional relationship underscores the necessity of accurately identifying structural regions affected by chemical modifications. Such interplay has been observed in mitochondrial diseases, where tRNA mutations alter modification levels and structural stability, ultimately impairing translation [14]. Moreover, the interaction between modifications and secondary structure has been demonstrated through the analysis of RNA structuromes in HIV, yeast, Arabidopsis, and mammalian cells and tissues, highlighting the importance of RNA modifications in secondary structure, particularly in the context of human diseases [15].
Currently, the integration of modification effects into RNA structural prediction remains challenging due to the lack of specialized algorithms and precise thermodynamic parameters. The prediction of RNA secondary structure has long been a major focus in computational biology. Traditional algorithms commonly identify the structure with the minimum free energy (MFE) using thermodynamic parameters derived from experimental data [16]. Existing tools have made preliminary attempts to address this issue. For example, RNAstructure incorporates a nearest-neighbor model that allows folding predictions for modified nucleotides, such as m6A, by extending thermodynamic parameters [17]. In contrast, ViennaRNA adopts a flexible constraint framework that dynamically adjusts base-pairing probabilities to reflect the energetic effects of modifications like m6A, inosine, and pseudouridine [18].
Given the regulatory significance and widespread occurrence of RNA modifications [19], structure prediction based on modifications can help researchers locate modification sites that may influence RNA structure and interpret potential indirect functional effects mediated by these modifications. To address this need, we developed StructRMDB, a predicted structure-centric database dedicated to screening and identifying RNA secondary structures altered by single-site RNA modifications, including m⁶A, pseudouridine (Ψ), and A-to-I RNA editing.
In contrast to other RNA structural databases, such as the Nucleic Acid Circular Dichroism Database —a repository of experimentally derived circular dichroism spectra of nucleic acids [17], StructRMDB is dedicated to exploring the potential effects of chemical modifications on RNA secondary structures through large-scale predictive data. It integrates advanced structure prediction algorithms with more than 880,000 high-confidence modification sites across nine species, enabling direct comparisons between modified and unmodified RNA structural states. Furthermore, the database offers comprehensive functional annotations, including gene regions, gene types, RNA sequence categories (pre-RNA and mature RNA), as well as overlaps with RNA-binding proteins (RBPs), miRNAs, and single nucleotide polymorphisms (SNPs) associated with modification sites. RNAplot was used to visually display predicted RNA secondary structures with or without modifications (Fig. 1). StructRMDB is now freely accessible at: http://www.rnamd.org/StructRMDB/index.html.
Fig. 1.
Layout of StructRMDB database. StructRMDB is a comprehensive database that focuses on RNA secondary structure changes influenced by single-base modifications, including m6A, pseudouridine (Ψ), and RNA-editing (A-to-I). Both ViennaRNA and RNAstructure were applied to predict RNA secondary structures with and without modifications, and four scores were used to quantify the resulting structural alterations. In addition, the database provides extensive functional annotations and visualization of RNA secondary structures.
2. Material and methods
2.1. Workflow
The workflow of StructRMDB involves three main steps: data and materials collection, sequence extraction, analysis and visualization (Fig. 2). Information on modification sites was collected from RMBase V3.0 and m6A-Atlas V2.0, along with annotation and reference genome files from GENCODE 2021, UCSC Genome Browser, and Ensembl. During sequence extraction, genomic regions for each site were annotated. By integrating these annotations with transcript sequences, both mature and pre-RNA sequences containing the modified sites were generated. Additionally, the RNA sequences were analysed using the prediction tools RNAstructure V6.4 and ViennaRNA 2.6.4. Results from these two software were formatted in either Connectivity Table (CT) or dot-bracket notation. The predicted structures for both modified and unmodified RNA were compared using four indices to assess their influence, and the results were classified into four categories. Finally, the predicted secondary structure of modified and unmodified RNA was visualized through secondary structure plots.
Fig. 2.
Workflow of data processing. The process involves three parts: data and materials collection, sequence extraction, and analysis and visualization. (A) Data and materials collection. Modification information, reference files, and annotation files were sourced from public databases, including RMBase V3.0, m6A-Atlas V2.0, GENCODE, Ensembl, and UCSC Genome Browser. (B) Sequence extraction. Genomic regions were annotated and mapped to transcripts to generate mature and pre-RNA sequences containing the modified sites. For each modification type (e.g., m⁶A, Ψ, A-to-I), sequences were generated in two versions: one containing the original base and another incorporating the specific modified base symbol. (C) Analysis and visualization. RNA sequences were analysed by using RNAstructure and ViennaRNA, and the results were formatted in Connectivity Table (CT) and dot-bracket notation. Predicted structures of modified and unmodified RNA were compared using four indices, classified into categories, and visualized through secondary structure plots.
2.2. Data and material collection
Information on three types of chemical modification sites across nine species was collected to assess their impact on secondary structure. Data on m6A-modified RNA sites were collected from the database m6A-Atlas V2.0 [20], while information on pseudouridine and adenosine-to-inosine was obtained from the database RMBase v3.0 [21]. The annotation files for each species were downloaded from GENCODE 2021[22] and Ensembl [23]. The reference genome sequences were mainly downloaded from Ensembl, with some data obtained from GENCODE 2021 and UCSC Genome Browser (Supplementary Table 1). Next, the downloaded GTF files were used to annotate each modification site, identify its genomic region, and determine the corresponding RNA sequence type. If a modified site is annotated as intergenic, no RNA sequence will be generated. If the annotated genomic region comprises introns without exons, this site will only yield a pre-RNA sequence. Similarly, the site will yield a mature RNA sequence if its annotated genomic region contains exons. In this database, both pre-RNA and mature RNA sequences were considered, meaning one modified site could have both sequence types.
2.3. Sequence extraction
The genomic coordinates of transcripts in the GTF file were extracted using the R package GenomicFeatures [24] and subsequently used to retrieve transcript sequences from the reference genome. Following acquisition of transcriptome-wide modification sites and corresponding sequences, a 401 bp sequence (comprising 200 bp upstream, 200 bp downstream, and the modification site) was extracted for secondary structure prediction. In cases where modification sites were mapped to multiple transcript isoforms, the sequence from the longest transcript was preferentially selected. When the required 401 bp sequence could not be obtained, either due to proximity to transcript termini or when the total transcript length was insufficient (Supplementary Figure 1 A), the maximum available sequence length was utilized instead. In some cases, this occurred when the sites were located within shorter non-coding RNAs such as tRNAs and rRNA.
For mature RNA sequence extraction, the R packages GenomicFeatures [24], BSgenome [25], and Biostrings [25] in Bioconductor [25] were used to construct the regions of mature RNA. Furthermore, the corresponding exon information was extracted and merged based on the species-specific BSgenome object to construct the mature RNA sequence. Similar to the pre-RNA sequence extraction, the longest sequence was prioritized for analysis (Supplementary Figure 1B). In this way, each modification site in the genome could have up to two RNA sequences (pre- and mature), each with a corresponding transcriptome name. A specific symbol was added at the modification site in the modified sequences to differentiate them from the original, unmodified sequences.
2.4. Analysis and visualization
2.4.1. Prediction and comparison of secondary structure
The extracted sequences were subjected to secondary structure prediction using RNAstructure 6.4 [26] and ViennaRNA 2.6.4 [27]. The ViennaRNA tool is capable of predicting secondary structures containing three types of RNA modifications: m6A, pseudouridine, and adenosine-to-inosine. In contrast, RNAstructure can only predict secondary structures with m6A modifications. To compare the differences between modified and unmodified secondary structures, the predicted secondary structures, provided in the form of dot-bracket files or Connectivity Tables (CT), were input into RNAforester 2.0.1 [27] and RNAsmc 0.8.0 [28] for evaluation. A total of four types of scores were employed for this evaluation: three from RNAforester—namely “similarity”, “relative similarity”, and “distance”—and one from RNAsmc, the SMC score.
2.5. RNAforester
RNAforester accepts the primary structure of RNA and its secondary structure in the form of a dot-bracket notation, calculates RNA secondary structure alignments, and performs the comparison based on the tree alignment model. The similarity score and distance were provided by RNAforester through a global alignment of RNA secondary structures. The scoring methods are summarized in Table 1.
Table 1.
The scoring method of global similarity and distance (Indel = insertion–deletion).
| Scoring type | Global similarity | Distance |
|---|---|---|
| Pair match | 10 | 0 |
| Pair indel* | -5 | 3 |
| Base match | 1 | 0 |
| Base replacement | 0 | 1 |
| Base indel | -10 | 2 |
The relative similarity score is derived from the similarity score, but it undergoes a normalized process as per the following equation:
Where represents the secondary structure of two sequences. represents for Relative Similarity and represents for Similarity.
At this stage, the relative similarity score is capped at 1, effectively eliminating the influence of the length of the secondary structure. The distance score, another evaluation metric provided by RNAforester, increases with the disparity between two structures, contrasting with the similarity score. When two structures are identical, the distance score is 0. It should be noted that the distance score is also influenced by the sequence length. Only the optimal alignment score is taken into consideration.
2.6. RNAsmc
RNAsmc implements a strategy for dynamic alignment based on structural motifs. It accepts the Connectivity Table (CT) file as input, identifying and annotating structural motifs within the RNA. The output includes a score ranging from 0 to 10, where a score of 10 indicates no structural changes. Notably, RNAsmc demonstrates strong robustness to variations in sequence length [28].
Where and represent as the bulge loop, external loop, hairpin loop, interior loop, multiple branch loop, and stem, correspondingly. and represent the spatial arrangement sets of motifs within RNA1 and RNA2 for each type of motif, respectively. and represent the quantities of motifs in these two RNAs. The first item represents the Jaccard similarity coefficient, and the secondary item represents the likelihood ratio.
After comparison, we will perform visualization of RNA secondary structure before and after modification using RNAplot [29], with the modified position highlighted in the plot.
2.7. RNA secondary structure modification classification
Currently, the study of RNA secondary structures is still in the developmental stage. It remains computationally challenging to accurately determine whether differences in RNA structures can affect their functions. To assist users in identifying RNA structures with significant differences, this study employs the Relative, Distance, and SMC scores for classification. The global similarity score is omitted because it varies with sequence length and is therefore not robust in this scenario. Specifically, sequences with no structural differences (Relative score = 1, Distance = 0, and SMC score = 10) are classified as “No alteration”. For sequences with differences, the three scores are used for ranking. For example, a sequence is classified in the “Top 10 %” category only if all three scores fall within the top 10 % of severe structural changes in the dataset. (The Relative score and SMC score are ranked in ascending order, while the Distance score is ranked in descending order, and the top 10 % from each ranking is selected). Similarly, sequences where all three scores fall between 10 % and 50 % are classified into the “10 %-50 %” category, and those with scores between 50 % and 99.9 % are classified in the “50 %-99.9 %” category (Table 2).
Table 2.
Classification rules for modified RNA secondary structures.
| Category | Criteria | Structural Interpretation |
|---|---|---|
| No alteration | - Structural distance = 0 - Relative score = 1.0 - RNA-SMC score = 10 |
No change in RNA secondary structures |
| 50 %-99.9 % | Site ranks in top 50 %-99.9 % for: - Relative score - Structural distance - RNA-SMC score |
Modest alterations in RNA secondary structure |
| 10 %-50 % | Site ranks in 10 %-50 % for all: - Relative score - Structural distance - RNA-SMC score |
substantial but minor structural changes |
| 10 % | Site ranks in top 10 % for all: - Relative score - Structural distance - RNA-SMC score |
the most significant structural alterations |
2.8. Molecular interaction annotation
In addition, StructRMDB provides RNA-binding proteins (RBPs), microRNAs (miRNAs), and single nucleotide polymorphisms (SNPs). These could help to investigate the connections to other epi-transcriptomic markers and the roles of modifications in gene expression regulation and disease development. The RBPs-binding sites were acquired from POSTAR3 [30]. The miRNAs target sites were retrieved from starBase v2.0 [31]. The SNP information was annotated by Ensembl [23].
2.9. Database construction
The MySQL Database Management System was applied to store and manage all datasets in StructRMDB. Hypertext Preprocessor (PHP) and JavaScript were used to develop the database queries and user interface. The layout and rendering of the web interface were built using HyperText Markup Language (HTML) and Cascading Style Sheets (CSS). Query results can be visualized in various statistical graph forms using DataTables, ECharts, and HighCharts. JBrowse was implemented to navigate all genomic tracks on the web server.
3. Results
StructRMDB, equipped with a user-friendly web interface, facilitates the comprehensive exploration of chemical modifications, RNA secondary structures, and corresponding annotations. The platform offers multiple options for filtering and selecting modification sites, allowing users to tailor their analyses. For instance, users can select results from RNAstructure or ViennaRNA and subsequently choose detailed options for further analysis.
3.1. StructRMDB assesses structural changes induced by several RNA modifications
A total of 791,825 m6A modification sites were collected from m6A-Atlas V2.0. After processing, approximately 1446,500 unique result IDs were generated in StructRMDB using RNAstructure and ViennaRNA separately. Among these, around 740,000 result IDs pertain to pre-RNA, and around 700,000 result IDs pertain to mature-RNA. Detailed information is shown in Table 3. Results from A-to-I and pseudo-modifications were given by ViennaRNA only, including both pre-RNA and mature RNA results. Detailed information is provided in Table 4. Annotation information, including RBPs, miRNAs, and SNPs, is shown in Supplementary Table 2.
Table 3.
RNA m6A modification statistics across in m6A-Atlas V2.0.
| Species | Assembly | Site number | Pre-RNA |
Mature-RNA |
||
|---|---|---|---|---|---|---|
| RNAstructure | ViennaRNA | RNAstructure | ViennaRNA | |||
| Homo sapiens | hg38 | 422,730 | 394,631 | 394,594 | 363,242 | 363,212 |
| Mus musculus | mm10 | 266,632 | 261,142 | 261,142 | 252,205 | 252,193 |
| Rattus norvegicus | Rn6 | 6144 | 5470 | 5469 | 5378 | 5377 |
| Arabidopsis thaliana | TAIR10 | 35,329 | 24,204 | 24,203 | 23,781 | 23,780 |
| Drosophila melanogaster | BDGP6 | 25,570 | 25,335 | 25,333 | 24,588 | 24,582 |
| Saccharomyces cerevisiae | sacCer3 | 10,560 | 10,555 | 10,553 | 10,520 | 10,518 |
| Danio rerio | GRCz10 | 24,860 | 23,108 | 23,089 | 22,275 | 22,332 |
Table 4.
Other two modification statistics across in RMBase, predicted by ViennaRNA only.
| Species | Assembly | Modification | Site number | Pre-RNA | Matrue-RNA |
|---|---|---|---|---|---|
| Homo sapiens | hg38 | A-to-I | 142,648 | 127,985 | 29,854 |
| hg38 | pseudo | 4835 | 3868 | 3162 | |
| Mus musculus | mm10 | A-to-I | 6144 | 7401 | 2486 |
| mm10 | pseudo | 35,329 | 3528 | 3354 | |
| Rattus norvegicus | rn6 | A-to-I | 10 | 4 | 4 |
| rn6 | pseudo | 1317 | 897 | 861 | |
| Bos taurus | bosTau9 | pseudo | 381 | 68 | 5 |
| Oryctolagus cuniculus | OryCun2 | pseudo | 225 | 2 | 0 |
We performed a correlation analysis on four scores utilizing the entire dataset (Fig. 3A). The absolute values of the correlation coefficients between each pair of scores were above 0.9, indicating a strong relationship among these scores. This result demonstrated that classification based on the combination of these scores effectively reflects the status of the RNA secondary structure. As shown in Fig. 3B, the score distribution within the “10 %” category highlights notable points that may lead to significant structural changes. Under this method, the classification results of RNAstructure and ViennaRNA showed a narrower gap.
Fig. 3.
Comparison of RNA secondary structure based on RNAforester and RNAsmc. (A) Heatmap showing the Pearson correlation coefficients among four structure-comparison scores: similarity, relative, distance, and SMC scores. The coefficients quantify the degree of linear correlation between each pair of metrics, where values close to 1 or –1 indicate strong positive or negative relationships, respectively. (B) Boxplots showing the distribution of the four scores within the “10 % alteration” structural category, computed separately using RNAstructure and ViennaRNA predictions.
3.1.1. Case study on gene METTL3
Methyltransferase-like 3 (METTL3) is the most well-known m6A methyltransferase, playing a crucial role in the reversible epi-transcriptomic regulation of m6A modification [32]. A previous study indicated that m6A can affect RNA secondary structure in METTL3-knockout cells. This effect may result from the structural selectivity of the m6A modification machinery for unpaired bases [33]. In StructRMDB, you can search for "METTL3" in the "Gene name" search box after selecting a prediction tool, such as RNAstructure, and the species "Mus musculus". Different filtration options are available as well on the webpage, allowing users to quickly filter the results that meet the requirements (Fig. 4A). According to your selection, the webpage will generate statistical graphs and return the filtered modification sites (Fig. 4B and Fig. 4C). Click a specific site ID (e.g., m6A_mm10_StruRM_RNAstructure_345677), and detailed information such as primary structure, secondary structure, and data source (Fig. 4D) is shown. Additionally, RBPs, miRNAs, and SNPs information are integrated into the website to assist in exploring the post-transcriptional machinery. For the site m6A_mm10_StruRM_RNAstructure_345677, the database indicates that it is located within the interaction range of 6 miRNAs (Fig. 4E). Visualization is displayed at the bottom of the page to help users understand the effect of this site on RNA secondary structure more intuitively (Fig. 4F). Users can also click “JBrowse” to view the modification site in the genome browser (Fig. 4G).
Fig. 4.
The m6A site of gene Mettl3 and related information in StructRMDB. (A) Filtration options of the modification site. (B) The statistical graphs returned by the chosen setting. (C)The basic information of the filtered site. (D) The details about the site include extra information about the primary structure, secondary structure, Data source (E) Detailed RBPs and miRNAs, and SNPs information associated with the modification site. Here are the miRNAs of a specific site m6A_mm10_StruRM_RNAstructure_345677. (F) Visualization of a specific site m6A_mm10_StruRM_RNAstructure_345677. (G) JBrowse of a specific site m6A_mm10_StruRM_RNAstructure_345677(m6A_mm10_262588).
3.1.2. Case study on MALAT1
A m6A site located within the hairpin stem of the human long noncoding RNA Metastasis Associated Lung Adenocarcinoma Transcript 1 (MALAT1) was identified by Liu et al. [12]. In StructRMDB, this site corresponds to m6A_hg38_StruRM_RNAstructure_692104 (RNAstructure) and m6A_hg38_StruRM_vienna_692047 (ViennaRNA). As shown in Fig. 5A, the m6A modification occurs at position 2577 and is highlighted in red. Both of prediction programs accurately reproduced the hairpin stem structure in the original, unmodified sequence (Figs. 5B and 5C). After the introduction of the m6A modification, the secondary structure predicted by RNAstructure changed, whereas ViennaRNA retained the original conformation. Notably, both prediction tools exhibited decreased absolute MFE values following m6A modification, suggesting reduced structural stability [34]. This observation is consistent with experimental findings from Liu et al. [12], who demonstrated that m6A residues within RNA stems can destabilize RNA duplexes. Specifically, m6A at position 2577 in MALAT1 tends to disrupt base stacking and pairing interactions, thereby decreasing local thermodynamic stability, an effect that aligns well with our computational predictions. In addition, HNRNPC, which recognizes m⁶A-induced RNA structural rearrangements through the “m6A-switch” mechanism described by Liu et al. [9], was identified in StructRMDB’s RBP annotation section. This allows users to conveniently access relevant information and facilitates the generation of more concrete biological insights from the data.
Fig. 5.
The m6A modification on the MALAT1 gene destabilizes its secondary RNA structure and the corresponding information in StructRMDB. (A) A validated secondary structure of the MALAT1 hairpin, with the m6A site at position 2577 highlighted in red. (B) The local secondary structure and MFE values before and after the m6A modification, obtained using RNAstructure (m6A_hg38_StruRM_RNAstructure_692104). (C) The local secondary structure and MFE values before and after the m⁶A modification, obtained using ViennaRNA (m6A_hg38_StruRM_vienna_692047). Since the pre-RNA and mature RNA share the same sequence at this site, only one is shown.
4. Discussion
The prediction of RNA secondary structure is crucial for understanding its function, as structure largely determines RNA stability, localization, and intermolecular interactions. Accurate structural modelling is also vital for computational design strategies, as demonstrated by recent work showing that modelling the target secondary structure can significantly improve nucleic acid library design and screening efficiency [35]. However, most existing predictive tools share a significant limitation: they fail to adequately account for the influence of RNA modifications on secondary structure. Against this backdrop, RNAstructure and ViennaRNA have emerged as two of the few tools capable of handling large-scale datasets while simultaneously incorporating modification-specific thermodynamic parameters. This unique combination of features establishes them as the most practical options for modification-inclusive predictions at present, despite their inherent limitations. In StructRMDB, we currently focus on the effects of three well-characterized RNA modifications, including m6A, pseudouridine (Ψ), and adenosine-to-inosine (A-to-I) editing, on RNA secondary structures. The fact that these modifications have been experimentally validated in previous research lends reliability and biological relevance to our predictions 17, 27.
While StructRMDB serves as a valuable platform for exploring the effects of RNA modifications on secondary structure, its current version has certain limitations in data coverage, algorithmic integration, and model complexity, which also highlight directions for future improvement. First, the scope and depth of the database can be further expanded. Future iterations could incorporate additional modification types, such as N5-methylcytosine (m5C), N7-methylguanosine (m7G), and N1-methyladenosine (m1A), as well as broaden the range of included species. Second, discrepancies between prediction tools represent an unavoidable challenge. Different software packages use distinct built-in thermodynamic parameters, which can lead to variation in predicted structures for the same modified sequence. For example, RNAstructure and ViennaRNA may yield different results depending on how each model incorporates modification-induced energy changes. At present, StructRMDB provides separate outputs for each algorithm, as reconciling these methodological differences is not yet feasible. Third, the underlying algorithmic approach for identifying modified bases remains imperfect and may result in inaccuracies. Errors may arise from limitations in the underlying thermodynamic data, particularly those derived from optical melting experiments using small synthetic oligonucleotides [36], or from insufficient experimental evidence for certain modification contexts.
At the algorithmic level, the core limitation lies in the simplified single-modification assumption and limited data coverage. The lack of experimentally determined thermodynamic parameters for different combinations of modifications restricts structural predictions to a single modification type per RNA sequence [18]. This simplified model fails to capture the synergistic or antagonistic effects of multiple co-occurring modifications, which are biologically prevalent in molecules such as tRNAs and rRNAs [37]. Meanwhile, it is also important to develop specialized algorithms for different types of RNA that account for modifications. For example, accurately predicting the secondary structure of lncRNAs remains challenging because the presence of pseudoknots remains a major obstacle [38]. Addressing these issues relies on two primary strategies: [1] The field needs innovation in predictive tools and experimental validation to expand the range of modifications that can be accurately modelled. The recent inclusion of RNA structure prediction in AlphaFold 3 [39], for instance, presents a promising avenue for future development of tools capable of predicting RNA secondary structures incorporating modifications. [2] It is crucial to integrate high-resolution datasets from technologies like nanopore sequencing and high-resolution mass spectrometry, which provide site-specific information on coexisting modifications, thereby providing the data foundation for constructing complex energy models 40, 41. Although the accuracy of nanopore sequencing requires further improvement, its unique principle—distinct from that of second-generation sequencing—holds promise for resolving challenges related to multiple coexisting modifications and transcript isoform ambiguity, thereby enhancing the accuracy of RNA structure prediction in complex modification contexts.
At the methodological level, technical challenges remain, particularly those arising from sequence length bias and the approach used for classification. Although the selection of a default 401 bp sequence length is not optimal, its prediction accuracy does not appear to decrease significantly. According to previous studies, when the sequence length is ≤ 700 bp, the average prediction accuracy of minimum free energy (MFE)-based methods can reach approximately 73 % [38]. Meanwhile, our case study on MALAT1 successfully reproduced the authentic local secondary structure as confirmed by biological experiments, even though the sequence length was not identical to the original [12]. Nevertheless, the influence of sequence length should not be overlooked. Both the overall RNA length and the position of the modified base can affect prediction outcomes at multiple stages of analysis. For instance, during parameter testing, the chemical stability of phosphodiester bonds in certain oligoribonucleotides has been shown to vary with sequence length [42], suggesting that sequence-dependent physicochemical properties may partially account for prediction variability. Currently, there is no established benchmark for defining categories of structural alteration. To address this, our classification method provides a practical way for users to differentiate modification sites that influence RNA structural changes and to efficiently identify those that may have a substantial impact. In the future, refinements could incorporate more robust methods to assess structural alterations and further refine the classification process.
In summary, the future objectives for StructRMDB are to expand its repertoire of modification models, incorporate additional modification types and species to provide broader insights into modification–structure relationships, and integrate a wider range of modification-aware tools for comprehensive structural evaluation. Improved assessment methods could also provide more scientifically grounded interpretations of structural alterations. By systematically addressing these limitations, StructRMDB is poised to become a more comprehensive and reliable resource for elucidating the regulatory roles of RNA modifications on structural dynamics, thereby facilitating the discovery of their potential biological functions.
CRediT authorship contribution statement
Jia Meng: Writing – review & editing, Supervision, Resources, Project administration, Funding acquisition. Jingxian Zhou: Writing – review & editing. Xuan Wang: Visualization, Resources, Methodology. Ziyan Zhang: Writing – original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation. Jiongming Ma: Writing – review & editing, Validation, Supervision, Project administration, Methodology, Investigation. Yuheng Cai: Writing – review & editing, Supervision. Yuxin Liang: Visualization. Bowen Song: Writing – review & editing, Supervision.
Funding
National Natural Science Foundation of China [31671373]; XJTLU Key Program Special Fund [KSF-E-51 and KSF-P-02]. This work is Supported by the Supercomputing Platform of Xi’an Jiaotong-Liverpool University.
Declaration of Competing Interest
The authors declare no competing interests.
Acknowledgements
Author contributions: Jia Meng conceived the study and initiated the project. Ziyan Zhang collected and analysed the epi-transcriptomic data under the supervision of Jiongming Ma, who provided guidance on analytical challenges. Xuan Wang designed and implemented the StructRMDB website. Ziyan Zhang, Jingxian Zhou and Jiongming Ma drafted the manuscript. Yuheng Cai and Jiongming Ma supervised the project. All authors reviewed, critically revised, and approved the final version.
Footnotes
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.csbj.2025.11.058.
Contributor Information
Yuheng Cai, Email: Yuheng.Cai@liverpool.ac.uk.
Jiongming Ma, Email: J.Ma39@liverpool.ac.uk.
Appendix A. Supplementary material
Supplementary material
Supplementary material
Supplementary material
References
- 1.Cai C., Yu J., Zhang X., Zhou T., Chen Q. A model for propagation of RNA structural memory through biomolecular condensates. Nat Cell Biol. 2025;27:1381–1386. doi: 10.1038/s41556-025-01736-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hull C.M., Bevilacqua P.C. Discriminating self and non-self by RNA: roles for RNA structure, misfolding, and modification in regulating the innate immune sensor PKR. Acc Chem Res. 2016;49:1242–1249. doi: 10.1021/acs.accounts.6b00151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pyle A.M. Ribozymes: a distinct class of metalloenzymes. Science. 1993;261:709–714. doi: 10.1126/science.7688142. [DOI] [PubMed] [Google Scholar]
- 4.Zhang J., Fei Y., Sun L., Zhang Q.C. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat Methods. 2022;19:1193–1207. doi: 10.1038/s41592-022-01623-y. [DOI] [PubMed] [Google Scholar]
- 5.Wan Y., Kertesz M., Spitale R.C., Segal E., Chang H.Y. Understanding the transcriptome through RNA structure. Nat Rev Genet. 2011;12:641–655. doi: 10.1038/nrg3049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Georgakopoulos-Soares I., Parada G.E., Hemberg M. Secondary structures in RNA synthesis, splicing and translation. Comput Struct Biotechnol J. 2022;20:2871–2884. doi: 10.1016/j.csbj.2022.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhao Q., Zhao Z., Fan X., Yuan Z., Mao Q., Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol. 2021;17 doi: 10.1371/journal.pcbi.1009291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cicconetti C., Lauria A., Proserpio V., Masera M., Tamburrini A., Maldotti M., Oliviero S., Molineris I. 3plex enables deep computational investigation of triplex forming lncRNAs. Comput Struct Biotechnol J. 2023;21:3091–3102. doi: 10.1016/j.csbj.2023.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sridhar A., More A.S., Jadhav A.R., Patil K., Mavlankar A., Dixit V.M., Bapat S.A. Pattern recognition in the landscape of seemingly random chimeric transcripts. Comput Struct Biotechnol J. 2023;21:5153–5164. doi: 10.1016/j.csbj.2023.10.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Krautwurst S., Lamkiewicz K. RNA-protein interaction prediction without high-throughput data: An overview and benchmark of in silico tools. Comput Struct Biotechnol J. 2024;23:4036–4046. doi: 10.1016/j.csbj.2024.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Choi Y.J., Gibala K.S., Ayele T., Deventer K.V., Resendiz M.J.E. Biophysical properties, thermal stability and functional impact of 8-oxo-7,8-dihydroguanine on oligonucleotides of RNA-a study of duplex, hairpins and the aptamer for preQ1 as models. Nucleic Acids Res. 2017;45:2099–2111. doi: 10.1093/nar/gkw885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu N., Dai Q., Zheng G., He C., Parisien M., Pan T. N(6)-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature. 2015;518:560–564. doi: 10.1038/nature14234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dominissini D., Nachtergaele S., Moshitch-Moshkovitz S., Peer E., Kol N., Ben-Haim M.S., Dai Q., Di Segni A., Salmon-Divon M., Clark W.C., et al. The dynamic N(1)-methyladenosine methylome in eukaryotic messenger RNA. Nature. 2016;530:441–446. doi: 10.1038/nature16998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wang Y., Wang S., Meng Z., Liu X.M., Mao Y. Determinant of m6A regional preference by transcriptional dynamics. Nucleic Acids Res. 2024;52:3510–3521. doi: 10.1093/nar/gkae169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cui L., Ma R., Cai J., Guo C., Chen Z., Yao L., Wang Y., Fan R., Wang X., Shi Y. RNA modifications: importance in immune cell biology and related diseases. Signal Transduct Target Ther. 2022;7:334. doi: 10.1038/s41392-022-01175-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mathews D.H. Revolutions in RNA secondary structure prediction. J Mol Biol. 2006;359:526–532. doi: 10.1016/j.jmb.2006.01.067. [DOI] [PubMed] [Google Scholar]
- 17.Kierzek E., Zhang X., Watson R.M., Kennedy S.D., Szabat M., Kierzek R., Mathews D.H. Secondary structure prediction for RNA sequences including N(6)-methyladenosine. Nat Commun. 2022;13:1271. doi: 10.1038/s41467-022-28817-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Varenyk Y., Spicher T., Hofacker I.L., Lorenz R. Modified RNAs and predictions with the ViennaRNA package. Bioinformatics. 2023;39 doi: 10.1093/bioinformatics/btad696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Roundtree I.A., Evans M.E., Pan T., He C. Dynamic RNA modifications in gene expression regulation. Cell. 2017;169:1187–1200. doi: 10.1016/j.cell.2017.05.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liang Z., Ye H., Ma J., Wei Z., Wang Y., Zhang Y., Huang D., Song B., Meng J., Rigden D.J., Chen K. m6A-Atlas v2.0: updated resources for unraveling the N6-methyladenosine (m6A) epitranscriptome among multiple species. Nucleic Acids Res. 2024;52:D194–D202. doi: 10.1093/nar/gkad691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Xuan J., Chen L., Chen Z., Pang J., Huang J., Lin J., Zheng L., Li B., Qu L., Yang J. RMBase v3.0: decode the landscape, mechanisms and functions of RNA modifications. Nucleic Acids Res. 2024;52:D273–D284. doi: 10.1093/nar/gkad1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Frankish A., Diekhans M., Jungreis I., Lagarde J., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Armstrong J., Barnes I., et al. Gencode 2021. Nucleic Acids Res. 2021;49:D916–D923. doi: 10.1093/nar/gkaa1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Harrison P.W., Amode M.R., Austine-Orimoloye O., Azov A.G., Barba M., Barnes I., Becker A., Bennett R., Berry A., Bhai J., et al. Ensembl 2024. Nucleic Acids Res. 2024;52:D891–D899. doi: 10.1093/nar/gkad1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lawrence M., Huber W., Pages H., Aboyoun P., Carlson M., Gentleman R., Morgan M.T., Carey V.J. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9 doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5 doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Reuter J.S., Mathews D.H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinforma. 2010;11:129. doi: 10.1186/1471-2105-11-129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lorenz R., Bernhart S.H., Honer Zu Siederdissen C., Tafer H., Flamm C., Stadler P.F., Hofacker I.L. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6:26. doi: 10.1186/1748-7188-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang H., Lu X., Zheng H., Wang W., Zhang G., Wang S., Lin P., Zhuang Y., Chen C., Chen Q., et al. RNAsmc: A integrated tool for comparing RNA secondary structure and evaluating allosteric effects. Comput Struct Biotechnol J. 2023;21:965–973. doi: 10.1016/j.csbj.2023.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ponty Y., Leclerc F. Drawing and editing the secondary structure(s) of RNA. Methods Mol Biol. 2015;1269:63–100. doi: 10.1007/978-1-4939-2291-8_5. [DOI] [PubMed] [Google Scholar]
- 30.Zhao W., Zhang S., Zhu Y., Xi X., Bao P., Ma Z., Kapral T.H., Chen S., Zagrovic B., Yang Y.T., Lu Z.J. POSTAR3: an updated platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins. Nucleic Acids Res. 2022;50:D287–D294. doi: 10.1093/nar/gkab702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Li J.H., Liu S., Zhou H., Qu L.H., Yang J.H. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42:D92–97. doi: 10.1093/nar/gkt1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu S., Zhuo L., Wang J., Zhang Q., Li Q., Li G., Yan L., Jin T., Pan T., Sui X., et al. METTL3 plays multiple functions in biological processes. Am J Cancer Res. 2020;10:1631–1646. [PMC free article] [PubMed] [Google Scholar]
- 33.Spitale R.C., Flynn R.A., Zhang Q.C., Crisalli P., Lee B., Jung J.W., Kuchelmeister H.Y., Batista P.J., Torre E.A., Kool E.T., Chang H.Y. Structural imprints in vivo decode RNA regulatory mechanisms. Nature. 2015;519:486–490. doi: 10.1038/nature14263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liu B., Merriman D.K., Choi S.H., Schumacher M.A., Plangger R., Kreutz C., Horner S.M., Meyer K.D., Al-Hashimi H.M. A potentially abundant junctional RNA motif stabilized by m(6)A and Mg(2) Nat Commun. 2018;9:2761. doi: 10.1038/s41467-018-05243-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen L., Zhang B., Wu Z., Liu G., Li W., Tang Y. In Silico discovery of aptamers with an enhanced library design strategy. Comput Struct Biotechnol J. 2023;21:1005–1013. doi: 10.1016/j.csbj.2023.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zuber J., Cabral B.J., McFadyen I., Mauger D.M., Mathews D.H. Analysis of RNA nearest neighbor parameters reveals interdependencies and quantifies the uncertainty in RNA secondary structure prediction. RNA. 2018;24:1568–1582. doi: 10.1261/rna.065102.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jackman J.E., Alfonzo J.D. Transfer RNA modifications: nature's combinatorial chemistry playground. Wiley Inter Rev RNA. 2013;4:35–48. doi: 10.1002/wrna.1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ballarino M., Pepe G., Helmer-Citterich M., Palma A. Exploring the landscape of tools and resources for the analysis of long non-coding RNAs. Comput Struct Biotechnol J. 2023;21:4706–4716. doi: 10.1016/j.csbj.2023.09.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lucas M.C., Pryszcz L.P., Medina R., Milenkovic I., Camacho N., Marchand V., Motorin Y., Ribas de Pouplana L., Novoa E.M. Quantitative analysis of tRNA abundance and modifications by nanopore RNA sequencing. Nat Biotechnol. 2024;42:72–86. doi: 10.1038/s41587-023-01743-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yuan X., Su Y., Johnson B., Kirchner M., Zhang X., Xu S., Jiang S., Wu J., Shi S., Russo J.J., et al. Mass Spectrometry-Based Direct Sequencing of tRNAs De Novo and Quantitative Mapping of Multiple RNA Modifications. J Am Chem Soc. 2024;146:25600–25613. doi: 10.1021/jacs.4c07280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kierzek R. Hydrolysis of oligoribonucleotides: influence of sequence and length. Nucleic Acids Res. 1992;20:5073–5077. doi: 10.1093/nar/20.19.5073. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material
Supplementary material
Supplementary material





