Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2025 Sep 22;41(11):btaf527. doi: 10.1093/bioinformatics/btaf527

TVAE-RNA: ensemble-based RNA secondary structure prediction via transformer variational autoencoders

Xiyuan Mei 1, Hanbo Liu 2, Yuheng Zhu 3, Enshuang Zhao 4, Longyi Li 5, Hao Zhang 6,
Editor: Jianlin Cheng
PMCID: PMC12640237  PMID: 40981507

Abstract

Motivation

Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.

Results

We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.

Availability and implementation

Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA. The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.

Graphical abstract

Graphical Abstract.

Graphical Abstract

1 Introduction

RNA plays essential roles in gene regulation (Cao et al. 2024), translation (Holbrook 2005), and genome maintenance (Wan et al. 2011). Its biological functions are tightly linked to secondary structures (Higgs 2000), which govern interactions with proteins, ions, and small molecules (Svoboda and Cara 2006). Therefore, accurate prediction of RNA secondary structure is critical for understanding RNA biology (Butcher and Pyle 2011) and enabling therapeutic design (Gebauer et al. 2021).

Traditional methods—including dynamic programming algorithms (Nussinov and Jacobson 1980, Do et al. 2006), and chemical probing techniques (Hajdin et al. 2013, Ding et al. 2014, Tian and Das 2016)—thermodynamic models (Reuter and Mathews 2010, Lorenz et al. 2011)—face limitations in handling long sequences, pseudoknots, and data scarcity. Deep learning approaches, especially transformers (Zhao et al. 2021, Han et al. 2023) and VAEs (Bao et al. 2017, Burgess et al. 2018, Dai and Wipf 2019), have made significant progress by capturing long-range dependencies and modeling structural uncertainty. Hybrid models like KnotFold (Gong et al. 2024), SPOT-RNA (Singh et al. 2019), SPOT-RNA2 (Singh et al. 2021), RNAMoIP (Loyer and Reinharz 2024), and UFold (Fu et al. 2022) further improve performance through architectural integration or data fusion (Sato and Hamada 2023).

However, most existing models remain deterministic, predicting a single “optimal” structure, despite strong experimental evidence that RNA molecules adopt diverse structures under physiological conditions (Zhang et al. 2024). This limits their biological realism and therapeutic applicability.

To address these challenges, we propose TVAE, a Transformer-VAE hybrid that combines global sequence modeling with probabilistic latent representations. While our approach focuses on secondary structure diversity, we acknowledge that complete RNA structure ensembles involve tertiary structure dynamics, ligand-induced conformational changes, and environmental factors. Our work represents a first step toward ensemble-based prediction at the secondary structure level. TVAE generates diverse (Zhang et al. 2023), biologically plausible structures per RNA sequence (Cordero and Das 2015), capturing structural uncertainty and improving predictions for complex features like pseudoknots and non-canonical pairs.

We further introduce GHA-Pairing, a scalable post-processing algorithm for translating probabilistic outputs into discrete, interpretable secondary structures while preserving biological constraints.

In summary, TVAE advances RNA structure prediction by shifting from deterministic to diversity-aware modeling, offering a robust and biologically grounded framework for both basic and translational RNA research.

2 Materials and methods

2.1 Datasets and preprocessing

Our 150k pooled corpus includes sequences that overlap with the standard RNA-Strand and Rfam benchmark sets. To avoid any leakage, we treated these benchmarks as held-out evaluation resources: entries belonging to the standard RNA-Strand test set and to Rfam families designated for benchmarking were excluded from training/validation splits of BRNdata. We additionally report performance on these held-out benchmarks to verify robustness. Following common practice for reducing homologous bias in RNA benchmarks, we applied CD-HIT-EST at 80% identity to balance the removal of near-duplicate sequences with retention of family-level diversity. We will provide detailed filtering statistics (counts retained/removed by step; length distributions before/after) in Tables 1 and 2, available as supplementary data at Bioinformatics online, to justify the ∼87% reduction.

In this study, RNA sequences were collected from three public databases: bpRNA-1m (Danaee et al. 2018), Rfam (Griffiths-Jones et al. 2003), and lncRNAdb (Amaral et al. 2011), totaling over 150 000 entries across 4000+ RNA families. To improve model generalization, sequences were aggregated and processed via a standardized pipeline. Low-quality entries were removed, and FASTA files were reformatted uniformly. Redundancy was reduced using CD-HIT-EST (Li and Godzik 2006) (≥80% identity), and homologous sequences were filtered using BLAST-N (Altschul et al. 1997) (E-value ≤10). After deduplication, platform consistency analysis showed no significant batch effects, allowing for simple integration. From the cleaned data, 30% were sampled to construct BRNdata, a refined dataset with 18 328 unique RNA sequences.

2.2 RNA-FM embedding model

We employed RNA-FM (Chen et al. 2022), a 12-layer Transformer encoder pretrained with masked nucleotide prediction (15% masking rate), to extract RNA sequence embeddings. The model architecture (640-dimensional feedforward networks, 20-head self-attention) generates contextualized embeddings (L × 640 matrix) that simultaneously encode positional, structural, and functional features at nucleotide resolution.

2.3 Transformer-based variational autoencoder model

To capture long-range dependencies and structural variability in RNA, we propose TVAE, a Transformer-based Variational Autoencoder. It processes RNA-FM embeddings through a shared Transformer encoder–decoder architecture, augmented with dynamic attention spans and relative position encoding. A latent variable is sampled via the reparameterization trick and used to reconstruct base-pairing matrices. The overall workflow is outlined in Algorithm 1.

Algorithm 1: Transformer-VAE Workflow with Dynamic Attention and Relative Positional Encoding.

Input: RNA-FM embedding E ∈ ℝ^{L×640}

Output:Predicted base-pairing probability matrix PRL×L

1. Embedding Projection

   H ← Linear(E) → Activation → LayerNorm

2. Transformer Encoder with Dynamic Attention

   For each Transformer encoder layer:

    a. For each position i:

     - Compute dynamic attention span:

     span ← min(S_max, 1 + growth_rate * i)

     - Define attention mask:

     mask[i] ← [j for j in range(i - span, i + span +

1)]

   b. For each (i, j) in attention:

    # Relative position bias (proposed)

    - Compute attention score with relative bias:

    AttnScore[i][j] ← Q[i] • K[j] + RelBias[i - j]

   c. Apply Multi-head Attention + Residual + FFN +LayerNorm

3. Latent Variable Sampling (VAE Module)

   - Compute mean and variance:

     Z_mean, Z_logvar ← Linear(H)

   - Reparameterization:

     Sample Z ← Z_mean + ε * exp(0.5 * Z_logvar)

4. Transformer Decoder

   For each Transformer decoder layer:

     Apply Multi-head Attention + Residual + FFN +

LayerNorm

5. Output Layer

   - Compute base-pairing probabilities:

     Output base-pairing matrix ← Sigmoid(Linear(Z))

Algorithm 1. The workflow integrates two novel mechanisms to enhance RNA structure modeling:

  1. Dynamic Attention Span, which adapts attention windows based on nucleotide position to balance locality and efficiency in long sequences;

  2. Relative Positional Encoding, which adds learnable distance-aware bias to attention scores, improving the modeling of RNA folding constraints.

These innovations allow the model to better capture long-range and context-specific interactions critical to secondary structure prediction.

An overview of the Transformer-based architecture is illustrated in Fig. 1, which shows the full data flow of our proposed TVAE framework. The model takes RNA-FM embeddings as input and applies a shared Transformer encoder–decoder backbone, augmented with a dynamic attention span mechanism and relative positional encoding. The encoder generates contextual representations and estimates the latent distribution parameters (mean and variance), from which a latent variable is sampled. The decoder reconstructs a base-pairing matrix from the sampled latent variable. This architecture enables diverse and biologically plausible RNA secondary structure predictions.

Figure 1.

Figure 1.

The Transformer architecture includes a linear layer, multi-head self-attention, feedforward network, and normalization.

Beyond the Transformer architecture, TVAE introduces a variational inference component to model structural diversity. As illustrated in Fig. 2, the encoder outputs are mapped to the mean and variance of a latent variable distribution, from which a latent vector is sampled using the reparameterization trick. This sampled latent representation captures the uncertainty and variability of RNA folding and is subsequently decoded to reconstruct the base-pairing probability matrix. The variational component enables the generation of diverse yet biologically plausible RNA secondary structures.

Figure 2.

Figure 2.

The VAE uses Transformer-based encoder and decoder. The encoder generates latent variable Z, which the decoder uses to predict RNA secondary structure.

In summary, the TVAE framework integrates a Transformer-based encoder–decoder backbone with variational inference, enabling the model to capture both global sequence dependencies and structural uncertainty. The incorporation of dynamic attention span and relative positional encoding enhances modeling of RNA-specific structural constraints, while the probabilistic latent space allows for diverse and biologically meaningful predictions. This architectural design lays the foundation for accurate, robust, and diversity-aware RNA secondary structure modeling.

2.4 Loss function

The model is trained end-to-end using a loss function combining reconstruction loss and KL divergence to capture RNA’s probabilistic structure.

The KL divergence is used to measure the distance between the latent variable distribution q(zx)=N(μ,σ2) generated by the encoder and the standard normal distribution p(z)=N(0, I).  The expression is as follows:

LKL=12i=1L(1+ log σi2-μi2-σi2) (1)

We apply masking to exclude padding regions and ensure numerical stability to prevent NaNs from log or float errors. The KL loss is batch-averaged and negated during training.

The reconstruction loss Lrecon is calculated using the mean absolute error (MAE) between the reconstructed output and the target matrix, formulated as follows:

Lrecon=i=1Nypred(i)-ytrue(i) (2)

where N denotes the total number of elements in the matrix, ypred(i) and ytrue(i) represent the values at the ith position in the reconstructed output and target matrix, respectively.

The final loss function of the VAE model is obtained by summing the reconstruction loss and the regularization loss, and then multiplying by a scaling coefficient:

L=αLrecon+βLKL (3)

The weights α and β in Equation (3) were empirically determined via grid search on the validation set. To assess robustness, we conducted sensitivity analysis by varying α and β within the range [0, 1.0]. Figure 1, available as supplementary data at Bioinformatics online, shows that model performance is stable across this range, with α = 0.9 and β = 0.1 providing the best balance between reconstruction accuracy and KL regularization.

The framework integrates RNA-FM embeddings with a Transformer-based VAE. The encoder maps sequences to a latent space, and the decoder reconstructs RNA structures from sampled variables. Both are trained jointly to minimize reconstruction and KL divergence losses, enabling diverse and meaningful structural predictions (see Fig. 3).

Figure 3.

Figure 3.

The full training pipeline. Starting from RNA sequence input, the framework uses the RNA-FM and VAE models to predict RNA secondary structure.

Training the VAE-based model significantly enhances its adaptability to diverse RNA sequence distributions. Experimental results demonstrate that our method consistently outperforms current state-of-the-art approaches, particularly on RNA families not seen during training. This improved generalization is reflected in higher prediction accuracy across various RNA types, underscoring the model’s robustness and broad applicability in RNA secondary structure prediction.

2.5 GHA-Pairing: Greedy Hungarian approximation for RNA pairing

To efficiently convert predicted base-pairing probability matrices into discrete RNA secondary structures, we propose GHA-Pairing, a fast greedy approximation of the Hungarian algorithm (Mills-Tettey et al. 2007). This method preserves high-quality pairings while substantially reducing computational cost.

Given a predicted foreground pairing probability matrix PfgRL×L and a background pairing probability matrix PbgRL×L, GHA-Pairing constructs a score matrix SRL×L by computing the log-odds contrast between foreground and background as follows:

Si,j=log(Pi,jfgPi,jbg)+log(1-Pi,jbg1-Pi,jfg)-φ (4)

where φ is a penalty term used to discourage weak or spurious pairings. This scoring function encourages pairings that are consistently favored by the model, while penalizing uncertain or conflicting evidence.

The algorithm then sorts all valid candidate pairs (i, j) that satisfy base-pairing constraints in descending order of Si,j. Starting from the highest-scoring pair, the algorithm greedily assigns the pair if neither base has been paired before. This process continues until no further valid assignments can be made.

The GHA-Pairing algorithm is highly efficient and scalable, with a time complexity of approximately O(n2logn), significantly lower than the classical Hungarian algorithm’s O(n3). Despite its simplified greedy approach, GHA-Pairing achieves accuracy comparable to or surpassing more complex optimization methods. Thus, it offers a fast, accurate, and stable solution for converting predicted pairing probabilities into precise RNA secondary structures.

3 Results

3.1 Dataset and preprocessing results

3.1.1 Dataset statistics

For benchmark evaluation, we reserve the standard RNA-Strand test sequences and a disjoint set of Rfam families strictly excluded from training of BRNdata. This design ensures representativeness while preventing train–test leakage. Results are summarized in the main text, Table 3, available as supplementary data at Bioinformatics online.

We constructed BRNdata, a benchmark dataset for RNA secondary structure prediction, by integrating sequences from bpRNA-1m (Danaee et al. 2018), Rfam (Griffiths-Jones et al. 2003), and lncRNAdb (Amaral et al. 2011). BRNdata contains 18 328 sequences mostly between 50 and 512 nucleotides (average ∼200). The dataset is split into training (70%), validation (20%), and test (10%) sets. After filtering low-quality and redundant sequences, the final dataset is balanced and representative.

3.1.2 Data preprocessing and redundancy removal

We add step-wise filtering statistics (deduplication, CD-HIT-EST @80%, BLAST-N filtering) to quantify contributions to the final size. We provided sequence length distributions before and after deduplication in Figs 2 and 3, available as supplementary data at Bioinformatics online, supporting the representativeness of BRNdata.

To ensure robust model generalization, we implemented rigorous redundancy removal procedures. Specifically, we applied CD-HIT-EST and BLAST-N with an 80% sequence similarity threshold to eliminate potentially homologous sequences. This preprocessing step reduced the dataset size by 70% while significantly enhancing sequence diversity and minimizing overfitting risks.

3.2 Model performance

3.2.1 Prediction accuracy evaluation

We conducted comprehensive performance evaluation using three widely accepted metrics in RNA structure prediction: precision, recall, and F1 score. These metrics provide complementary perspectives on prediction quality, defined as:

Prec=TPTP+FP (5)
Recall=TPTP+FN (6)
F1=2×Recall×PrecRecall+Prec (7)

TP, FP, and FN denote true positives (correctly predicted pairs), false positives (incorrect predictions), and false negatives (missed reference pairs), respectively.

To assess the effectiveness of our proposed TVAE model, we conducted a comprehensive comparison against eight representative baseline methods—E2Efold (Shen et al. 2022), Knotfold (Gong et al. 2024), MXfold2 (Sato et al. 2021), MXfold (Akiyama et al. 2018), RNAfold (Lorenz et al. 2011), RNAstructure (Reuter and Mathews 2010), SPOT-RNA (Singh et al. 2019), and UFold (Fu et al. 2022)—on an identical test dataset. These methods span a range of strategies, including end-to-end deep learning, pseudoknot modeling, thermodynamic integration, and multimodal feature fusion.

TVAE consistently outperforms baseline models across multiple metrics. As shown in Table 1, it achieves an F1 score of 0.8592, precision of 0.8280, and recall of 0.8928 on the test set. To assess generalization, we further evaluated the model on the independent PDB dataset containing sequences unseen during training. The F1 score distributions shown in Fig. 4 demonstrate that TVAE not only yields high accuracy but also maintains stable performance across diverse RNA families on both datasets.

Table 1.

TVAE compares F1 score, precision, and recall across nine methods.

Model Precision Recall F1 score
TVAE 0.828 0.8928 0.8592
UFold 0.5681 0.6687 0.5953
SPOT-RNA 0.5417 0.5987 0.5778
RNAstructure 0.638 0.7487 0.6723
MXFold 0.5829 0.6434 0.6125
MXFold2 0.6101 0.7096 0.6433
KnotFold 0.7164 0.8327 0.7677
E2EFold 0.2697 0.2574 0.2531
RNAFold 0.5324 0.6487 0.5683

Bold values indicate the metrics of our model and represent the highest performance for each metric.

Figure 4.

Figure 4.

Violin plots comparing F1 score distributions of TVAE and baseline methods. (a) Results on the standard test dataset. (b) Results on the independent PDB dataset, which contains RNA sequences not seen during training. TVAE consistently achieves higher and more stable scores across both datasets, demonstrating strong predictive accuracy and generalization.

Table 1 compares F1 score, precision, and recall across nine methods, with TVAE achieving the best performance on all metrics.

To illustrate structural prediction performance, we visualized the predicted and reference base-pairing matrices of two representative sequences from the PDB and test sets (Fig. 5). The results highlight the accuracy and robustness of our method.

Figure 5.

Figure 5.

Visual comparison between predicted and ground truth pairing matrices.

To further assess TVAE’s performance on the test set, we conducted Wilcoxon signed-rank tests (Taheri and Hesamian 2013) against MXFold2 and RNAfold using F1 scores. Results show that TVAE significantly outperforms both baselines on the test set (P < .01), with consistently higher and more stable F1 distributions (Fig. 6), demonstrating stronger structural modeling and generalization capabilities.

Figure 6.

Figure 6.

Wilcoxon signed-rank test results between TVAE, MXFold2, and RNAfold on the same test set. TVAE significantly outperforms the baselines on most samples (P < .01).

In summary, we systematically compared TVAE with state-of-the-art baselines (e.g. SPOT-RNA, RNAfold) across multiple RNA datasets. Violin plots and Wilcoxon signed-rank tests (P < .01) demonstrate that TVAE consistently achieves higher accuracy and stability. These advantages stem from the integration of Transformer and VAE architectures, enabling better modeling of latent RNA structural patterns. TVAE thus offers a robust and promising solution for RNA secondary structure prediction.

In addition to Wilcoxon signed-rank tests, we report 95% confidence intervals and effect sizes (Cohen’s d) for pairwise comparisons, with multiple testing corrections applied using the Benjamini–Hochberg procedure to control the false discovery rate. The results of these statistical analyses are provided in Figs 4 and 5 and Tables 4 and 5, available as supplementary data at Bioinformatics online, which clearly demonstrate that TVAE achieves consistently higher F1 scores compared to RNAFold and MXFold2.

3.2.2 Prediction visualization

To demonstrate TVAE’s superior modeling, we selected representative RNA sequences and compared predicted secondary structures (Fig. 7). Using standard arc notation from the RNAstructure web server (Reuter and Mathews 2010), the diagrams show TVAE’s improved accuracy over UFold, MXFold2, and RNAstructure in capturing complex base-pairing. Figure 7 highlights TVAE’s ability to model complex RNA structures.

Figure 7.

Figure 7.

The picture shows the visual comparison between the RNA secondary structures predicted by the four methods and the ground truth.

To highlight TVAE’s ability to generate multiple distinct structures for one RNA sequence, we show in Fig. 8 different base-pairing matrices from multiple latent space samplings. This demonstrates TVAE’s capacity to model structural heterogeneity with diverse, biologically plausible conformations.

Figure 8.

Figure 8.

Two predicted secondary structures for the same RNA sequence by TVAE are mostly similar with subtle differences, showing the model’s ability to generate diverse yet reasonable outputs.

3.3 Generative diversity and biological plausibility

A distinguishing feature of TVAE is its ability to generate diverse, biologically meaningful RNA secondary structures for each input sequence by probabilistically sampling from the learned latent space. This diversity directly addresses a key limitation of deterministic models, which produce a single “optimal” prediction and fail to capture the range of structural possibilities observed in experimental studies such as SHAPE-MaP (Weeks 2010) and mutate-and-map assays (Rouskin et al. 2014).

Importantly, such structural diversity is biologically functional, not an artifact of modeling. Numerous studies have shown that variations in RNA folding play essential roles in regulatory processes like riboswitch activation, translational attenuation, and splice site accessibility (Breaker 2012). For instance, the HIV-1 5′ UTR dynamically shifts its secondary structure to regulate translation versus packaging (Watts et al. 2009), and RNA viruses often exploit structural variability to evade immune detection (Dethoff et al. 2012).

By embracing diversity in structural prediction, TVAE enables more comprehensive identification of functionally relevant patterns. In the context of siRNA and antisense oligonucleotide (ASO) design, e.g. consistently exposed binding regions across diverse structural predictions can increase targeting efficiency and reduce off-target effects (Kamola et al. 2015). Similarly, in RNA vaccine design, the ability to identify stable and accessible epitopes across multiple predicted structures enhances immunogenicity and robustness (Damase et al. 2021).

In summary, TVAE extends beyond accuracy metrics by offering a diversity-aware framework that supports functional interpretation and therapeutic innovation. Its probabilistic structure generation enables exploration of alternative but plausible RNA foldings, paving the way for RNA-based precision medicine strategies sensitive to genetic variation and environmental context.

To quantify structural diversity, we implemented two complementary metrics based on sampled conformations. First, we estimated base-pairing probabilities  Pij from K sampled structures, from which we derived per-position entropy Hi and reported the global entropy Hglobal. Second, we calculated the Pairwise Structural Diversity Index (PSDI), defined as the mean base–pair distance between all pairs of sampled structures. These two measures together capture both local uncertainty and global conformational variability. As summarized in Table 6, available as supplementary data at Bioinformatics online, our results demonstrate that TVAE generates conformational ensembles with substantial entropy and non-trivial PSDI values, highlighting its ability to produce structurally diverse RNA secondary structures.

In terms of biological validation, we emphasize that entropy and PSDI are widely recognized metrics for characterizing RNA conformational heterogeneity. Elevated entropy values reflect local flexibility at individual nucleotide positions, while larger PSDI values capture substantial structural differences across sampled ensembles. Together, these measures align with established principles of RNA structural dynamics and support the biological plausibility of the diversity generated by TVAE. Moreover, SHAPE-MaP experimental ensembles serve as an important benchmark for assessing conformational heterogeneity. Although direct integration of SHAPE-MaP data is beyond the scope of the present study, our framework is readily extensible to incorporate such datasets in future work, which will further enhance the biological relevance of the predicted structural diversity.

3.4 Ablation analysis

We conducted ablation studies to evaluate the impact of key architectural choices, including input representations (RNA-FM versus one-hot) and VAE regularization (with versus without KL divergence). All models were trained under identical conditions for fair comparison. Notably, removing KL divergence reduces the model to a deterministic Transformer, enabling direct assessment of diversity and generative capacity.

Results show that RNA-FM embeddings consistently lead to better performance, highlighting the value of pretrained features. Incorporating KL divergence further improves generalization, especially with one-hot inputs. To illustrate training dynamics, we plotted F1 score trajectories (Fig. 9), showing that KL-regularized models converge more smoothly and reach higher final accuracy, demonstrating the benefit of variational modeling.

Figure 9.

Figure 9.

F1 score progression during training under four ablation settings.

In addition to the ablation experiments, we further analysed the impact of different α and β weight settings. The results confirm that the ablation conclusions remain valid across a broad range of weight values, demonstrating that the benefits of KL divergence are not sensitive to the precise choice of α and β.

3.5 Hyperparameter settings and training details

To balance performance and model size, we set the embedding dimension to 256 and used four Transformer encoder layers with eight attention heads, totaling ∼3 million parameters. This configuration was chosen as a tradeoff between accuracy and efficiency. While larger models with more layers and attention heads provided marginal improvements, the computational cost increased substantially. Therefore, we adopted the reported configuration as the optimal balance for practical training and evaluation.

TVAE was trained on two NVIDIA RTX 3080 GPUs (16 GB) with a batch size of 4 for 400 000 steps. We used the AdamW optimizer with a 0.0001 learning rate, β1 = 0.9, β2 = 0.999, and L2 weight decay of 0.01.

3.6 Computation optimization and model compression

3.6.1 Model compression results

To evaluate computational scalability, we compared the wall-clock time of TVAE with baseline methods across different sequence lengths (128–1024 nt). Results in Table 7, available as supplementary data at Bioinformatics online, demonstrate that TVAE achieves very high efficiency, requiring only a fraction of a second to predict a single sequence, which is substantially faster than most mainstream methods. Moreover, TVAE scales approximately linearly with sequence length and remains feasible up to 1024 nt on a single 16 GB GPU. For sequences longer than 512 nt, we employed a segmented prediction strategy that partitions the input into overlapping windows, followed by structure stitching to ensure global consistency (Fan et al. 2024). Furthermore, we systematically investigated the trade-off between ensemble size and performance. As shown in Fig. 6, available as supplementary data at Bioinformatics online, generating 10–20 samples provides an optimal balance between structural diversity and predictive accuracy, while larger ensembles yield diminishing returns.

To enhance inference efficiency, we applied model quantization and pruning. These techniques reduced weight precision and removed redundant parameters, respectively, yielding a 30% reduction in model size and 40% lower FLOPs. The optimized model achieved up to 5× faster inference with less than 5% accuracy loss, maintaining competitive performance for practical deployment.

3.6.2 Knowledge distillation for improved computational efficiency

To improve scalability, we used knowledge distillation to create a lightweight student model. It learns from ground truth and TVAE’s soft predictions, inheriting ensemble capabilities in compressed form. This yields a three-fold speedup while retaining 90% of original accuracy, enabling efficient deployment in resource-limited settings and demonstrating knowledge distillation’s practical value for RNA structure prediction.

4 Discussion

In this study, we proposed TVAE, a Transformer-based Variational Autoencoder designed to address key challenges in RNA secondary structure prediction, including long-range dependencies, pseudoknot modeling, and structural variability. By integrating a self-attention-based encoder–decoder backbone with probabilistic latent space modeling, TVAE captures both global contextual information and intrinsic structural uncertainty, enabling the generation of diverse, biologically plausible secondary structures.

Unlike traditional deterministic models that output a single “optimal” structure, TVAE leverages variational inference to learn a latent distribution, from which multiple structure candidates can be generated. This allows the model to identify regions of structural ambiguity, which may correspond to flexible, functionally important RNA segments. The ability to reflect such uncertainty is particularly valuable in cases where experimental data is sparse or conflicting, making TVAE a useful tool for guiding further biological investigation.

While our model shows strong predictive performance across multiple datasets, including unseen RNA families, there are several limitations. First, TVAE is limited to secondary structure prediction and does not incorporate tertiary interactions or account for structural changes induced by proteins, ligands, or environmental factors such as temperature, pH, and ionic concentration. Second, the model does not capture RNA kinetic folding pathways, which may influence structural equilibria in vivo. Future extensions could integrate high-throughput structure probing data, folding kinetics, or 3D constraints to enhance the biological realism and scope of the framework.

Despite these limitations, this work represents a meaningful first step toward uncertainty-aware and diversity-enabled RNA structure prediction. The proposed TVAE model not only achieves high prediction performance but also provides valuable analytical capabilities for RNA biology and therapeutic applications. First, it can assist experimental design by highlighting structurally uncertain regions that merit further investigation. Second, it supports mutation prioritization by identifying sequence positions where changes are likely to induce structural shifts. Third, it enables confidence estimation by quantifying predictive variability, thereby improving the interpretability and reliability of structure predictions in downstream tasks. Together, these features make TVAE a practical and versatile tool for both fundamental research and translational studies in RNA science.

Supplementary Material

btaf527_Supplementary_Data

Contributor Information

Xiyuan Mei, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Hanbo Liu, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Yuheng Zhu, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Enshuang Zhao, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Longyi Li, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Hao Zhang, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Supplementary data

Supplementary data are available at Bioinformatics online.

Author contributions

Xiyuan Mei (Resources [lead], Software [lead], Supervision [lead], Validation [lead], Visualization [lead], Writing—original draft [lead], Writing—review & editing [lead]), Hanbo Liu (Software [supporting]), Yuheng Zhu (Supervision [supporting]), Enshuang Zhao (Resources [supporting]), Longyi Li (Validation [supporting]), and Hao Zhang (Writing—review & editing [supporting])

Conflict of interest: None declared.

Funding

This work was supported by the National Natural Science Foundation of China [62072210].

Data availability

Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA. The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.

References

  1. Akiyama M, Sato K, Sakakibara Y.  A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model. J Bioinform Comput Biol  2018;16:1840025. [DOI] [PubMed] [Google Scholar]
  2. Altschul SF, Madden TL, Schäffer AA  et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res  1997;25:3389–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Amaral PP, Clark MB, Gascoigne DK  et al.  lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res  2011;39:D146–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bao J, Chen D, Wen F  et al.  CVAE-GAN: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision, Italy. New York, NY, USA: IEEE, 2017, 2745–54. [Google Scholar]
  5. Breaker RR.  Riboswitches and the RNA world. Cold Spring Harb Perspect Biol  2012;4:a003566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Burgess CP, Higgins I, Pal A et al. Understanding disentangling in β-VAE. arXiv preprint arXiv:1804.03599.
  7. Butcher SE, Pyle AM.  The molecular interactions that stabilize RNA tertiary structure: RNA motifs, patterns, and networks. Acc Chem Res  2011;44:1302–11. [DOI] [PubMed] [Google Scholar]
  8. Cao X, Zhang Y, Ding Y  et al.  Identification of RNA structures and their roles in RNA functions. Nat Rev Mol Cell Biol  2024;25:784–801. [DOI] [PubMed] [Google Scholar]
  9. Chen J, Hu Z, Sun S et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. arXiv preprint arXiv:2204.00300, 2022.
  10. Cordero P, Das R.  Rich RNA structure landscapes revealed by mutate-and-map analysis. PLoS Comput Biol  2015;11:e1004473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dai B, Wipf D. Diagnosing and enhancing VAE models. In: Proceedings of the International Conference on Learning Representations (ICLR). OpenReview, 2019. 10.48550/arXiv.1903.05789 [DOI]
  12. Damase TR, Sukhovershin R, Boada C  et al.  The limitless future of RNA therapeutics. Front Bioeng Biotechnol  2021;9:628137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Danaee P, Rouches M, Wiley M  et al.  bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res  2018;46:5381–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dethoff EA, Chugh J, Mustoe AM  et al.  Functional complexity and regulation through RNA dynamics. Nature  2012;482:322–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Ding Y, Tang Y, Kwok CK  et al.  In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature  2014;505:696–700. [DOI] [PubMed] [Google Scholar]
  16. Do CB, Woods DA, Batzoglou S.  CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics  2006;22:e90–e98. [DOI] [PubMed] [Google Scholar]
  17. Fan X, Cui Y, Zhao Z et al. Partition, Predict and Assemble: Targeting Long RNA Secondary Structure Prediction. In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 172–6. IEEE. [Google Scholar]
  18. Fu L, Cao Y, Wu J  et al.  UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res  2022;50:e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gebauer F, Schwarzl T, Valcárcel J  et al.  RNA-binding proteins in human genetic disease. Nat Rev Genet  2021;22:185–98. [DOI] [PubMed] [Google Scholar]
  20. Gong T, Ju F, Bu D.  KnotFold for RNA secondary structure prediction with pseudoknots. Commun Biol  2024;7:297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Griffiths-Jones S, Bateman A, Marshall M  et al.  Rfam: an RNA family database. Nucleic Acids Res  2003;31:439–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hajdin CE, Bellaousov S, Huggins W  et al.  Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc Natl Acad Sci USA  2013;110:5498–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Han K, Wang Y, Chen H  et al.  A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell  2023;45:87–110. [DOI] [PubMed] [Google Scholar]
  24. Higgs PG.  RNA secondary structure: physical and computational aspects. Q Rev Biophys  2000;33:199–253. [DOI] [PubMed] [Google Scholar]
  25. Holbrook SR.  RNA structure: the long and the short of it. Curr Opin Struct Biol  2005;15:302–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kamola PJ, Kitson JDA, Turner G  et al.  In silico and in vitro evaluation of exonic and intronic off-target effects forms a critical element of therapeutic ASO design. Nucleic Acids Res  2015;43:8638–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Li W, Godzik A.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics  2006;22:1658–9. [DOI] [PubMed] [Google Scholar]
  28. Lorenz R, Bernhart SH, Höner zu Siederdissen C  et al.  ViennaRNA package 2.0. Algorithms Mol Biol  2011;6:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Loyer G, Reinharz V.  Concurrent prediction of RNA secondary structures with pseudoknots and local 3D motifs in an integer programming framework. Bioinformatics  2024;40:btae022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Mills-Tettey GA, Stentz A, Dias MB. The dynamic Hungarian algorithm for the assignment problem with changing costs. Technical Report CMU-RI-TR-07-27. Robotics Institute, Carnegie Mellon University, 2007.
  31. Nussinov R, Jacobson AB.  Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci USA  1980;77:6309–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Reuter JS, Mathews DH.  RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics  2010;11:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Rouskin S, Zubradt M, Washietl S  et al.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature  2014;505:701–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Sato K, Hamada M.  RNA information science trends: a review of machine learning and deep learning in RNA secondary structure prediction and RNA drug discovery. Brief Bioinform  2023;24:bbad186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sato K, Akiyama M, Sakakibara Y.  RNA secondary structure prediction using deep learning with thermodynamic integration. Nat Commun  2021;12:941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Shen T, Hu Z, Peng Z  et al.  E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D structure prediction. Nat Commun  2022;13:1–10.34983933 [Google Scholar]
  37. Singh J, Hanson J, Paliwal K  et al.  RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun  2019;10:5407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Singh J, Paliwal K, Zhang T  et al.  Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics  2021;37:2589–600. [DOI] [PubMed] [Google Scholar]
  39. Svoboda P, Cara AD.  Hairpin RNA: a secondary structure of primary importance. Cell Mol Life Sci  2006;63:901–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Taheri SM, Hesamian G.  A generalization of the Wilcoxon signed-rank test and its applications. Stat Papers  2013;54:457–70. [Google Scholar]
  41. Tian S, Das R.  RNA structure through multidimensional chemical mapping. Q Rev Biophys  2016;49:e7. [DOI] [PubMed] [Google Scholar]
  42. Wan Y, Kertesz M, Spitale R  et al.  Understanding the transcriptome through RNA structure. Nat Rev Genet  2011;12:641–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Watts JM, Dang KK, Gorelick RJ  et al.  Architecture and secondary structure of an entire HIV-1 RNA genome. Nature  2009;460:711–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Weeks KM.  Advances in RNA structure analysis by chemical probing. Curr Opin Struct Biol  2010;20:295–304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zhang H, Li S, Zhang L  et al.  LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2. Nucleic Acids Res  2023;51:e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Zhang S, Li J, Chen SJ.  Machine learning in RNA structure prediction: advances and challenges. Biophys J  2024;123:2647–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zhao H, Jiang L, Jia J  et al. Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada. 2021, 16259–68.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btaf527_Supplementary_Data

Data Availability Statement

Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA. The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.


Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES