Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2013 Jun 19;29(13):i308–i315. doi: 10.1093/bioinformatics/btt217

A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution

Vladimir Reinharz 1, Yann Ponty 2,3,*, Jérôme Waldispühl 1,*
PMCID: PMC3694657  PMID: 23812999

Abstract

Motivations: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity.

Results: In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure.

Availability: IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/

Contact: jeromew@cs.mcgill.ca or yann.ponty@lix.polytechnique.fr

Supplementary Information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

At the core of the emerging field of synthetic biology resides, our capacity to design and reengineer molecules with target functions. RNA molecules are well tailored for such applications. The ease to synthesize them (they are directly transcribed from DNA) and the broad diversity of catalytic and regulation functions they can perform enable to integrate de novo logic circuits within living cells (Rodrigo et al., 2012) or reprogram existing regulation mechanisms (Chang et al., 2012). Future advances and applications of these techniques in gene-therapy studies will strongly rely on efficient computational methods to design and reengineer RNA molecules.

Most of RNA functions are, at least partially, encoded by the 3D molecular structures, which are themselves primarily determined by the secondary structures. The development of efficient algorithms for designing RNA sequences with predefined secondary structures is thus a milestone to enter the synthetic biology era. RNAinverse pioneered RNA secondary structure design algorithms. It has been developed and distributed with the Vienna RNA package (Hofacker et al., 1994). However, only posterior experimental studies revealed the potential and practical impact of these techniques. Thereby, during the past 6 years, many improvements and variants of RNAinverse have been proposed. Conceptually, almost all of existing algorithms follow the same approach. First a seed sequence is selected, then a local search strategy is used to mutate the seed and find, in its vicinity, a sequence with desired folding properties. Using this strategy, INFO-RNA (Busch and Backofen, 2006), RNA-SSD (Aguirre-Hernández et al., 2007) and NUPACK:Design (Zadeh et al., 2011) significantly improved the performance of RNA secondary structure design algorithms. More recent research studies aimed to include more constraints in the selection criteria. RNAexinv focused on the design of sequences with enhanced thermodynamical and mutational robustness (Avihoo et al., 2011), while Frnakenstein enables to design RNA with multiple target structures (Lyngsø et al., 2012).

We recently introduced with RNA-ensign a novel paradigm for the search strategy of RNA secondary structure design algorithm (Levin et al., 2012). Instead of a local search approach, we proposed a global sampling strategy of the mutational landscape based on the RNAmutants algorithm (Waldispühl et al., 2008). This methodology offered promising performances, but suffered from prohibitive runtime and memory consumption. Following our work, Garcia-Martin et al. (2013) proposed RNAiFOLD, an alternate methodology that uses constraint programming techniques to prune the mutational landscape. While also suffering from prohibitive running times, it is worth noting that this latter algorithm also proposes a seedless approach to the RNA secondary structure design problem.

In this article, we introduce IncaRNAtion, an RNA secondary structure design algorithm that benefits from our recent algorithmic advances (Reinharz et al., 2013) to expand our original RNA-ensign algorithm (Levin et al., 2012). IncaRNAtion addresses previous limitations of RNA-ensign and offers new functionalities. First, while our previous program had a running time complexity of Inline graphic, IncaRNAtion now runs in linear-time and space complexity, allowing it to demonstrate similar speeds as any local search algorithm. Next, IncaRNAtion is seedless. Unlike RNA-ensign, it does not require a seed sequence to initiate its search. Finally, IncaRNAtion implements a novel algorithm based on weighted sampling techniques (Bodini and Ponty, 2010) that enables us to control, for the first time, explicitly the GC-content of the solution. This functionality is essential because wild-type sequences within living organisms often present medium or low GC-content, presumably to offer better transcription rates and/or structural plasticity. Previous programs do not allow to control this parameter and tend to output sequences having high GC-contents (Lyngsø et al., 2012).

We demonstrate the performance of our algorithms on a set of real RNA structures extracted from the RNA STRAND database (Andronescu et al., 2008). To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies such as the one implemented in RNAinverse. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure.

2 METHODS

We introduce a probabilistic model for the design of RNA sequences with a specific GC-content and folding into a predefined secondary structure. For the sake of simplicity, we choose to base this proof-of-concept implementation on a simplified free-energy function Inline graphic, which only considers the contributions of stacked canonical base pairs. We show how a modification of the dynamic programming scheme used in RNAmutants allows for the sampling of good and diverse design candidates, in linear time and space complexities.

2.1 Definitions

A targeted secondary structure Inline graphic of length n is given as a non-crossing arc-annotated sequence, where Inline graphic stands for the base-pairing position of position i in Inline graphic if any (and, reciprocally, Inline graphic), or −1 otherwise. In addition, let us denote by Inline graphic the number of occurrences of G and C in an RNA sequence s.

2.1.1 Simplified energy model

We use a simplified free-energy model, which only includes additive contributions from stacking base pairs. Using individual values from the Turner 2004 model [retrieved from the NNDB (Turner and Mathews, 2010)]. Given a candidate sequence s for a secondary structure Inline graphic, the free energy of any sequence s of length Inline graphic is given by

graphic file with name btt217um1.jpg

where Inline graphic is set to 0 if Inline graphic (no base pair to stack onto), the tabulated free energy of stacking pairs Inline graphic in the Turner model if available, or Inline graphic for non–Watson-Crick/Wobble pairs (i.e. not in Inline graphic). This latter parameter allows one to choose whether to simply penalize invalid base pairs (Inline graphic), or forbid them altogether (Inline graphic). Position-specific sequence constraints can also be enforced at this level (details omitted for the sake of clarity) by assigning to Inline graphic a Inline graphic penalty (leading to a null probability) in the presence of a base incompatible with a user-specified constraint mask.

2.1.2 GC-weighted Boltzmann ensemble and distribution

To counterbalance the documented tendency of sampling methods to generate GC-rich sequences (Levin et al., 2012), we introduce a parameter Inline graphic, whose value will influence the GC-content of generated sequences. For any secondary structure Inline graphic, the GC-weighted Boltzmann factor of a sequence s is Inline graphic such that

graphic file with name btt217m1.jpg (1)

where R is the Boltzmann constant and T the temperature in Kelvin.

Summing the GC-weighted Boltzmann factor over all possible sequences of a given length Inline graphic, one obtains the GC-weighted partition function Inline graphic, from which one defines the GC-weighted Boltzmann probability Inline graphic of each sequence s, respectively, such that

graphic file with name btt217m2.jpg (2)

2.2 Linear-time stochastic sampling algorithm for the GC-weighted Boltzmann ensemble

Let us now describe a linear-time algorithm to sample sequences at random in the GC-weighted Boltzmann distribution. This algorithm follows the general principles of the recursive approach to random generation (Wilf, 1977), pioneered in the context of RNA by the SFold algorithm (Ding and Lawrence, 2003). The algorithm starts by precomputing the partition function restricted to each substructure occurring in the target structure, and then performs a series of recursive stochastic backtracks, using precomputed values to decide on the probability of each alternative.

2.2.1 Precomputing the GC-weighted partition function

Firstly, a dynamic programming algorithm computes Inline graphic the GC-weighted partition function (the dependency in x is omitted here for the sake of clarity) for a structure Inline graphic, assuming its (previously chosen) flanking nucleotides are a and b, respectively, either forming a closing base pair (Inline graphic) or not (Inline graphic). Remark that the empty structure only supports the empty sequence, having energy 0, so one has

graphic file with name btt217m3.jpg (3)

The general recursion scheme consists in three different terms, depending on the first position in Inline graphic:

Case 1 —

First position is unpaired (Inline graphic):

graphic file with name btt217m4.jpg (4)
Case 2 —

First position is paired with last position [Inline graphic], stacking onto a preexisting exterior pair (Inline graphic):

graphic file with name btt217m5.jpg (5)
Case 3 —

First position is involved in a base pair [Inline graphic], which is not stacking onto an exterior base pair (Inline graphic or Inline graphic):

graphic file with name btt217m6.jpg (6)

Remark that the number of combinations of a, b and Inline graphic remains bounded by a constant, thus the complexity of computing Inline graphic mainly depends on the values taken by Inline graphic on subsequent recursive calls. Such values are entirely determined by Inline graphic at any given step of the recursion, and their dependency can be summarized in a tree having Inline graphic. Therefore, the computation of Inline graphic requires Inline graphic time and space using dynamic programming.

graphic file with name btt217ilf1.jpg

2.2.2 Stochastic backtrack

Once the GC-weighted partition functions have been computed and memorized, a stochastic backtrack starts from the target structure Inline graphic with any exterior bases Inline graphic and no nesting base pair, corresponding to a call SBx Inline graphic to Algorithm 1. At each step, a suitable assignment for one or several positions is chosen, using probabilities derived from the precomputation, as illustrated by Figure 1. One or several recursive calls over the appropriate substructures are then performed. On each recursive call, the algorithm assigns at least 1 nt to a previously unassigned position. Moreover, the number of executions of each loop is bounded by a constant. Consequently, the complexity of Algorithm 1 is in Inline graphic time and space.

Fig. 1.

Fig. 1.

Stochastic backtrack procedure for a given substructure S. Either the first position is left unpaired (top), a base pair is formed between the two extremities, stacking onto an exterior base pair (middle) or paired without creating a stacking, defining two regions on which subsequent recursive calls are needed (bottom). For the empty structure (omitted here), the empty sequence is returned. Positions indicated in red are assigned at the current stage of the backtrack

2.2.3 Self-adaptive sampling strategy

Let us remind that our goal is to produce a set of sequences whose GC-content matches a prescribed value gc. An absolute tolerance κ may be allowed, so that the GC-content of any valid sequence must fall in Inline graphic. Because sequences of arbitrary GC-content may be generated by Algorithm 1, we use a rejection-based approach (Bodini and Ponty, 2010), previously adapted by the authors in a similar context (Waldispühl and Ponty, 2011). This gives an algorithm that generates k valid sequences in expected time Inline graphic when Inline graphic [or Inline graphic when κ is a positive constant] and memory in Inline graphic. A complete analysis of the rejection process can be found in an earlier contribution (Waldispühl and Ponty, 2011), but let us briefly outline the approach, and the main arguments used to establish its complexity.

As summarized by Figure 2, our adaptive sampling approach simply generates sets of sequences by repeatedly running the stochastic backtrack algorithm. The average GC-content induced by the current value of the x parameter can then be adequately estimated from the sample, or computed exactly using recent algorithmic advances (Ponty and Saule, 2011). The set of sequences is filtered to only retain valid sequences. The value of the parameter x is then adapted to match the average GC-content (induced by the value of x) with the targeted one. It can be shown that the expected GC-content is a continuous and strictly increasing monotonic function of x, whose limits are 0 when x = 0 and n when Inline graphic. Consequently, for any targeted GC-content Inline graphic, there exists a unique value xgc such that generated sequences feature, on the average, the right GC-content. In practice, a simple binary search (Waldispühl and Ponty, 2011) is used in our implementation, and typically converges after few iterations. An optimal value for x can also be derived analytically using interpolation after Inline graphic evaluations of Inline graphic for different candidate values of x, as previously noted (Waldispühl and Ponty, 2011), and could be implemented using the Fast-Fourier Transform (Senter et al., 2012).

Fig. 2.

Fig. 2.

General workflow of our adaptive sampling algorithm (Waldispühl and Ponty, 2011)

2.2.4 Overall complexity

It was previously established (Waldispühl and Ponty, 2011) that, for each value of x, there exists constants Inline graphic and Inline graphic such that the distribution of GC-content asymptotically converges toward a normal law having expectation in Inline graphic and standard deviation in Inline graphic. Furthermore, the distribution of GC-content is highly concentrated, as asserted by its limited standard deviation; therefore, the expected number of attempts required to generate a valid sequence when Inline graphic [respectively Inline graphic] grows like Inline graphic [respectively Inline graphic, i.e. a constant], leading to the announced complexities. Formally, because a suitable weight x must be recomputed for each targeted structure and GC-content, then the number M of iterations required for the converge can be accounted for explicitly, leading to time complexities in Inline graphic (if Inline graphic, i.e. without any tolerance) and Inline graphic (if Inline graphic).

2.3 Postprocessing unpaired regions: a local/global (glocal) hybrid approach

Owing to our simplified energy model, unpaired regions are not subject to design constraints other than the GC-content, leading to modest probabilities for refolded design candidates to match the targeted structure. To improve these performances and test the complementarity of our global sampling approach with previous contributions based on local search, we used the RNAinverse software to redesign unpaired regions. We specified a constraint mask to prevent stacking base pairs from being modified and, whenever necessary, reestablished their content a posteriori, as RNAinverse has been witnessed to take some liberties with constraint masks. As shown in the Supplementary Material, this postprocessing does not drastically alter the GC-content, so the glocal approach reasonably addresses the constrained GC-content design problem.

3 RESULTS

3.1 Implementation

Our software, IncaRNAtion, was implemented in Python 2.7. We used RNAinverse from the Vienna Package 2.0 (Hofacker et al., 1994). All-time benchmarks were run on a single AMD Opteron(tm) 6278 Processor at 2.4 GHz with cache of 512 kb. The penalty β, associated with invalid base pairs, was set to 15.

Figure 3 presents the average times spent running IncaRNAtion + RNAinverse to generate one sequence with the required GC-content. As expected, the time grows linearly in function of the length of the structures for IncaRNAtion.

Fig. 3.

Fig. 3.

Average time in seconds to generate one sequence for IncaRNAtion and RNAinverse

3.2 Dataset

To evaluate the quality of our method, we used secondary structures from the RNA STRAND database (Andronescu et al., 2008). Those are known secondary structures from a variety of organisms. We considered a subset of 50 structures selected by Levin et al. (2012), whose length ranges between 20 and 100 nt. To ease the visualization of results, we clustered together structures having similar length, stacks density and proportion of free nucleotides in loops, leading to distributions of structures shown in Figure 4.

Fig. 4.

Fig. 4.

Number of secondary structures per bin, according to our three clustering criteria

3.3 Design

We ran our method as follows. First, we sampled approximately 100 sequences per structure. Then, we use these sequences as seed in RNAinverse. Finally, we computed the Minimal Free-Energy (MFE) with the RNAfold program from the Vienna Package 2.0 (Hofacker et al., 1994).

Before starting our benchmark, we asses the need for our methods and performed an analysis of the GC-content drift achieved with state-of-the-art software. Using our dataset of 50 structures, we generated 100 samples per structure with classical softwares that do not control the GC-content. Namely, RNAinverse, INFO-RNA, NUPACK:Design and Frnakenstein. We show the distribution of the GC-content of the sequences produced with these softwares in Figure 5.

Fig. 5.

Fig. 5.

Overall GC-content distribution for sequences designed using RNAinverse, INFO-RNA, NUPACK:Design and Frnakenstein folding in the desired structure

As anticipated, we observe a clear bias toward high GC-contents and a complete absence of sequence with <30% of GC. This striking result motivates a need for methods that enable to explicitly control the GC-content and more precisely that enable to design sequences with low GC-content (i.e. ≤30%). To provide a complete overview of the performance of IncaRNAtion, we provide additional statistics for these softwares in the Supplementary Material.

3.4 Success rate

We started by estimating the success rate of our methodology and computed the percentage of sequences with a MFE structure identical to the target secondary structure. Figure 6 shows our results. We clearly see that before the postprocessing step (i.e. RNAinverse) the sequences sampled by IncaRNAtion have a low success rate (first row). As mentioned earlier, this could be explained by the fact that no selection criterion has been at this stage applied to unpaired nucleotides. Remarkably, after the local search optimization (with RNAinverse) of nucleotides in unpaired regions (second row), we observe a dramatic improvement of our success rate. As expected, we observed that length is, in general, not a good predictor for the hardness of designing a structure. Instead, a high number of free nucleotides in the structure seems to be a good measure of the hardness of its design. Similarly, these data also show that designing sequences with low GC-content is challenging for all types of targets.

Fig. 6.

Fig. 6.

Success rate IncaRNAtion before and after RNAinverse postprocessing. The first row shows the percentage of sampled sequences folding into the target when using only IncaRNAtion. The second shows after processing previous results with RNAinverse

We investigated further the quality of the sequences generated by IncaRNAtion. In particular, we estimated the capacity of our methods to generate ‘good’ sequences with desired folding capabilities regardless of the property to fold exactly into the target structure. In Figure 7, we show the ratio of well-predicted base pairs in the MFE structure of our sampled sequences. As above, we can observe that, in all cases, the sequences that are the hardest to design are those with an extremely low GC-content. Indeed, the energetic contribution of the base pairs to the stability of the structure is weaker. Interestingly, we also notice that the most accurate sequences yield a GC-content of Inline graphic. Overall, we observe that all our samples have good folding properties, and that there is a correlation between the ‘precision’ of the samples and the hardness of the design.

Fig. 7.

Fig. 7.

Structural sensitivity (i.e. number of well predicted base pairs/number of base pairs in target) of the sampled sequences MFE

We noticed a highly decreased structural sensitivity for the sequences with 15% free nucleotides in the loops. However, one must remain careful interpreting this observation, as the structures within this class all originate from the PDB, and are relatively small (for the complete STRAND DB, the average length is Inline graphic nt, compared with Inline graphic nt around 15% unpaired bases).

3.5 Properties of designed sequences

In this section, we further analyze the generated sequences with a MFE structure that folds into the target structure.

A desirable feature in sequence design is to produce samples with a high sequence diversity and stable secondary structure. Therefore, in the following, we will use two useful measures, which are the sequence identity of the samples, and the Boltzmann probability of the target structure in the low-energy ensemble.

The sequence identity is defined over a set Inline graphic of aligned sequences (in our case, all sequences have the same length and can be trivially aligned) as follows:

graphic file with name btt217m7.jpg (7)

where si is the nucleotide at position i in sequence s. Intuitively, this measure captures the diversity of sequences generated by a given method. Next, the Boltzmann frequency is defined for a structure Inline graphic and a sequence s as follows:

graphic file with name btt217m8.jpg (8)

where Inline graphic is the partition function of sequence s. This measure tells us how dominant is a structure Inline graphic in the Boltzmann ensemble of structures over a sequence s. A high value implies a stable structure. We compute this frequency with RNAfold from the Vienna Package 2.0 (Hofacker et al., 1994).

Figure 8 shows the number of solutions generated (i.e. sequences with a MFE structure identical to the target structure). Here, we note that low GC-contents have a strong (negative) influence on the number of sequences generated, and in parallel also affect negatively the sequence diversity. This observation emphasizes the difficulty to design sequences with low GC-content. Once again, large percentages of free nucleotides increase the difficulty of the task.

Fig. 8.

Fig. 8.

Number of solutions generated with IncaRNAtion + RNAinverse on the first row and their average sequence identity on the second

The thermodynamical stability of the target structure on the designed sequence is another important property when estimating the performance of RNA design algorithms. We estimate the quality of our solutions in Figure 9. First, we observe a slow decline of the structure stability (i.e. the frequency) when the target structure increases in size. Yet, for an average GC-content, the frequency stays >10% even at size of 100 nt. Next, we note that for the most difficult target structures (i.e. the longer ones or those with high percentages of unpaired nucleotides in loops) the GC-content has a limited (almost null) influence on the stability of the target structure on the designed sequence. By contrast, this is less true for easiest and small structures with only few free nucleotides in internal loops.

Fig. 9.

Fig. 9.

Thermodynamical stability of the target structure. The curves report the average Boltzmann probability of the target structure (which is also the MFE structure) at various GC-contents with respect to the length of the target (left), density of stacked base pairs (centre) and number of unpaired nucleotides in loops (right)

3.6 Global sampling versus Local search versus Glocal approach

To conclude this study, we estimate the impact of the design methodology on the performances. More precisely, we aim to determine the merits of a global sampling approach (IncaRNAtion), compared with a glocal procedure (IncaRNAtion + RNAinverse) and a local search methodology (RNA-SSD). To the best of our knowledge, RNA-SSD, beside IncaRNAtion, is the only software that implements an explicit control of the GC-content.

Here, we compare the running time and the sequence diversity of the solutions produced by each software. In addition, we focus on the design of sequences with low GC-contents (≤30%) as they are almost impossible to design with classical software (Figure 5).

Figure 3 shows the running time of each software. These data demonstrate the efficiency and scalability of our techniques. In particular, this figure suggests that our strategy has the potential to be applied efficiently for designing sequences on long (and difficult) target secondary structures at low GC-content—a task that could have not been achieved before due time requirements.

Next, we show in Figure 10 the average sequence identity achieved by the various methods. Our results show that at extremely low GC-contents (i.e. 10%), IncaRNAtion slightly outperforms RNA-SSD, while this advantage becomes less evident when the GC-content increases. Our experiments on higher GC-contents (i.e. ≥50%) showed that our glocal strategy and the local search approach perform similarly. Similarly, we did not find any clear evidence that a global, local or glocal approach outperforms others when we compare at the thermodynamical stability of the target structure (data not shown).

Fig. 10.

Fig. 10.

Sequence identity of IncaRNAtion and RNAinverse for 10 and 30% of GC

4 CONCLUSION

In this article, we described a novel algorithm, IncaRNAtion, for the RNA secondary structure design problem, i.e. the design of an RNA sequence adopting a predefined secondary structure as its minimal free-energy fold. Implementing a global sampling approach, it optimizes affinity toward the target secondary structure, while granting the user full control over the GC-content of the resulting sequences. This extended control does not necessarily induce additional computational demands, and we showed the linear complexity of both the preprocessing stage and the generation of candidate sequences for the design, allowing for the design of larger and more complex secondary structures in a matter of minutes on a single processor (e.g. ∼28 min for 100 candidate sequences for a ∼1500 nt 16S rRNA). We evaluated the method on a benchmark composed of target secondary structures extracted from the RNA STRAND database. We observed good overall success rate, with the notable exception of very low targeted GC-content (10%), and a good to excellent entropy within designed candidates. Finally, we implemented a hybrid approach, using the RNAinverse software as a postprocessing step for unpaired regions. This approach greatly increased the success rate of the method, allowing for the design of highly diverse candidates for almost all of the structures in our benchmark, while largely preserving the targeted GC-content.

In the future, we would like to complement this study by further investigating the potential of hybrid local/global—or glocal—approaches. A global sampling approach would capture the positive aspects of design, optimizing affinity toward a given structure while allowing the specification of expressive systems of constraints. Designed sequences would serve as a seed for a restricted local approach which, by breaking unwanted symmetries, would perform the negative part of the design, while ideally maintaining obedience to the constraints. Another perspective of this work is the incorporation of the full Turner energy model, which should in principle yield better designs for unpaired regions.

Supplementary Material

Supplementary Data

ACKNOWLEDGMENTS

The authors would like to thank Rob Knight for his suggestions and comments.

Funding: This work was funded by the French Agence Nationale de la Recherche (ANR) through the [check] Magnum ANR 2010 BLAN 0204 project (to Y.P.), the FQRNT team grant 232983 (to V.R. and J.W.) and NSERC Discovery grant 219671 (to J.W.). This article has been developed as a result of a mobility stay funded by the Erasmus Mundus Programme of the European Commission under the Transatlantic Partnership for Excellence in Engineering TEE Project (V.R.).

Conflict of Interest: none declared.

REFERENCES

  1. Aguirre-Hernández R, et al. Computational RNA secondary structure design: empirical complexity and improved methods. BMC Bioinformatics. 2007;8:34. doi: 10.1186/1471-2105-8-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andronescu M, et al. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics. 2008;9:340. doi: 10.1186/1471-2105-9-340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Avihoo A, et al. RNAexinv: an extended inverse RNA folding from shape and physical attributes to sequences. BMC Bioinformatics. 2011;12:319. doi: 10.1186/1471-2105-12-319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bodini O, Ponty Y. 2010. Multi-dimensional Boltzmann Sampling of Languages. In DMTCS Proceedings, number 01 in AM, Vienne, Autriche, pp 49–64. [Google Scholar]
  5. Busch A, Backofen R. INFO-RNA–a fast approach to inverse RNA folding. Bioinformatics. 2006;22:1823–1831. doi: 10.1093/bioinformatics/btl194. [DOI] [PubMed] [Google Scholar]
  6. Chang AL, et al. Synthetic RNA switches as a tool for temporal and spatial control over gene expression. Curr. Opin. Biotechnol. 2012;23:679–688. doi: 10.1016/j.copbio.2012.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ding Y, Lawrence E. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 2003;31:7280–7301. doi: 10.1093/nar/gkg938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Garcia-Martin JA, et al. RNAiFold: a constraint programming algorithm for RNA inverse folding and molecular design. J. Bioinform. Comput. Biol. 2013;11:1350001. doi: 10.1142/S0219720013500017. [DOI] [PubMed] [Google Scholar]
  9. Hofacker IL, et al. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 1994;125:167–188. [Google Scholar]
  10. Levin A, et al. A global sampling approach to designing and reengineering rna secondary structures. Nucleic Acids Res. 2012;40:10041–10052. doi: 10.1093/nar/gks768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lyngsø RB, et al. Frnakenstein: multiple target inverse RNA folding. BMC Bioinformatics. 2012;13:260. doi: 10.1186/1471-2105-13-260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ponty Y, Saule C. 2011. A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms. In: WABI - 11th Workshop on Algorithms in Bioinformatics—2011, Saarbrucken, Allemagne. [Google Scholar]
  13. Reinharz V, et al. 2013. A linear inside-outside algorithm for correcting sequencing errors in structured RNA sequences. In: Research in Computational Molecular Biology. Lecture Notes in Computer Science, Vol. 7821. Springer, pp. 199–211. [Google Scholar]
  14. Rodrigo G, et al. De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells. Proc. Natl Acad. Sci. USA. 2012;109:15271–15276. doi: 10.1073/pnas.1203831109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Senter E, et al. Using the fast fourier transform to accelerate the computational search for RNA conformational switches. PLoS One. 2012;7:e50506. doi: 10.1371/journal.pone.0050506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Turner DH, Mathews DH. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010;38:D280–D282. doi: 10.1093/nar/gkp892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Waldispühl J, Ponty Y. An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure. J. Comput. Biol. 2011;18:1465–1479. doi: 10.1089/cmb.2011.0181. [DOI] [PubMed] [Google Scholar]
  18. Waldispühl J, et al. Efficient Algorithms for probing the RNA mutation landscape. PLoS Comput. Biol. 2008;4:e1000124. doi: 10.1371/journal.pcbi.1000124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Wilf HS. A unified setting for sequencing, ranking, and selection algorithms for combinatorial objects. Adv. Math. 1977;24:281–291. [Google Scholar]
  20. Zadeh JN, et al. Nucleic acid sequence design via efficient ensemble defect optimization. J. Comput. Chem. 2011;32:439–452. doi: 10.1002/jcc.21633. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES