Abstract
Predicting secondary structure of RNA is an intermediate in predicting RNA 3D structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. This data suggested energy contributions of dinucleotide bulges were sequence dependent, and a sequence dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5’-purine-pyrimidine-3’, and 2.41 kcal/mol for 5’-pyrimidine-purine-3’). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a −0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.
Graphical abstract

INTRODUCTION
RNA has more biological functions in nature than serving as an intermediate in protein synthesis. A few of the numerous additional roles are to catalyze reactions (1), regulate function (2), control gene expression through riboswitches (3), and use snRNAs in mRNA splicing (4). Many scientists are interested in predicting the free energy or the secondary structure of a particular RNA sequence. RNAstructure (5), mfold (6), and the Vienna package (7) make these predictions using a nearest neighbor model based on thermodynamic parameters for all secondary structure motifs. The improvement of the free energy parameters for any particular motif could improve the secondary structure and free energy predictions made by these programs. Subsequently, the secondary structure can be utilized as an intermediary in tertiary structure prediction based on the sequence (8).
Bulges are a common RNA secondary structure motif in nature. A bulge is made up of one or more adjacent unpaired nucleotides in one strand of an RNA duplex and is assumed to prevent the adjacent base pairs from stacking with one another when the bulge consists of two or more nucleotides (9). Bulges can perform a variety of functions such as serving a role in gene expression (10), intron splicing (11), ligand binding (12), and the formation of tertiary structures (13). Dinucleotide bulges occur naturally in several organisms, such as the telomerase holoenzyme of Tetrahymena thermophila (14), the 5’-UTR of multiple enteroviruses such as poliovirus type 1, and rhinoviruses such as Human Rhinovirus-14. The dinucleotide bulge in these enteroviruses and rhinoviruses is part of a consensus sequence which has been shown to regulate translation as well as replication of these viruses (15). One virus of specific interest is HIV-2 where dinucleotide bulges have previously been identified and determined to play a significant role in viral replication (16).
Thermodynamic data has previously been collected for only six dinucleotide bulges. Based on this limited data set, a model was derived that attributed a 2.8 kcal/mol free energy penalty to all dinucleotide bulges independent of the identity of the nucleotides in the bulge and the nearest neighbors adjacent to the bulge (17). However, the identities of these nucleotides have proven to be important in thermodynamic model development, as seen for single nucleotide bulges (18) and trinucleotide bulges (19). This study aims to determine whether the current sequence independent model is the most accurate predictor of the free energy contribution of dinucleotide bulges. This study reports the thermodynamic parameters for 18 dinucleotide bulges (and includes an additional bulge from the literature), most of which frequently occur in nature. These experimental results can be incorporated into secondary structure prediction software. Also, a sequence dependent model utilizing the identity of the nucleotides in the bulge and the nearest neighbors was derived. This improved model can also be incorporated into secondary structure prediction software to be used for predicting the free energy contributions of dinucleotide bulges for which we do not have experimental thermodynamic data.
MATERIALS AND METHODS
Compiling and Searching a Database for RNA Dinucleotide Bulges
A previously compiled database of various RNA secondary structures (20) was searched for dinucleotide bulges. G-U pairs were considered to be canonical base pairs for this study. The resulting list of naturally occurring dinucleotide bulges was used to identify the dinucleotide bulges and the nearest neighbor combinations studied here.
Design of Sequences
Based on the list of dinucleotide bulges described above, short, synthetic RNA duplexes were designed. The RNA duplexes were composed of one strand with 10 nucleotides and another with 8 nucleotides. The dinucleotide bulge with its nearest neighbors was centered in the middle of the duplex with three additional Watson-Crick base pairs on either side. To prevent end fraying during the melting experiment, the terminal base pairs of the duplexes were G-C pairs. The designed sequences were then checked (by free energy calculations) for possible competing unimolecular or alternative bimolecular folding. A total of 18 duplexes with dinucleotide bulges were selected for this study.
RNA Synthesis and Purification
The RNA oligonucleotides were ordered from Integrated DNA Technologies (Coralville, IA). The synthesis and purification were standard, and the procedures have been described previously (21).
Optical Melting Experiments and Thermodynamics
Optical melting experiments were performed in a buffer solution containing 1 M NaCl, 20 mM sodium cacodylate, and 0.5 mM EDTA (pH 7). The optical melting studies were performed utilizing standard procedures as described previously (20–21). The resulting melt curves were analyzed with Meltwin software (22). Thermodynamic parameters were then obtained for each duplex as described previously (23). Because multiple reference duplexes would have been necessary, the nearest neighbor model was utilized to account for the thermodynamic parameters of the stem (24). In addition, the nearest neighbor model was utilized because nearest neighbor parameters will be used along with the newly derived bulge parameters to predict the free energy and secondary structure.
Linear Regression and Dinucleotide Bulge Thermodynamic Parameters
The data from the 18 dinucleotide bulge and closing pair combinations determined experimentally in this study were combined with data from 1 additional bulge previously studied (17). Four other bulges were previously studied (17) but were not utilized during model development due to the possible competition from unwanted bimolecular associations of the two strands and non-state melting.
For example, the authors of the previous study recognized these possible issues by stating that one of the sequences, , had significant concentrations of homoduplex formation and that most strands did not melt in a two-state manner (17). The experimental free energy contribution of the dinucleotide bulge was utilized as a constant when performing linear regression using the LINEST function in Microsoft Excel. Multiple combinations of parameters were tested, and those that yielded the greatest predictive accuracy accounted for the closing base pair as well as the identity (purine or pyrimidine) of the nucleotides that made up the dinucleotide bulge.
RESULTS
Database Searching
In the compiled database of RNA secondary structures, 1839 dinucleotide bulges were found. The first dataset in Table 1 presents the frequency and percent occurrence when both the bulge and closing base pairs are specified. When the data was categorized in this manner, 220 unique bulge and closing pair combinations were identified. The top 20 most frequent bulge and closing pair sequences are displayed in Table 1. These bulges and corresponding closing pair sequences account for 63% of all dinucleotide bulge and closing pair combinations found in the database. Many of the combinations found do not occur frequently. The most frequently occurring bulge and corresponding closing pair sequence is , which accounts for 13% of all combinations found. The thermodynamic parameters for this bulge and closing pair sequence were previously studied, and those results are included in this study. Some combinations of bulge and corresponding closing pair sequences frequently found in the secondary structure database were not studied here due to possible competing structures; however, this study investigated 5 of the top 10 most frequent bulges and closing pair sequences in the database and incorporated the most frequent bulge from the literature (17). These six bulges account for 27% of the combinations found in the database. Overall, 41% of the combinations found in the database were accounted for in this study.
Table 1.
Frequency of Occurrence of Dinucleotide Bulge Sequences in a Secondary Structure Databasea
|
|
|
||||||
|---|---|---|---|---|---|---|---|
| Dataset 1
|
Dataset 2
|
||||||
| Bulge and NNb | Frequencyc | Percentaged | Referencee | Bulgef | Frequencyc | Percentaged | Referencef |
|
|
|
||||||
| G AA G C C |
240 | 13.05 | g | AA | 507 | 27.57 | g,h |
| G CA U U A |
85 | 4.62 | CA | 249 | 13.54 | h | |
| G UG A C U |
81 | 4.40 | AC | 217 | 11.80 | h | |
| C CA G G C |
76 | 4.13 | h | AG | 180 | 9.79 | h |
| A AG G U C |
75 | 4.08 | UG | 152 | 8.27 | ||
| U AA A G U |
65 | 3.53 | GA | 144 | 7.83 | h | |
| U AC C A G |
62 | 3.37 | h | UA | 132 | 7.18 | |
| U AG U G A |
52 | 2.83 | h | CG | 55 | 2.99 | |
| U AC G A C |
44 | 2.39 | h | UC | 40 | 2.18 | h |
| U AC U A A |
43 | 2.34 | h | AU | 39 | 2.12 | h |
| G UA U C G |
41 | 2.23 | GU | 32 | 1.74 | ||
| A AA C U G |
39 | 2.12 | UU | 32 | 1.74 | ||
| G CA G C C |
39 | 2.12 | h | CC | 18 | 0.98 | |
| A AC G U C |
38 | 2.07 | GG | 17 | 0.92 | ||
| U GA A G U |
34 | 1.85 | h | GC | 14 | 0.76 | |
| G AA U C A |
33 | 1.79 | h | CU | 11 | 0.60 | |
| C GA A G U |
31 | 1.69 | |||||
| U GA C A G |
26 | 1.41 | |||||
| U AA C A G |
24 | 1.31 | |||||
| C AG C G G |
23 | 1.25 | |||||
Not all bulges found in the database are shown due to space limitations.
Dinucleotide bulge and nearest neighbor sequence.
Frequency of occurence in the database searched.
Percent out of 1839 dinucleotide bulges, the total number found in the database search.
Reference where data was reported.
Dinucleotide bulge sequence.
From Ref. 17.
From this work.
The second dataset in Table 1 displays the frequency and percent occurrence when only the bulge sequence is specified. There were 16 unique dinucleotide bulge sequences identified in the database, which includes all possible dinucleotide bulges. Table 1 includes all 16 of these unique sequences. The most frequent dinucleotide bulge is (5’ AA 3’) which accounts for 28% of all dinucleotide bulges in the database. This is the same dinucleotide bulge found within the most prevalent bulge and closing base pair combination discussed earlier. This study investigates the thermodynamics of seven different bulge sequences (not considering nearest neighbors) which represent 76% of the bulges present in the secondary structure database.
Thermodynamic Parameters
Table 2 presents the thermodynamic parameters for the formation of the duplexes containing the dinucleotide bulges. These thermodynamic parameters are derived from the analysis of individual melt curves and the analysis of the 1/TM vs log(CT) plots. The duplexes in the table are listed in order of decreasing frequency of occurrence in the secondary structure database.
Table 2.
Thermodynamic Parameters for the Formation of Duplexes Containing Dinucleotide Bulgesa
|
|
|
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Analysis of Tm Dependence/Errors
|
Analysis of Melt Curve Fits/Errors
|
||||||||||
| Frequencyb | Sequence c | ΔH° (kcal/mol) | ΔS° (cal/Kmol) | ΔG°37 (kcal/mol) | Tm (°C)d | ΔH° (kcal/mol) | ΔS° (cal/K·mol) | ΔG°37 (kcal/mol) | Tm (°C)d | ||
|
|
|
||||||||||
| 240 | GC | GAAG | CGAe | −54.4 | −154 | −6.7 | 38.2 | −35.2 | −91.6 | −6.8 | 39.5 |
| ACG | C C | GC | |||||||||
| GC | GAAG | UCAe | −49.4 | −138 | −6.6 | 37.5 | −40.9 | −110 | −6.8 | 39.0 | |
| ACG | C C | AG | |||||||||
| 76 | GAG | CCAG | GUG | −70.5 ± 4.5 | −198.4 ± 14.1 | −8.92 ± 0.13 | 47.9 | −66.4 ± 4.9 | −185.6 ± 15.8 | −8.85 ± 0.10 | 48.2 |
| CUC | G C | CAC | |||||||||
| 62 | GAC | UACC | GUG | −61.6 ± 2.2 | −179.7 ± 7.3 | −5.86 ± 0.05 | 33.7 | −62.3 ± 3.1 | −182.1 ± 10.2 | −5.86 ± 0.09 | 33.7 |
| CUG | A G | CAC | |||||||||
| 52 | GAG | UAGU | GUC | −62.4 ± 6.3 | −183.4 ± 20.8 | −5.49 ± 0.22 | 31.9 | −62.7 ± 4.9 | −184.3 ± 15.9 | −5.52 ± 0.13 | 32.1 |
| CUC | G A | CAG | |||||||||
| 44 | GAC | UACG | CUG | −63.7 ± 3.7 | −184.9 ± 12.1 | −6.40 ± 0.07 | 36.4 | −64.1 ± 6.8 | −186.0 ± 21.8 | −6.45 ± 0.17 | 36.6 |
| CUG | A C | GAC | |||||||||
| 43 | GAC | UACU | CUG | −70.2 ± 10.3 | −212.0 ± 34.2 | −4.46 ± 0.48 | 28.1 | −69.8 ± 15.3 | −210.4 ± 50.9 | −4.51 ± 0.53 | 28.3 |
| CUG | A A | GAC | |||||||||
| 39 | GAG | GCAG | GUG | −89.9 ± 3.8 | −262.4 ± 12.0 | −8.55 ± 0.08 | 44.1 | −85.5 ± 8.4 | −248.3 ± 26.8 | −8.51 ± 0.09 | 44.4 |
| CUC | C C | CAC | |||||||||
| 38 | GAG | AACG | CUG | −61.7 ± 2.3 | −178.3 ± 7.5 | −6.43 ± 0.05 | 36.5 | −60.2 ± 5.9 | −173.1 ± 19.4 | −6.50 ± 0.08 | 36.8 |
| CUC | U C | GAC | |||||||||
| 34 | GAC | UGAA | CUG | −58.6 ± 3.0 | −172.5 ± 10.1 | −5.09 ± 0.11 | 29.5 | −56.7 ± 3.8 | −166.2 ± 12.5 | −5.13 ± 0.17 | 29.5 |
| CUG | G U | GAC | |||||||||
| 33 | GAG | GAAU | CUG | −65.0 ± 9.8 | −188.9 ± 31.5 | −6.45 ± 0.47 | 36.6 | −63.0 ± 8.7 | −182.1 ± 28.7 | −6.52 ± 0.33 | 36.9 |
| CUC | C A | GAC | |||||||||
| 26 | GAC | UGAC | CUG | −64.6 ± 8.8 | −188.8 ± 28.6 | −6.03 ± 0.36 | 34.6 | −64.2 ± 10.9 | −187.4 ± 35.6 | −6.05 ± 0.32 | 34.7 |
| CUG | A G | GAC | |||||||||
| 24 | GAC | UAAC | CUG | −68.4 ± 4.5 | −200.5 ± 14.6 | −6.20 ± 0.09 | 35.5 | −68.1 ± 7.6 | −199.5 ± 24.5 | −6.23 ± 0.09 | 35.6 |
| CUG | A G | GAC | |||||||||
| 23 | GAG | CAGC | GUG | −68.3 ± 3.8 | −196.2 ± 12.1 | −7.41 ± 0.06 | 41.0 | −63.6 ± 3.4 | −181.2 ± 10.6 | −7.38 ± 0.10 | 41.2 |
| CUC | G G | CAC | |||||||||
| 21 | GAC | CAAC | GUG | −72.2 ± 3.61 | −209.0 ± 11.6 | −7.35 ± 0.05 | 40.6 | −67.0 ± 2.9 | −192.3 ± 8.9 | −7.34 ± 0.14 | 40.8 |
| CUG | G G | CAC | |||||||||
| 11 | GAG | GCAU | CUG | −59.6 ± 2.51 | −169.3 ± 8.1 | −7.12 ± 0.03 | 40.1 | −64.4 ± 11.7 | −184.6 ± 37.7 | −7.14 ± 0.18 | 40.0 |
| CUC | C A | GAC | |||||||||
| 11 | GAC | UAAC | CUG | −53.4 ± 5.6 | −149.3 ± 18.1 | −7.08 ± 0.20 | 40.2 | −67.2 ± 17.0 | −194.0 ± 55.0 | −7.01 ± 0.33 | 39.2 |
| CUG | G G | GAC | |||||||||
| 10 | GAG | GAUC | GUG | −75.5 ± 2.9 | −217.6 ± 9.2 | −7.98 ± 0.05 | 43.1 | −74.0 ± 8.0 | −213.1 ± 25.4 | −7.95 ± 0.17 | 43.1 |
| CUC | C G | CAC | |||||||||
| 7 | GAC | AUCG | CUG | −61.3 ± 0.8 | −181.0 ± 2.7 | −5.17 ± 0.03 | 30.2 | −62.9 ± 4.0 | −186.2 ± 13.3 | −5.10 ± 0.16 | 30.1 |
| CUG | U U | GAC | |||||||||
| 6 | CUG | GUCG | CUC | −66.2 ± 0.9 | −186.2 ± 2.8 | −8.48 ± 0.02 | 46.4 | −69.1 ± 3.3 | −195.1 ± 10.4 | −8.55 ± 0.10 | 46.4 |
| GAC | C C | GAG | |||||||||
Measurements were made in 1 M NaCl, 20 mM sodium cacodylate, and 0.5 M EDTA pH 7.
Frequency of occurrence obtained from database described in Materials and Methods.
The dinucleotide bulge is identified in bold letters. The nearest neighbors and bulge are set apart for easy identification. The top strand of each duplex is written 5' to 3', and each bottom strand is written 3' to 5'.
All values are calculated at 10−4 M oligomer concentration.
Melt data from ref 17.
Contribution of the Dinucleotide Bulge to Duplex Thermodynamics
The dinucleotide bulge contribution to duplex thermodynamics is listed in Table 3. These values were calculated as described in the Materials and Methods. The measured free energy contribution of the dinucleotide bulges ranged from 2.33 to 3.76 kcal/mol. The most destabilizing dinucleotide bulge is , while the least destabilizing dinucleotide bulge is .
Table 3.
Contribution of Dinucleotide Bulge to Duplex Thermodynamics
| ΔG°37 (kcal/mol)
| |||||
|---|---|---|---|---|---|
| Sequence Independent Model
|
Sequence Dependent Model
|
||||
| Sequencea | Measuredb | Predictionc | Differenced | Prediction | e Differencef |
| GAAGg C C |
3.57 | 2.8 | 0.77 | 3.06 | 0.51 |
| GAAGg C C |
3.08 | 2.8 | 0.28 | 3.06 | 0.02 |
| CCAG G C |
2.45 | 2.8 | 0.35 | 2.41 | 0.04 |
| UACC A G |
2.98 | 2.8 | 0.18 | 3.16 | 0.18 |
| UAGU G A |
3.25 | 2.8 | 0.45 | 3.23 | 0.02 |
| UACG A C |
3.34 | 2.8 | 0.54 | 3.10 | 0.21 |
| UACU A A |
3.76 | 2.8 | 0.96 | 3.61 | 0.15 |
| GCAG C C |
2.66 | 2.8 | 0.14 | 2.41 | 0.25 |
| AACG U C |
3.42 | 2.8 | 0.62 | 3.16 | 0.26 |
| UGAA G U |
3.16 | 2.8 | 0.36 | 3.23 | 0.07 |
| GAAU C A |
3.24 | 2.8 | 0.44 | 3.51 | 0.27 |
| UGAC A G |
3.55 | 2.8 | 0.75 | 3.51 | 0.04 |
| UAAC A G |
3.38 | 2.8 | 0.58 | 3.51 | 0.13 |
| CAGC G G |
3.06 | 2.8 | 0.26 | 3.06 | 0.00 |
| CAAC G G |
3.12 | 2.8 | 0.32 | 3.06 | 0.06 |
| GCAU C A |
2.57 | 2.8 | 0.23 | 2.86 | 0.29 |
| UAAC G G |
2.64 | 2.8 | 0.16 | 2.78 | 0.14 |
| GAUC C G |
2.33 | 2.8 | 0.47 | 2.71 | 0.38 |
| AUCG U U |
3.33 | 2.8 | 0.53 | 3.10 | 0.23 |
| GUCG C C |
2.73 | 2.8 | 0.07 | 2.93 | 0.20 |
|
| |||||
| Averageh | 0.42 ± 0.23 | 0.17 ± 0.13 | |||
The dinucleotide bulge is identified by bold letters. The top strand of each duplex is written 5' to 3', and each bottom strand is written 3' to 5'.
The experimental free energy contribution of the bulge calculated as mentioned in the text.
The free energy prediction made by the sequence independent model (Ref 17).
The absolute difference between the free energy predicted by the sequence independent model (Ref 17) and the experimental free energy.
The free energy prediction made by the seqence dependent model proposed here.
The absolute difference between the free energy predicted by the sequence dependent model and the experimental free energy.
Data from Ref 17.
The average (absolute value) deviation.
Free Energy Parameters for Dinucleotide Bulges
Currently, the free energy penalty for all dinucleotide bulges regardless of bulge and closing pair sequence is 2.8 kcal/mol (17). Multiple models were tested to improve upon the current method of prediction. The model that resulted in the lowest average deviation between the predicted and experimental values is:
| (1) |
As shown in Table 4, ΔG°37, bulge is dependent upon the type of nucleotides in the bulge (purine or pyrimidine) and their order. Thus, ΔG°37, bulge is a 3.06 kcal/mol penalty for two bulged purines, a 2.94 kcal/mol penalty for two bulged pyrimidines, a 2.71 kcal/mol penalty for a bulge of 5’-purine-pyrimidine-3’, and a 2.41 kcal/mol penalty for a bulge of 5’-pyrimidine-purine-3’. ΔG°37, AU is a 0.45 kcal/mol penalty for each A-U pair adjacent to the bulge; ΔG°37, GU is a −0.28 bonus for each G-U base pair adjacent to the bulge. It is important to note the A-U adjacent pair penalty of 0.45 kcal/mol is in addition to the 0.45 kcal/mol penalty for a terminal A-U pair that is used when calculating free energy contribution of the stem. As an example, the stability of 5’GACUACCGUG3’/3’CUGAGCAC5’ is predicted to be −5.86 kcal/mol, as shown below:
| (2) |
Table 4.
Sequence Dependent Model for Predicting the Free Energy Contribution of Dinucleotide Bulges
| ΔG°37, dint bulge Parameters | Free Energy Contribution (kcal/mol) |
|---|---|
| Δ G°37, bulge a | |
| 2 purines | 3.06 ± 0.10 |
| 2 pyrimidines | 2.94 ± 0.20 |
| 1 purine, 1 pyrimidine (5' RY 3') | 2.71 ± 0.16 |
| 1 pyrimidine, 1 purine (5' YR 3') | 2.41 ± 0.15 |
| ΔG°37, AU b (kcal/mol) | 0.45 ± 0.11 |
| ΔG°37, GU c (kcal/mol) | −0.28 ± 0.16 |
The free energy contribution attributed to the two bulged nucleotides. One of the four values will be applied depending on the purine/pyrimidine composition of the bulge.
The free energy penalty applied for each A-U closing pair adjacent to the dinucleotide bulge.
The free energy bonus applied for each G-U closing pair adjacent to the dinucleotide bulge.
Substituting eq 1 into eq 2 yields:
| (3) |
Replacing the terms with values yields:
| (4) |
which is close to the measured value of −5.86 kcal/mol. This new sequence dependent model predicted the experimental values of the dinucleotide bulges with an average difference of 0.17 kcal/mol from the experimental value (Table 3).
DISCUSSION
Database Searching
While the database search yielded 220 possible combinations of bulge and closing pair sequence combinations, it is likely that some of the possible sequence combinations that were not found in the database do exist in nature and would have been found in a larger secondary structure database. There may also be a structural explanation for why certain sequence combinations occur more frequently than others in nature, though this would require extensive structural studies.
It is interesting to note that 8 of the top 10 most frequent bulge sequences in the database contain at least one A-U closing base pair and account for 28% of the bulge and corresponding closing base pair sequences in the database. Similarly, 3 of the top 10 most frequent sequences contain at least one G-U closing base pair and account for 11% of the total dinucleotide bulges and corresponding closing base pair sequences in the database. The remaining closing base pair, G-C, is present in 6 of the top 10 most frequent bulge sequences in the database and accounts for 31% of the total dinucleotide bulges and closing pair sequences in the database.
Thermodynamic Contributions of a Dinucleotide Bulge to Duplex Thermodynamics
All of the dinucleotide bulges included in this study destabilized the duplex. This destabilization is expected, and it is assumed that every dinucleotide bulge disrupts what otherwise would be stabilizing stacking interactions between neighboring nucleotides (9). Also, the presence of a bulge likely places strain on the adjacent base pairs, and the resulting bend or kink at the bulge site may destabilize the helix.
The bulges that consist of only pyrimidines or only purines seem to impart a larger penalty than the bulges that consist of one pyrimidine and one purine. This seems to suggest that the identity of the bulge does play a role in the overall destabilization of the dinucleotide bulge. This conclusion favors a sequence dependent model over a sequence independent model which attributes a constant value for all dinucleotide bulges regardless of sequence. If nature selected bulges based on stability, we would expect bulges of one purine and one pyrimidine to be the most common. However, the most common dinucleotide bulge (5’-AA-3’) contains two purines.
Petrov, Zirbel, and Leontis (25) have identified several structural motifs adopted by dinucleotide bulges (Fig. 1). Some examples of these structural motifs include (a) a bulge with its nucleotides in an extensive stacking arrangement with the closing base pairs and the nucleotides in the bulge but no pairing by the bulge nucleotides, (b) a bulge with its nucleotides “flipped out” and not stacking on each other or the closing base pairs and with no pairing by the bulge nucleotides, (c) a bulge with one bulge nucleotide pairing with a closing nucleotide on the opposite strand creating a base triple, and (d) a bulge with one bulge nucleotide pairing with a closing nucleotide on the opposite strand creating a base triple and the other bulge nucleotide pairing with a closing nucleotide on the same strand creating a second base triple. Although we have observed some sequence-stability patterns and Petrov, Zirbel, and Leontis identified some structural motifs, we were unable to correlate the thermodynamic contribution of the bulge to stacking, hydrogen bonding, or structural features in general. Our limited knowledge about structural motifs in dinucleotide bulges does not provide enough information about stacking of the bulge nucleotides with each other, stacking of the bulge nucleotides with the closing base pairs, stacking of the closing base pairs with each other, pairing of the bulge nucleotides with each other, and pairing of the bulge nucleotide with the closing base pairs to development any meaningful relationship between structure and stability. A significant effort to identify sequence-structure patterns in dinucleotide bulges may shed some light on the relationship between thermodynamic stability and structure.
Figure 1.
Dinucleotide bulge motifs identified from 3D structures (25). The nucleotides are colored by residue type, A is red, C is yellow, G is green, and U is blue, and are labeled by residue type and sequence number. (top left) Motif IL_04466.1 with sequence 5’CAAC3’/3’GG5’ (PDB ID 3U5H). In this structure, the closing G-C pairs are canonical Watson-Crick pairs, and the bulge nucleotides are inserted into the helix and stack on each other and the C’s of the closing G-C pair. The bulge nucleotides do not form any pairs. (top right) Motif IL_23448.1 with sequence 5’CGAC3’/3’GG5’ (PDB ID 4ERD). In this structure, the closing G-C pairs are canonical Watson-Crick pairs, and the bulge nucleotides are “flipped out” and do not stack on each other or the closing base pairs. The bulge nucleotides do not form any pairs, and the bulge disrupts the closing base pairs from stacking with each other. (bottom left) Motif IL_37964.2 with sequence 5’CAUU3’/3’GA5’ (PDB ID 3J7A). In this structure, the closing A933-U1034 and G934-C1031 pairs are canonical Watson-Crick pairs, and there is significant stacking between the residues. Bulge residue A1032 is pairing with A933 on the opposite strand, forming an A-A-U base triple. (bottom right) Motif IL_73000.4 with sequence 5’CAGC3’/3’GG5’ (PDB ID 2AW7). In this structure, the closing G-C pairs are canonical Watson-Crick pairs, with some stacking between the nucleotides. Bulge residue G1131 is pairing with G1143 on the opposite strand, forming a G-G-C base triple. In addition, bulge residue A1130 is pairing with the adjacent nucleotide C1129, forming a G-C-A base triple.
Improving the Model Used to Predict Dinucleotide Thermodynamics
The current model used by RNAstructure to predict the free energy contribution of dinucleotide bulges was derived by averaging the measured free energy of only six dinucleotide bulges. Four of these six sequences were not included in this study as they had issues with possible self-association and/or exhibited a non-two state melt as discussed previously (17). The two that were used had the same bulge and nearest neighbors. The large range of bulge contributions seen here, approximately 1.5 kcal/mol (Table 3), suggested that a sequence dependent model may be better at predicting the free energy contribution of a dinucleotide bulge. Therefore, the data collected here (a threefold increase in sample size in comparison to the previous dinucleotide bulge dataset) was used to derive a sequence dependent model (Table 4). On average, the sequence independent model (17) predicts a free energy that is 0.42 kcal/mol different than the experimental value. The sequence dependent model proposed here predicts an average free energy that is only 0.17 kcal/mol different than the experimental value, over a two-fold improvement. Additionally, the standard deviation of the predictions using the sequence independent model is 0.23 kcal/mol, which is almost double that of the sequence dependent model (0.13 kcal/mol) (Table 3). It is important to note, bulges whose stabilities are poorly predicted by the sequence independent model are more accurately predicted with the sequence dependent model. For example, when using the sequence independent model, there are seven bulges with predicted values varying from the experimental values by >0.5 kcal/mol, with the highest difference being 0.96 kcal/mol. In contrast, the sequence dependent model has only one bulge whose predicted value varies from the experimental value by >0.5 kcal/mol, more specifically, only 0.51 kcal/mol.
Not only does the sequence dependent model predict the experimental dinucleotide bulge contribution well, it is also consistent with the published model used to predict the free energy contribution of trinucleotide bulges (19). Both models are sequence dependent; they rely on the identity of the bulge nucleotides and the adjacent base pairs to predict the free energy contribution to the bulge. Also, for dinucleotide and trinucleotide bulges, bulges consisting of all pyrimidines or all purines are more destabilizing than bulges consisting of both pyrimidines and purines. Finally, both dinucleotide bulges and trinucleotide bulges utilize a penalty for an A-U adjacent pair (0.49 kcal/mol for dinucleotide bulges and 0.45 kcal/mol for trinucleotide bulges) and a bonus for a G-U adjacent pair (−0.28 kcal/mol for dinucleotide bulges and −0.56 kcal/mol for trinucleotide bulges) (19). This G-U bonus is also present in the predictive model for single nucleotide bulges (18).
With the collection of new thermodynamic data for 18 dinucleotide bulges (plus one additional bulge from the literature) new experimental data is available for free energy and secondary structure prediction software. The newly-derived sequence dependent predictive model can also be incorporated into prediction software. These should both improve prediction of RNA stability and secondary structure from sequence.
Acknowledgments
FUNDING
Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number [2R15GM085699-02].
Footnotes
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
References
- 1.Reymond C, Beaudoin JD, Perreault JP. Modulating RNA structure and catalysis: Lessons from small cleaving ribozymes. Cell Mol Life Sci. 2009;66:3937–3950. doi: 10.1007/s00018-009-0124-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Morris KV, Mattick JS. The rise of regulatory RNA. Nat Rev Genet. 2014;15:423–437. doi: 10.1038/nrg3722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hollands K, Proshkin S, Sklyarova S, Epshtein V, Mironov A, Nudler E, Groisman EA. Riboswitch control of Rho-dependent transcription termination. Proc Natl Acad Sci USA. 2012;109:5376–5381. doi: 10.1073/pnas.1112211109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Valadkhan S. snRNAs as the catalysts of pre-mRNA splicing. Curr Opin Chem Biol. 2005;9:603–608. doi: 10.1016/j.cbpa.2005.10.008. [DOI] [PubMed] [Google Scholar]
- 5.Mathews DH, Disney MD, Childs JC, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci USA. 2004;101:7287–7292. doi: 10.1073/pnas.0401799101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. doi: 10.1093/nar/gkg599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li X, Quon G, Lipshitz HD, Morris Q. Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. RNA. 2010;16:1096–1107. doi: 10.1261/rna.2017210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
- 10.Gerdeman MS, Henkin TM, Hines JV. Solution structure of the Bacillus subtilis T-box antiterminator RNA: Seven nucleotide bulge characterized by stacking and flexibility. J Mol Biol. 2003;326:189–210. doi: 10.1016/s0022-2836(02)01339-6. [DOI] [PubMed] [Google Scholar]
- 11.McManus CJ, Schwartz ML, Butcher SE, Brow DA. A dynamic bulge in the U6 RNA internal stem-loop functions in spliceosome assembly and activation. RNA. 2007;13:2252–2265. doi: 10.1261/rna.699907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Peattie DA, Douthwaite S, Garrett RA, Noller HF. A bulged double helix in a RNA-protein contact site. Proc Natl Acad Sci USA. 1981;78:7331–7335. doi: 10.1073/pnas.78.12.7331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Woese CR, Gutell RR. Evidence for several higher-order structural elements in ribosomal-RNA. Proc Natl Acad Sci USA. 1989;86:3119–3122. doi: 10.1073/pnas.86.9.3119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.O’Connor CM, Collins K. A novel RNA binding domain in Tetrahymena telomerase p65 initiates hierarchical assembly of telomerase holoenzyme. Mol Cell Biol. 2006;26:2029–2036. doi: 10.1128/MCB.26.6.2029-2036.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Du Z, Yu J, Ulyanov NB, Andino R, James TL. Solution structure of consensus stem-loop D RNA domain that plays important roles in translation and replication in enteroviruses and rhinoviruses. Biochemistry. 2004;43:11959–11972. doi: 10.1021/bi048973p. [DOI] [PubMed] [Google Scholar]
- 16.Rhim H, Rice AP. Functional significance of the dinucleotide bulge in stem-loop 1 and stem-loop 2 of HIV-2 TAR RNA. Virology. 1994;202:202–211. doi: 10.1006/viro.1994.1336. [DOI] [PubMed] [Google Scholar]
- 17.Longfellow CE, Kierzek R, Turner DH. Thermodynamic and spectroscopic study of bulge loops in oligoribonucleotides. Biochemistry. 1990;46:4625–4634. doi: 10.1021/bi00453a038. [DOI] [PubMed] [Google Scholar]
- 18.Znosko BM, Silvestri SB, Volkman H, Boswell B, Serra MJ. Thermodynamic parameters for an expanded nearest-neighbor model for the formation of RNA duplexes with single nucleotide bulges. Biochemistry. 2002;41:10406–10417. doi: 10.1021/bi025781q. [DOI] [PubMed] [Google Scholar]
- 19.Murray MH, Hard JA, Znosko BM. Improved model to predict the free energy contribution of trinucleotide bulges to RNA duplex stability. Biochemistry. 2014;53:3502–3508. doi: 10.1021/bi500204e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Christiansen ME, Znosko BM. Thermodynamic characterization of the complete set of sequence symmetric tandem mismatches in RNA and an improved model for predicting the free energy contribution of sequence asymmetric tandem mismatches. Biochemistry. 2008;47:4329–4336. doi: 10.1021/bi7020876. [DOI] [PubMed] [Google Scholar]
- 21.Wright DJ, Rice JL, Yanker DM, Znosko BM. Nearest neighbor parameters for inosine-uridine pairs in RNA duplexes. Biochemistry. 2007;46:4625–4634. doi: 10.1021/bi0616910. [DOI] [PubMed] [Google Scholar]
- 22.McDowell JA. Meltwin. 1996. version 3.5. [Google Scholar]
- 23.Vanegas PL, Horwitz TS, Znosko BM. Effects of non-nearest neighbors on the thermodynamic stability of RNA GNRA hairpin tetraloops. Biochemistry. 2012;51:2192–2198. doi: 10.1021/bi300008j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xia T, SantaLucia J, Jr, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH. Thermodynamic parameters for an expanded nearest- neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry. 1998;37:14719–14735. doi: 10.1021/bi9809425. [DOI] [PubMed] [Google Scholar]
- 25.Petrov AI, Zirbel CL, Leontis NB. Automated classification of RNA 3D motifs and the RNA 3D motif atlas. RNA. 2013;19:1327–1340. doi: 10.1261/rna.039438.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

