Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2009 Jun 9;37(14):4696–4706. doi: 10.1093/nar/gkp465

Thermodynamic characterization of tandem mismatches found in naturally occurring RNA

Martha E Christiansen 1, Brent M Znosko 1,*
PMCID: PMC2724281  PMID: 19509311

Abstract

Although all sequence symmetric tandem mismatches and some sequence asymmetric tandem mismatches have been thermodynamically characterized and a model has been proposed to predict the stability of previously unmeasured sequence asymmetric tandem mismatches [Christiansen,M.E. and Znosko,B.M. (2008) Biochemistry, 47, 4329–4336], experimental thermodynamic data for frequently occurring tandem mismatches is lacking. Since experimental data is preferred over a predictive model, the thermodynamic parameters for 25 frequently occurring tandem mismatches were determined. These new experimental values, on average, are 1.0 kcal/mol different from the values predicted for these mismatches using the previous model. The data for the sequence asymmetric tandem mismatches reported here were then combined with the data for 72 sequence asymmetric tandem mismatches that were published previously, and the parameters used to predict the thermodynamics of previously unmeasured sequence asymmetric tandem mismatches were updated. The average absolute difference between the measured values and the values predicted using these updated parameters is 0.5 kcal/mol. This updated model improves the prediction for tandem mismatches that were predicted rather poorly by the previous model. This new experimental data and updated predictive model allow for more accurate calculations of the free energy of RNA duplexes containing tandem mismatches, and, furthermore, should allow for improved prediction of secondary structure from sequence.

INTRODUCTION

Non-canonical regions account for about half of the secondary structure of RNA (1). In particular, tandem mismatches or 2 × 2 nucleotide internal loops, are widespread and are found quite frequently. For example, tandem mismatches have been found to occur in the rRNA of Escherichia coli (2), Haloferex volcanni (3) and Haloarcula marismortui (4); the mitochondrial 16S-like rRNA of Clamydomonas reinhardtii (3); the genomic RNA of the dengue-3 virus (5); turnip crinkle viral RNA (6); and myotonic dystrophy type 2 RNA (7), to name a few. It is important to note, however, that tandem mismatches not only occur in a variety of different RNA and in different organisms, but they also serve functional roles within the RNA. For example, the aminoglycoside ampramycin binds to a tandem mismatch in bacteria and halts replication (8). Another functional role of tandem mismatches occurs in the J4/5 loop of the group I intron of Candida albicans. This tandem mismatch is the docking site for the first step of the self-splicing reaction required for processing of rRNA (9).

Due to the natural occurrence of tandem mismatches and their functional roles within certain RNAs, the ability to accurately predict tandem mismatches in secondary structures can lead to a better understanding of tertiary interactions and structure–function relationships and aid in the design of pharmaceuticals. As a result, thermodynamics of tandem mismatches have been studied quite extensively (10–19). Recently, we reported a complete periodic table of sequence symmetric tandem mismatches and proposed an updated model for predicting the free energy contribution of previously unmeasured sequence asymmetric tandem mismatches (19). However, despite these studies, only 8% of all possible tandem mismatch—nearest neighbor combinations have experimentally determined thermodynamic parameters. Due to a lack of data, current algorithms that predict RNA secondary structure from sequence (1,20–24) are based on several assumptions and may be oversimplified.

An obvious way to improve the current algorithms is to expand the experimental data set for tandem mismatches. Ideally, the algorithms should contain experimental parameters for each possible tandem mismatch (1095 when defined as the tandem mismatch and the adjacent nearest neighbors). Although this is unrealistic, what is possible is to measure the thermodynamics of tandem mismatches that occur frequently in nature. This would provide algorithms with experimental parameters for the tandem mismatches that users of these programs would most likely be interested in. Thus, the only tandem mismatches that would rely on a predictive model would be those that occur less frequently in nature. In addition, more experimental data would increase the amount of tandem mismatch data that is available from which the predictive model was derived, enhancing the accuracy of the model. With this in mind, the frequency of occurrence of tandem mismatches in a dataset of RNA secondary structures was determined. Twenty five of the most frequently occurring tandem mismatches were then optically melted in order to derive their corresponding thermodynamic parameters. In addition, this thermodynamic data was used to update the predictive model used for tandem mismatches that do not have experimental values. This additional experimental data and updated predictive model can now be incorporated into secondary structure algorithms.

MATERIALS AND METHODS

Compiling and searching a database for tandem mismatches

A database of 1899 RNA secondary structures containing 123 small subunit rRNAs (3), 223 large subunit rRNAs (25,26), 309 5S rRNAs (27), 484 tRNAs (28), 91 signal recognition particles (29), 16 RNase P RNAs (30), 100 group I introns (31,32), 3 group II introns (33) and 450 miRNAs (34–36). This database was searched for tandem mismatches, and the number of occurrences for each type of tandem mismatch was tabulated. In this work, G–U pairs are considered to be canonical base pairs.

Design of sequences for optical melting studies

Sequences of tandem mismatches and nearest neighbors were designed to represent those found most frequently in the database described above. Each duplex chosen was 10 base pairs long, with the tandem mismatches occurring in the middle of the duplex. On either side of the tandem mismatch are the adjacent nearest neighbors and three additional canonical base pairs. The terminal base pairs are G–C base pairs in order to prevent end fraying of the sequence during melting. The duplexes were also designed to have a melting temperature between 35°C and 55°C and to have minimal formation of hairpin structures or mis-aligned duplexes.

RNA synthesis and purification

Oligonucleotides were ordered from the Keck Lab at Yale University (New Haven, CT), Azco BioTech, Inc. (San Diego, CA) or Integrated DNA Technologies (Coralville, IA). The synthesis and purification of the oligonucleotides followed standard procedures that were described previously (19,37,38).

Optical melting experiments and thermodynamics

Optical melting experiments were performed in 1 M NaCl, 20 mM sodium cacodylate and 0.5 mM Na2EDTA (pH 7.0). Melting curves (absorbance versus temperature) were obtained, and duplex thermodynamics were determined as described previously (19,37,38). The thermodynamic contributions of tandem mismatches to duplex thermodynamics (ΔG°37,tandem mismatch, ΔH°tandem mismatch and ΔS°tandem mismatch) were determined by subtracting the Watson–Crick contribution (39) from the measured duplex thermodynamics. This type of calculation was described previously (19).

Linear regression and tandem mismatch thermodynamic parameters

Data collected for the 25 sequence asymmetric tandem mismatches in this study were combined with previously published data for 72 sequence asymmetric tandem mismatches (12,15–18). Similar to what was done previously to derive parameters for sequence asymmetric tandem mismatches (19), six variables were used for linear regression: (i) tandem mismatches with a U•U pair adjacent to an R•R pair, tandem mismatches with one G•A or A•G pair adjacent to a Y•Y pair, or tandem mismatches with any combination of A•C, U•C, C•U, C•C, C•A or A•A pairs; (ii) tandem mismatches with any combination of adjacent G•A and A•G pairs or adjacent U•U pairs; (iii) tandem mismatches with a U•U pair adjacent to a Y•Y (not U•U), C•A, or A•C pair; (iv) tandem mismatches with a single G•G pair not adjacent to a U•U pair; (v) tandem mismatches with an A–U or U–A nearest neighbor; and (vi) tandem mismatches with a G–U or U–G nearest neighbor. The calculated experimental contribution of the tandem mismatch to duplex stability was used as a constant when doing linear regression. To simultaneously solve for each variable, the LINEST function of Microsoft Excel was used for linear regression. Based on this dataset, the fourth parameter did not contribute to tandem mismatch stability. Therefore, this parameter was removed and linear regression was re-run with the remaining parameters. Various other parameters and/or combinations of parameters were tried, but the parameters described here resulted in free energy contributions that agreed closely with the experimental data and error values that are comparable to those of the RNAstructure algorithm (1,20,21).

Structural investigation

Six previously solved three-dimensional structures containing frequently occurring tandem mismatches were downloaded from the Protein Data Bank (PDB) (40). These structures were viewed using Insight II software (Accelrys Software Inc.), and base–base hydrogen bonding between the mismatch nucleotides was investigated. A hydrogen bond was considered to be present when the distance between the proton on the donor atom and the heavy atom acceptor was less than 2.5 Å and when the angle between the heavy atom donor, proton, and heavy atom acceptor was between 120° and 180°.

RESULTS

Database searching

The database described above was searched for tandem mismatches. In this database, 1092 tandem mismatches were found, averaging about one occurrence in every two sequences. Table 1 shows a summary of the database results obtained. The first set of data lists frequency and percent occurrence when the mismatch nucleotides and nearest neighbors are specified. Categorizing tandem mismatches in this fashion results in 350 types of mismatches in the database. The 30 tandem mismatch types listed in the first data set (Table 1) account for 51% of the total number of tandem mismatches found. The 320 types of tandem mismatches not shown account for the remaining 49%; however, each type represents <0.6% of the total number of tandem mismatches found. When categorized in this manner, previous studies account for only 21% of the total number of tandem mismatches found, but after adding the data reported here, this percentage increases to 59%. Similarly, previous studies thermodynamically characterized only seven types of tandem mismatches in the top 30, but after adding the data reported here, all of the tandem mismatches in the top 30 have been studied.

Table 1.

Summary of database search results for tandem mismatchesa

graphic file with name gkp465i3.jpg

The second set of data (Table 1) lists frequency and percent occurrence when only the tandem mismatch sequence is specified (nearest neighbors are not considered). Categorizing tandem mismatches in this fashion results in 55 types of tandem mismatches in the database. The 30 tandem mismatches listed in the second data set (Table 1) account for 93% of the total number of tandem mismatches found. The 25 types of tandem mismatches not shown account for the remaining 7%, with each mismatch representing <0.5% of the total number of tandem mismatches found. When categorized in this manner, previous studies account for 93% of the total number of tandem mismatches found. Adding the data reported here did not change this percentage.

The third set of data (Table 1) lists frequency and percent occurrence of 5′ and 3′ nearest neighbor combinations. Categorizing tandem mismatches in this fashion results in 21 types of nearest neighbor combinations in the database, representing all possible types of nearest neighbor combinations. When categorized in this manner, previous studies account for 55% of all nearest neighbor combinations, but after adding the data reported here, this percentage increases to 92%.

The fourth set of data (Table 1) lists frequency and percent occurrence of the tandem mismatch nucleotides when A and G are categorized as purines (R) and C and U are categorized as pyrimidines (Y). Categorizing tandem mismatches in this fashion results in 10 types of tandem mismatches, representing all possible combinations. When categorized in this manner, previous studies account for 99.5% of all combinations. Adding the data reported here did not change this percentage. The only combination that has not been measured is 5′RR3′/3′YY5′ (with 5′AA3′/3′CC5′ being the only possible tandem mismatch sequence in this category).

Thermodynamic parameters

Table 2 shows the thermodynamic parameters of duplex formation that were obtained from fitting each melting curve to the two-state model and from the van’t Hoff plot of TM−1 versus log (CT/4). Data for the duplexes containing the 30 most frequently occurring tandem mismatches in the database are shown in order of decreasing frequency. However, data for 33 duplexes are shown because three tandem mismatches were melted with two different stem sequences.

Table 2.

Thermodynamic parameters for duplex formationa

graphic file with name gkp465i4.jpg

Contribution of tandem mismatches to duplex thermodynamics

The contributions of the 33 tandem mismatches to duplex stability are listed in Table 3. The examination of the thermodynamic contributions of tandem mismatches to duplex thermodynamics indicates a large variance in the obtained thermodynamic parameters. Tandem mismatch contributions to enthalpy, entropy and free energy changes range from −31.1 to 38.4 kcal/mol, −101.7 to 117.1 cal/(K mol) and −1.7 to 3.7 kcal/mol, respectively.

Table 3.

Contributions of 33 tandem mismatches to duplex stabilitya

graphic file with name gkp465i5.jpg

Updated model for predicting the thermodynamics of previously unmeasured sequence asymmetric tandem mismatches

An updated model to predict the thermodynamics of previously unmeasured sequence asymmetric tandem mismatches was derived by compiling data for 72 sequence asymmetric tandem mismatches from previous works (12,15–18) and the data for 25 sequence asymmetric tandem mismatches from this work. Linear regression was then used to derive nearest neighbor parameters for predicting the contribution of a tandem mismatch to duplex thermodynamics. These parameters are shown in Table 4. Table 4 also lists parameters for calculating enthalpy and entropy contributions of rare sequence asymmetric tandem mismatches.

Table 4.

Model for predicting the free energy contribution of previously unmeasured sequence asymmetric tandem mismatches at 37°C

Tandem mismatches witha Inline graphic increments proposed previously (19) (kcal/mol) Inline graphic increments (kcal/mol) ΔH° increments (kcal/mol) ΔS° increments (eu)
    a U·U pair adjacent to an R·R pair
      or
    a G·A or A·G pair adjacent to a Y·Y pair 1.1 ± 0.1 1.0 ± 0.1 −6.2 ± 2.2 −23.5 ± 6.9
      or
any combination of A·C, U·C, C·U, C·C, C·A, or A·A pairs
    any combination of adjacent G·A and A·G pairs
      or −1.2 ± 0.3 −0.7 ± 0.2 −13.6 ± 3.3 −41.3 ± 10.4
two U·U pairs
a U·U pair adjacent to a Y·Y (not U·U), C·A, or A·C pair 0.8 ± 0.2 0.6 ± 0.2 −4.3 ± 3.0 −14.3 ± 9.5
  a G·G pair not adjacent to a U·U pair −0.3 ± 0.2 0 0 0
    per A-U nearest neighbor 0.5 ± 0.2 1.0 ± 0.1 −5.3 ± 2.2 −20.6 ± 6.9
    per G-U nearest neighbor 1.2 ± 0.1 1.0 ± 0.1 −5.0 ± 2.6 −19.5 ± 8.1

aAny other base pair combinations in a tandem mismatch do not contribute to duplex stability.

We previously proposed a model for predicting the free energy contribution of previously unmeasured sequence asymmetric tandem mismatches (19). Since the database of sequence asymmetric tandem mismatches has increased from 72 to 97 tandem mismatches with the data reported here, we re-derived the parameters used in the previous model to determine if the additional data would change any of the parameters published previously (19). With the addition of the new data, there were some minor changes to the previous parameters (Table 4). Initially, as done previously, a parameter was derived for loops containing a G•G pair not adjacent to a U•U pair. It was found that these loops did not contribute to duplex stability. Therefore, this mismatch combination was removed as a parameter. Four of the remaining five parameters derived with the additional data are within experimental error of the previous data. The one value not within experimental error was the penalty added per A–U nearest neighbor; this penalty changed from 0.5 to 1.0 kcal/mol. The difference between the previous penalty and the new penalty is likely due to the small sample size (eight occurrences) used to generate this parameter previously; with the data reported here, the number of tandem mismatches with A–U nearest neighbors has quadrupled in size (31 occurrences). A small sample size should not be a problem with any other parameter, as all parameters now have at least 16 occurrences contributing to the derivation of that parameter. The penalty per A–U nearest neighbor is the same as the penalty per G–U nearest neighbor, 1.0 kcal/mol. Having an identical penalty for an A–U and G–U pair adjacent to a tandem mismatch is consistent with previous findings for tandem mismatches (1,20,21) and for 1 × 2 nucleotide internal loops (41). In our previous model (19), we recently proposed penalties that were slightly different, 0.5 kcal/mol per A–U nearest neighbor and 1.2 kcal/mol per G–U nearest neighbor. As stated previously, this difference was likely due to the small sample size of tandem mismatches with A–U nearest neighbors.

DISCUSSION

Database searching

Due to the size and diversity of the RNA secondary structure database that was searched, we have assumed that the number and type of tandem mismatches found in this database are representative of tandem mismatches found in naturally occurring RNA.

It is clear from the first set of data in Table 1 that the tandem mismatch—nearest neighbor combinations that have been studied previously are not representative of those found in nature. When looking at the second set of data in Table 1, however, it appears as if the most frequent tandem mismatches (considering the mismatch nucleotides only) have already been studied. In 1997, Xia et al. (15) reported the thermodynamics of asymmetric tandem mismatches adjacent to G–C base pairs. As a result, most of the tandem mismatches (considering the mismatch nucleotides only) have already been studied. However, it has been shown that the stability of tandem mismatches depends not only on the identity of the nucleotides in the loop but also on the identity of the closing base pairs (1,10–21). Therefore, this work (i) focuses on frequently occurring tandem mismatches when considering both the nucleotides in the loop and the nearest neighbors, (ii) compliments the work done in 1997 by Xia et al. (15), and (iii) provides additional, useful information about the thermodynamic stability of tandem mismatches.

We previously compiled the thermodynamics of all 60 sequence symmetric tandem mismatches (19). It is interesting to note that only 3 of the 30 most frequently occurring tandem mismatches (loop nucleotides and nearest neighbors) are sequence symmetric (Table 1).

The database search to compile the values in Dataset 2 of Table 1 revealed 55 types of tandem mismatches in the database; representing all possible types of tandem mismatches.

Comparisons can be made between the most common nearest neighbor combinations listed here (Dataset 3 of Table 1) and those found most frequently adjacent to single mismatches (38) and 1 × 2 nucleotide loops (41). For tandem mismatches, a 5′C–G and 3′C–G combination was the most frequent (see the first entry of Dataset 3 in Table 1). This combination was the sixth most common combination for single mismatches (38) and the most common combination for 1 × 2 nucleotide loops (41). The most common combination for single mismatches was a 5′G–C and 3′C–G combination (38) (which is the 10th most common for tandem mismatches). It is unclear why single mismatches prefer different nearest neighbor combinations than 1 × 2 loops and tandem mismatches.

When categorizing the loop nucleotides as purines and pyrimidines (Dataset 4 of Table 1), it is interesting to note that the type of mismatch with the most possible combinations, 5′RY3′/3′RY5′ (with 16 possible sequences), is only the third most frequent. The top two types, all purine and all pyrimidine loops, each only have 10 possible sequences. Therefore, there appears to be a preference for all purine and all pyrimidine tandem mismatches, with all purine being the most preferred. In fact, all purine and all pyrimidine tandem mismatches account for 47% and 24%, respectively, of all the mismatches found in the database. Similar results were observed for 1 × 2 nucleotide internal loops; all purine (56%) and all pyrimidine (15%) internal loops accounted for ∼75% of all the 1 × 2 nucleotide internal loops in the database (41).

Thermodynamic contributions of tandem mismatches to duplex thermodynamics

From the data in Table 3, it is evident that the stability of a tandem mismatch does not determine its frequency of occurrence. For example, the second most stable tandem mismatch (5′GGAG3′/3′CAGC5′) is the 28th most common. Similarly, a tandem mismatch that contributes an unfavorable 3.1 kcal/mol toward duplex stability (5′AAUA3′/3′UCUU5′) appears in the top 30. Interestingly, all four tandem mismatches that have favorable free energy contributions (5′GA3′/3′AG5′, 5′AA3′/3′AG5′, 5′AA3′/3′GG5′ and 5′UU3′/3′UU5′) also have unfavorable free energy contributions when situated between different nearest neighbor combinations.

It is interesting to note that there are three tandem mismatches (loop nucleotides plus adjacent nearest neighbors) that were studied in two different stem sequences (Table 3). Two of the tandem mismatches, 5′CAUC3′/3′GGUG5′ and 5′CUUC3′/3′GUUG5′, differ by only 0.2 kcal/mol when placed in the two different stems. However, 5′CGAG3′/3′GAGC5′ differed by 1.1 kcal/mol when placed in the two different stems. This difference is likely due to non-nearest neighbor effects. Similar non-nearest neighbor effects have been observed previously (37,38,41–44). It has recently been reported that the accuracy of RNA secondary structure prediction by free energy minimization is limited by non-nearest neighbor effects (45). Since these effects may be complicated to interpret, non-nearest neighbor effects were ignored here, and data were treated as if stability relied only upon immediate nearest neighbors and the identity of the loop nucleotides. The role of non-nearest neighbor effects, however, is currently being investigated.

Updated model for predicting thermodynamics of tandem mismatches

Because we have collected thermodynamic data for 25 tandem mismatches that previously did not have experimental values, when predicting the free energy contributions of these mismatches in an RNA duplex, the experimental values can be used. These new experimental values, on average, are 1.0 kcal/mol different from the values predicted for these mismatches using the previous model.

In order to test the accuracy of this new model, the free energies of all 97 sequence asymmetric tandem mismatches compiled here were predicted using the updated model (although this model would never be used for these loops since they have experimental parameters available) (see Supplementary Table S1). For the sequence asymmetric tandem mismatches compiled here, the average absolute difference between the measured values and the values predicted using these updated parameters is 0.53 ± 0.41 kcal/mol. Stated differently, 57% of the experimental free energies were predicted within 0.5 kcal/mol, 86% were predicted within 1.0 kcal/mol and 96% were predicted within 1.5 kcal/mol. As suggested by these numbers, there are some idiosyncrasies associated with the free energy contributions of tandem mismatches that we still do not understand. Using the model proposed previously (19) and the same dataset of asymmetric tandem mismatches, the average absolute difference between the measured values and the values predicted is 0.58 ± 0.51 kcal/mol. Overall, there is not a significant difference between the two models. However, the new model does improve the prediction for tandem mismatches that were predicted rather poorly by the previous model. For example, for eight tandem mismatches that were predicted poorly (greater than 1 kcal/mol difference between the predicted value and the experimental value) by the previous model, the average difference between experimental and predicted values was 1.7 kcal/mol. For the same eight loops using the updated model, this average difference decreases to 0.9 kcal/mol.

Hydrogen bonding patterns in tandem mismatches

In order to investigate both the influence of structure on the stability of tandem mismatches and if structural features are important for determining a tandem mismatch's frequency of occurrence, the hydrogen bonding patterns between tandem mismatch nucleotides in previously solved three-dimensional structures were investigated. Structures of four tandem mismatches in the top 30 were downloaded from the PDB. One of the tandem mismatches was the most frequently occurring tandem mismatch in the secondary structure database, 5′GGAG3′/3′CAGU5′. This tandem mismatch occurred 57 times in the secondary structure database and had a slightly favorable free energy contribution of −0.10 kcal/mol. The PDB structure (1J5E) of the Thermus thermophilus 30S ribosomal subunit from the Ramakrishnan laboratory (46) revealed that both G•A pairs are forming sugar-edge/Hoogsteen pairs. One of the G•A pairs contained one base–base hydrogen bond from the G amino proton to the A N7. The other G•A pair contained two base–base hydrogen bonds, one between an A amino proton and G N3 and one between a G amino proton and A N7 (see Supplementary Figure S1). With the extensive hydrogen bonding within the tandem mismatch, it is unclear why this tandem mismatch is only slightly stabilizing.

The next tandem mismatch investigated structurally was the third most frequently occurring tandem mismatch in the secondary structure database, 5′CGAG3′/3′GAGC5′. This tandem mismatch occurred 41 times in the secondary structure database and had a favorable free energy contribution of −0.69 or −1.74 kcal/mol, depending on the identity of the stem duplex. Interestingly, this tandem mismatch also contains two adjacent G•A pairs, similar to the tandem mismatch described above. Hydrogen bonding of this tandem mismatch was investigated in three different PDB structures, the PDB structure (1NKW) of the Deinococcus radiodurans large ribosomal subunit from the Yonath laboratory (47), the PDB structure (1S72) of the H. marismortui large ribosomal subunit from the Moore and Steitz laboratories (48) and the PDB structure (1YFV) of a synthetic RNA from the Turner laboratory (49). The tandem mismatch in each of these structures had slightly different features. In PDB ID 1NKW, both G•A pairs are in a sugar-edge/Hoogsteen conformation; however, there are no base–base hydrogen bonds (see Supplementary Figure S2). In both PDB ID 1S72 and 1YFV, the GA pairs are also in a sugar-edge/Hoogsteen conformation; however, both pairs in both structures have two base–base hydrogen bonds, a G amino proton to A N7 and an A amino proton to G N3 (see Supplementary Figure S3). It is difficult to interpret the influence of the structure on either the thermodynamics or frequency of occurrence for this tandem mismatch due to the difference in stability when this tandem mismatch is placed within different stems and due to the different hydrogen bonding patterns found for this tandem mismatch in various PDB structures.

A third tandem mismatch with two G•A pairs was investigated, 5′UGAG3′/3′AAGC5′. This tandem mismatch occurred 16 times in the secondary structure database and had an unfavorable free energy contribution of 2.13 kcal/mol. Structural features of this tandem mismatch were investigated in the PDB structure (1NYI) of the hammerhead ribozyme from the Scott laboratory (50). Once again, both G•A pairs are in a sugar-edge/Hoogsteen conformation with both pairs having an A amino proton to G N3 hydrogen bond and a G amino proton to A N7 hydrogen bond (see Supplementary Figure S3). After investigating all five structures containing tandem G•A pairs, it is not straightforward to identify the structural features that lead to the varying thermodynamic stabilities.

Perhaps major contributors to the thermodynamic stabilities are the identities of the closing base pairs. The last tandem mismatch discussed above, 5′UGAG3′/3′AAGC5′, has one U–A and one G–C adjacent base pairs and was the most unfavorable thermodynamically of the three. The second tandem mismatch discussed above, 5′CGAG3′/3′GAGC5′, has one C–G and one G–C adjacent base pairs and was the most favorable thermodynamically. The first tandem mismatch discussed above, 5′GGAG3′/3′CAGU5′ has one G–C and one G–U adjacent base pair and was more stable than the third and less stable than the second tandem mismatch.

Another tandem mismatch that was investigated was 5′CUUG3′/3′GUUC5′. This tandem mismatch occurred 19 times in the secondary structure database and had a favorable free energy contribution of 0.44 kcal/mol. Structural features of this tandem mismatch were investigated in the PDB structure (280D) of an RNA dodecamer from the Kundrot laboratory (51). Both U•U pairs are in a Watson–Crick/Watson–Crick conformation with both pairs having two U imino proton to U carbonyl oxygen hydrogen bonds (see Supplementary Figure S4).

The base pairing between the tandem mismatch nucleotides in the PDB structures discussed here shed little light on the relationship between structure and thermodynamics and the reason why nature prefers particular tandem mismatch sequences over others. In addition to base–base hydrogen bonds, perhaps base–backbone hydrogen bonds, the hydrogen bonding pattern of the adjacent base pairs, the amount of stacking between the loop nucleotides and between the loop nucleotides and the nearest neighbors, sugar puckers, dynamics, helix distortion, and available functional groups situated in the grooves would provide more insight to help answer these questions. Perhaps a more extensive PDB search and comparison that investigates a much larger number of tandem mismatches and more structural features would help in understanding structure-stability relationships and nature's preference for certain loop tandem mismatch sequences.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Saint Louis University College of Arts and Sciences; Department of Chemistry, Saint Louis University; a Saint Louis University Summer Research Award (to B.M.Z.); the Saint Louis University Beaumont Faculty Development Fund (to B.M.Z.); and two Sigma Xi Grants-in-Aid of Research (to M.E.C.). Award Number R15GM085699 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and doe not necessarily represent the official view of the National Institute of General Medical Sciences of the National Institutes of Health. Funding for open access charge: National Institute of General Medical Sciences.

Conflict of interest statement. None declared.

Supplementary Material

[Supplementary Data]
gkp465_index.html (831B, html)

ACKNOWLEDGEMENTS

The authors would like to thank Mr Matthew Alexander and Miss Mary (Molly) Burke for checking all of our thermodynamic calculations.

REFERENCES

  • 1.Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
  • 2.Gutell RR, Woese CR. Higher order structural elements in ribosomal RNAs: Pseudo-knots and the use of noncanonical pairs. Proc. Natl Acad. Sci. USA. 1990;87:663–667. doi: 10.1073/pnas.87.2.663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gutell RR. Collection of small-subunit (16s- and 16s-like) ribosomal-RNA structures—1994. Nucleic Acids Res. 1994;22:3502–3507. doi: 10.1093/nar/22.17.3502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science. 2000;289:905–920. doi: 10.1126/science.289.5481.905. [DOI] [PubMed] [Google Scholar]
  • 5.Shi P-Y, Brinton MA, Veal JM, Zhong YY, Wilson WD. Evidence for the existence of a pseudoknot structure at the 3′ terminus of the flavivirus genomic RNA. Biochemistry. 1996;35:4222–4230. doi: 10.1021/bi952398v. [DOI] [PubMed] [Google Scholar]
  • 6.Zhang JC, Zhang GH, Guo R, Shapiro BA, Simon AE. A pseudoknot in a preactive form of a viral RNA is part of a structural switch activating minus-strand synthesis. J. Virol. 2006;80:9181–9191. doi: 10.1128/JVI.00295-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sobczak K, de Mezer M, Michlewski G, Krol J, Krzyzosiak WJ. RNA structure of trinucleotide repeats associated with human neurological diseases. Nucleic Acids Res. 2003;31:5469–5482. doi: 10.1093/nar/gkg766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.DeNap JCB, Thomas JR, Musk DJ, Hergenrother PJ. Combating drug-resistant bacteria: small molecule mimics of plasmid incompatibility as antiplasmid compounds, J. Am. Chem. Soc. 2004;126:15402–15404. doi: 10.1021/ja044207u. [DOI] [PubMed] [Google Scholar]
  • 9.Disney MD, Haidaris CG, Turner DH. Recognition elements for 5′ exon substrate binding to the Candida albicans group I intron. Biochemistry. 2001;40:6507–6519. doi: 10.1021/bi002008r. [DOI] [PubMed] [Google Scholar]
  • 10.SantaLucia J, Jr, Kierzek R, Turner DH. Effects of GA mismatches on the structure and thermodynamics of RNA internal loops. Biochemistry. 1990;29:8813–8819. doi: 10.1021/bi00489a044. [DOI] [PubMed] [Google Scholar]
  • 11.SantaLucia J, Jr., Kierzek R, Turner DH. Stabilities of consecutive A.C, C.C, G.G, U.C, and U.U mismatches in RNA internal loops: evidence for stable hydrogen-bonded U.U and C.C.+ pairs. Biochemistry. 1991;30:8242–8251. doi: 10.1021/bi00247a021. [DOI] [PubMed] [Google Scholar]
  • 12.Peritz AE, Kierzek R, Sugimoto N, Turner DH. Thermodynamic study of internal loops in oligoribonucleotides: symmetric loops are more stable than asymmetric loops. Biochemistry. 1991;30:6428–6436. doi: 10.1021/bi00240a013. [DOI] [PubMed] [Google Scholar]
  • 13.Walter AE, Wu M, Turner DH. The stability and structure of tandem GA mismatches in RNA depend on closing base pairs. Biochemistry. 1994;33:11349–11354. doi: 10.1021/bi00203a033. [DOI] [PubMed] [Google Scholar]
  • 14.Wu M, McDowell JA, Turner DH. A periodic table of symmetric tandem mismatches in RNA. Biochemistry. 1995;34:3204–3211. doi: 10.1021/bi00010a009. [DOI] [PubMed] [Google Scholar]
  • 15.Xia TB, McDowell JA, Turner DH. Thermodynamics of nonsymmetric tandem mismatches adjacent to G–C base pairs in RNA. Biochemistry. 1997;36:12486–12497. doi: 10.1021/bi971069v. [DOI] [PubMed] [Google Scholar]
  • 16.Burkard ME, Xia T, Turner DH. Thermodynamics of RNA internal loops with a guanosine–guanosine pair adjacent to another noncanonical pair. Biochemistry. 2001;40:2478–2483. doi: 10.1021/bi0012181. [DOI] [PubMed] [Google Scholar]
  • 17.Schroeder SJ, Turner DH. Thermodynamic stabilities of internal loops with GU closing pairs in RNA. Biochemistry. 2001;40:11509–11517. doi: 10.1021/bi010489o. [DOI] [PubMed] [Google Scholar]
  • 18.Bourdelat-Parks BN, Wartell RM. Thermodynamics of RNA duplexes with tandem mismatches containing a uracil-uracil pair flanked by C–G/G–C or G–C/A–U closing base pairs. Biochemistry. 2005;44:16710–16717. doi: 10.1021/bi051659q. [DOI] [PubMed] [Google Scholar]
  • 19.Christiansen ME, Znosko BM. Thermodynamic characterization of the complete set of sequence symmetric tandem mismatches in RNA and an improved model to predict the free energy contribution of sequence asymmetric tandem mismatches. Biochemistry. 2008;47:4329–4336. doi: 10.1021/bi7020876. [DOI] [PubMed] [Google Scholar]
  • 20.Mathews DH, Disney MD, Childs JC, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl Acad. Sci. USA. 2004;101:7287–7292. doi: 10.1073/pnas.0401799101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lu ZJ, Turner DH, Mathews DH. A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res. 2006;34:4912–4924. doi: 10.1093/nar/gkl472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. doi: 10.1093/nar/gkg599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zuker M. On finding all suboptimal foldings of an RNA molecule. Science. 1989;244:48–52. doi: 10.1126/science.2468181. [DOI] [PubMed] [Google Scholar]
  • 25.Gutell RR, Gray MW, Schnare MN. A compilation of large subunit (23s-like and 23s-like) ribosomal-RNA structures—1993. Nucleic Acids Res. 1993;21:3055–3074. doi: 10.1093/nar/21.13.3055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schnare MN, Damberger SH, Gray MW, Gutell RR. Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23S-like) ribosomal RNA. J. Mol. Biol. 1996;256:701–719. doi: 10.1006/jmbi.1996.0119. [DOI] [PubMed] [Google Scholar]
  • 27.Szymanski M, Specht T, Barciszewska MZ, Barciszewski J, Erdmann VA. 5S rRNA data bank. Nucleic Acids Res. 1998;26:156–159. doi: 10.1093/nar/26.1.156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998;26:148–153. doi: 10.1093/nar/26.1.148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Larsen N, Samuelsson T, Zwieb C. The signal recognition particle database (SRPDB) Nucleic Acids Res. 1998;26:177–178. doi: 10.1093/nar/26.1.177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Brown JW. The ribonuclease P database. Nucleic Acids Res. 1998;26:351–352. doi: 10.1093/nar/26.1.351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Waring RB, Davies RW. Assessment of a model for intron RNA secondary structure relevant to RNA self-splicing—a review. Gene. 1984;28:277–291. doi: 10.1016/0378-1119(84)90145-8. [DOI] [PubMed] [Google Scholar]
  • 32.Damberger SH, Gutell RR. A comparative database of group I intron structures. Nucleic Acids Res. 1994;22:3508–3510. doi: 10.1093/nar/22.17.3508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Michel F, Umesono K, Ozeki H. Comparative and functional-anatomy of group-Ii catalytic introns—a review. Gene. 1989;82:5–30. doi: 10.1016/0378-1119(89)90026-7. [DOI] [PubMed] [Google Scholar]
  • 34.Griffiths-Jones S. The microRNA registry. Nucleic Acids Res. 2004;32:D109–D111. doi: 10.1093/nar/gkh023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: MicroRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–D144. doi: 10.1093/nar/gkj112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36:D154–D158. doi: 10.1093/nar/gkm952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wright DJ, Rice JL, Yanker DM, Znosko BM. Nearest neighbor parameters for inosine–uridine pairs in RNA duplexes. Biochemistry. 2007;46:4625–4634. doi: 10.1021/bi0616910. [DOI] [PubMed] [Google Scholar]
  • 38.Davis AR, Znosko BM. Thermodynamic characterization of single mismatches found in naturally occurring RNA. Biochemistry. 2007;46:13425–13436. doi: 10.1021/bi701311c. [DOI] [PubMed] [Google Scholar]
  • 39.Xia T, SantaLucia J, Jr, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry. 1998;37:14719–14735. doi: 10.1021/bi9809425. [DOI] [PubMed] [Google Scholar]
  • 40.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Badhwar J, Karri S, Cass CK, Wunderlich EL, Znosko BM. Thermodynamic characterization of RNA duplexes containing naturally occurring 1×2 nucleotide internal loops. Biochemistry. 2007;46:14715–14724. doi: 10.1021/bi701024w. [DOI] [PubMed] [Google Scholar]
  • 42.Kierzek R, Burkard ME, Turner DH. Thermodynamics of single mismatches in RNA duplexes. Biochemistry. 1999;38:14214–14223. doi: 10.1021/bi991186l. [DOI] [PubMed] [Google Scholar]
  • 43.Siegfried NA, Metzger SL, Bevilacqua PC. Folding cooperativity in RNA and DNA is dependent on position in the helix. Biochemistry. 2007;46:172–181. doi: 10.1021/bi061375l. [DOI] [PubMed] [Google Scholar]
  • 44.Longfellow CE, Kierzek R, Turner DH. Thermodynamic and spectroscopic study of bulge loops in oligoribonucleotides. Biochemistry. 1990;29:278–285. doi: 10.1021/bi00453a038. [DOI] [PubMed] [Google Scholar]
  • 45.Mathews DH. Revolutions in RNA secondary structure prediction. J. Mol. Biol. 2006;359:526–532. doi: 10.1016/j.jmb.2006.01.067. [DOI] [PubMed] [Google Scholar]
  • 46.Wimberly BT, Brodersen DE, Clemons WM, Morgan-Warren RJ, Carter AP, Vonrhein C, Hartsch T, Ramakrishnan V. Structure of the 30S ribosomal subunit. Nature. 2000;407:327–339. doi: 10.1038/35030006. [DOI] [PubMed] [Google Scholar]
  • 47.Harms JM, Schluenzen F, Zarivach R, Bashan A, Gat S, Agmon I, Bartels H, Franceschi F, Yonath A. High resolution structure of the large ribosomal subunit from a mesophilic eubacterium. Cell. 2001;107:679–688. doi: 10.1016/s0092-8674(01)00546-3. [DOI] [PubMed] [Google Scholar]
  • 48.Klein DJ, Moore PB, Steitz TA. The roles of ribosomal proteins in the structure, assembly and evolution of the large ribosomal subunit. J. Mol. Biol. 2004;340:141–177. doi: 10.1016/j.jmb.2004.03.076. [DOI] [PubMed] [Google Scholar]
  • 49.SantaLucia J, Jr., Turner DH. Structure of (rGGCGAGCC)2 in solution from NMR and restrained molecular dynamics. Biochemistry. 1993;32:12612–12623. doi: 10.1021/bi00210a009. [DOI] [PubMed] [Google Scholar]
  • 50.Dunham CM, Murray JB, Scott WG. A helical twist-induced conformational switch activates cleavage in the hammerhead ribozyme. J. Mol. Biol. 2003;332:327–336. doi: 10.1016/s0022-2836(03)00843-x. [DOI] [PubMed] [Google Scholar]
  • 51.Lietzke SE, Barnes CL, Berglund JA, Kundrot CE. The structure of an RNA dodecamer shows how tandem U–U base pairs increase the range of stable RNA structures and the diversity of recognition sites. Structure. 1996;4:917–930. doi: 10.1016/s0969-2126(96)00099-8. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Data]
gkp465_index.html (831B, html)

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES