Abstract
Myotonic dystrophy (DM) type 1 is associated with an expansion of (>50) CTG repeats within the 3′ untranslated region (UTR) of the dystrophin myotonin protein kinase gene (dmpk). In the corresponding mRNA transcript, the CUG repeats form an extended stem-loop structure. The double-stranded RNA of the stem sequesters RNA binding proteins away from their normal cellular targets resulting in aberrant transcription, alternative splicing patterns, or both, thereby leading to DM. To better understand the structural basis of DM type 1, we determined to 1.58-Å resolution the x-ray crystal structure of an 18-bp RNA containing six CUG repeats. The CUG repeats form antiparallel double-stranded helices that stack end-on-end in the crystal to form infinite, pseudocontinuous helices similar to the long CUG stem loops formed by the expanded CUG repeats in DM type 1. The CUG helix is very similar in structure to A-form RNA with the exception of the unique U-U mismatches. This structure provides a high-resolution view of a toxic, trinucleotide repeat RNA.
Keywords: toxic RNA, U-U mismatches, x-ray crystallography
Myotonic dystrophy (DM) is the most common form of adult muscular dystrophy, affecting ≈1 in 8,000 individuals. There are two types of DM: type 1 (DM1) and type 2 (DM2). Both types are caused by a RNA gain-of-function mechanism (1, 2). DM1 is caused by a CTG repeat expansion within the 3′ untranslated region (UTR) of the dystrophin myotonin protein kinase (dmpk) gene, whereas DM2 is caused by a CCTG repeat expansion within intron 1 of the Zn finger 9 (znf9) gene (3, 4). In both DM1 and DM2, normal individuals have <50 repeats, whereas individuals with DM1 and DM2 have hundreds or even thousands of repeats (5). Severity and age of onset of DM are correlated with repeat length. The clinical features of DM1 and DM2 are very similar and include muscle weakness, myotonia, cardiac arrhythmias, insulin resistance, cognitive impairment, and serological changes (6, 7). However, only DM1 has a severe congenital form.
Dmpk and znf9 genes are on different chromosomes and encode very different proteins, yet both CUG and CCUG expansions cause DM, suggesting that the loss of function of DMPK and ZnF9 are not the primary causative agent of DM and that the repeat tracts themselves might be toxic. It is clear that the repeats cause DM because this difference is how affected individuals differ from normal individuals. In support of this model, the expression of an RNA containing ≈250 CUG repeats in mice causes characteristic features of DM1 (8). The CUG repeats form an extended stem-loop structure with U-U mismatches and G-C Watson-Crick base pairs (9, 10).
The clinical features of DM appear to be caused by a “toxic RNA gain-of-function” mechanism in which the CUG repeat tracts bind and sequester specific RNA and DNA binding proteins. The CUG binding protein 1 appears to be up-regulated in the presence of extended CUG repeats and this increase might affect alternative splicing of genes relevant to the clinical features of DM1 (11–14). The muscleblind proteins (MBNL) specifically bind long CUG repeat tracts and colocalize in vivo with CUG and CCUG repeats in DM1 and DM2 cells (15, 16). A mouse knockout of MBNL displays several of the characteristic phenotypes of DM1 (17). At the DNA level, CTG repeat expansions affect the transcription of the neighboring genes and this change may also play a role in DM1 pathogenesis (18). However, the primary pathogenic element in DM1 appears to be the long double-stranded r(CUG) repeats that sequester MBNL leading to inappropriate gene expression.
Presently, no high-resolution structural information is available to provide insight into an RNA containing CUG repeats. U-U pairs can adopt a range of conformations that vary in the extent of their propeller twist, imino proton hydrogen bonding, and backbone distortion. The thermodynamic contribution of U-U pairs in an RNA duplex depends heavily on the adjacent base pairs (19). Tandem U-U pairs have been reported to stabilize conformations inaccessible to A-form RNA (20). In addition, the U-U pair presents a strong electronegative patch in the exposed minor groove (two O2 atoms) and an unusual number of hydrogen bond acceptors that may provide unique RNA–RNA or RNA–protein interfaces. To better understand the structures of U-U mismatches and the CUG trinucleotide repeat, and their roles in DM1, we determined the crystal structure of a CUG repeat RNA.
Materials and Methods
RNA Purification and Crystallization. The r(CUG)6 oligonucleotides were synthesized by 5′-silyl-2′-orthoester RNA chemistry (Dharmacon RNA Technologies). The oligonucleotides were purified on a 10% polyacrylamide (19:1) gel containing 6 M urea. RNA was located by UV shadow, excised, eluted in 0.3 M ammoniaacetate, and precipitated in 3 volumes of ethanol overnight at –80°C. Samples were resuspended in dd(H2O) and desalted by using a Micro Bio-Spin 6 chromatography column (Bio-Rad). The RNA was concentrated to 0.35 mM and moved into a solution with 300 mM NaCl and 50 mM Mops (pH 7.0). RNA was annealed by heating at 95°C for 5 min and slow cooling to room temperature for 60 min. Crystals were grown at room temperature by vapor diffusion with the hanging drop method from a mixture of 2 μl of RNA solution and 2 μl of well solution containing 50 mM Mops (pH 7.0), 300 mM NaCl, 20 mM MgCl2, and 40% 2-methyl-2,4-pentanediol. Crystals appeared within 1–2 weeks. Isomorphous crystals of oligonucleotides with brominated (position 5) or iodinated (position 2) uridine incorporated were grown under similar conditions.
Data Collection. Crystals 0.2 × 0.2 × 0.05 mm in dimension were mounted in rayon loops directly from the crystallization drops for data collection. Three-wavelength Br-MAD data were collected from a crystal of a brominated sequence (bromine at the C5 position on the U5) at Advanced Light Source BL 8.2.2 to a resolution of 2.3 Å. The same crystal was used on a second trip to Advanced Light Source BL 8.2.2 to collect 1.66 Å monochromatic data. Monochromatic data were also collected from a crystal of an iodinated oligonucleotide to 2.4 Å at Stanford Synchrotron Radiation Laboratory BL 9-1 and from a crystal of the unmodified sequence to a resolution of 1.58 Å at Advanced Light Source BL 8.2.1. The x-ray data were integrated, merged, and scaled with the hkl-2000 program suite (21) and converted to structure factors with the ccp4 program (22). Data were truncated employing the method of French and Wilson (23). Data reduction trials were done with alternative unit cells and symmetry. The selected R3 symmetry gave the fullest description of the crystallographic symmetry of the crystal and the Rsymm for the data because R3 was not significantly greater than that for the data reduced as P1, showing that the R3 symmetry was correct.
Structure Determination. The programs shelxc/d/e were used with the Br-MAD data to solve the heavy atom substructure to a resolution of 2.6 Å and to extend the phases to 1.58 Å by using the native data (24). Four, rather than two, heavy atom positions were found by using the Bijvoet and dispersive differences. Further validation of the heavy atom positions came from difference Fourier maps computed by using the x-ray data from the native, brominated (Fig. 1B), and iodinated structures (Table 3, which is published as supporting information on the PNAS web site). The positions of the heavy atoms correspond to two superimposed duplexes related by a full helical turn of 11 base pairs (Fig. 1B). In other words, each duplex assumes one of two different positions in the unit cell. The heavy atom positions were independently confirmed with Bijvoet and SIR differences by using x-ray data from the iodinated sequence.
Fig. 1.
Overall structure of r(CUG)6. (A) Secondary structure of the r(CUG)6 duplex. (B) Overlay of the two half-occupied duplexes in the asymmetric unit. The red 18 base-pair duplex is translated one helical turn relative to the blue duplex. The orange electron density is contoured at the 4.0 sigma level and is from a difference map made by subtracting the native structure factors from the brominated structure factors and by using the native model as the source of phase angles. There are six peaks: two near the C5 position in the U5 of each strand in the blue duplex, two likewise positioned near the red duplex, and two at the ends that are associated with symmetry-related duplexes that are not shown. (C) Two parallel stacks of three duplexes stacked end-on-end. The three duplexes form five helical turns that span the length of the unit cell. The middle duplex in each stack has its van der Waals surface displayed in red. The two stacks are merged in the disordered crystal structure. (D) Stereoview of model of r(CUG)6 in electron density map made with experimental phases determined by Br-MAD. The map is contoured at the 1.0 σ level.
The observed lattice is the result of the merging of two lattices: One lattice is composed of a set of parallel, pseudocontinuous helices and a second identical lattice is translated by one helical turn along the c-axis relative to the first lattice (Fig. 1C). The occupancies of the two pairs of bromine atoms are similar, indicating that side-by-side crystal packing interactions between the helices do not favor one lattice over the other.
Each double helix packs head-to-tail in a pseudocontinuous helix (Fig. 1C). As a result, chain A stacks on top of a symmetry-related chain A and, likewise, chain B stacks on top of a symmetry-related chain B. The stacks of helices are aligned parallel to the c-axis of the R3 unit cell. The c-axis unit cell edge (Table 3) is long enough to accommodate three duplexes stacked end-on-end (helical rise of 2.63 Å). A stack of three duplexes (54 base pairs) parallel to the c-axis of the unit cell gives five helical turns per unit cell and 10.8 base pairs per helical turn. These unit cell-derived helical parameters closely agree with the helical parameters measured from the actual structure (Table 4, which is published as supporting information on the PNAS web site).
The electron density maps showed that the backbone was well defined with ribose rings, phosphates, and glycosidic bonds being essentially superimposable between the two models (Figs. 1D and 2). This result was surprising because the formation of two hydrogen bonds between the uridines of U-U base pairs requires the C1′ atoms of the glycosidic bonds to be ≈2 Å closer than in Watson-Crick base pairs. The superimposition of such a U-U base pair on a Watson-Crick base pair would lead to the spreading of the electron density about the glycosidic bonds, which we did not observe. The electron density around the bases clearly showed the superimposition of two bases (Fig. 2). The outlines of the electron density around superimposed U-U/G-C or U-U/C-G base pairs (Fig. 2 A and C) were distinct from that around the superimposed C-G/G-C or G-C/C-G base pairs (Fig. 2 B and D). The former superimposed base pairs have three exocyclic atoms projecting into the minor groove, whereas the later superimposed base pairs have two exocyclic atoms projecting into the minor groove. The slight differences in the outlines of the electron density around the C-G/G-C base pairs in Fig. 2 B and D reflect subtle structural differences due to the presence and absence in the minor groove of the backbone from a symmetry-related duplex (Fig. 2 B and D, respectively).
Fig. 2.
SigmaA weighted electron density maps at the 1.0 σ contour level at selected base pairs. The base pairs from symmetry-related molecules are represented by red stick models and are labeled with asterisks. Red spheres represent water molecules. (A) Base pair G6A-C13B on base pair U17C-U2D. (B) Base pair C7A-G12B on base pair G18C*-C1D*. (C) Base pair U8A-U11B on base pair C1C-G18D. (D) Base pair C10A-G9B on base pair G3C-C16B. The A, B, C, and D associated with each nucleotide refers to chains A and B of one duplex and chains C and D to the other duplex.
Similar static disorder has been found in several other RNA crystal structures. The 17-mer r(CACCGGAUGGUUCGGUG) had fourfold disorder along its helical axis (25). The 18-mer heteroduplex of 5′-CACCGUUGGUAGCGGUGC-3′ and 5′-CACCGCUACCAACGGUGC-3′ had thirty-sixfold disorder, which reduced the unique part of the electron density map to a single nucleotide with an averaged base composition (26). A crystal structure of a chemically modified A-form DNA duplex was also found to have twofold translational disorder similar to that observed in this structure (27).
We anticipated static disorder for the crystal structure of r(CUG)6 because of the symmetrical nature of the repeating sequence. Instead, we found the disorder to be dependent on the A-form helical geometry rather than on the base sequence. That is, the disordered helices were related by a translation vector with a length equivalent to one helical turn. This arrangement of overlapped helices can be described as screw disorder where one duplex from one lattice is related to a duplex from the other lattice by one complete helical turn. However, the rotational component of the helical screw motion is a trivial 360° rotation, so it is simpler to describe this disorder as translational disorder. The one backbone–minor groove interaction that occurs between neighboring helices creates the observed packing arrangement, but this crystal packing interaction is not specific enough to prevent the translational disorder.
The electron density map made with the Br-MAD phases was used to build the initial model of two overlapping duplexes by using xtalview (28). The occupancies of the two duplexes were set to 0.5. Duplexes were refined with refmac (29) by using noncrystallographic symmetry (NCS) restraints relating the opposing strands of each duplex. TLS parameterization in refmac was introduced to describe the collective translation, libration, and screw-rotation displacements of each base pair (30). The parameterization added 720 parameters (20 unique parameters per rigid group × 36 base pairs). After trial refinements with varying numbers of cycles, four cycles of TLS refinement lead to a minimum in Rfree, so the parameters from the four-cycle TLS refinement were fixed before continuing with positional and individual B factor refinement. The inclusion of TLS parameters led to a 2% decrease in both Rwork and Rfree (Table 1). The NCS restraints were gradually relaxed and eventually removed when the model was nearly complete.
Table 1. Refinement statistics.
| Measurement | Value |
|---|---|
| Resolution, Å | 1.58 |
| No. of reflections | 11,503 |
| Rwork/Rfree | 21.8/27.9 |
| No. of RNA atoms (half occupied) | 1,500 |
| No. of solvent (full and partially occupied) | 81 |
| Average RNA B factors, Å2 | 28.4 |
| Average solvent B factors, Å2 | 40.4 |
| rms deviation bond lengths, Å | 0.011 |
| rms deviation bond angles, ° | 2.060 |
The helical parameters were determined by using curves 5.2 (31). The electrostatic potentials were computed with qnifft the nonlinear Poisson-Boltzmann equation with parameters optimized for RNA (32). Figures were prepared with xtalview, raster3d (33), l (www.biosci.ki.se/groups/ljo/software/nuccyl.html), and pymol (http://pymol.sourceforge.net).
Results
An 18-nucleotide RNA consisting of six CUG repeats was crystallized in space group R3. The RNA pairs with a second antiparallel strand to form a double helix that contains 12 Watson-Crick G-C base pairs and six U-U mismatches (Fig. 1 A). This double helix, r(CUG)6, stacks end-to-end with helices above and below in the crystal to form infinite pseudocontinuous CUG helices (Fig. 1C) similar to expanded CUG repeats found in the dmpk gene in patients with DM1. The formation of Watson-Crick G-C base pairs was also observed in NMR studies of r(CUG)97 (10, 34). This agreement suggests that the crystal structure of r(CUG)6 is likely relevant to the stem of the stem loops that long CUG repeats form.
CUG Repeats Are A-Form RNA. The overall structure of the CUG repeats is very similar to A-form RNA, as shown in the superposition of the r(CUG)6 structure with A-form nucleic acid (Fig. 3A). The average helical rise for r(CUG)6 is 2.6 Å, which is similar to the helical rise of fiber A-DNA and other single crystal structures of A-RNA (Table 4). The average helical twist is 33.7° (10.6 base pairs per helical turn), which is more than the helical twist expected for A-form (32.7°, 11 base pairs per turn) and not much less than the 34.8° helical twist (10.4 base pairs per turn) found for B-DNA in solution (35). The base pairs are both steeply inclined (17°) with respect to the helical axis and displaced away from the helical axis and toward the minor groove as expected for the A-form. The displacement of the base pairs leads to a wide and shallow minor groove and a narrow and deep major groove. The widths of both grooves are more like those of fiber A-DNA than the single-crystal structure of an 18-mer RNA containing all Watson-Crick base pairs (Table 4). Although there are a few subtle differences between r(CUG)6 and standard A-form helical nucleic acid, it is striking that an RNA containing mismatches every third base pair is basically A-form.
Fig. 3.
Similarity of r(CUG)6 to A-form RNA. (A) Superposition of r(CUG)6 (blue) and A-form (red). (B) Electrostatic surface of r(CUG)6. (C) U-U pairs highlighted in aqua on gray background.
Previous studies suggested that CUG repeats form A-form RNA. Electron microscopy studies of an RNA molecule containing 130 CUG repeats revealed a rod-like structure consistent with A-form (36). Also, the double-stranded RNA-dependent kinase PKR binds CUG repeats with high affinity similar to its binding to A-form RNA of generic sequences (37).
The r(CUG)6 has an electrostatic potential surface that suggests how proteins might recognize this RNA with specificity (Fig. 3B). The minor groove displays a repeating pattern of positive and negative electrostatic potential distinct from the patterns of electrostatic potential found in the minor groove of RNA double helices without U-U mismatches. This distinctive electrostatic pattern may provide a unique binding site for proteins. In contrast, the major groove has a rather homogeneous negative electrostatic potential. In Fig. 3C, the U-U mismatches are highlighted in aqua, showing the accessibility of the U-U mismatches in the minor groove.
It has been demonstrated that RNA duplexes are stabilized by monovalent and, in particular, by divalent cations (38). Specific divalent cation binding usually occurs in regions of unique structural and electrostatic potential (32, 38, 39). We observe no such hallmarks of divalent cation binding sites in electrostatic potential maps of r(CUG)6. Because of the translational disorder, such cations would be half occupied at most and, hence, difficult to distinguish from fully occupied water molecules in the electron density map.
U-U Mismatches Appear to Lack H Bonds. In known RNA structures, mismatch base pairs are accommodated in double helices by distortions in the backbones to permit the formation of one or two hydrogen bonds between the mismatched bases. However, in the structure of r(CUG)6, the backbone is not distorted to accommodate the formation of hydrogen bonds between the U-U pairs. In other structures with U-U pairs (20, 40, 41), the C1′ positions are separated by ≈ 8.5 Å rather than 10.5 Å as in Watson-Crick base pairs (Fig. 4). We find no evidence for such a close approach of the C1′ atoms in any of the 12 U-U pairs (Fig. 4C). Such large displacements of the glycosidic bonds from their Watson-Crick positions would be obvious in the electron density maps despite the translational disorder (Fig. 2).
Fig. 4.
Variation in U-U base pairs found in several RNA crystal structures. The C1′
C1′ distances and hydrogen bonding distances are quoted in angstroms. The dashed line between the C1′ atoms is used to define the λ-angle (N1-C1′
C1′), which describes the orientation of one base relative to the other member of the base pair. Carbon atoms are shown in green, nitrogen atoms in blue, oxygen in red, and phosphorous in yellow. (A) Type I U-U base pair (e.g., U2610:U2546 from 50S ribosome (PDB ID code 1jj2/Nucleic Acid Data Bank accession no. RR0033) with the carbonyl O4 oxygen atom on the left strand projecting into the major groove. (B) Type II U-U base pair (e.g., U43D:U30C from 280D/URL050), with the carbonyl O4 oxygen atom of the right strand projecting into the major groove. (C) Type III base pair [e.g., U17D:U2C from r(CUG)6] with the maintenance of the Watson-Crick C1′-C1′ distance (typically 10.4–10.7 Å) and one λ-angle (typically 54–57°).
Within the constraint of the glycosidic bond separation found in this structure, one exocyclic oxygen (O2 or O4) may form a hydrogen bond with the N3 of the opposing uridine. This bonding appears to happen in 2 of 12 U-U pairs. For the remaining 10 U-U pairs, van der Waals clashes between the exocyclic oxygens of each opposing uridine restrict the N3 · · · · O2
C2 (or N3 · · · · O4
C4) angle to ≈180°, which weakens the hydrogen bond to such an extent that it is doubtful that it actually exists. In any event, the contribution of this hydrogen bond is likely to be canceled by van der Waals clashes between the exocyclic oxygens. The weakness of the interaction between the uridines is reflected in the observation that the geometry of the U-U pairs varies substantially with changes in the crystalline environment, whereas the geometry of the Watson-Crick C-G and G-C base pairs varies much less (Table 2).
Table 2. Conformation distances and angles of U-U base pairs.
| Type | Sample size | C1′—C1′, Å | Left λ angle, ° | Right λ angle, ° |
|---|---|---|---|---|
| Type I* | 11 | 8.6 ± 0.3 | 77.1 ± 2.3 | 46.3 ± 2.5 |
| Type II† | 5 | 8.4 ± 0.2 | 47.2 ± 2.7 | 78.7 ± 1.9 |
| Type III‡ | 12 | 10.1 ± 0.6 | 47.9 ± 12.2 | 51.0 ± 7.9 |
| C-G | 24 | 10.5 ± 0.3 | 54.3 ± 4.6 | 53.9 ± 4.7 |
The parameters for the C-G base pairs from r(CUG)6 are shown for comparison. The U-U base pairs in the sample population are as follows: *, 280D.pdb: U83:U850, U70:U111, U69:U112, U286:U366, U26:U517, U2546:U2610, U1984:U1977, U163:U172, U1304:U1350; 280D.pdb: U7A:U18B, U31C:U42D; †, 280D.pdb: U391:U398, U2495:U2527, U1001:U967; 280D.pdb: U6A:U19B, U30C:U43D; and ‡, (CUG)6: U2A:U17B, U5A:U14B, U8A:U11B, U11A:U8B, U14A:U5B, U17A:U2B, U2C:U17D, U5C:U14B, U8C:U11D, U11C:U8D, U14C:U5D, U17C:U2D. Measurements are mean ± SD.
Water molecules in the major and minor grooves of r(CUG)6 probably satisfy the apparently unmet hydrogen bonding potential of the uridines. Water molecules have been seen bridging two uridines that share only one hydrogen bond (42). In our structure, translational disorder affects the solvent structure, thereby obscuring potential bridging water molecules and coordinated metal cations.
The importance of stacking interactions in maintaining A-form RNA structure has previously been observed in an RNA duplex containing two phenyl groups (43). The phenyl groups in this structure stack within the helix to maintain geometry that is quite similar to an RNA containing all Watson-Crick base pairs in an A-form conformation.
Poor Stacking of C/U Base Steps Dominate Base Stacking. The CUG structure has three different pairs of adjacent base pairs (base steps): CU/UG, UG/CU, and GC/GC (the letters represent bases along a strand in the 5′ → 3′ direction and the slash separates strands). The GC/GC base-step has two purines and two pyrimidines, whereas the other two base steps are unusual in that they have one purine and three pyrimidines. The GC/GC base step has the greatest amount of base overlap. This overlap is between the six-membered rings of bases within each strand. This overlap maximizes the base-stacking interactions and minimizes the exposure of aromatic surface area to the solvent. This type of base stacking is general for purine-pyrimidine base steps in A-form (44).
The CU/UG base step has minimal intrastrand base overlap on the 5′ → 3′ strand and some overlap of the uridine six-membered ring and the guanosine five-membered ring on the 3′ → 5′ strand. This base step also has some cross-strand overlap between the six-membered rings of the uridine of the 5′ → 3′ strand and of the guanosine of the 3′ → 5′ strand. The UG/CU step shows minimal intrastrand overlap in both strands and more interstrand overlap of the six-membered rings of the guanosine of the 5′ → 3′ strand and the U of the 3′ → 5′ strand. This interstrand overlap leads to the large values in the base pair slide parameter of pyrimidine/purine base steps in A-form helices (44). Taken together, the GC/GC base step has the greatest base stacking, whereas the CU/UG base step has the least base stacking. These results suggest that the GC/GC base steps dominate the base-stacking interactions and that base-stacking interactions would be easiest to disrupt at CU/UG base steps.
Discussion
Trinucleotide repeats are linked to many hereditary neurological diseases, and several of these expanded repeats function at the RNA level. Most of these expanded triplet repeats (CUG, CCG, CAG, and CGG) form stem-loop structures (45). For the CUG repeats of DM1, the stem is thought to sequester the MBNL RNA-binding proteins (15). Stem loops formed by other repetitive sequences may also sequester RNA binding proteins or inhibit RNA processing and translation.
The r(CUG)6 structure suggests CCG repeats, which also have a pyrimidine in the second position, may form a very similar structure and adopt an A-form helical structure. On the other hand, triplet repeats containing purines in the second position (CAG and CGG) will not be able to form a standard A-form helix because the mismatched purines will push the phosphate backbone outward, leading to bulges in the backbone. Therefore, triplet repeats can be divided into two groups: (i) triplet repeats with pyrimidines in the second position that are likely to fold into a largely undistorted A-form structure that may lack direct hydrogen bonds between the mismatched pyrimidines and (ii) triplet repeats containing purines in the second position that form structures with bulged backbones with mismatched purines forming direct hydrogen bonds with one another.
The U-U mismatches do not generally appear to form direct hydrogen bonds because the most thermodynamically stable r(CUG)6 structure is driven by base-stacking interactions at the G-C base steps. The r(CUG)6 structure is another example of the importance of base stacking in determining the fold of a nucleic acid structure. The mismatched uridines are almost certainly satisfying all of their hydrogen bonding potential by interacting with water molecules. We found several water molecules in the minor groove bridging the exocyclic O2 oxygen atom and the ribose ring O3′ oxygen atom and several waters in the major groove hydrogen bonded to the exocyclic O4 oxygen atoms. Additional water molecules, such as ones bridging the O4 or O2 atoms, may be present but they were not visible in the electron density map presumably because of the translational disorder of this structure (discussed in Materials and Methods).
Our r(CUG)6 structure suggests expanded CUG repeats in vivo will adopt a structure very similar to A-form RNA and should be targets of the RNA interference (RNAi) machinery. Expanded CUG repeats do not appear to be subject to RNAi but instead form aggregates called nuclear foci (46). MBNL proteins are the primary determinant for the formation of these foci (47). We predict that if the MBNL proteins are not binding to the expanded CUG repeats, then the DMPK transcript would be subject to RNAi because it contains a region of A-form RNA.
Based on the r(CUG)6 structure, we propose that if MBNL proteins directly recognize the U-U pairs, this recognition would occur primarily through the minor groove because the major groove is relatively inaccessible and the backbones are not distorted (Fig. 2C). In alternate models, MBNL either distorts the RNA upon binding (induced fit model) or binds CUG repeats in a different conformation such as when some or all of the uridines are flipped out of the helix. In the induced fit model, MBNL senses the conformational flexibility of the U-U base pairs and reorients the uridines upon binding. More recently, yeast three-hybrid and in vitro binding studies have shown that MBNL binds CCUG repeats with higher affinity than CUG repeats (48). CCUG repeats also form a stem-loop structure (45). The difference between the CCUG stem and CUG stem is that the CCUG stem contains two C-U mismatches sandwiched between two Watson-Crick base pairs. It is tempting to speculate that the CCUG stem loop forms an A-form-like structure in which the C-U pairs are in a similar conformation to that observed for the U-U pairs. These C-U mismatches would form at most one hydrogen bond while maintaining the A-form helical width, and they would place two O2 oxygen atoms in the minor groove in a fashion similar to that seen in the structure of r(CUG)6. The two adjacent C-U/U-C mismatches would expand the electronegative patch in the minor groove and, perhaps, lead to tighter binding of CCUG repeats by MBNL in comparison to CUG repeats. Even if MBNL binds pyrimidine mismatches within A-form RNA irrespective of the size of the electronegative patch, it would not be surprising if MNBL binds CCUG with higher affinity because CCUG repeats contain 50% more pyrimidine mismatches for a given length of repeating RNA sequence. Studies are underway to determine whether CCUG repeats do indeed adopt the standard A-form conformation.
As a general rule, mismatches in RNA helices distort the RNA from the standard A-RNA conformation, and it is these distortions that create protein recognition sites, small molecule binding sites, or RNA docking sites for RNA tertiary structure formation. The structure of r(CUG)6 does not follow this general pattern of mismatches creating local or overall distortions of the RNA from standard A-form. Other RNA helices containing U-U pairs tend to have decreased width in the major and minor groove at such sites because of the formation of direct hydrogen bonds between the U-U pairs (Fig. 4 A and B) (20, 40). A previous comparison of U-U pairs suggested that they are flexible (20). An extreme example of such flexibility was found in base pair U6-U19 of a RNA dodecamer (40), which forms one U-U hydrogen bond and has one uridine with a propeller twist of 34°. The r(CUG)6 structure supports the notion that U-U pairs are flexible because the U-U pairs in this structure adopt conformations that leave the double helix in the A-form conformation while maximizing the GC/GC stacking interactions. Other U-U pairs in known structures contained at least one hydrogen bond, which results in at least some local distortion of the RNA backbone.
Many U-U pairs are found in crystal structures of the 30S and 50S subunits of the ribosome (41, 49). Many of the U-U pairs from the higher resolution 50S structure and most U-U pairs from crystal structures of RNA oligonucleotides form two hydrogen bonds in one of two conformations (Type I or Type II) depending on which strand the exocyclic O4 carbonyl oxygen atom is unpaired and free to project into the major groove (Fig. 4 A and B) (40). The remaining U-U bases with single hydrogen bonds are often not coplanar and are frequently found in structures other than antiparallel double-stranded helices. These U-U pairs may be important for the folding or function of the ribosome because some of these U-U pairs are conserved (50).
The U-U pairs with ≈10.2 Å (Watson-Crick ≈10.5 Å) C1′-C1′ distances and with Watson-Crick-like orientations are called by us type III U-U pairs. These type III U-U pairs contain glycosidic bonds (the λ angle describes the orientation of the glycosidic bond, Table 2) that are very similar to Watson-Crick base pairs. Molecular modeling trials suggest that the Watson-Crick C1′—C1′ distance can accommodate at most only one direct hydrogen bond between opposing uridines although only 2 of 12 U-U mismatches appears to form a direct hydrogen bond in r(CUG)6.
Only Type III U-U pairs were observed in the CUG repeat structure. Type I and Type II conformations would have required distortion of the backbone at every third base pair to accommodate the 2-Å reduction in the distance between the glycosidic bonds. Distortions of the RNA backbone due to U-U pairs every third base pair may not be thermodynamically favorable in the context of the flanking GC/GC dinucleotides whereas the loss of hydrogen bonds between the uridines may be offset by base-stacking interactions that are more favorable. Previously observed U-U pairs have not occurred in repeating sequences and have been flanked by other types of base pairs. Perhaps A-form helices accommodate the backbone distortions required for Type I and Type II base pairs only when there is a greater separation of U-U pairs in the sequence or when two or more U-U pairs or other mismatches are adjacent in the sequence.
The crystal structure of r(CUG)6 can be used to design drugs for DM that recognize and bind the CUG repeats of DM1, thereby triggering the release of sequestered MBNL proteins. Potential drugs would need to bind the CUG repeats with a higher affinity than MBNL proteins. Drugs might be designed to recognize and bind the unique repeating electrostatic patches in the minor groove. Alternatively, drugs might target the weakly paired uridines by providing complementary hydrogen bonding partners in the minor groove, allowing the drug to block MBNL binding if these U-U pairs are indeed part of its binding site.
Supplementary Material
Acknowledgments
We thank the user support staff at Stanford Synchrotron Radiation Laboratory and Advanced Light Source for their assistance during data collection; Dr. Ken Prehoda for collecting the Br-MAD data; and Drs. Brian Matthews, Alice Barkan, and Ken Prehoda for their comments on the manuscript. J.S.L. dedicates this work to Cheryl Logue. B.H.M.M. was supported by National Institutes of Health Grants GM21967 and GM20066 (to B. W. Matthews). This work is supported by Muscular Dystrophy Association Grant 3591 (to J.A.B.).
Author contributions: B.H.M.M. and J.S.L. performed research; J.A.B. designed research; B.H.M.M., J.S.L., and J.A.B. analyzed data; and B.H.M.M., J.S.L., and J.A.B. wrote the paper.
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: DM, myotonic dystrophy; DMn, DM type n; DMPK, dystrophin myotonin protein kinase; MBNL, muscleblind.
Data deposition: The atomic coordinates and structure factors of the native data have been deposited in the Protein Data Bank, www.pdb.org (PDB ID code 1ZEV), and the Nucleic Acid Data Bank (accession no. AR0062).
References
- 1.Ranum, L. P. & Day, J. W. (2004) Am. J. Hum. Genet. 74, 793–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Day, J. W. & Ranum, L. P. (2005) Neuromuscul. Disord. 15, 5–16. [DOI] [PubMed] [Google Scholar]
- 3.Brook, J. D., McCurrach, M. E., Harley, H. G., Buckler, A. J., Church, D., Aburatani, H., Hunter, K., Stanton, V. P., Thirion, J. P., Hudson, T., et al. (1992) Cell 68, 799–808. [DOI] [PubMed] [Google Scholar]
- 4.Liquori, C. L., Ricker, K., Moseley, M. L., Jacobsen, J. F., Kress, W., Naylor, S. L., Day, J. W. & Ranum, L. P. (2001) Science 293, 864–867. [DOI] [PubMed] [Google Scholar]
- 5.Pizzuti, A., Friedman, D. L. & Caskey, C. T. (1993) Arch. Neurol. 50, 1173–1179. [DOI] [PubMed] [Google Scholar]
- 6.Meola, G. & Moxley, R. T., 3rd (2004) J. Neurol. 251, 1173–1182. [DOI] [PubMed] [Google Scholar]
- 7.Machuca-Tzili, L., Brook, D. & Hilton-Jones, D. (2005) Muscle Nerve 32, 1–18. [DOI] [PubMed] [Google Scholar]
- 8.Mankodi, A., Logigian, E., Callahan, L., McClain, C., White, R., Henderson, D., Krym, M. & Thornton, C. A. (2000) Science 289, 1769–1773. [DOI] [PubMed] [Google Scholar]
- 9.Napierala, M. & Krzyzosiak, W. J. (1997) J. Biol. Chem. 272, 31079–31085. [DOI] [PubMed] [Google Scholar]
- 10.Leppert, J., Urbinati, C. R., Hafner, S., Ohlenschlager, O., Swanson, M. S., Gorlach, M. & Ramachandran, R. (2004) Nucleic Acids Res. 32, 1177–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Timchenko, L. T., Miller, J. W., Timchenko, N. A., DeVore, D. R., Datar, K. V., Lin, L., Roberts, R., Caskey, C. T. & Swanson, M. S. (1996) Nucleic Acids Res. 24, 4407–4414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Roberts, R., Timchenko, N. A., Miller, J. W., Reddy, S., Caskey, C. T., Swanson, M. S. & Timchenko, L. T. (1997) Proc. Natl. Acad. Sci. USA 94, 13221–13226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ho, T. H., Bundman, D., Armstrong, D. L. & Cooper, T. A. (2005) Hum. Mol. Genet. 14, 1539–1547. [DOI] [PubMed] [Google Scholar]
- 14.Philips, A. V., Timchenko, L. T. & Cooper, T. A. (1998) Science 280, 737–741. [DOI] [PubMed] [Google Scholar]
- 15.Miller, J. W., Urbinati, C. R., Teng-Umnuay, P., Stenberg, M. G., Byrne, B. J., Thornton, C. A. & Swanson, M. S. (2000) EMBO J. 19, 4439–4448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fardaei, M., Rogers, M. T., Thorpe, H. M., Larkin, K., Hamshere, M. G., Harper, P. S. & Brook, J. D. (2002) Hum. Mol. Genet. 11, 805–814. [DOI] [PubMed] [Google Scholar]
- 17.Kanadia, R. N., Johnstone, K. A., Mankodi, A., Lungu, C., Thornton, C. A., Esson, D., Timmers, A. M., Hauswirth, W. W. & Swanson, M. S. (2003) Science 302, 1978–1980. [DOI] [PubMed] [Google Scholar]
- 18.Frisch, R., Singleton, K. R., Moses, P. A., Gonzalez, I. L., Carango, P., Marks, H. G. & Funanage, V. L. (2001) Mol. Genet. Metab. 74, 281–291. [DOI] [PubMed] [Google Scholar]
- 19.Kierzek, R., Burkard, M. E. & Turner, D. H. (1999) Biochemistry 38, 14214–14223. [DOI] [PubMed] [Google Scholar]
- 20.Lietzke, S. E., Barnes, C. L., Berglund, J. A. & Kundrot, C. E. (1996) Structure (London) 4, 917–930. [DOI] [PubMed] [Google Scholar]
- 21.Otwinowski, Z. & Minor, W. (1997) in Methods in Enzymology, Macromolecular Crystallography, part A, eds. Carter, C. W. J. & Sweet, R. M. (Academic, San Diego) Vol. 276, pp. 307–326. [Google Scholar]
- 22.The ccp4 Suite: Programs for Protein Crystallography (1994) Acta Crystallogr. D 50, 760–763. [DOI] [PubMed] [Google Scholar]
- 23.French, S. & Wilson, D. (1978) Acta Crystallogr. A 34, 517–525. [Google Scholar]
- 24.Sheldrick, G. M. (2002) Z. Kristallogr 217, 644–650. [Google Scholar]
- 25.Shah, S. A. & Brunger, A. T. (1999) J. Mol. Biol. 285, 1577–1588. [DOI] [PubMed] [Google Scholar]
- 26.Klosterman, P. S., Shah, S. A. & Steitz, T. A. (1999) Biochemistry 38, 14784–14792. [DOI] [PubMed] [Google Scholar]
- 27.Tereshko, V., Gryaznov, S. & Egli, M. (1998) J. Am. Chem. Soc. 120, 269–283. [Google Scholar]
- 28.McRee, D. E. (1999) J. Struct. Biol. 125, 156–165. [DOI] [PubMed] [Google Scholar]
- 29.Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997) Acta Crystallogr. D 53, 240–255. [DOI] [PubMed] [Google Scholar]
- 30.Winn, M. D., Isupov, M. N. & Murshudov, G. N. (2001) Acta Crystallogr. D 57, 122–133. [DOI] [PubMed] [Google Scholar]
- 31.Lavery, R. & Sklenar, H. (1989) J. Biomol. Struct. Dyn. 6, 655–667. [DOI] [PubMed] [Google Scholar]
- 32.Chin, K., Sharp, K. A., Honig, B. & Pyle, A. M. (1999) Nat. Struct. Biol. 6, 1055–1061. [DOI] [PubMed] [Google Scholar]
- 33.Merritt, E. A. & Bacon, D. J. (1997) in Methods in Enzymology: Macromolecular Crystallography Part B, eds. Carter, C. W. & Sweet, R. M. (Academic, San Diego), Vol. 277, pp. 505–524. [Google Scholar]
- 34.Riedel, K., Leppert, J., Ohlenschlager, O., Gorlach, M. & Ramachandran, R. (2005) J. Biomol. NMR 31, 331–336. [DOI] [PubMed] [Google Scholar]
- 35.Wang, J. C. (1979) Proc. Natl. Acad. Sci. USA 76, 200–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Michalowski, S., Miller, J. W., Urbinati, C. R., Paliouras, M., Swanson, M. S. & Griffith, J. (1999) Nucleic Acids Res. 27, 3534–3542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tian, B., White, R. J., Xia, T., Welle, S., Turner, D. H., Mathews, M. B. & Thornton, C. A. (2000) RNA 6, 79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Serra, M. J., Baird, J. D., Dale, T., Fey, B. L., Retatagos, K. & Westhof, E. (2002) RNA 8, 307–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ennifar, E., Walter, P. & Dumas, P. (2003) Nucleic Acids Res. 31, 2671–2682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Baeyens, K. J., De Bondt, H. L. & Holbrook, S. R. (1995) Nat. Struct. Biol. 2, 56–62. [DOI] [PubMed] [Google Scholar]
- 41.Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. (2000) Science 289, 905–920. [DOI] [PubMed] [Google Scholar]
- 42.Rould, M. A., Perona, J. J. & Steitz, T. A. (1991) Nature 352, 213–218. [DOI] [PubMed] [Google Scholar]
- 43.Minasov, G., Matulic-Adamic, J., Wilds, C. J., Haeberli, P., Usman, N., Beigelman, L. & Egli, M. (2000) RNA 6, 1516–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Calladine, C. R. & Drew, H. R. (1984) J. Mol. Biol. 178, 773–782. [DOI] [PubMed] [Google Scholar]
- 45.Sobczak, K., de Mezer, M., Michlewski, G., Krol, J. & Krzyzosiak, W. J. (2003) Nucleic Acids Res. 31, 5469–5482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Davis, B. M., McCurrach, M. E., Taneja, K. L., Singer, R. H. & Housman, D. E. (1997) Proc. Natl. Acad. Sci. USA 94, 7388–7393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Dansithong, W., Paul, S., Comai, L. & Reddy, S. (2005) J. Biol. Chem. 280, 5773–5780. [DOI] [PubMed] [Google Scholar]
- 48.Kino, Y., Mori, D., Oma, Y., Takeshita, Y., Sasagawa, N. & Ishiura, S. (2004) Hum. Mol. Genet. 13, 495–507. [DOI] [PubMed] [Google Scholar]
- 49.Wimberly, B. T., Brodersen, D. E., Clemons, W. M., Jr., Morgan-Warren, R. J., Carter, A. P., Vonrhein, C., Hartsch, T. & Ramakrishnan, V. (2000) Nature 407, 327–339. [DOI] [PubMed] [Google Scholar]
- 50.Gutell, R. R., Larsen, N. & Woese, C. R. (1994) Microbiol. Rev. 58, 10–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




