Abstract
RNA can serve as an enzyme, small molecule sensor, and vaccine, and it may have been a conduit for the origin of life. Despite these profound functions, RNA is thought to have limited molecular diversity. A pressing question is whether RNA can adopt novel molecular states that enhance its function. Covalent modifications of RNA have been demonstrated to augment biological function, but much less is known about non-covalent alterations such as novel protonated or tautomeric forms. Conventionally, a G•U wobble has the U located in the major groove. We used a cheminformatic approach to identify four structural families of shifted G•U wobbles in which the G instead resides in the major groove, which requires alternative tautomeric states of either base, or an anionic state of the U. We provide experimental support for these shifted G•U wobbles via the unconventional in vivo reactivity of the U with dimethylsulfate (DMS). These shifted wobbles may play functional roles and could serve as drug targets, as they are common in Bacteria and chloroplasts, but underrepresented in Eukaryotes and Archaea. Our cheminformatics approach can be applied to identify alternative protonation states in other RNA motifs, as well as in DNA and proteins.
Graphical Abstract
Graphical Abstract.
Introduction
RNA is a relatively unassuming biopolymer comprised of only four similar sidechains and a simple sugar phosphate backbone. The nucleobase sidechains have only two sizes: the larger purines, A and G, which are comprised of fused five- and six-membered rings, and the smaller pyrimidines, C and U, which are comprised of a single six-membered ring. Additionally, the four nucleobases have highly similar functional groups of exocyclic keto and amino groups, and endocyclic imino nitrogens. Further limiting intermolecular interactions, the amino groups are aromatic, with their lone pairs delocalized into the ring system. In many ways, this limited chemical diversity is incongruous with RNA’s prodigious functional diversity that includes catalysis (ribozymes), small molecule recognition (riboswitches), and synthesis of proteins (tRNAs, rRNAs, and mRNAs) [1–3].
One way RNA is known to enhance its chemical range is through covalent modification. Indeed, tRNA and rRNA have long been known to have myriad covalent modifications such as base modifications (e.g. pseudouridine, 5-methyluridine, and 7-methylguanosine) as well as sugar modifications (e.g. 2′-O-methyl) [4–6]. These modifications control numerous biological functions including codon recognition, reading frame maintenance, and tRNA decay. More recent studies have revealed extensive chemical modification of mRNAs as well. For instance, 6-methyladenosine, 8-oxo-guanosine, and pseudouridine have been found in mRNAs where they affect transcription, translation, and mRNA decay [7, 8].
Less clear is whether non-covalent modifications have a role to play in enhancing the chemical diversity of RNA. The nucleobases have long been thought to resist ionization and tautomerization at neutral pH. For instance, the pKa of the bases are removed from neutrality, being near 4 for the Watson–Crick-Franklin (WCF) faces of A and C and above 9 for those of G and U [9]; moreover, these values shift further from neutrality upon formation of WCF base pairs owing to coupling with RNA folding [10, 11]. Watson and Crick reasoned that tautomers of the DNA bases must be rare in order to maintain the fidelity of base pairing [12], and calculations have indicated that tautomers of DNA and RNA are energetically unfavorable, with estimates of ∼5 to 10 kcal/mol penalties for the formation of the neutral tautomers of the bases [13–18]. Nonetheless, there is experimental evidence, both direct and indirect, that ionization and tautomerization of the bases do indeed occur. Regarding ionization, we showed that C75 of the HDV ribozyme has a pKa shifted up to 7.1 at biological concentrations Mg2+ [19, 20], that C8 in the base quartet of beet western yellows virus has a pKa of 8.1 [21], and that the A in an A+•C wobble pair can protonate at neutrality when the nearest-neighbors have strong WCF base pairing [22]. Additionally, Lilley and colleagues showed that A1 in the twister ribozyme serves as the general acid to protonate the leaving group and has a pKa shifted toward 7 [23], and Wohnert and colleagues demonstrated that A11 in a quartet in a GTP-binding aptamer has a pKa in the bound state of at least 8.9 [24]. In all the above cases, the base is A or C and it forms a cationic state. We also provided indirect evidence that the bases and hydrated metal ions can ionize to perform chemistry in ribozymes [25, 26]. Regarding tautomerization, there has been indirect evidence of tautomers playing important roles in the mechanisms of the hairpin and HDV ribozymes [17, 27, 28] and direct evidence that the ribosome can form tautomers in a G•U pair [29]. Ionization of the bases in so-called “reverse protonation” [26, 30, 31], as well as re-protonation of the bases in tautomerization, could lead to novel base-pairing, proton transfer in catalysis, and new sites for protein binding and specific drug targeting of RNA. Despite these anecdotal examples of ionization and tautomerization of the bases, there is little evidence as to whether it is widespread, present in anionic bases, or due to specific structures.
In this work, we present a cheminformatic workflow that identifies and analyzes RNA structural motifs containing non-covalently modified residues. We focus on rare G•U wobbles (Fig. 1) and do so in part because they are understudied but also because anionic bases, which can form in them, are rare in RNA. Recently, Westhof and co-workers provided the first report that shifted G•U pairs can form in bacterial rRNAs [32]. They reported three anionic G•U base pairs, with conserved sequence across bacteria, in structures from Escherichia coli, Thermus thermophilus, Bacillus subtilis, and Staphylococcus aureus. Here, we significantly expand on this by employing a workflow to identify four separate structural clusters, one of which has not been observed before, containing alternative forms of G•U wobbles across all three domains of life for the first time, in which the G, rather than the U, is shifted into the major groove. Secondary and 3D models are provided for each cluster, which reveal potential driving forces for formation of the shifted wobble including metal ion binding in the major groove and extensive intra and interstrand minor groove interactions within the local RNA fold as well as with amino acids. We also present dimethylsulfate (DMS) probing experiments in three organisms that support the WCF face of the U in a shifted G•U wobble as being deprotonated in vivo.
Figure 1.
Structures of standard and shifted G•U wobbles. (A) Standard and (B) shifted wobbles. For the shifted wobbles, we provide charged (top) and tautomeric (middle and bottom) wobbles between G and U. For the charged wobble, we provide the enolate resonance form to the right. Positions of major and minor grooves are provided in both panels. For the stick drawings, the position of the sugar is depicted by a methyl group.
Materials and methods
Workflow to identify non-redundant shifted G•U wobbles (From Fig. 2A)
Figure 2.
Workflow to identify and analyze shifted G•U wobbles. (A and B) Steps to (A) identify non-redundant shifted G•U wobbles and (B) analyze the shifted G•U wobbles.
Downloading and characterizing RNA structures
The workflow started with collecting structures containing RNA entities from the RCSB Protein Data Bank (PDB) [33]. Structures were collected in the Crystallographic Information File (CIF) format. An initial resolution cut-off of 3.2 Å was applied to ensure satisfactory quality of structures. This resolution cut-off excluded all structures solved by solution NMR, leaving just X-ray diffraction and cryo-EM structures. The selected structures were then characterized by Dissecting the Spatial Structure of RNA (DSSR) software [34]. This step output base pair, hydrogen bond, stacking, glycosidic angle, and sugar pucker information for each structure file.
Identifying wobbles by hydrogen bonds
From the DSSR base pair information, all G•U base pairs were identified and filtered as wobble or non-wobble base pairs. All base pairs called by DSSR as G•U wobbles were considered for the next steps of the analysis as standard wobbles. Any base pairs containing hydrogen bonds between G(N1) and U(O4), as well as G(N2) and U(N3) (see Fig. 1) were binned to shifted wobble base pairs.
Additional data extraction, calculation, and filtration
For all the standard and shifted wobble base pairs, a two-step data extraction was performed. In the first step, metadata of the solved structures containing the wobbles were extracted from the RCSB PDB, including the name of source and expressed organisms, type of RNA molecule, segment ID, and length of the reference chain. In the next step, three types of information were calculated for each wobble: distances, dihedral angles, and average temperature factors. To calculate distances, three atoms from the WCF edge of G (O6, N1, and N2) and three atoms from the WCF edge of U (O4, N3, and O2) were considered. For each atom, the distances to the three WCF atoms of the other residue were calculated, resulting in a total of nine distances. To evaluate the coplanarity of the G and U, three dihedral angles were calculated: one along the O6-O4 distance and two along the distances of the hydrogen bonds required for each wobble. These latter two consist of the O6-N3 and N1-O2 distances for the standard wobbles and the N1-O4 and N2-N3 distances for the shifted wobbles. Finally, the average temperature factors for the nucleobase atoms of G and U forming the wobbles were calculated. We excluded cases where base pair forming residues were from different chains or where a residue index included an insertion code (i.e. a letter).
Using the calculated structural parameters, both standard and shifted wobble base pairs were evaluated for the quality of hydrogen bonds in solved structures through a three-step quality check process. Structures were required to have (1) hydrogen bonded distances of ≤3.4 Å and dihedral angles of ≤50°, (2) average temperature factors of ≤100 Å2 for G and U, and (3) G(O6) to U(O4) distance of >3.4 Å.
Identifying non-redundant structures
Because of sampling bias inherent in the available pool of structures (e.g. E. coli is highly represented while other organisms have no published structures) we aimed to remove redundancy. Therefore, for each group of structures from the same organism and RNA type with the same (or near identical) residue index, one representative structure was selected. Because the published sequence indexes did not always match between structures of the same RNA in the same organism, indexes were first adjusted by alignment to one reference sequence for each organism and RNA type. Sequence alignments were performed with EMBOSS needle [35]. Following are the steps to identify the representative structure for G•U wobbles from the same organism, RNA type, and adjusted residue indexes. The collection of representative structures constitutes our “non-redundant dataset.”
All groups contain one or more G•U wobble instances. For each of the wobble instances within a group, a six-residue motif (the G•U wobble plus one residue above and below both the G and U) was clipped from the originally solved structure. Instances of motifs with missing residues or atoms were excluded from further analysis if there are other instances within the same group that have all six residues and all atoms.
If there was only one instance of a G•U wobble within a group, then that motif was selected as the representative structure of the group. If there were only two instances, then the one with the lowest average temperature factors for G and U was selected as the representative structure. If there were more than two instances, then steps C and D were performed to select the representative structure.
The various instances of a motif within a group were compared with one another using pairwise root mean square deviation (RMSD) values calculated using Biopython’s superimposer library [36]. Because this calculation requires structures to contain the same number of atoms, we followed the recent approach by Kollmann and colleagues to coarse grain the residues by selecting just five atoms from each residue: P, C4′, N9, C2, and C6 for purines and P, C4′, N1, C2, and C4 for pyrimidines [37].
From all instances within a group, one average structure was generated after aligning the structures of the instances. The instance with an RMSD that is closest to the average structure was selected as the representative structure.
Workflow to analyze the shifted G•U wobbles (From Fig. 2B)
Checking the quality of the structure model
Two additional quality assessments were performed: (1) Calculating the correlation coefficients between the experimental electron density maps and modeled structures and (2) Analyzing the hydrogen bond distances and angles of the two hydrogen bonds required for the shifted wobbles. These two quality checks were only performed for the shifted G•U wobbles of the non-redundant dataset.
For the first type of assessment, the electron density map and the modeled structure of the entire molecule for each non-redundant shifted G•U wobble was obtained from the RCSB PDB [33]. Using the Phenix software package, the map file was compared with the corresponding structure file to calculate map-model correlation coefficients (CC) [38]. To assess the fit of the shifted G•U wobble to the experimental electron density map, the map-model CCs of the G and U residues were compared with the mean and median map-model CCs for all residues within the corresponding chain.
For the second type of assessment, hydrogen atoms were first added to a clipped structure only containing the shifted wobble using PyMOL. Then, the hydrogen-acceptor distances and the donor-hydrogen-acceptor angles for the G(N1)-U(O4) and G(N2)-U(N3) hydrogen bonds were calculated. We consider good hydrogen bonding geometries to exhibit distances of ≤2.5 Å and linear angles that are ≥140°.
Identifying non-WCF interactions for selected atoms
Using PyMOL’s API along with python scripts, interactions within 3.4 Å of the O2′, O4′, N3, N2, O6, and the N7 of G and O2′, O4′, O2, and O4 of U were identified for all the non-redundant shifted G•U wobbles.
Identifying location in secondary structure motifs
From the base pair information extracted from the DSSR characterization output, the non-redundant G•U wobbles were binned based on their location in one of the five secondary structure motifs: (1) inside stem, with one WCF base pair above and one below, (2) terminal, with at least one WCF base pair above, (3) terminal, with at least one WCF base pair below, (4) unstructured, where no WCF base pair is right above or below and the wobble does not occur at the closing base pair of a hairpin loop with a maximum of 10 nucleotides, and (5) inside a loop.
Identifying and analyzing structural clusters
To identify groups of non-redundant shifted G•U wobbles with similar structural orientation, pairwise RMSDs were calculated for the corresponding six-residue motifs, which included one residue before and after the G and the U. The above coarse graining approach was followed to keep the number of atoms the same for the RMSD calculations. Hierarchical clustering was performed on the distance matrix using the centroid method, also known as the Unweighted Pair Group Method with Arithmetic Mean, implemented in the SciPy library [39]. To identify clusters of structures, an RMSD cut-off of 1.23 Å was identified on the basis of the similarity of the G and U residue indexes.
One consensus sequence and one secondary structure were assigned to each cluster. To this end, two different segments of the structure were prepared for each member of a cluster. The first segment includes three residues upstream from the G (or U), the G (or U) itself, the entire span of residues between the G and U (or U and G), the U (or G) itself, and three residues downstream from the U (or G). This structural segment was then characterized by DSSR [34] to extract the sequence, which was later used to perform multiple sequence alignment by pyMSAviz and assign a consensus sequence by following the Cavener rule [40]. The second segment is the same as the first, except that the number of residues upstream from the G (or U) and downstream from the U (or G) may increase depending on the cluster and is indicated in the Results. This segment was then characterized by DSSR to extract secondary structure and Leontis and Westhof notation [41]. The most frequent secondary structures were assigned as the consensus secondary structures.
Analysis of DMS-MaP probing data
Raw sequences were downloaded from the SRA using accessions provided in the Supplementary Table S1. Replicates with the same treatment condition were pooled, and reads were trimmed with cutadapt [42] according to the published methods for each library. Trimmed reads were then mapped and mutations were counted using the default settings of ShapeMapper2 [43]. For E. coli and S. cerevisiae, raw reactivities were calculated by subtracting the mutation rate of the untreated sample from the mutation rate of the DMS-treated sample mutation rate. For H. sapiens, an untreated sample was not available, so the mutation rates of the DMS treated sample were used as raw reactivities. Finally, for each organism, raw reactivities were pooled for all three rRNAs and normalized by dividing each mutation rate by the average mutation rate of the 90th-98th percentile. This process was done separately for each nucleotide. All data with reactivites of <0.5 were not included in the visualization plots to avoid noise from low reactivity, although these data were included in ROC and reactivity histogram analyses. All reactivities over 1 (i.e. > average of the 90th-98th percentile) were set to 1, all values under −0.1 were set to −0.1 rather than 0 in an effort to visualize and report the number of such events, and all nucleotides with raw background modification rates greater than 0.05% were discarded. To assess the comparability of the three sets of DMS reactivity data, area under curve (AUC) was calculated in receiver operating characteristic (ROC) curves for two cases: (i) all residues in rRNAs (5S, 16S, and 23S–E. coli values) and (ii) those residues in the highly conserved peptidyl transferase center (PTC). The PTC residues and neighboring residues in 23S rRNA of E. coli were selected following the approach of Mankin and Polacek [44]. Briefly, the residues identified in 23S rRNA of E. coli (PDB ID: 4YBB) [45] were extracted and aligned using PyMol to the 25S rRNA of S. cerevisiae (PDB ID: 4V88) [46] and the 28S rRNA of H. sapiens (PDB ID: 8QOI) [47] to identify corresponding PTC and neighboring residues. For each of the three organisms, the DMS reactivities were mapped onto the aligned structures to generate the ROC plots.
Results
Identifying shifted G•U wobble pairs
We were interested in identifying nucleobases with unusual protonation states (Fig. 1). To do so we developed a cheminformatics approach and scanned the RCSB Protein Data Bank (PDB) [33] for RNAs with unconventional base pairing (Fig. 2). As of April 2023, we downloaded 6817 RNA structure files from the PDB in CIF format [33]. These structures were solved using several experimental methods: 3729 by X-ray diffraction, 2331 by cryogenic electron microscopy (cryo-EM), 737 by solution NMR, and 20 by other experimental techniques. In our cheminformatics approach, we first applied a 3.2 Å resolution cutoff, which removed approximately 42% of the structure files. The remaining 3915 structure files, many of which were redundant (see below), included 3038 by X-ray diffraction, 875 by cryo-EM, and 2 by fiber diffraction. We were curious about the distribution of base pairing type (i.e. AU, GC, and G•U) in these structures. To obtain this information, we turned to the software tool Dissecting the Spatial Structure of RNA (DSSR) [34], which takes RNA CIF files as inputs and provides a variety of structural features as outputs such as base pairing partner and type, hydrogen bond distances, and stacking interactions. We found that within these 3915 structure files, there were 389 175 AU, 899 418 GC, and 159 047 G•U base pairs.
Next, we delved into the G•U base pairs. In the standard G•U wobble, the O4 of U is resident in the major groove (Fig. 1A), but we also found shifted G•U wobbles where the O6 of G is instead resident in the major groove (Fig. 1B). As depicted in Fig. 1B, the shifted G•U wobble can be stabilized by either an anionic form of the U or a tautomeric form of either base. Because of our interest in bases with alternative protonation states, we analyzed our collection of G•U base pairs for shifted G•U wobbles. All 159 047 G•U pairs were analyzed for hydrogen bonds between G(N1) and U(O4), as well as between G(N2) and U(N3) (see Fig. 1). Remarkably, this analysis resulted in the identification of 1114 shifted G•U wobbles.
Next, we applied a stringent set of hydrogen bond distance, dihedral angle, and temperature factor cut-offs to both the standard and shifted G•U wobble pairs (see Materials and methods), which resulted in 373 high confidence shifted G•U wobbles (Supplementary Table S3). Data on the standard G•U wobbles can be found in Supplementary Table S4. Within this dataset of high confidence shifted G•U wobbles, certain wobbles were found to be overrepresented (i.e. sharing the same organism, RNA type, and residue index). For instance, there were 137 examples of the shifted G•U wobble between U660 and G696 in the 16S rRNA of Thermus thermophilus (Table 1 and Supplementary Table S3). It is notable that there was just a single example of the standard G•U wobble between these same two residues (Table 1 and Supplementary Table S4), indicating a very strong propensity for formation of the shifted wobble. Indeed, the vast overrepresentation of shifted G•U wobbles relative to standard G•U wobbles held for all cases with multiple examples of the shifted wobbles (Table 1).
Table 1.
Frequency of shifted G•U wobbles in the full dataset
| Domain | Organism | RNA type | First residuea | Second residuea | Clusterb | Number in Shifted conf.c | Number in Standard conf.c |
|---|---|---|---|---|---|---|---|
| Bacteria | Mycobacterium tuberculosis | 16S rRNA | U668 | G704 | C2 | 3 | 0 |
| 23S rRNA | G471 | U481 | C3 | 2 | 0 | ||
| G2542 | U2550 | C4 | 3 | 0 | |||
| Mycolicibacterium smegmatis | 23S rRNA | G470 | U480 | C3 | 1 | 0 | |
| Thermus thermophilus | 16S rRNA | U660 | G696 | C2 | 137 | 1 | |
| U1068 | G1081 | C1 | 8 | 0 | |||
| 23S rRNA | G2315 | U2323 | C4 | 80 | 1 | ||
| Bacillus subtilis | 16S rRNA | U685 | G721 | C2 | 1 | 0 | |
| 23S rRNA | G2130 | U2217 | NC | 1 | 0 | ||
| Staphylococcus aureus | 16S rRNA | U685 | G721 | C2 | 2 | 0 | |
| U1097 | G1110 | C1 | 1 | 0 | |||
| 23S rRNA | G428 | U438 | C3 | 1 | 0 | ||
| G1515 | U1565 | NC | 5 | 1 | |||
| G1577 | U1589 | NC | 1 | 0 | |||
| G2133 | U2211 | NC | 1 | 0 | |||
| Listeria monocytogenes | 16S rRNA | U679 | G715 | C2 | 1 | 0 | |
| 23S rRNA | G2335 | U2343 | C4 | 1 | 0 | ||
| Enterococcus faecalis | 23S rRNA | G2315 | U2323 | C4 | 2 | 0 | |
| Pseudomonas aeruginosa | 16S rRNA | U664 | G700 | C2 | 2 | 0 | |
| U1073 | G1086 | C1 | 1 | 0 | |||
| 23S rRNA | G2290 | U2298 | C4 | 3 | 0 | ||
| Escherichia coli | 16S rRNA | U676 | G712 | C2 | 47 | 0 | |
| U1085 | G1098 | C1 | 3 | 0 | |||
| 23S rRNA | G2304 | U2312 | C4 | 36 | 0 | ||
| sgRNA | G22 | U40 | 2 | 0 | |||
| Acinetobacter baumannii | 16S rRNA | U674 | G710 | C2 | 3 | 0 | |
| U1083 | G1096 | C1 | 1 | 0 | |||
| Archaea | Thermococcus kodakarensis | 23S rRNA | G1015 | U1024 | NC | 1 | 0 |
| Haloarcula marismortui | 23S rRNA | U2585 | G2591 | NC | 10 | 0 | |
| Eukarya | Spinacia oleracea | 23S rRNA | G394 | U404 | C3 | 1 | 0 |
| Solanum lycopersicum | 25S rRNA | G424 | U643 | NC | 2 | 0 | |
| Leishmania donovani | 18S rRNA | G2012 | U2028 | NC | 1 | 0 | |
| Schizosaccharomyces pombe | RNA (75mer) | G106 | U179 | NC | 1 | 0 | |
| Saccharomyces cerevisiae | 25S rRNA | U441 | G493 | NC | 1 | 0 | |
| U2875 | G2952 | NC | 1 | 0 | |||
| Encephalitozoon cuniculi | 18S rRNA | U557 | G593 | C2 | 1 | 0 | |
| Spraguea lophii | RNA SSU | U998 | G1050 | NC | 1 | 0 | |
| Oryctolagus cuniculus | 28S rRNA | G3353 | U3489 | NC | 1 | 0 | |
| Homo sapiens | 28S rRNA | G505 | U653 | NC | 1 | 1 | |
| Drosophila melanogaster | 28S rRNA | U1194 | G1308 | NC | 1 | 0 | |
| G2621 | U2861 | NC | 1 | 0 | |||
| Total | 373 | 4 |
Numbering is from alignment of all sequences within an organism and RNA type and may be slightly different from individual RNAs provided elsewhere.
Refers to the clusters found in Fig. 7. “NC” stands for non-clustered using an RMSD cutoff of 1.23 Å.
All entries in this table had at least one structure in the shifted conformation. We report here, for each entry, how many structures were in the shifted conformation and how many were in the standard conformation. Wobbles that did not clearly conform to either of these conformations were not scored.
While the overrepresented, high confidence shifted G•U wobbles were useful for assessing the prevalence of shifted over standard wobbles, they could lead to overcounting of unique shifted G•U wobbles. To address this, we developed a pipeline to identify a representative shifted G•U wobble for those examples sharing the same organism, RNA type, and residue index (Fig. 2A). Upon filtering for redundancy, 41 unique examples of the shifted G•U wobble resulted (Table 1 and Supplementary Table S3). Among these, 27 were in bacterial RNAs from 10 species, 12 were in eukaryotic RNAs from 10 species, and 2 were in archaeal RNAs from 2 species (Fig. 3 and Table 1). Regarding the distribution of these shifted G•U wobbles across RNA types, most were in rRNAs, with 23 examples in the large subunit and 16 in the small subunit.
Figure 3.
Distribution of shifted G•U wobbles across the three domains of life. Provided are RNA type and the source organism for the 41 non-redundant structures containing shifted G•U wobbles. “LSU” indicates the large subunit of the ribosome and “SSU” indicates the small subunit of the ribosome; “Others” corresponds to a 75-mer and an sgRNA. The number of occurrences of each RNA type in each organism is provided with a heat map and a number. For the rRNA gray boxes, which have no values, we indicate if the shifted wobble was not found (NF) or it there was no structure to analyze (NA).
Assessing the quality of the shifted G•U wobble pair assignments
The hydrogen bonds required for the formation of standard G•U wobbles are between G(O6) and U(N3), and between G(N1) and U(O2) (Fig. 1A). In contrast, the hydrogen bonds necessary for the formation of the shifted G•U wobbles are between G(N1) and U(O4), and between G(N2) and U(N3) (Fig. 1B). To judge the extent to which the standard and shifted G•U wobble candidates uniquely had these two sets of hydrogen bonds, we plotted the distribution of all non-redundant standard (n = 6636) and all non-redundant shifted (n = 41) G•U wobbles for the nine possible pairwise distances between the three WCF face heteroatoms of G and three WCF face heteroatoms of U (Fig. 4). The distance distributions for G(O6)-U(N3) and G(N1)-U(O2) were as expected for the standard and shifted wobbles, with small respective medians of 2.93 and 2.84 Å and narrow interquartile ranges (IQRs) of 0.20 and 0.22 Å for the standard G•U wobbles, but large respective medians of 5.57 and 5.68 Å and broad IQRs of 0.47 and 0.37 Å for the shifted G•U wobbles. Similarly, the distance distributions for G(N1)-U(O4) and G(N2)-U(N3) gave small respective medians of 3.02 and 3.20 Å and IQRs of 0.33 and 0.31 Å for the shifted G•U wobbles, but large respective medians of 5.42 and 5.37 Å and IQRs of 0.33 and 0.37 Å for the standard G•U wobbles. This comparison of these two sets of distances between atoms participating in hydrogen bonds across the standard and shifted G•U wobbles enhances confidence in the assignment of the shifted wobbles. We note that the distances and IQRs are somewhat smaller and tighter, respectively, for the standard wobbles than the shifted ones, which may come from there being much more data for the standard wobbles (nstandard= 6636 versus nshifted= 41) or from researchers conducting model to electron density fitting refinements with standard protonation states. We also note that the medians from the other five pairwise combination of distances are above the 3.4 Å line and so are not consistent with hydrogen bonds for either the standard or shifted G•U wobble pairs (Fig. 4).
Figure 4.
Distribution of distances between heteroatoms in all non-redundant G•U wobbles. We provide distances for all nine possible combinations of the three WCF face heteroatoms of G (O6, N1, and N2) and of U (O4, N3, and O2). Combinations are provided in the order from major groove resident atoms to minor groove ones. The 6636 non-redundant standard G•U wobbles are in blue (left side of column) and the 41 non-redundant shifted G•U wobbles are in pink (right side of column). The dashed line depicts the upper limit for a hydrogen bond of 3.4 Å. The standard G•U wobbles show two distinct distributions below this line, for G(O6)-U(N3) and G(N1)-U(O2) (blue asterisks), while the shifted G•U wobbles show two different distinct distributions below this line, for G(N1)-U(O4) and G(N2)-U(N3) (pink asterisks).
This lack of overlap between the distance distributions for the standard and shifted wobbles for the two sets of distances (Fig. 4) supports the conclusion that the electron density map for shifted G•U wobbles cannot be accounted for by standard G•U wobbles. To further test this idea, we overlaid the shifted G•U molecular model on the electron density map for each of the 41 non-redundant examples of the shifted G•U wobble. An example is provided in Fig. 5A, and all 41 examples are provided in Supplementary Fig. S1.1–S1.41. In nearly all cases, the overlay of the model and the map was very good, supporting the interpretation that the shifted wobble is a unique molecular model to fit the electron density map. Next, for each of the 41 non-redundant examples, we plotted the raw map-model correlation coefficients as a function of residue index for all the residues in the chain, with an example provided in Fig. 5B and all 41 examples provided in Supplementary Fig. S1.1–S1.41. These plots revealed that the G and U of the shifted G•U wobble are generally in a well-defined region of the structure. Next, we plotted the distribution of each of the raw map-model correlation coefficients, with an example provided in Fig. 5C and all 41 examples provided in Supplementary Fig. S1.1–S1.41. Inspection of these examples confirmed that the G and U of the shifted G•U wobble have similar correlation coefficients as the rest of their chain. To visualize the correlation coefficient data for all 41 shifted wobbles at once, we prepared box plot distributions of the 41 mean correlation coefficients of all nucleotides within chains containing the 82 G and U residues as well as of the 41 G residues alone and 41 U residues alone, both as raw and mean-normalized values (Fig. 5D). The latter plots reveal that nearly all G and U residues have correlation coefficients similar to those of the rest of their chain. Three outlier G•Us were identified and are denoted in Supplementary Table S3. These were included in our analyses but did not make it into the clusters described below. In total, these data provide strong evidence that the electron density maps define a shifted G•U model with confidence.
Figure 5.
Map-to-model analyses for shifted G•U wobbles. (A) Sample model fitting to electron density map for U677 and G713 in 8B0X (chain ID: A). (B) Sample map-model correlation coefficient (cc) plot for the shifted G•U wobble from panel A. (C) Sample map-model cc distribution using the plot in panel (B). For panels B and C, the values for the G and U of the shifted wobble are denoted with green and umber symbols, respectively, and the overlapping mean and median are denoted with a single black dashed line. Panels (A–C) for each of the 41 non-redundant shifted G•U wobbles are provided in Supplementary Fig. S1. (D) Distribution of map-model cc for all 41 shifted G•U wobbles. Column 3: Distribution of mean of map-model cc of all nucleotides in the G•U wobble-containing chains. Columns 2 and 4: Distribution of raw map-model cc for the 41 G (Column 2) and the 41 U (Column 4) residues of the shifted wobble. Columns 1 and 5: Distribution of corrected map-model cc for the 41 G (Column 1) and the 41 U (Column 5) of the shifted wobble in which the raw value (Columns 2 or 4) is divided by its mean value (Column 3). Gs are green and Us are umber.
Finally, to further assess the assignments of the 41 shifted G•U wobbles, we measured the distances and angles of its two hydrogen bonds upon the addition of a proton to the donating atom of each hydrogen bond. This approach has been shown to provide an additional evaluation of hydrogen bond quality [48]. As shown in Supplementary Fig. S2, 38 of these hydrogen bonds have high quality distances and angles, supporting their assignments as shifted G•U wobbles. Three outliers were again detected and are the same as those noted above.
Sequences and secondary structures containing the shifted G•U wobble pairs
We were curious if there might be certain sequence and structural contexts that drive formation of the shifted wobble. To begin, we divided the 41 non-redundant shifted wobbles into G•U and U•G subclasses based upon which of the two residues, G or U, appeared first in the sequence (Fig. 6A). We found that there was a near even split of the shifted wobbles between the two subclasses, with 22 members in the G•U subclass and 19 members in the U•G subclass (Fig. 6B and C).
Figure 6.
Patterns in sequences flanking the shifted wobbles. (A) Definition of positions of sequence pairs (defined as pairs of nucleobases opposite each other) relative to the shifted G•U wobble (pink circles, filled) up to two steps above or below. The shifted wobble is defined as position 0, and positions below it are negative while those above it are positive. Panels (B) and (C) provide all 16 sequence pairs (y-axis) for all four positions flanking the shifted wobble, numbered as in panel A (x-axis), for the U•G and G•U subclasses, respectively. The U•G and G•U subclasses were assigned on the basis of whether the first residue in the sequence was the shifted U (=U•G subclass) or G (=G•U subclass). Line thickness scales with number of candidates and is consistent across the two panels. The orange field (filled) designates those sequence pairs with WCF complementarity, the majority of which form canonical WCF pairs (Supplementary Table S3).
We then inspected the sequences flanking the shifted wobble. The frequency of sequence pairs (defined as bases opposite each other but not necessarily base paired) for all 16 dinucleotide combinations are provided in Fig. 6 for the four positions of −2, −1, +1, and +2, relative to the shifted wobble defined as position 0. We found that the vast majority—20 of the 22 G•U and 17 of the 19 U•G subclasses—contained at least one canonical WCF base pair as a nearest neighbor (Fig. 6B, C, orange field.) For the 20 wobbles in the G•U subclass, canonical WCF base pairs were highly prevalent at both the -1 and +1 flanking positions, at 15 and 7 instances, respectively. Among these 20 structures, two contained canonical WCF base pairs on both sides of the shifted wobble. Turning to the 17 wobbles in the U•G subclass, canonical WCF base pairs were found predominantly at the +1 flanking position. In particular, the frequencies of canonical WCF base pairs at the −1 and +1 flanking positions were 1 and 16 instances, respectively. None of the U•G subclass structures had canonical WCF base pairs on both sides of the shifted wobble. The most frequent canonical WCF base pair at the −1 flanking position of the G•U subclass was GC, while UA was the most frequent canonical WCF base pair at the +1 flanking position of U•G subclass. The molecular basis behind these sequence and structural trends was investigated next.
Clustering of the shifted G•U wobble pairs: secondary structures, consensus sequences, and interactions
We were interested in whether the 41 non-redundant G•U wobble-containing sequences shared structural similarities. To address this, we constructed alignments of 3D structures, identified structural clusters, and converted these to consensus sequences and secondary structures. To start, we conducted an all-against-all structural alignment for these sequences (Supplementary Table S5) to sort them into clusters (Fig. 7). Structural motifs used for clustering contained the -1, 0, and +1 sequence pairs as defined in Fig. 6A and no other sequence. Briefly, we generated a distance tree by first coarse graining each of the six nucleotides in a structural motif and then applied hierarchical clustering to all 41 structures using the SciPy library [39] (see Materials and methods). We then applied an RMSD cut-off of 1.23 Å, which resulted in 4 clusters each having between 4 and 9 members who have the same rRNA type and similar residue indices (Fig. 7). Additionally, 1.23 Å is close to the length of a C = O bond, which serves as a hydrogen bond acceptor in both standard and shifted G•U wobbles, thereby facilitating better structural clustering.
Figure 7.
Distance tree showing the clustering of the non-redundant shifted G•U wobbles. For each of the 41 shifted wobbles (pink, right-hand side), we included the −1 and +1 sequence pairs (green, right hand side) in the RMSD calculation. Structural overlays (right-hand side) show similarities within each cluster. The distances were calculated from the all-against-all structure alignment (see Materials and methods). The vertical dashed line at 1.23 Å represents the upper limit in RMSD for sorting structures into the same cluster. Clusters were required to have at least four members, which resulted in clusters 1–4, with 5, 9, 4, and 6 members, respectively. The gray data remained unclustered at the 1.23 Å cutoff. The identity of each species is provided in Supplementary Table S3.
The first cluster was in the U•G subclass of shifted wobbles and had five closely related members from 16S rRNA bacterial species with an average distance between members of 0.45 Å (Fig. 7, top). These were all at the equivalent position in 16S rRNA, with E. coli numbering of U1086 and G1099 (Fig. 8A). (Note that we underline the G and U of the shifted wobbles to help with orientation.) Next, we conducted a multiple sequence alignment of these five members using pyMSAviz (see Materials and methods). We included nucleotides starting at residue index −3 and ending at residue index +3 with respect to the shifted wobble position (Fig. 8A). Thus for cluster 1, 20 residues from five sequences were aligned. This resulted in a near perfect alignment, with the only variation being at residue index +18 (Fig. 8A, left). Next, for each of the five members, we retrieved the 3D structure of the 20 residues from the respective pdb files and obtained the underlying secondary structures for each of the five files in dot bracket notation using DSSR [34] (Fig. 8A, right). In all five structures, the shifted U•G wobble was found at the base of a 4 base pair stem having three GC base pairs and enclosing a UUAAGU hexaloop. The three nucleotides before and after the shifted U•G wobble were largely unpaired in the crystal structures.
Figure 8.
Consensus sequences and secondary structures for the four clusters for the non-redundant shifted G•U wobbles. Clusters 1–4 are provided in panels A–D, respectively. Sequence alignments, secondary structure representations, and consensus sequences were prepared as described in the Materials and methods. The identity of each species is provided as a three-letter code to the left of each alignment and is also found in Supplementary Table S3. E. coli numbering for 16S and 23S rRNA, as appropriate, is provided in red below the shifted G and U in each alignment. Sequences were aligned from three residues before to three residues after the G•U. Position of sequence pairs within each cluster are provided in parentheses next to each secondary structure. Pairing is shown both above the alignment and in the secondary structure to the right of the alignment. As needed, stems were lengthened to accommodate additional base pairs found below the −3 sequence pairs, but these were not used in the alignment and so are not numbered. For the consensus sequence, the IUPAC code was used, where K = G or U; M = A or C; S = G or C; V = A, C, or G; W = A or U; and Y = C or U, and interactions of base pairs follow the convention of Leontis and Westhof [41].
To understand what interactions might drive formation of the shifted wobble in cluster 1, we analyzed the interactions of the heteroatoms between U4 and G17 using an interaction cutoff of 3.4 Å (Fig. 9A). Briefly, this analysis considered pairwise interactions between an atom in the major groove, minor groove, and sugar atoms of the shifted wobble and an atom in the RNA chain, amino acids, metal ions, and water. We provide a stereoview of the minor groove and a stereoview of the major groove of each cluster in Supplementary Fig. S3 to aid viewing. Beginning in the major groove of the shifted U•G wobble of cluster 1, the most frequent (4 of the 5 structures) interaction was a major groove-major groove intrastrand stacking interaction between the oppositely charged G17(O6) and C16(N4). Also present in the major groove was an interaction (2 or 3 of the 5 structures) between the major groove atoms of G17(O6)/U4(O4) and a Mg2+ ion, which is also held in place by ligands from C18(N3) and the phosphate of U3. Note that we report the identity of the ion as modeled by the authors. The metal ion interaction suggests that this shifted G•U wobble might be in the anionic form (Fig. 1) (see Discussion). In the minor groove, only one interaction was present (2 of 5 structures), which was a minor groove-minor groove interstrand stacking interaction between G17(N3) and G5(N1). Moving to the sugar region of the shifted U•G wobble, there was a hydrogen bond between G17(O4′) and C16(O2′). Looking at the “None” interaction column, it is notable that the G17 base has many interactions, albeit not with its N7, while the U4 base has none. We also see that the O6 of G17 is always engaged in some interaction. Finally, we note that two of the five members in cluster 1 have multiple entries in the PDB. Specifically, 16S rRNA from Escherichia coli has three entries with a cluster 1 G•U shifted wobble and Thermus thermophilus has eight entries with this shifted wobble, while neither has an entry with a standard wobble (Table 1 and Supplementary Table S3) supporting the shifted wobble.
Figure 9.
Interactions within each of the four clusters for the non-redundant shifted wobbles. Clusters 1–4 are provided in panels A–D, respectively. For each of the 41 shifted wobbles (pink), we included the −1 and +1 sequence pairs (both in green), which were aligned as in Fig. 6. In each panel, an interaction diagram and a representative structure are provided on the left and right, respectively; an overlay of all structures in a cluster is provided in Fig. 7. For each interaction diagram, atoms from the shifted G and U are listed on the y-axis in the order: major groove (M), minor groove (m), sugar (s), while interacting atoms within 3.4 Å from the RNA chain, amino acids, metal ions, or water are provided on the x-axis, with the bases in alphabetical order nested by numerical order. For the representative structure, the base pairs are shown with dashed lines in the color of the respective bases, and vertical interactions are shown as solid gray lines. For clusters 1 and 3, we include metal ions that were frequent. For panels B and D, nucleosides and amino acids making long-range interactions are shown as yellow sticks. For each structure, the view has the same orientation as in Fig. 7, which leaves the minor groove in the foreground. Stereoviews of the minor and major groove of each structure are provided in Supplementary Fig. S3.
The second cluster identified was also in the U•G shifted wobble subclass and had nine members with an average distance of 0.51 Å between them (Fig. 7, second set). Eight of the nine members were from bacterial 16S rRNA and were very closely related in structure, while the other member was a bit further related in structure and from eukaryotic 18S rRNA (PDB ID: 7QEP). All entries, including the eukaryotic one, were found at the equivalent position in 16S rRNA, with E. coli numbering of U677 and G713 (Fig. 8B). The multiple sequence alignment of the 43 residues from these nine members, beginning three residues before the U•G and ending three after it, resulted in a strong consensus sequence, especially near the U•G wobble, of a UA pair above (8 of 9 sequences plus 1 AU pair) and an AG pair below (9 of 9 sequences) (Fig. 8B, left). Again, we retrieved the 3D structures, here of 45 residues (one more residue from both ends to capture full length of the stem below the shifted U•G wobble) of all the nine members of cluster 2, and derived the secondary structures to identify the consensus secondary structure (Fig. 8B, right). In all nine structures, the shifted U•G wobble was located within a 14 base pair stem that included a large (17 nt) structured loop. Cluster 2 has in common with cluster 1 a shifted U•G wobble at the base of a stem, but the two clusters differ in the nature of the bases above and below the wobble (Fig. 8A and B).
Again, we considered the interactions of the shifted U•G wobble, here between U4 and G40 of cluster 2 (Fig. 9B). First, we note that the UA above the U•G forms a standard WCF pair. Strikingly, the A and G of sequence pair −1 form a two-hydrogen bond cWW base pair [41], as do the AA and GA at sequence pairs −2 and −3, respectively (Figs 8B and 9B). In the major groove of the shifted U•G wobble, frequent stacking interactions were observed with the adjacent base pairs. Specifically, the shifted U•G wobble stacks on the UA base pair above it: on the left strand, an intrastrand stacking interaction occurs between U4(O4) and U5(O4) in 8 out of 9 structures, while on the right strand, an intrastrand stacking was identified between G40(O6) and A39(N6) in 7 out of 9 structures. Additionally, extensive stacking interactions were observed with the AG base pair below the wobble: on the left strand, stacking was noted between U4(O4) and A3(N6) in 6 out of 9 structures, and on the right strand, between G40(O6) and G41(O6) in 4 out of 9 structures. These intrastrand stacking interactions with residues both above and below the shifted wobble contribute significantly to stabilizing the structure. In the minor groove, there was one dominant interaction (9 of 9 structures) of a hydrogen bond involving U4(O2) and ALR(O2′), where “LR” stands for long-range, which is for any residues outside the aligned region. The U4(O2) also interacted with G41(N2) (5 of 9 structures). These U4(O2) interactions are likely important for holding the U of the U•G wobble in the minor groove (see Discussion). In the sugar region, there were two dominant interactions: one was a hydrogen bond between G40(O2′) and G41(O4′) (9 of 9 structures), and the other is a hydrogen bond between G40(O4′) and A39(O2′) (6 of 9 structures). There was also a sugar-sugar interaction of U4(O4′) and A3(O2′) (3 of 9 structures). This extensive intrastrand involvement of the O2′-to-O4′ sugar hydrogen bonds, especially the O2′ and O4′ of G40, which interacts with the O4′ and O2′, respectively, of sugars below and above the shifted wobble along both strands, is particularly notable. This network of ribose interactions is like a ribose zipper, which conventionally holds together two different strands together using 2′OH interstrand interactions [49] but differs in that it also involves the O4′ and in that all the interactions are intrastrand. Looking at the “None” column, it is notable that G40(N7) of the shifted wobble again has no interactions, while G40(O6) is still strongly engaged. Unique to cluster 2 is the heavy engagement of the U of the shifted wobble in interactions, including its major groove resident O4 and minor groove resident O2. Notably, there is no high occupancy metal ion present in these structures. Finally, we note that there are a number of other cluster 2 16S rRNA entries in the full dataset (Table 1 and Supplementary Table S3). These include E. coli (47 entries) and T. thermophilus (137 entries); for the other members in this cluster, there were three or fewer structures available for each in the redundant dataset. Out of nine members in this cluster having nearly 200 shifted G•U wobble entries, there was only one entry forming a standard G•U wobble (Table 1), strongly supporting the shifted wobble.
Whereas the first two identified clusters were from the shifted U•G subclass of wobbles, the third cluster was from the shifted G•U subclass of wobbles, which had four members with an average distance of 0.50 Å between them (Fig. 7, third set). These members were all from 23S rRNA rather than 16S rRNA, with three of the four being from bacteria and the other from Eukarya chloroplast (PDB ID: 5MMI). All entries, including the eukaryotic one, were found at the equivalent position in 23S rRNA, with E. coli numbering of G382 and U392 (Fig. 8C). The multiple sequence alignment of all 17 residues from the four members of cluster 3, again beginning 3 residues before the G•U wobble and ending 3 after it, resulted in a strong consensus sequence near the G•U of a CA pair above (4 of 4 sequences) and a GC pair below (3 of 4 sequences and 1 AU) (Fig. 8C, left). We then retrieved the 3D structures of the 29 residues (six more residues from both ends to capture the full length of the stem below the shifted G•U wobble) of all the four members of cluster 3 and derived the secondary structures, with consensus stem-loop secondary structure found in Fig. 8C, right. In three out of four structures, the shifted G•U wobble was found within a stem of 11 base pairs where there is one base pair above and nine base pairs below the shifted G•U wobble, and the wobble closed a loop of 9 nt with the sequence CMCGUGGAA (M = A or C).
Next, we considered the interactions of the shifted G•U wobble between G4 and U14 of cluster 3 (Fig. 9C). First, we note that the GC of sequence pair −1 from below the G•U forms a standard WCF pair. Interestingly, the C and A of sequence pair +1 from above the G•U forms a two-hydrogen bond trans-WCF-Hoogsteen (tWH) pair [41] in which A13(N6) donates one of its protons to the C5(N3) and the other to the C5(O2), which differs from the more conventional AC wobble in which A(N1) has to be protonated [22]. In the major groove of the shifted G•U wobble, there were limited RNA–RNA interactions, with only major groove-major groove intrastrand stacking between complementary charges on the adjacent U14(O4) and C15(N4) (2 of 4 structures) having more than one occurrence. However, like cluster 1, the major groove did host Mg2+, in this case two metal ions, with one metal ion interacting with G4(O6) and the other metal ion interacting with G4(N7). This suggests that, like in cluster 1, this shifted G•U wobble might be in the anionic form (see Discussion). In the minor groove, there was only one dominant RNA-RNA interaction (4 of 4 structures), which involved a minor groove-minor groove intrastrand stack of complementary charges on G4(N2) and C5(O2). Notably, this minor groove intrastrand stack was in the left strand, while the major groove intrastrand stack was in the right strand, providing support for both strands of the structure. In the sugar region, there were no notable interactions. It can be noted that cluster 3 is like cluster 1 in that both are dominated by major groove interactions with a metal ion and both have relatively few minor groove and sugar interactions. Indeed rotation of cluster 1 about its axis coming out of the page reveals similarities, although the lower base pair is a GC in cluster 3 but a CG in cluster 1. Finally, there were not many additional cluster 3 23S rRNA entries in the full dataset, except for the shifted wobble in 23S rRNA of Mycobacterium tuberculosis, which has two structures in the redundant dataset (Table 1 and Supplementary Table S3). Nonetheless, there were no examples of a standard G•U wobble by the corresponding residues, supporting the shifted wobble (Table 1).
The fourth and final cluster was also in the G•U shifted wobble subclass, where it had six members, which were related by an average distance of 1.0 Å (Fig. 7, bottom). The six members were all from bacteria and were found at the equivalent position in 23S rRNA, with E. coli numbering of G2304 and U2312 (Fig. 8D). The multiple sequence alignment of all 15 residues from the six members of cluster 4, starting 3 residues before the GU wobble and ending 3 residues after it, resulted in a strong consensus sequence near the G•U consisting of a WA sequence above and a GC pair below (6 of 6 sequences) the G•U (Fig. 8D, left). Upon retrieving the 3D structures of the 21 residues (three more residues from both ends to capture the full length of the stem below the shifted G•U wobble) of all the six members of cluster 4 and deriving the secondary structures, we ended up with a consensus stem-loop (Fig. 8D, right). In all cases, the shifted G•U wobble was found atop a stem having six base pairs below, with a standard GC pair neighboring the G•U, and a 7 nt loop with the sequence WYGGAAA (W = A or U, and Y = C or U).
Next, we investigated the interactions of the shifted G•U wobble between G4 and U12 of cluster 4 (Fig. 9D). As with the shifted G•U wobble from cluster 3, the GC of sequence pair -1 forms a standard WCF pair, but unlike cluster 3, there is no base pairing above the wobble. Very few interactions occur in the major groove. The only one of some note is a major groove-major groove intrastrand stacking interaction between G4(O6) and G3(O6) (2 of 6 structures). A metal ion was found with major groove atoms of the wobble, G4(O6) and U12(O4), but it was present in only one structure and so is not shown. The minor groove has more interactions and in this sense is somewhat reminiscent of cluster 2; moreover, in both clusters 2 and 4 most of the minor groove interaction are to species distal in sequence. In cluster 2, the dominant minor groove interaction was a hydrogen bond with the O2′ of a long-range A, while in cluster 4 all the minor groove interactions are to amino acids from UL5, a universally conserved protein from the large subunit [50]. In particular, there is an aspartate that interacts with the N2 (4 of 7 structures) and N3 (3 of 7 structures) of G4, as well as an asparagine that interacts with U12(O2). In the sugar region, there is a multitude of interactions, again reminiscent of cluster 2. The dominant interaction (6 of 6 structures) is a hydrogen bond between U12(O4′) and A11(O2′) that is akin to the hydrogen bond between U4(O4′) and A3(O2′) in cluster 2. The other interactions in the sugar region are again with amino acids, with the two dominant interactions involving G4(O2′) and the same aspartate that interacted with the N2 and N3 of G4, and U12(O2′) and a threonine. Notably, U12 engages its O2′ and O4′ in 5 of 6 and 6 of 6 cases, respectively, presumably helping to anchor the U of the wobble in the minor groove as is characteristic of the shifted G•U wobble (see Discussion). We note that there are multiple other cluster 4 23S rRNA entries in the full dataset (Table 1 and Supplementary Table S3). In particular, there are 36 entries from E. coli and 80 from T. thermophilus. For the other members in this cluster, there were three or fewer structures available for each in the redundant dataset. For the more than 100 members in this cluster, there was only one member forming the standard G•U wobble (Table 1), strongly supporting the shifted wobble.
Experimental support for the shifted G•U wobble pairs
In an effort to test whether the predicted G•U wobble pairs identified herein by cheminformatics are deprotonated in solution, we examined the DMS reactivity of ribosomes from three different organisms treated in vivo with DMS and read out using mutational profiling. Typically, DMS reacts with the WCF of A and C, which are deprotonated and can act as nucleophiles to attack a methyl group on DMS [51, 52]. Conventionally DMS should not react with G or U, as the imino nitrogen atoms on their WCF face are protected by protonation. However, if the U adopts either the anionic or enolic alternative protonation form (Fig. 1B), it becomes deprotonated on the WCF face and has the potential to attack DMS, as confirmed by in vitro DMS experiments at elevated pH [53]. We note that reactivity with a shifted G•U wobble may necessitate opening of the base pair, followed by rapid reaction with DMS.
Here we analyze published rRNA datasets for the reactivity of DMS with G and U in E. coli [54] and in S. cerevisiae and H. sapiens [55]. Because these three in vivo datasets come from two different studies, it was important to assure that the data are comparable. We therefore calculated the area under the curve (AUC) in receiver operating characteristic (ROC) curves for all three datasets for two cases: (i) all residues in rRNAs (5S, 16S, and 23S; uses E. coli S values) and (ii) those residues in the highly conserved peptidyl transferase center (PTC) (Supplementary Fig. S4). The AUC values for A residues were highly similar amongst all three organisms for both all rRNA residues and the PTC residues; the same was largely true for C residues, although E. coli performs slightly better. This analysis indicates that DMS data provide similar accuracy of base pairing within each dataset. We also note that the experimental conditions for DMS reactivity were similar across the three in vivo datasets: E. coli reacted for 5 min at 37°C, S. cerevisiae reacted for 4 min at 30°C, and H. sapiens reacted for 4–5 min at 37°C.
Our analysis of the rRNA datasets revealed significant DMS reactivity for three out of the six shifted G•U wobbles with available reactivity data; in all three cases, it was the U of the shifted G•U wobble that reacted (Supplementary Fig. S5, reactivity data are found in Supplementary Tables S6-S8). In E. coli, we identified three shifted G•U wobbles through our cheminformatic approach (three rows in Table 1), and one of these was highly reactive with DMS in vivo (Fig. 10 and Supplementary Fig. S5). This wobble, consisting of U677 and G713 in 16S rRNA, was identified as being in Cluster 2 (Figs 7 and 8B). Supplementary Fig. S5 shows the reactivity of U677 in the context of all U and G reactivities in E. coli rRNA. When compared to average DMS reactivities for all Us, U677 had higher DMS reactivity with a p-value < 0.0005 (Supplementary Table S2). Reactivities of other Us and Gs in shifted wobbles identified in E. coli did not exhibit significantly increased values over the average (Supplementary Fig. S5A). Nonetheless, although not significant, U1086 and G1099 in 16S rRNA of E. coli had moderate reactivity with DMS (Supplementary Figs S5A and S6A) suggestive of either an anionic U or a U-enol tautomer, possibly with deprotonation of G(N1) by U(O4), which would itself be externally deprotonated. Overall, the significant DMS reactivity of U677 of a shifted G•U wobble strongly supports that this U is either anionic or enolic (Fig. 1B).
Figure 10.
In vivo reactivity of DMS-treated rRNAs with shifted G•U wobbles. (A–C) Secondary structures illustrating the reactivity of DMS in the (A) 668–738 region of E. coli 16 S rRNA, highlighting the high reactivity of U677 in a shifted G•U wobble, (B) 438–499 region of S. cerevisiae 25S rRNA, highlighting the high reactivity of U441 in a shifted G•U wobble, and (C) 2872–2955 region of S. cerevisiae 25S rRNA, highlighting the high reactivity of U2875 in a shifted G•U wobble. Additional examples are found in Supplementary Fig. S6.
Next, we assessed the in vivo DMS reactivity of shifted G•U wobbles in the other two organisms, yeast and human. We identified two shifted G•U wobbles in S. cerevisiae 25S rRNA (two rows in Table 1). Strikingly, both had significant reactivity of the U (Fig. 10B,C), with p-values of 0.0018 for U441•G493 and 0.015 for U2875•G2952 (Supplementary Fig. S5, Supplementary Table S2). Like with U677•G713 in E. coli, both U441•G493 and U2875•G2952 are terminal U•G wobbles. However, unlike U677•G713 of E. coli, neither of these pairs in S. cerevisiae are part of a structural cluster (Fig. 7), and the base pair between U2875 and G2952 by crystal structures is not reported by covariance analysis [56]. Thus, while the high reactivity and consistent structure within bacterial ribosomes of E. coli U677•G713 suggests the deprotonated U is a conserved structural feature of bacterial ribosomes, the same is unclear for eukaryotes, although this may be due to the dearth of eukaryotic ribosomes represented in the PDB. Finally, the shifted G•U wobble in humans between G505 and U653 did not show significant reactivity with DMS (Supplementary Figs S5 and S6).
In summary, there was significant in vivo DMS reactivity for three of the six shifted G•U wobbles identified herein that have available genome-wide DMS chemical probing datasets, and the reactivity was always with the U of the G•U wobble. This observation supports either the anionic U or the enolic U forms of the tautomer (Fig. 1). The three shifted G•U wobbles that did not react significantly with DMS may have more limited opening of the base pair, which is consistent with them being adjacent to stable stems (compare Fig. 10 and Supplementary Fig. S6). We note that we tested for a correlation between DMS reactivity and solvent accessible surface area (SASA) but did not find one (Supplementary Fig. S7). Overall, these data provide in vivo experimental support for the shifted G•U wobbles identified herein by computational means.
Discussion
In this study, we provided a new approach for discovering RNAs with unconventional protonation and tautomeric states. Briefly, our approach is to draw the Lewis structure of a base pair intentionally having hydrogen bonding clashes, with the notion that the clashes can be resolved either by ionizing a base or by shifting a proton from one atom on a base to another atom on the same base as a tautomer, effectively swapping hydrogen bond donating and accepting roles. These changes can resolve one or even two hydrogen bonding clashes, for example if there are adjacent DD and AA clashes, where “D” is hydrogen bond donor and “A” is hydrogen bond acceptor. We then conduct 3D searches for the heteroatoms of such novel base pairs in the PDB, which is rife with base pairs; for instance, herein, we examined nearly 160 000 GU pairs. We then analyze the shifted wobbles to assure that they have good map-to-model coefficients both internally and relative to the rest of the RNA chain, as well as favorable hydrogen bonding distances and angles, and extract their structural features. We then assign them to secondary structure motifs and identify structural clusters. In so doing we identified novel G•U pairs that require alternative protonations.
The shifted G•U pairs identified herein were found in all three domains of life, including multiple examples from bacteria and eukarya, as well as several from archaea. While more examples were found in bacteria and eukarya than archaea, this may just be a consequence of there being fewer archaea RNAs that have been studied with high resolution structural techniques. Of the 41 uniquely identified shifted G•U pairs, 39 were found in ribosomal RNAs, which may be attributable to their overrepresentation in the PDB. The presence of shifted G•U pairs in other classes of RNAs may be awaiting a more extensive 3D structural database for RNA. The shifted G•U wobbles were relatively rare compared to the standard wobbles. We identified 6 636 non-redundant standard G•U wobbles and 41 shifted G•U wobbles, or 0.61%. This rarity makes the shifted G•U of potential greater interest. For instance, it may be better to have a therapeutic target a rare motif rather than a common one, leading to less off-target binding and therefore greater specificity. Enhancing target specificity is a major issue in developing drug binding to RNA [57, 58]. Some of the shifted G•U motifs, such as Clusters 1 and 4 appear to be specific to bacteria, which might allow targeting without binding to the human host, either through directly targeting the shifted G•U or by using that G•U as a secondary binding site for two-domain drug binders [59].
While we identified 41 shifted G•U wobbles in the non-redundant dataset, there were many more such wobbles in the full dataset, 377 in total. As shown in Table 1, in each organism, we separated the G•U wobbles into two classes, shifted and standard, as judged by the hydrogen bonding scheme in Fig. 4. It is notable that in the full dataset, 373 of these examples belonged to the shifted G•U wobble class, while only 4 were in the standard G•U wobble class. This suggests that the alternative protonation state(s) is dominant over the standard one, although other states or hybrid states outside of the shifted and standard G•U wobbles can also occur (Table 1). Other studies suggest that the anionic and enolic protonation states of DNA and RNA have very low populations [60, 61], although those studies did not look at the fully shifted wobbles herein in which the G is resident in the major groove. The nature of the alternative protonation state between anionic and enolic (Fig. 1) is uncertain from the studies herein and will await NMR characterization. Unfortunately, reactivity of Us with DMS (Fig. 9) does not inform on which of the alternative protonation states populate because the N3 of U is deprotonated, and therefore nucleophilic, in both anionic and enolic alternative states (Fig. 1). It is possible that both anionic and enolic forms populate and that the former is favored at elevated pH. We recently described a new reagent called ETC, which reacts with U and G in their normal, imino N-protonated forms [54]. Adoption of the deprotonated forms of U and G would lead to a loss-of-function; as such, ETC reactivity was not pursued herein.
It is notable that Clusters 1 and 3 both show one or more metal ions interacting with major groove G and U atoms (Fig. 9), which suggests that the G•U may be in the anionic state in which the enolate of the U(O4) could contribute strongly to metal ion binding (Fig. 1B). Future spectroscopic and theoretical studies will help resolve these questions. We also note that most metal ions were reported as Mg2+ ions, although a few as K+ ions. It is possible that the identity of the metal ion changes with the geometry of the G•U wobble, which will require future investigation.
Our analysis of the secondary structures of the 41 non-redundant shifted G•U wobbles revealed some prominent patterns. We formally divided the shifted wobbles into G•U and U•G subclasses but found striking symmetry between these. In the G•U subclass, there tended to be WCF pairs at the −1 sequence pair, while in the U•G subclass, there tended to be a WCF pair at the +1 sequence pair, albeit the G•U subclass had a predominance of GC WCF pairs while the U•G subclass had one of UA WCF pairs (Fig. 6). The symmetry continued in that the +1 sequence pair in the G•U class tended to be unpaired, while the −1 sequence pair in the U•G class tended to be unpaired. Symmetry extended further to the −2 and +2 positions, which tended to be paired and unpaired in the G•U subclass respectively, but unpaired and paired the U•G subclass, respectively. The mirror image symmetry between the G•U and U•G subclasses, clear from comparison of the side-by-side panels B and C in Fig. 6, is likely because one can rotate (Fig. 6A) 180o about the axis coming out of the page to turn the G•U into a U•G and sequence pairs −1 and −2 into sequence pairs +1 and +2. Overall, one is left with a secondary structure motif in which the G•U sits atop a stem with a loop above it while the U•G sits below a stem with either unpaired or non-canonically paired nucleotides below it, making the symmetry imperfect.
Inspection of Fig. 8, with its secondary structures, supports this model but with additional complexity. For instance, in the G•U subclass there can be a complex structure in the loop above that interacts with the G•U; for instance, cluster 3 contains a non-canonical trans-Watson-Crick/Hoogsteen AC base pair at sequence pair +1 that interacts with the G•U wobble. Likewise, in the U•G subclass there can be a complex structure in the stem below that interacts with the U•G; for instance, cluster 2 has a non-canonical cis-Watson–Crick/Watson–Crick AG base pair at sequence pair −1 that interacts with the U•G wobble. In some cases, the pair neighboring the shifted wobble is relatively simple and strong; for example, clusters 1 and 3 share multiple GC base pairs above and below their respective U•G and G•U wobbles. But, in other cases, the pairing is more complex and longer; for example, in clusters 2 and 4 where cluster 2 has four non-canonical pairs below it and nine mixed pairs above it, while cluster 4 has six mixed pairs below it.
These motifs raise the question as to what interactions are present in them to stabilize the alternative protonation states of the GU wobbles. Keeping with the theme that clusters 1 and 3 have similar secondary structures, they share key 3D structural features as well. For instance, there is a stabilizing metal ion in the major groove of each structure, although the wobble in cluster 1 has more sugar interactions and the wobble in cluster 3 has more minor groove interactions. Similarly, revisiting the theme that clusters 2 and 4 have similar secondary structures, they share essential 3D structural features too. For instance, there are extensive interactions with the sugars in each structure, with most of the G and U O2′ and O4′ atoms of the cluster members engaged in interactions, although the wobble in cluster 2 satisfies these with intrastrand interactions with neighboring nucleosides while the wobble in cluster 4 satisfies these primarily with amino acids from a ribosomal protein, albeit it does have one dominant intrastrand interaction with a neighboring nucleoside.
We asked what the dominant interactions in the shifted wobble might be that would direct the U towards the minor groove and the G towards the major groove. The structural clustering from the 3D structures reveals broad trends. In cluster 1, there were relatively few interactions in minor and sugar regions and only one major groove interaction, of the G with 5′-base. However, there were extensive metal–ion interactions with both the G and the U. Thus, metal ions can play the dominant role in shifting the wobble. In cluster 2, there were extensive intrastrand major groove interactions of both the G and the U. In the minor groove, interactions were dominated by U, with a long-range interaction, while in the sugar region the interactions were dominated by the G, with both its O2′ and O4′. Thus, a plethora of interactions of both bases in the major, minor, and sugar region can shift the wobble. In cluster 3, the G dominates the major groove, again with extensive metal ion interactions. The minor groove has an intrastrand interaction between the G and the 3′-neighboring U, while there are no sugar interactions. Thus, once again metal ion interactions can play the dominant role in shifting the wobble, although here with just the G. Finally, in cluster 4, there were relatively few major groove interactions, two minor groove interactions, but a plethora of sugar interactions to the G and U. Unusually, these were largely long-range interactions with a ribosomal protein, albeit there is one dominant intrastrand interaction to the sugar of the U. It is notable that two of the clusters—Cluster 2 and Cluster 4—have long-range interactions and thus may not form outside of the ribosomal context.
As mentioned in the Introduction, Westhof and co-workers conducted seminal studies that identified the first shifted G•U pairs in bacterial rRNAs [32]. They provided three different shifted G•U base pairs, which correspond with our Clusters 1, 2, and 4. To assess similarities, we compared the E. coli numbering and interactions of these three clusters, which we identified in an agnostic fashion, and they were highly similar between the two studies, providing confidence in the results. Importantly, our study also expands this important early study in five different ways. First, we identified a fourth class of shifted G•U wobbles, in Cluster 3, which is notable for having a G•U wobble above two WCF GC pairs and below an unusual CA tWH pair. Second, we provided the first eukaryotic and archaeal examples of the shifted G•U wobble, and did so in the following ways: (1) we identified a eukaryotic example in Cluster 2 as Ecu_7QEP (Fig. 7), which is from the parasite Encephalitozoon cuniculi, a eukaryote with one of the smallest known genomes [62], (2) we identified a eukaryotic example in Cluster 3 as Sol_5MMI (Fig. 7), which is a chloroplastic RNA from spinach, and (3) we identified unclustered eukaryotic and archaeal examples, 10 and 2 instances respectively, shown in gray in Fig. 7. Third, we provide evidence for the shifted G•U wobble in Eukaryotes and Archaea. To gain deeper insight into the presence of Cluster 3 across the three domains of life, we looked into the base pair frequency tables in the Comparative RNA Web (CRW) database [56]. While bacterial alignments gave a 14% frequency of the G•U wobble in Cluster 3, chloroplast alignments gave 57% frequency, supporting the importance of the shifted wobble in chloroplasts; Eukarya and Archaea alignments gave 0%. Motivated by this observation, we looked at frequency of the G•U wobble in Clusters 1, 2, and 4 and confirmed those values reported by Westhof and co-workers for bacterial alignments of 94% (U•G), 72% (U•G), and 93% (G•U), in Clusters 1, 2, and 4, respectively [32]. We then found that alignment of Eukarya sequences gave low frequencies of the G•U of 1.8%, 8%, and 4%, respectively; alignment of Archaea sequences gave variable frequency of the G•U of 0%, 2%, and 61%, respectively; and alignment of chloroplast sequences gave largely higher frequency of the G•U of 81%, 19%, and 94%, respectively. Apparently, the shifted G•U can be found in all domains of life, although they are favored in bacteria. Their presence in chloroplasts may have arisen from this organelle’s endosymbiotic origin from photosynthetic bacteria [63–65]. Notably, the paucity of the G•U in eukaryotes, along with its conservation in bacteria, makes the shifted wobble an attractive motif for therapeutics. Fourth, in addition to identifying the presence of shifted G•U wobbles, we provide chemical explanations for the presence of both anionic and tautomeric forms of the bases (Fig. 1), with either the anionic or U-enol tautomeric form supported by the reactivity of the U with DMS. Fifth, our study was based on structural conservation using pairwise RMSD values calculated using Biopython. This evolution approach expanded the number of members of each cluster as well as gave rise to an overlapping but different set of interactions, provided in Fig. 9, which might help to identify the driving forces for shifted G•U wobbles.
In summary, our study highlights that structural clustering can help identify diverse driving forces as being from preferential sequences, unique conformational states, and atypical molecular interactions, all leading to the formation of a shifted G•U wobble. The current study also uncovered DMS reactivity with the shifted G•U wobble in a dynamic structural context. Future studies may identify experimental approaches in which a shifted G•U reacts in any context. Finally, we note that the cheminformatics approach developed herein can be applied to other non-WCF base pairs to identify additional novel protonation states of RNA and is applicable to DNA and proteins as well.
Supplementary Material
Acknowledgements
The authors would like to thank Dr. Andrey Krasilnikov for advice on the analysis of RNA structures. We also thank Kobie Kirven for advice on computational and statistical analysis and Dr. Mrityunjay Gupta for helpful comments on the manuscript.
Author contributions: Project design was from M.S.S., C.A.D, and P.C.B. Analysis of map-model correlation coefficient was performed by M.S.S., A.J.V., A.N.P., and N.H.Y. Analysis of DMS mapping experiments was performed C.A.D. All other analyses were performed by M.S.S. The paper was written by M.S.S. and P.C.B., with contributions by all authors.
Notes
Present address: Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
Contributor Information
Md Sharear Saon, Department of Chemistry, Pennsylvania State University, University Park, PA 16802, United States; Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States.
Catherine A Douds, Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States; Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States.
Andrew J Veenis, Department of Chemistry, Pennsylvania State University, University Park, PA 16802, United States; Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States.
Ashley N Pearson, Department of Chemistry, Pennsylvania State University, University Park, PA 16802, United States; Department of Biology, Pennsylvania State University, University Park, PA 16802, United States.
Neela H Yennawar, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, United States.
Philip C Bevilacqua, Department of Chemistry, Pennsylvania State University, University Park, PA 16802, United States; Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States; Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States.
Supplementary data
Supplementary data is available at NAR online.
Conflict of interest
None declared.
Funding
This research was supported by the National Institutes of Health (NIH) grant R35GM127064 and the National Aeronautics and Space Administration (NASA) grant 80NSSC22K0553. Funding to pay the Open Access publication charges for this article was provided by NIH/NASA.
Data availability
Structural analysis of shifted and standard wobbles can be found in Supplementary_file. All data files including custom search results (in csv format) from RCSB Protein Data Bank [https://www.rcsb.org/] [33], characterization output (in json format) from Dissecting the Spatial Structure of RNA (DSSR) [34], python scripts, jupyter notebooks, correlation coefficients between electron density maps and modeled structures (in txt and csv format) are available in figshare [https://figshare.com/s/928d1a2773ea32f89396]. The codes are also hosted on GitHub [https://github.com/The-Bevilacqua-Lab/identifying_and_analyzing_shifted_wobble].
References
- 1. Wilson TJ, Lilley DMJ The chemical principles of RNA catalysis. Ribozymes. 2021; Weinheim, Germany: WILEY‐VCH GmbH; 1–22. [Google Scholar]
- 2. Kavita K, Breaker RR Discovering riboswitches: the past and the future. Trends Biochem Sci. 2023; 48:119–41. 10.1016/j.tibs.2022.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Brito Querido J, Díaz-López I, Ramakrishnan V The molecular basis of translation initiation and its regulation in eukaryotes. Nat Rev Mol Cell Biol. 2024; 25:168–86. 10.1038/s41580-023-00624-9. [DOI] [PubMed] [Google Scholar]
- 4. Pan T Modifications and functional genomics of human transfer RNA. Cell Res. 2018; 28:395–404. 10.1038/s41422-018-0013-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Suzuki T The expanding world of tRNA modifications and their disease relevance. Nat Rev Mol Cell Biol. 2021; 22:375–92. 10.1038/s41580-021-00342-0. [DOI] [PubMed] [Google Scholar]
- 6. Lucas MC, Pryszcz LP, Medina R et al. Quantitative analysis of tRNA abundance and modifications by nanopore RNA sequencing. Nat Biotechnol. 2024; 42:72–86. 10.1038/s41587-023-01743-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Arribas-Hernández L, Brodersen P Occurrence and functions of m(6)A and other covalent modifications in plant mRNA. Plant Physiol. 2020; 182:79–96. 10.1104/pp.19.01156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Boo SH, Kim YK The emerging role of RNA modifications in the regulation of mRNA stability. Exp Mol Med. 2020; 52:400–8. 10.1038/s12276-020-0407-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Izatt RM, Christensen JJ, Rytting JH Sites and thermodynamic quantities associated with proton and metal ion interaction with ribonucleic acid, deoxyribonucleic acid, and their constituent bases, nucleosides, and and nucleotides. Chem Rev. 1971; 71:439–81. 10.1021/cr60273a002. [DOI] [PubMed] [Google Scholar]
- 10. Legault P, Pardi A In situ probing of adenine protonation in RNA by 13C NMR. J Am Chem Soc. 1994; 116:8390–1. 10.1021/ja00097a066. [DOI] [Google Scholar]
- 11. Legault P, Pardi A Unusual dynamics and pKa shift at the active site of a lead-dependent ribozyme. J Am Chem Soc. 1997; 119:6621–8. 10.1021/ja9640051. [DOI] [Google Scholar]
- 12. Watson JD, Crick FHC Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature. 1953; 171:737–8. 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
- 13. Colominas C, Luque FJ, Orozco M Tautomerism and protonation of guanine and cytosine. Implications in the formation of hydrogen-bonded complexes. J Am Chem Soc. 1996; 118:6811–21. 10.1021/ja954293l. [DOI] [Google Scholar]
- 14. Hobza P, Sponer J Structure, energetics, and dynamics of the nucleic Acid base pairs: nonempirical ab initio calculations. Chem Rev. 1999; 99:3247–76. 10.1021/cr9800255. [DOI] [PubMed] [Google Scholar]
- 15. Civcir PÜ A theoretical study of tautomerism of cytosine, thymine, uracil and their 1-methyl analogues in the gas and aqueous phases using AM1 and PM3. J Mol Struct THEOCHEM. 2000; 532:157–69. 10.1016/S0166-1280(00)00556-X. [DOI] [Google Scholar]
- 16. Šponer J, Leszczynski J, Hobza P Electronic properties, hydrogen bonding, stacking, and cation binding of DNA and RNA bases. Biopolymers. 2001; 61:3–31.. [DOI] [PubMed] [Google Scholar]
- 17. Bevilacqua PC, Brown TS, Nakano S et al. Catalytic roles for proton transfer and protonation in ribozymes. Biopolymers. 2004; 73:90–109. 10.1002/bip.10519. [DOI] [PubMed] [Google Scholar]
- 18. Kersten C, Archambault P, Köhler LP Assessment of nucleobase protomeric and tautomeric states in nucleic acid structures for interaction analysis and structure-based ligand design. J Chem Inf Model. 2024; 64:4485–99. 10.1021/acs.jcim.4c00520. [DOI] [PubMed] [Google Scholar]
- 19. Gong B, Chen JH, Chase E et al. Direct measurement of a pK(a) near neutrality for the catalytic cytosine in the genomic HDV ribozyme using Raman crystallography. J Am Chem Soc. 2007; 129:13335–42. 10.1021/ja0743893. [DOI] [PubMed] [Google Scholar]
- 20. Nakano S, Bevilacqua PC Mechanistic characterization of the HDV genomic ribozyme: a mutant of the C41 motif provides insight into the positioning and thermodynamic linkage of metal ions and protons. Biochemistry. 2007; 46:3001–12. 10.1021/bi061732s. [DOI] [PubMed] [Google Scholar]
- 21. Wilcox JL, Bevilacqua PC A simple fluorescence method for pK(a) determination in RNA and DNA reveals highly shifted pK(a)’s. J Am Chem Soc. 2013; 135:7390–3. 10.1021/ja3125299. [DOI] [PubMed] [Google Scholar]
- 22. Wilcox JL, Bevilacqua PC pKa shifting in double-stranded RNA is highly dependent upon nearest neighbors and bulge positioning. Biochemistry. 2013; 52:7470–6. 10.1021/bi400768q. [DOI] [PubMed] [Google Scholar]
- 23. Wilson TJ, Liu Y, Domnick C et al. The novel chemical mechanism of the twister ribozyme. J Am Chem Soc. 2016; 138:6151–62. 10.1021/jacs.5b11791. [DOI] [PubMed] [Google Scholar]
- 24. Wolter AC, Weickhmann AK, Nasiri AH et al. A stably protonated adenine nucleotide with a highly shifted pKa value stabilizes the tertiary structure of a GTP-binding RNA aptamer. Angew Chem Int Ed. 2017; 56:401–4. 10.1002/anie.201609184. [DOI] [PubMed] [Google Scholar]
- 25. Bevilacqua PC Mechanistic considerations for general acid-base catalysis by RNA: revisiting the mechanism of the hairpin ribozyme. Biochemistry. 2003; 42:2259–65. 10.1021/bi027273m. [DOI] [PubMed] [Google Scholar]
- 26. Frankel EA, Bevilacqua PC Complexity in pH-dependent ribozyme kinetics: dark pK(a) shifts and wavy rate-pH profiles. Biochemistry. 2018; 57:483–8. 10.1021/acs.biochem.7b00784. [DOI] [PubMed] [Google Scholar]
- 27. Pinard R, Hampel KJ, Heckman JE et al. Functional involvement of G8 in the hairpin ribozyme cleavage mechanism. EMBO J. 2001; 20:6434–42. 10.1093/emboj/20.22.6434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ganguly A, Thaplyal P, Rosta E et al. Quantum mechanical/molecular mechanical free energy simulations of the self-cleavage reaction in the hepatitis delta virus ribozyme. J Am Chem Soc. 2014; 136:1483–96. 10.1021/ja4104217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Demeshkina N, Jenner L, Westhof E et al. A new understanding of the decoding principle on the ribosome. Nature. 2012; 484:256–9. 10.1038/nature10913. [DOI] [PubMed] [Google Scholar]
- 30. Sims PA, Larsen TM, Poyner RR et al. Reverse protonation is the key to general acid−base catalysis in enolase. Biochemistry. 2003; 42:8298–306. 10.1021/bi0346345. [DOI] [PubMed] [Google Scholar]
- 31. Knuckley B, Bhatia M, Thompson PR Protein arginine deiminase 4: evidence for a reverse protonation mechanism. Biochemistry. 2007; 46:6578–87. 10.1021/bi700095s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Westhof E, Watson ZL, Zirbel CL et al. Anionic G•U pairs in bacterial ribosomal rRNAs. RNA. 2023; 29:1069–76. 10.1261/rna.079583.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Berman HM, Westbrook J, Feng Z et al. The Protein Data Bank. Nucleic Acids Res. 2000; 28:235–42. 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Lu X-J, Bussemaker HJ, Olson WK DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 2015; 43:e142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Rice P, Longden I, Bleasby A EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000; 16:276–7. 10.1016/S0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
- 36. Cock PJA, Antao T, Chang JT et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009; 25:1422–3. 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Ramakers J, Blum CF, König S et al. De novo prediction of RNA 3D structures with deep generative models. PLoS One. 2024; 19:e0297105. 10.1371/journal.pone.0297105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Adams PD, Afonine PV, Bunkóczi G et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr. 2010; 66:213–21. 10.1107/S0907444909052925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Virtanen P, Gommers R, Oliphant TE et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020; 17:261–72. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Cavener DR Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucl Acids Res. 1987; 15:1353–61. 10.1093/nar/15.4.1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Leontis NB, Westhof E Geometric nomenclature and classification of RNA base pairs. RNA. 2001; 7:499–512. 10.1017/S1355838201002515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Martin M Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j. 2011; 17:3. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 43. Busan S, Weeks KM Accurate detection of chemical modifications in RNA by mutational profiling (MaP) with ShapeMapper 2. RNA. 2018; 24:143–8. 10.1261/rna.061945.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Polacek N, Mankin AS The ribosomal peptidyl transferase center: structure, function, evolution, inhibition. Crit Rev Biochem Mol Biol. 2005; 40:285–311. 10.1080/10409230500326334. [DOI] [PubMed] [Google Scholar]
- 45. Noeske J, Wasserman MR, Terry DS et al. High-resolution structure of the Escherichia coli ribosome. Nat Struct Mol Biol. 2015; 22:336–41. 10.1038/nsmb.2994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Ben-Shem A, Garreau de Loubresse N, Melnikov S et al. The structure of the eukaryotic ribosome at 3.0 Å resolution. Science. 2011; 334:1524–9. 10.1126/science.1212642. [DOI] [PubMed] [Google Scholar]
- 47. Holvec S, Barchet C, Lechner A et al. The structure of the human 80S ribosome at 1.9 Å resolution reveals the molecular role of chemical modifications and ions in RNA. Nat Struct Mol Biol. 2024; 31:1251–64. 10.1038/s41594-024-01274-x. [DOI] [PubMed] [Google Scholar]
- 48. Baker EN, Hubbard RE Hydrogen bonding in globular proteins. Prog Biophys Mol Biol. 1984; 44:97–179. 10.1016/0079-6107(84)90007-5. [DOI] [PubMed] [Google Scholar]
- 49. Tamura M, Holbrook SR Sequence and structural conservation in RNA ribose zippers. J Mol Biol. 2002; 320:455–74. 10.1016/S0022-2836(02)00515-6. [DOI] [PubMed] [Google Scholar]
- 50. Crowe-McAuliffe C, Murina V, Turnbull KJ et al. Structural basis of ABCF-mediated resistance to pleuromutilin, lincosamide, and streptogramin A antibiotics in gram-positive pathogens. Nat Commun. 2021; 12:3577. 10.1038/s41467-021-23753-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Bevilacqua PC, Assmann SM Technique development for probing RNA structure in vivo and genome-wide. Cold Spring Harb Perspect Biol. 2018; 10:a032250. 10.1101/cshperspect.a032250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Mitchell D 3rd, Assmann SM, Bevilacqua PC Probing RNA structure in vivo. Curr Opin Struct Biol. 2019; 59:151–8. 10.1016/j.sbi.2019.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Mitchell D III, Cotter J, Saleem I et al. Mutation signature filtering enables high-fidelity RNA structure probing at all four nucleobases with DMS. Nucleic Acids Res. 2023; 51:8744–57. 10.1093/nar/gkad522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Douds CA, Babitzke P, Bevilacqua PC A new reagent for in vivo structure probing of RNA G and U residues that improves RNA structure prediction alone and combined with DMS. RNA. 2024; 30:901–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Zubradt M, Gupta P, Persad S et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat Methods. 2017; 14:75–82. 10.1038/nmeth.4057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Cannone JJ, Subramanian S, Schnare MN et al. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinf. 2002; 3:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Zafferani M, Hargrove AE Small molecule targeting of biologically relevant RNA tertiary and quaternary structures. Cell Chem Biol. 2021; 28:594–609. 10.1016/j.chembiol.2021.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Childs-Disney JL, Yang X, Gibaut QMR et al. Targeting RNA structures with small molecules. Nat Rev Drug Discov. 2022; 21:736–62. 10.1038/s41573-022-00521-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Haniff HS, Tong Y, Liu X et al. Targeting the SARS-CoV-2 RNA genome with small molecule binders and ribonuclease targeting chimera (RiboTac) degraders. ACS Cent Sci. 2020; 6:1713–21. 10.1021/acscentsci.0c00984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Kimsey IJ, Petzold K, Sathyamoorthy B et al. Visualizing transient Watson-Crick-like mispairs in DNA and RNA duplexes. Nature. 2015; 519:315–20. 10.1038/nature14227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Szymanski ES, Kimsey IJ, Al-Hashimi HM Direct NMR evidence that transient tautomeric and anionic states in dG·dT form Watson–Crick-like base pairs. J Am Chem Soc. 2017; 139:4326–9. 10.1021/jacs.7b01156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Nicholson D, Salamina M, Panek J et al. Adaptation to genome decay in the structure of the smallest eukaryotic ribosome. Nat Commun. 2022; 13:591. 10.1038/s41467-022-28281-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Mereschkowsky C Über natur und ursprung der chromatophoren im Pflanzenreiche. Biologisches Centralblatt. 1905; 25/18:38–604. [Google Scholar]
- 64. Schwartz W Lynn Margulis, origin of eukaryotic cells. Evidence and research implications for a theory of the origin and evolution of microbial, plant, and animal cells on the Precambrian Earth. XXII u. 349 S., 89 abb., 49 tab. New Haven-London 1970: yale University Press $ 15.00. Z. Allg. Mikrobiol. 1973; 13:186. [Google Scholar]
- 65. Zimorski V, Ku C, Martin WF et al. Endosymbiotic theory for organelle origins. Curr Opin Microbiol. 2014; 22:38–48. 10.1016/j.mib.2014.09.008. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Structural analysis of shifted and standard wobbles can be found in Supplementary_file. All data files including custom search results (in csv format) from RCSB Protein Data Bank [https://www.rcsb.org/] [33], characterization output (in json format) from Dissecting the Spatial Structure of RNA (DSSR) [34], python scripts, jupyter notebooks, correlation coefficients between electron density maps and modeled structures (in txt and csv format) are available in figshare [https://figshare.com/s/928d1a2773ea32f89396]. The codes are also hosted on GitHub [https://github.com/The-Bevilacqua-Lab/identifying_and_analyzing_shifted_wobble].











