Abstract
Several variants of a nucleic acid binding motif (RRM1) of putative transcription factor hnRNP LL containing nucleobase amino acids at specific positions have been prepared and used to study binding affinity for the BCL2 i-motif DNA. Molecular modeling suggested a number of amino acids in RRM1 likely to be involved in interaction with the i-motif DNA, and His24 and Arg26 were chosen for modification based on their potential ability to interact with G14 of the i-motif DNA. Four nucleobase amino acids were introduced into RRM1 at one or both of positions 24 and 26. The introduction of cytosine nucleobase 2 into position 24 of RRM1 increased the affinity of the modified protein for the i-motif DNA, consistent with the possible Watson–Crick interaction of 2 and G14. In comparison, the introduction of uracil nucleobase 3 had a minimal effect on DNA affinity. Two structurally simplified nucleobase analogues (1 and 4) lacking both the N-1 and the 2-oxo substituents were also introduced in lieu of His24. Again, the RRM1 analogue containing 1 exhibited enhanced affinity for the i-motif DNA, while the protein analogue containing 4 bound less tightly to the DNA substrate. Finally, the modified protein containing 1 in lieu of Arg26 also bound to the i-motif DNA more strongly than the wild-type protein, but a protein containing 1 both at positions 24 and 26 bound to the DNA less strongly than wild type. The results support the idea of using nucleobase amino acids as protein constituents for controlling and enhancing DNA–protein interaction. Finally, modification of the i-motif DNA at G14 diminished RRM1–DNA interaction, as well as the ability of nucleobase amino acid 1 to stabilize RRM1–DNA interaction.
There are numerous proteins which play important roles in cellular functions by selective interaction with DNA or RNA. These proteins include those involved in transcriptional regulation, DNA replication, and repair of DNA damage. DNA interactive proteins include topoisomerases, gyrases, recombinases, transcription factors, and DNA polymerases. Proteins which recognize RNA selectively include aminoacyl-tRNA synthetases, the initiation, elongation, and termination factors which participate in protein translation, ribosomal proteins, RNA polymerases, and proteins involved in RNA splicing complexes. Many of these proteins take part in essential cellular functions, such that modifying their selective targeting could in principle modulate (aberrant) cellular function.
While a number of nucleic acid–protein binding motifs have been documented, including the helix-turn-helix1 and leucine zipper2 DNA binding motifs, the de novo design of new/modified protein motifs to enable selective targeting of predetermined DNA and RNA structures is still quite challenging. As all nucleic acid structures are assembled using only a small number of nucleobases, and only a limited number of base pairing/association schemes are operative, a novel strategy for nucleic acid recognition might involve the use of nucleobases as amino acid side chains, enabling nucleic acid recognition by familiar and well understood interactions such as base pairing and stacking.
Although there is as yet no report of any protein which recognizes a nucleic acid target selectively by employing a nucleobase incorporated within the protein, the basic concept is implicit within a number of related studies involving peptides. Of particular note are the efforts of the Mihara laboratory in preparing a series of peptides containing L-amino-γ-nucleobase butyric acids for the purpose of RNA binding.3 For the synthesis of larger protein mimetics, peptides containing a nucleobase amino acid were synthesized chemically and condensed by means of native chemical ligation.3b A number of interesting results were obtained in this study. For example, when three nucleobase acid residues were included within arginine-rich α-helical peptides targeted to a hairpin RNA, the affinities varied with the nature of the nucleobases present.3a The introduction of nucleobase acids within the dimerization domain of HIV-1 protease enhanced affinity, but lowered the catalytic activity to some extent.3b
The nucleobase amino acids employed for this study are of a type first described by the Diederichsen laboratory4 for the preparation of nucleobase functionalized peptide nucleic acids4a and β-peptides.4b These contain a single methylene group between the nucleobase and α-carbon atom of the derived amino acid, i.e. the same spacing as in the four proteinogenic aromatic amino acids (histidine, phenylalanine, tyrosine, and tryptophan). We have recently described5 the synthesis of the amino acids corresponding to the five canonical nucleobases in RNA and DNA (uracil, thymine, cytosine, adenine, and guanine), two of which (2 and 3, Figure 1) are employed in the present study. While these amino acids were previously incorporated into two proteins, the potentially altered functions of the derived proteins were not explored.5
Figure 1.
Nucleobase amino acids (1–4), and amino acids (5–7), prepared and used to activate a suppressor tRNACUA and incorporated into RRM1 via in vitro ribosomal synthesis.
Presently, we prepared several constructs of RRM1, one of the nucleic acid recognition motifs of transcription factor hnRNP LL which has recently been shown to bind to the lateral loops of the BCL2 i-motif DNA and to unfold the DNA.6 We identified His24 of RRM1 as a potential binding partner for G14 of the i-motif DNA and report that the modified RRM1 containing cytosine nucleobase 2 in lieu of His24 binds to the i-motif DNA more strongly than wild-type RRM1, while the introduction of uracil nucleobase 3 produces no increase in binding. We also report the syntheses of two new pyrimidine-like nucleobases (1 and 4, Figure 1) which lack both the 2-oxo and N-1 substituents of pyrimidines. In spite of its structural differences from 2, nucleobase 1 seemed to function analogously to 2 in its DNA binding properties when introduced into position 24 of RRM1. As anticipated, nucleobase 4 diminished RRM1–i-motif DNA interaction when introduced into position 24. Also included to facilitate interpretation of the results obtained with the nucleobase amino acids were amino acids p-NH2Phe (5), tyrosine (6), and phenylalanine (7), all of which have been incorporated into proteins previously by nonsense codon suppression.7
The chemical synthesis of derivatives of 1 began from commercially available 5-methyl-2-aminopyridine (8) (Scheme 1).8a The amino group was oxidized to afford the respective 2-nitropyridine (9) in 60% yield, the latter of which was treated with N-bromosuccinimide8b in CCl4 to provide benzylic bromide 10. Regioselective lithiation of the Schöllkopf chiral auxiliary with n-BuLi,8c followed by admixture with 10, gave 11 which was isolated as a single diastereomer.8d Mild hydrolysis using 2 N HCl then afforded the α-substituted amino acid methyl ester 12 in 84% yield. Following hydrogenation of the nitro group over Pd/C, and NVOC protection of amino acid 13 as the activated cyanomethyl ester 14, treatment with the dinucleotide pdCpA9 provided NVOC-1 as its pdCpA ester (15) in 54% yield. The amino-acylated dinucleotide was then ligated to an abbreviated suppressor tRNA transcript (tRNACUA-COH) using T4 RNA ligase, and the NVOC protecting group was removed by photolysis prior to its use for the synthesis of analogues of RRM1. Closely analogous transformations employing synthetic intermediate 12 were used to prepare a suppressor tRNA activated with 4 (SI, Scheme S1).
Scheme 1.
Synthesis of the pdCpA and tRNACUA Esters of Nucleobase Amino Acid 1
As a model system for studying the interaction of nucleobase-containing proteins with DNA, we chose the RRM1 from hnRNP LL, which has been expressed in a cell-free system and shown to bind to the lateral loops of the i-motif DNA found in the promoter region of BCL2.6 Based on an analysis of the likely molecular mode of hnRNP LL–DNA interaction using I-TASSER,10 DP-bind,11 and NCBI CDD12 as well as a knowledge of the important RNA binding roles of the invariant residues His105 and Arg107 in the closely related protein hnRNP L,13,14 we chose the analogous residues (His24 and Arg26) in RRM1 for modification. The structures of the i-motif DNA and RRM1 are shown in Figure 2.
Figure 2.
(A) Structure of the BCL2 i-motif DNA.6 The cytidine rich i-motif stem is shown in yellow with two of the lateral loops and the central loop. (B) Human hnRNP LL RRM1 domain, including His24, generated by I-Tasser software using the structure of Mus musculus RRM domain of BAB28521 protein (PDB 1WEX) as a template.15
Three RRM1 constructs were made, enabling the introduction of different nucleobase amino acids in lieu of His24 or Arg26, or at both of these positions. The yields of modified RRM1s obtained using a suppressor tRNA activated with 1, 2, or 3 were studied (Figure S2A). The suppression yield of modified RRM1 containing 1 at position 24 was ∼3-fold greater than that for 2 or 3 (71% vs 23 and 24%). Also studied were the efficiencies of incorporation of 1 into positions 24 (55%) and 26 (54%), and into both positions (6%) (Figure S2B). While the yield of the modified RRM1 containing two amino acid 1 residues was lower as anticipated, it was still sufficient to provide material enabling biochemical studies, a critical finding since the implementation of a strategy to use nucleobase side chains in proteins to achieve selective nucleic acid recognition will clearly require the use of two or more closely spaced nucleobase amino acids. Also prepared to aid in interpretation of i-motif DNA binding studies were RRM1s containing nucleobase amino acid 4,16 p-NH2Phe (5), Tyr (6), or Phe (7) at position 24.
Four RRM1 samples prepared by in vitro protein synthesis (wild-type and three modified proteins containing 1) were purified by Strep-Tactin chromatography and analyzed by denaturing SDS-polyacrylamide gel electrophoresis (Figure S4A), verifying the electrophoretic homogeneity of the samples. RRM1 containing 1 at position 24 was analyzed by MALDI-MS/MS (Figure S5), confirming the presence and position of 1.
Binding of the BCL2 i-motif DNA to each of the RRM1s was studied using an electrophoretic mobility shift assay (EMSA).17,18 The results are shown in Table 1 and SI, Figure S8A. The RRM1s containing amino acid 1 at position 24 (BC50 53 ± 12 nM) or 26 (BC50 63 ± 11 nM) exhibited increased affinity for the BCL2 i-motif DNA relative to wild-type RRM1 (BC50 96 ± 13 nM). Thus, replacement of either His24 or Arg26 by 1 resulted in energetically improved contacts with the target DNA, relative to the native amino acids. The selective nature of the interactions with i-motif DNA mediated by 1 at position 24 or 26 can be appreciated from the observation that the simultaneous introduction of 1 into both positions substantially diminished RRM1–DNA affinity (BC50 130 ± 28 nM).
Table 1.
Quantification of the Binding of Modified RRM1s Containing 1 to BCL2 i-Motif DNA
Protein | BC50 value (nM) |
---|---|
RRM1 His24 Arg26 (wt) | 96 ± 13 |
RRM1 1-24 Arg26 | 53 ± 12 |
RRM1 His24 1-26 | 63 ± 11 |
RRM1 1-24 1-26 | 130 ± 28 |
Analysis of two other mutant proteins RRM1 2-24 and RRM1 3-24, having cytosine and uracil-based amino acids in position 24, prepared analogously (Figure S4B), provides mechanistic insights (Table 2, SI, Figure S8B). The presence of cytosine nucleobase 2 at RRM1 position 24 also increased DNA affinity relative to wild type (BC50 54 ± 11 nM), providing affinity similar to that of RRM1 containing 1 at position 24, while RRM1 having uracil nucleobase 3 at that position had an affinity comparable to that of wild-type RRM1 (BC50 82 ± 16 vs 80 ± 10 nM). While not definitive, the foregoing observations are consistent with the thesis that H-bonding interactions between the nucleobase moieties of amino acids 1–3 and the nucleobase of G14 play a key role in enhanced DNA binding of the modified RRM1s. While the enhanced binding of RRM1 containing 1 would be consistent with the formation of Watson–Crick-like H-bonds between 1 and G14, one might have expected a diminution of RRM1–DNA binding for the RRM1 containing 3. Plausibly, the RRM1 containing 3 may bind to G14 via a wobble-type interaction (SI, Figure S9).19 Less surprising was the finding that an RRM1 having nucleobase amino acid 4 at position 24 exhibited greatly diminished DNA affinity (BC50 110 ± 10 nM) (Table 2). This seems consistent with what might have been expected from the introduction of a nitro substituent.
Table 2.
Quantification of the Binding of Modified RRM1s Containing 2–7 to BCL2 i-Motif DNA
Protein | BC50 value (nM) |
---|---|
RRM1 His24 Arg26 (wt) | 80 ± 10 |
RRM1 2-24 Arg26 | 54 ± 11 |
RRM1 3-24 Arg26 | 82 ± 16 |
RRM1 4-24 Arg26 | 110 ± 10 |
RRM1 5-24 Arg26 | 92 ± 9 |
RRM1 6-24 Arg26 | 141 ± 5 |
RRM1 7-24 Arg26 | 165 ± 4 |
Also studied were RRM1 analogues having p-NH2Phe (5), Tyr (6), or Phe (7) at position 24 (Table 2 and SI, Figure 8C). RRM1 containing 5 had a BC50 value of 92 ± 9 nM. This loss of affinity relative to the RRM1 containing 1 at position 24 (BC50 53 ± 12 nM) argues that the ring N atom is an H-bond acceptor and suggests that enhanced binding in this system is not due to base stacking. The lesser affinity of RRM1 6-24 Arg (BC50 141 ± 5 nM) reinforces the role of the exocyclic N atom in 1 as a source of enhanced affinity, logically through H-bonding. Introduction of Phe into RRM1 7-24 Arg26 further reduced DNA binding.
We also studied the effect of altering the i-motif DNA binding partner for RRM1. The binding interaction involves the lateral loops of the i-motif, where G14 is located.6 Replacement of G14 with A14 diminished RRM1 affinity for the i-motif (Table 3).
Table 3.
Quantification of the Binding of Modified RRM1s Containing 1 to Wild-Type (G14) and Modified (A14) BCL2 i-Motif DNAs
BC50 value (nM) | ||
---|---|---|
Protein | G14 i-motif | A14 i-motif |
RRM1 His24 Arg26 (wt) | 84 ± 3 | 99 ± 8 |
RRM1 1-24 Arg26 | 47 ± 5 | 81 ± 9 |
RRM1 His24 1-26 | 55 ± 10 | 78 ± 11 |
Interestingly, while the introduction of 1 at position 24 or 26 resulted in stronger binding of RRM1 to the modified i-motif, the improvement in affinity was much less pronounced than that for the native i-motif DNA sequence (Table 3, SI, Figure S10).
We have used computational docking to attempt to understand the molecular basis for the observation that introduction of nucleobase 1 into position 24 or 26 of RRM1 enhanced DNA binding but the presence of 1 in both positions significantly diminished binding. Figure 3A shows one of the most energetically favorable docked structures. The Arg26 guanidine moiety is within H-bonding distance to O6 and N2 of G14 of the i-motif DNA (as well as O2 of C-13), while the His24 imidazole N is more than 7 Å from all DNA bases (and ∼8 Å from N2 of G14). Another very favorable docked structure is shown in Figure 3B. The His24 imidazole N is ∼4 Å from O6 of G14, while Arg26 has been displaced close to the DNA backbone. Thus, in these energetically favorable models, either His24 or Arg26 (but not necessarily both) can be H-bonded to G14. This is consistent with the experimental observations noted in Table 1.
Figure 3.
Computationally docked models showing the binding of RRM1 to the i-motif DNA via His24 and Arg26 interaction. The modeling was done using SWISS-MODEL20 homology modeling (A), and HADDOCK21 web server was used for data-driven docking (B). A scoring algorithm used factors such as backbone flexibility, surface residues, and specific amino acids to score the docked structures. The structure shown is a high scoring model illustrating the potential distance between His24 and G14.
It seems logical to anticipate that optimizing selective protein–nucleic acid interactions will require the use of multiple nucleobase amino acids in a single protein. In this regard a few observations seem worthy of note. First, because modified nucleobase amino acids such as 1 and 416 appear to be incorporated with greater efficiency than “canonical” nucleobase amino acids such as 2 and 3, once fully optimized they should facilitate the introduction of multiple nucleobase amino acids into a protein. Second, our recent development of modified ribosomes capable of incorporating dipeptides into proteins22 may enable the introduction of two contiguous nucleobase amino acids in a single ribosomal event. The finding (Table 1) that incorporation of amino acid 1 into both positions 24 and 26 of RRM1 substantially reduced the affinity of this protein for the i-motif (BC50 130 ± 28 vs 96 ± 13 nM), as compared with the increases in affinity observed following either of the same substitutions made singly, indicates that developing nucleic acid binding proteins that function with favorable affinity and selectivity will demand creative new strategies, and that the full repertoire of interactions noted for nucleic acid interactions, including stacking, van der Waals, and electrostatic interactions, may find utility in addition to the H-bonding putatively involved in the current study. Finally, while the increases in protein affinity observed in this study were not large, they may be sufficient in many cases to alter the specificity for their nucleic acid target. The introduction of multiple nucleobase amino acids into a protein should also have that effect. Presumably, increasing affinity would at some point negatively impact biological function.
Supplementary Material
Acknowledgments
This study was supported by Grant GM103861 from the National Institute of General Medical Sciences, National Institutes of Health. Z. L. thanks the NFSC-joint fund for talent cultivation in Henan Province for a fellowship (U1404206).
Footnotes
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/jacs.6b11825.
Experimental procedures for the synthesis and characterization of the pdCpA derivatives of 1 and 4, for the synthesis and evaluation of modified RRM1s, and the EMSA data for protein–DNA binding (PDF)
ORCID
Basab Roy: 0000-0001-9117-1769
Sidney M. Hecht: 0000-0002-5429-2462
Notes
The authors declare no competing financial interest.
References
- 1.Brennan RG, Matthews BW. J Biol Chem. 1989;264:1903. [PubMed] [Google Scholar]
- 2.(a) Vinson CR, Sigler PB, McKnight SL. Science. 1989;246:911. doi: 10.1126/science.2683088. [DOI] [PubMed] [Google Scholar]; (b) Pavletich NP, Pabo CO. Science. 1991;252:809. doi: 10.1126/science.2028256. [DOI] [PubMed] [Google Scholar]
- 3.(a) Miyanishi H, Takahashi T, Mihara H. Bioconjugate Chem. 2004;15:694. doi: 10.1021/bc034210n. [DOI] [PubMed] [Google Scholar]; (b) Takahashi T, Yana D, Mihara H. ChemBioChem. 2006;7:729. doi: 10.1002/cbic.200500422. [DOI] [PubMed] [Google Scholar]; (c) Watanabe S, Tomizaki K, Takahashi T, Usui K, Kajikawa K, Mihara H. Peptide Sci. 2007;88:131. doi: 10.1002/bip.20662. [DOI] [PubMed] [Google Scholar]
- 4.(a) Diederichsen U. Angew Chem, Int Ed Engl. 1996;35:445. [Google Scholar]; (b) Chakraborty P, Diederichsen U. Chem - Eur J. 2005;11:3207. doi: 10.1002/chem.200500004. [DOI] [PubMed] [Google Scholar]
- 5.Talukder P, Dedkova LM, Ellington AD, Yakovchuk P, Lim J, Anslyn EV, Hecht SM. Bioorg Med Chem. 2016;24:4177. doi: 10.1016/j.bmc.2016.07.008. For the purposes of this study, we define nucleobase amino acids as species having at least one N atom within the aromatic substituent. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.(a) Kang HJ, Kendrick S, Hecht SM, Hurley LH. J Am Chem Soc. 2014;136:4172. doi: 10.1021/ja4109352. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Talukder P, Chen S, Roy B, Yakovchuk P, Spiering MM, Alam MP, Madathil MM, Bhattacharya C, Benkovic SJ, Hecht SM. Biochemistry. 2015;54:7457. doi: 10.1021/acs.biochem.5b01085. [DOI] [PubMed] [Google Scholar]; (c) Roy B, Talukder P, Kang HJ, Tsuen SS, Alam MP, Hurley LH, Hecht SM. J Am Chem Soc. 2016;138:10950. doi: 10.1021/jacs.6b05036. [DOI] [PubMed] [Google Scholar]
- 7.Gao R, Zhang Y, Dedkova L, Choudhury AK, Rahier NJ, Hecht SM. Biochemistry. 2006;45:8402. doi: 10.1021/bi0605179. [DOI] [PubMed] [Google Scholar]
- 8.(a) Zhang F, Zhang S, Duan XF. Org Lett. 2012;14:5618. doi: 10.1021/ol3026632. [DOI] [PubMed] [Google Scholar]; (b) Jones PA, Wilson I, Morisson-IveSon V, Jones C, Woodcraft J, Jackson A, Wynn D. 106564 A2. International Patent WO. 2009; (c) Schöllkopf U, Groth U, Westphalen KO, Deng CZ. Synthesis. 1981;1981:969. [Google Scholar]; (d) Talukder P, Chen SX, Arce PM, Hecht SM. Org Lett. 2014;16:556. doi: 10.1021/ol403429e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Robertson SA, Noren CJ, Anthony-Cahill SJ, Griffith MC, Schultz PG. Nucleic Acids Res. 1989;17:9649. doi: 10.1093/nar/17.23.9649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Roy A, Kucukural A, Zhang Y. Nat Protoc. 2010;5:725. doi: 10.1038/nprot.2010.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hwang S, Gou Z, Kuznetsov IB. Bioinformatics. 2007;23:634. doi: 10.1093/bioinformatics/btl672. [DOI] [PubMed] [Google Scholar]
- 12.Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH. Nucleic Acids Res. 2015;43:D222. doi: 10.1093/nar/gku1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang W, Zeng F, Liu Y, Zhao Y, Lv H, Niu L, Teng M, Li X. J Biol Chem. 2013;288:22636. doi: 10.1074/jbc.M113.463901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.These two residues correspond to His 74 and Arg 76 in hnRNP LL. A list of potential DNA interactive residues in RRM1 is summarized in the Supporting Information (SI) Figure S1.
- 15.Zhang Y. BMC Bioinf. 2008;9:40. doi: 10.1186/1471-2105-9-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.The incorporation of nucleobase amino acid 4 into positions 24 and 26 was carried out in a separate experiment; the suppression yields were 61% and 48%, respectively (SI, Figure S3).
- 17.Kendrick S, Akiyama Y, Hecht SM, Hurley LH. J Am Chem Soc. 2009;131:17667. doi: 10.1021/ja9076292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.The assay was optimized using wild-type RRM1 expressed in E. coli (SI, Figures S6 and S7). Wild-type RRM1 prepared in vitro had an affinity for the i-motif DNA quite similar to the sample isolated from E. coli (SI, Figure S7).
- 19.For another possible reason for the behavior of RRM1 3-24 Arg26, see:; Hildbrand S, Blaser A, Parel SP, Leumann CJ. J Am Chem Soc. 1997;119:5499. [Google Scholar]
- 20.Arnold K, Bordoli L, Kopp J, Schwede T. Bioinformatics. 2006;22:195. doi: 10.1093/bioinformatics/bti770. [DOI] [PubMed] [Google Scholar]
- 21.Dominguez C, Boelens R, Bonvin AM. J Am Chem Soc. 2003;125:1731. doi: 10.1021/ja026939x. [DOI] [PubMed] [Google Scholar]
- 22.Maini R, Dedkova LM, Paul R, Madathil MM, Chowdhury SR, Chen S, Hecht SM. J Am Chem Soc. 2015;137:11206. doi: 10.1021/jacs.5b03135. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.