Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Sep 15.
Published in final edited form as: Angew Chem Int Ed Engl. 2014 Jul 27;53(38):10124–10128. doi: 10.1002/anie.201405497

Mapping Polyamide–DNA Interactions in Human Cells Reveals a New Design Strategy for Effective Targeting of Genomic Sites**

Graham S Erwin, Devesh B Bhimsaria, Asuka Eguchi, Aseem Z Ansari *
PMCID: PMC4160732  NIHMSID: NIHMS621856  PMID: 25066383

Abstract

Targeting the genome with sequence-specific synthetic molecules is a major goal at the interface of chemistry, biology, and personalized medicine. Pyrrole/imidazole based polyamides can be rationally designed to target specific DNA sequences with exquisite precision in vitro; yet, the biological outcomes are often difficult to interpret using current models of binding energetics. To directly identify the binding sites of polyamides across the genome, we designed, synthesized, and tested polyamide derivatives that enabled covalent crosslinking and localization of polyamide–DNA interaction sites in live human cells. Bioinformatic analysis of the data reveals that clustered binding sites, spanning a broad range of affinities, best predict occupancy in cells. In contrast to the prevailing paradigm of targeting single high-affinity sites, our results point to a new design principle to deploy polyamides and perhaps other synthetic molecules to effectively target desired genomic sites in vivo.

Keywords: genome targeting, molecular recognition, DNA, polyamides, COSMIC


A major goal at the interface of chemistry, biology, and personalized medicine is to design small molecules that selectively target the genome to perturb, and rectify, malfunctioning gene regulatory networks. The greatest success in designing molecules with programmable DNA-binding specificity has been with polyamides.[1] Polyamides containing N-methylpyrrole and N-methylimidazole can be rationally designed to bind to specific sequences in the minor groove of DNA.[2] Polyamides can bind to specific sequences with nanomolar affinity,[3] and unlike most protein-based DNA-binding domains, they retain their affinity and specificity when binding to methylated[4] and chromatinized[5] DNA. Polyamides can also efficiently target viral DNA for degradation,[6] and they can traverse the cell membrane and traffic to the nucleus to modulate gene expression.[7]

To comprehensively examine the specificity of DNA-binding molecules, we previously developed a high-throughput platform to monitor polyamide binding to every possible sequence variant, up to 12 base pairs (bp) in length.[3, 8] The cognate site identifier (CSI) method was used to determine the specificity and affinity of several hairpin and linear polyamides. Because CSI binding intensity is directly proportional to the association constant (Ka) for a given DNA sequence,[3, 8a] the intensity data can be used to assign binding probabilities to every sequence across the genome. These genome-wide binding maps, called genomescapes,[3a] predict thousands of polyamide binding sites of maximal and varying affinity, yet only a small subset of these binding sites perturb gene expression.[7a, 7b, 9] Moreover, analysis of differential gene expression after polyamide treatment of live cells reveals that the degree of transcriptional perturbation varies considerably from gene to gene.[7a, 7b, 9] For example, polyamide 1, which was designed to target the hypoxia-responsive element (HRE), competes with endogenous transcription factors for binding to HRE and thereby reduces the expression of a target gene, VEGF. Yet a different gene, ET-2, with an HRE of a similar predicted binding energy, was downregulated 10-fold more than VEGF.[7a] The basis for such variable impact on transcription of specific genes when targeting energetically similar sites remains poorly understood.

The size and packaging of the genome in the cell nucleus could impact both the specificity and the accessibility of binding sites. In chromatin, DNA is packaged into nucleosomes with 146 bp of DNA wrapped around a histone octamer. The nucleosome can occlude binding sites and interfere with binding of natural factors or synthetic molecules.[10] Genomic DNA is further compacted in the nucleus, consistent with a new report that the in vivo chromatin landscape determines the accessibility of a polyamide for its target sites in live cells.[11]

To investigate how genomic architecture and the local chromatin landscape influence polyamide occupancy in live cells, we mapped polyamide binding across the human genome. In stark contrast to the current paradigm of single high affinity site targeting, we find that the occurrence of multiple clustered binding sites, even sub-optimal sites that display low to moderate affinity, correlates best with polyamide occupancy in live human cells. These data suggest a new design principle for in vivo genome-targeting by polyamides and perhaps other DNA-binding small molecules and therapeutic agents.

Several methods have recently been developed to study interactions between small molecules and nucleic acid targets (DNA or RNA) in a cellular environment.[12] To study polyamide binding in the genome, we devised an approach that we term the crosslinking of small molecules for isolation of chromatin (COSMIC). We designed and synthesized trifunctional derivatives of bioactive polyamides (3 and 6, Figure 1). These compounds consist of a DNA-targeting polyamide, a photocrosslinker (psoralen), and an affinity handle (biotin). Polyamides were synthesized by Boc solid-phase synthesis,[13] cleaved from resin, and conjugated to the psoralen-biotin moiety PB (active ester) to yield 3 and 6, respectively (Figure 1). 3 and 6 were purified by reverse-phase HPLC and characterized for purity and identity by analytical HPLC and MALDI-TOF mass spectrometry (Supporting Information, Figure S1).

Figure 1.

Figure 1

Trifunctional polyamides. a) Molecules employed in this study. Hairpin polyamides 1-3 target the DNA sequence 5′-WACGTW-3′. Linear polyamides 4-6 target 5′-AAGAAGAAG-3′. Rings of N-methylimidazole are bolded for clarity. Open and filled circles represent N-methylpyrrole and N-methylimidazole, respectively. Square represents 3-chlorothiophene, and diamond represents β-alanine. Psoralen and biotin are denoted by P and B, respectively. b) Model of 3 crosslinking to DNA. The polyamide binds to the minor groove of DNA and psoralen intercalates between two base pairs. Upon irradiation with 365-nm light, psoralen crosslinks to thymine bases on opposite strands of DNA.

We focused on two widely-used polyamide foldamers, hairpin and linear polyamides (3 and 6, Figure 1), that are known to modulate transcription in live human cells.[7a, 9] First, we tested whether psoralen-mediated crosslinking was driven by the polyamide. We reasoned that the high affinity of polyamides for specific DNA sequences (Ka 109 M-1) would deliver the psoralen to a specific sequence of DNA, whereas the lower affinity of psoralen (Ka 103-104 M-1) would not detectably impact polyamide distribution at nanomolar concentrations.[14] To test this hypothesis, in vitro crosslinking experiments were performed with a 22-bp double-stranded DNA (dsDNA) that contains the cognate site of 3. Hairpin 3 was incubated at a concentration of 40 nM with dsDNA for 1 h at 4 °C and then irradiated at 365 nm. The small molecule_DNA crosslinks were resolved by polyacrylamide gel electrophoresis under conditions that denature duplex DNA (7.5 M urea). We find that crosslinking to DNA requires both the polyamide and psoralen groups, as well as 365-nm UV irradiation (Figure 2a). As expected, crosslinking is greatly reduced in 1- and 2-bp mismatches compared to match DNA, consistent with previous reports of polyamides with tethered crosslinkers (Figure S2a,b).[7b, 12c, 12f] Importantly, these crosslinks can be reversed with hot alkali to enable downstream analysis of the captured DNA (Figure S2).

Figure 2.

Figure 2

In vitro crosslinking of 3 to DNA. 400 nM 3 was incubated with 32P-labeled DNA containing the match sequence for 3 and a TpA site for psoralen crosslinking, 5′-ATTTACGTGTA-3′. DNA was resolved by gel electrophoresis under denaturing conditions.

We then tested whether polyamides can maintain their sequence specificity in the biochemically active, chromatinized environment of nuclei. Multiple criteria were used to select six genomic loci that would provide insight into the targeting potential of polyamides (see Supporting Information). These loci include regions near Pol II-transcribed genes, loci with and without DNase I hypersensitivity (DNase HS), and loci with transcriptionally permissive chromatin landscapes.

COSMIC followed by quantitative polymerase chain reaction (COSMIC-qPCR) was used to determine the extent of polyamide binding to these sites. Briefly, 40 nM 3 or 6 was incubated with nuclei from human HEK293 cells at 4 °C for 1 h and then crosslinked to DNA with 365-nm UV irradiation. DNA was sheared by sonication, and crosslinked polyamide–DNA complexes were captured with streptavidin-coated magnetic beads. After highly stringent washes under semi-denaturing conditions (4 M urea) to remove non-covalently associated DNA, psoralen crosslinks were reversed, and DNA was purified. Quantitative PCR (qPCR) was used to determine the signal of polyamide binding as a fraction of the affinity-purified (AP) DNA over a reference sample of DNA, called input DNA. It is possible that the PB moiety could be playing a role in the polyamide occupancy, but we do not detect that in our measurements. We applied COSMIC to examine the occupancy of 3, the hairpin polyamide designed to target HREs, at both VEGF and ET-2 enhancers. We find that 3 targets the HRE of ET-2 nearly twice as well as the HRE of VEGF, consistent with the greater magnitude of inhibition for ET-2 (Figure S3b).[7a]

To understand the differential occupancy of 3 and 6 at sites with similar binding affinities, we next used genomescapes to bioinformatically score genomic loci based on the number of binding sites of varying affinities that are proximal to the targeted site (Figure 3, Equation (1)). Many bioinformatic methods annotate DNA-binding sites on the genome with only the highest-affinity consensus sites.[15] Yet genome-wide analyses of sites occupied by DNA-binding proteins in cooperative complexes strongly suggest that clusters of moderate-to-low binding sites should also be considered in modeling genomic occupancy profiles.[3a, 16]

Figure 3.

Figure 3

Polyamide binding in a chromatinized environment. a) The process to create CSI genomescapes. Each feature on the DNA microarray displays a unique sequence as a DNA hairpin, with all sequence variants of DNA represented on the array (∼1 million sequences). Polyamides are added to the microarray to simultaneously obtain intensity values for every DNA sequence. Genomescapes are generated by assigning an intensity level to every 12 bp sequence of the locus. b) Genomescapes of three of the six loci studied by COSMIC-qPCR (the remaining three loci are in Figure S3c). Each locus was bioinformatically scored for predicted binding by 3 and 6 (450 bp window shown). We also analyzed the underlying chromatin structure from ENCODE data. P score, predicted score. c) Scatterplot of predicted score and observed signal from nuclei of HEK293 cells treated with 40 nM 3 or 6 (Fraction of AP/input). AP, affinity-purified. Results are mean ± s.e.m.

We therefore summed all of the in vitro binding intensities (Z-score) within a genomic region of interest (Equation (1))

i=xyl+1ZPA{seq(i:i+l1)} (1)

where x is the start of the locus (seq), y is the end of the locus, ZPA{seq(i:i + l − 1)} is the Z-score of the given polyamide for the window i to i + l − 1, and l is the length of the CSI oligo. Different loci were therefore scored by summing binding sites with CSI-derived binding energies within a window.[3a] We examined different window sizes from 10 to 2000 bp and found that 420 ± 20 bp correlated best with the COSMIC-based occupancy measurements in nuclei (Figure S3d). Our bioinformatically predicted binding scores, derived from CSI-genomescapes, are directly proportional to the observed polyamide occupancy in nuclei (Figure 3c).

To further examine the specificity of polyamides, we performed COSMIC analysis at an additional concentration, 400 nM. We observed binding profiles that were similar to those obtained at 40 nM (Figure S3e). Thus, the sequence specificity of polyamides observed in vitro is preserved at the genomic level.

We next examined whether our studies with nuclei recapitulated the effects of polyamides in live cells in culture. 6 was selected for further study because linear polyamide architectures display less specificity compared to hairpin polyamides,[3] therefore 6 represents a stringent test of our bioinformatic predictions in live cells. Cellular morphology did not change after treatment with 400 nM 6, consistent with the low toxicity of polyamides (Figure 4a). 6 was incubated 16 h with live cells and then crosslinked to DNA, and COSMIC-qPCR was performed at the same six loci studied above. The data from live cells treated with 6 are consistent with the results found in nuclei (Figure 4b). Moreover, genomescapes and cumulative scoring over a defined window correlated well with occupancy across diverse loci in live cells. Based on our findings, we scored the entire human genome in 420 bp windows (Figures 4c and S5). Due to the limitations of histograms,[17] we displayed the distribution of scores as a violin plot. The violin plot provides a density trace to reveal patterns in the dataset. This plot shows that different genomic loci with similar predicted binding scores exhibit the diverse clustering of multiple sites of varying affinities.

Figure 4.

Figure 4

COSMIC-qPCR from live HEK293 cells treated with 6. a) HEK293 cells before and after treatment with 400 nM 6. No changes in cellular morphology were observed. b) Comparison of predicted binding signal with empirically-determined signal by COSMIC-qPCR (Fraction of AP/Input). Results are mean ± s.e.m. c) Frequency distribution of predicted scores of 6 binding across the entire genome shown as a violin plot. For reference, loci studied in this work are marked as circles on the violin plot. d) Loci with multiple low- and medium-affinity sequences show similar polyamide occupancy to loci with few high-affinity sequences.

The general strategy of targeting unique or specific high affinity binding sites has been successfully used to perturb binding of a variety of DNA-binding proteins in cells.[7a, 7g, 7h, 9] However, computational analysis of our COSMIC data at several different genomic loci reveals that polyamide occupancy in cells is strongly correlated with multiple clustered binding sites of varying affinities. In particular, it was surprising that the level of crosslinking at such “multi-site” loci with low- and medium-affinity sequences exceeded that observed at high affinity “single site” loci. A ∼400 base pair genomic locus with multiple sites of varying affinity best predict levels of polyamide occupancy in cells. This observation challenges the current paradigm that guides the design of genome-targeting molecules. Such a multi-site targeting strategy would be especially effective in targeting transcription factor binding sites within enhancers, promoter elements where the transcriptional machinery assembles, and the newly-discovered regulatory regions called super-enhancers that span more than a thousand base pairs.[18]

Some of the most successful therapeutic agents are molecules that bind to DNA and interfere with an array of genomic transactions.[19] We propose that COSMIC can be used to design new therapeutic strategies with the combinatorial use of DNA-targeting ligands.

Supplementary Material

Supporting Information

Footnotes

**

We thank Professor Peter Dervan, Dr. José Rodriguez-Martinez, Professor Parmesh Ramanathan and members of the Ansari lab for helpful discussions. This work was supported by NIH grants CA133508 and HL099773, the H.I. Romnes faculty fellowship and the W. M. Keck Medical Research Award to A.Z.A. G.S.E. was supported by Molecular Biosciences Training Grant NIH T32 GM07215. A. E. was supported by the Morgridge Graduate Fellowship and D. B. was supported by the NSEC grant from NSF.

Supporting information for this article is available on the WWW under http://dx.doi.org/10.1002/anie.201xxxxxx.

References

  • 1.a) Wemmer DE, Dervan PB. Curr Opin Struct Biol. 1997;7:355–361. doi: 10.1016/s0959-440x(97)80051-6. [DOI] [PubMed] [Google Scholar]; b) Lown JW, Krowicki K, Bhat UG, Skorobogaty A, Ward B, Dabrowiak JC. Biochemistry. 1986;25:7408–7416. doi: 10.1021/bi00371a024. [DOI] [PubMed] [Google Scholar]
  • 2.a) Dervan PB. Bioorg Med Chem. 2001;9:2215–2235. doi: 10.1016/s0968-0896(01)00262-0. [DOI] [PubMed] [Google Scholar]; b) Pelton JG, Wemmer DE. Proc Natl Acad Sci U S A. 1989;86:5723–5727. doi: 10.1073/pnas.86.15.5723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.a) Carlson CD, Warren CL, Hauschild KE, Ozers MS, Qadir N, Bhimsaria D, Lee Y, Cerrina F, Ansari AZ. Proc Natl Acad Sci U S A. 2010;107:4544–4549. doi: 10.1073/pnas.0914023107. [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Puckett JW, Muzikar KA, Tietjen J, Warren CL, Ansari AZ, Dervan PB. J Am Chem Soc. 2007;129:12310–12319. doi: 10.1021/ja0744899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Minoshima M, Bando T, Sasaki S, Fujimoto J, Sugiyama H. Nucleic Acids Res. 2008;36:2889–2894. doi: 10.1093/nar/gkn116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gottesfeld JM, Melander C, Suto RK, Raviol H, Luger K, Dervan PB. J Mol Biol. 2001;309:615–629. doi: 10.1006/jmbi.2001.4694. [DOI] [PubMed] [Google Scholar]
  • 6.a) Edwards TG, Koeller KJ, Slomczynska U, Fok K, Helmus M, Bashkin JK, Fisher C. Antiviral Res. 2011;91:177–186. doi: 10.1016/j.antiviral.2011.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Edwards TG, Vidmar TJ, Koeller K, Bashkin JK, Fisher C. PLoS ONE. 2013;8:e75406. doi: 10.1371/journal.pone.0075406. [DOI] [PMC free article] [PubMed] [Google Scholar]; c) He G, Vasilieva E, Harris GD, Jr, Koeller KJ, Bashkin JK, Dupureur CM. Biochimie. 2014;102:83–91. doi: 10.1016/j.biochi.2014.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.a) Olenyuk BZ, Zhang GJ, Klco JM, Nickols NG, Kaelin WG, Dervan PB. Proc Natl Acad Sci U S A. 2004;101:16768–16773. doi: 10.1073/pnas.0407617101. [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Dickinson LA, Burnett R, Melander C, Edelson BS, Arora PS, Dervan PB, Gottesfeld JM. Chem Biol. 2004;11:1583–1594. doi: 10.1016/j.chembiol.2004.09.004. [DOI] [PubMed] [Google Scholar]; c) Xiao X, Yu P, Lim HS, Sikder D, Kodadek T. Angew Chem Int Ed. 2007;46:2865–2868. doi: 10.1002/anie.200604485. [DOI] [PubMed] [Google Scholar]; d) Janssen S, Cuvier O, Müller M, Laemmli UK. Mol Cell. 2000;6:1013–1024. doi: 10.1016/s1097-2765(00)00100-3. [DOI] [PubMed] [Google Scholar]; e) Pandian GN, Nakano Y, Sato S, Morinaga H, Bando T, Nagase H, Sugiyama H. Sci Rep. 2012;2 doi: 10.1038/srep00544. [DOI] [PMC free article] [PubMed] [Google Scholar]; f) Yang F, Nickols NG, Li BC, Marinov GK, Said JW, Dervan PB. Proc Natl Acad Sci U S A. 2013;110:1863–1868. doi: 10.1073/pnas.1222035110. [DOI] [PMC free article] [PubMed] [Google Scholar]; g) Mapp AK, Ansari AZ, Ptashne M, Dervan PB. Proc Natl Acad Sci U S A. 2000;97:3930–3935. doi: 10.1073/pnas.97.8.3930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.a) Warren CL, Kratochvil NCS, Hauschild KE, Foister S, Brezinski ML, Dervan PB, Phillips GN, Jr, Ansari AZ. Proc Natl Acad Sci U S A. 2006;103:867–872. doi: 10.1073/pnas.0509843102. [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Tietjen JR, Donato LJ, Bhimisaria D, Ansari AZ. In: Methods Enzymol. Chris V, editor. Vol. 497. Academic Press; 2011. pp. 3–30. [DOI] [PubMed] [Google Scholar]; c) Hauschild KE, Stover JS, Boger DL, Ansari AZ. Bioorg Med Chem Lett. 2009;19:3779–3782. doi: 10.1016/j.bmcl.2009.04.097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Burnett R, Melander C, Puckett JW, Son LS, Wells RD, Dervan PB, Gottesfeld JM. Proc Natl Acad Sci U S A. 2006;103:11497–11502. doi: 10.1073/pnas.0604939103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Archer TK, Cordingley MG, Wolford RG, Hager GL. Mol Cell Biol. 1991;11:688–698. doi: 10.1128/mcb.11.2.688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jespersen C, Soragni E, James Chou C, Arora PS, Dervan PB, Gottesfeld JM. Bioorg Med Chem Lett. 2012;22:4068–4071. doi: 10.1016/j.bmcl.2012.04.090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.a) Anders L, Guenther MG, Qi J, Fan ZP, Marineau JJ, Rahl PB, Loven J, Sigova AA, Smith WB, Lee TI, et al. Nat Biotechnol. 2014;32:92–96. doi: 10.1038/nbt.2776. [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Lee M, Roldan MC, Haskell MK, McAdam SR, Hartley JA. J Med Chem. 1994;37:1208–1213. doi: 10.1021/jm00034a019. [DOI] [PubMed] [Google Scholar]; c) Wurtz NR, Dervan PB. Chem Biol. 2000;7:153–161. doi: 10.1016/s1074-5521(00)00085-5. [DOI] [PubMed] [Google Scholar]; d) Guan L, Disney MD. Angew Chem Int Ed. 2013;52:10010–10013. doi: 10.1002/anie.201301639. [DOI] [PMC free article] [PubMed] [Google Scholar]; e) White JD, Osborn MF, Moghaddam AD, Guzman LE, Haley MM, DeRose VJ. J Am Chem Soc. 2013;135:11680–11683. doi: 10.1021/ja402453k. [DOI] [PMC free article] [PubMed] [Google Scholar]; f) Bando T, Sugiyama H. Acc Chem Res. 2006;39:935–944. doi: 10.1021/ar030287f. [DOI] [PubMed] [Google Scholar]
  • 13.Baird EE, Dervan PB. J Am Chem Soc. 1996;118:6141–6146. [Google Scholar]
  • 14.Hyde JE, Hearst JE. Biochemistry. 1978;17:1251–1257. doi: 10.1021/bi00600a019. [DOI] [PubMed] [Google Scholar]
  • 15.Jolma A, Yan J, Whitington T, Toivonen J, Kazuhiro Nitta R, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, et al. Cell. 2013;152:327–339. doi: 10.1016/j.cell.2012.12.009. [DOI] [PubMed] [Google Scholar]
  • 16.Panne D, Maniatis T, Harrison SC. Cell. 2007;129:1111–1123. doi: 10.1016/j.cell.2007.05.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Simonoff JS, Udina F. Comput Stat Data An. 1997;23:335–353. [Google Scholar]
  • 18.Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Cell. 2013;153:307–319. doi: 10.1016/j.cell.2013.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.a) Wang D, Lippard SJ. Nat Rev Drug Discov. 2005;4:307–320. doi: 10.1038/nrd1691. [DOI] [PubMed] [Google Scholar]; b) Hurley LH. Nat Rev Cancer. 2002;2:188–200. doi: 10.1038/nrc749. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

RESOURCES