Custom DNA Microarrays Reveal Diverse Binding Preferences of Proteins and Small Molecules to Thousands of G-Quadruplexes

Sreejana Ray; Desiree Tillo; Robert E Boer; Nima Assad; Mira Barshai; Guanhui Wu; Yaron Orenstein; Danzhou Yang; John S Schneekloth, Jr; Charles Vinson

doi:10.1021/acschembio.9b00934

. Author manuscript; available in PMC: 2021 Apr 17.

Published in final edited form as: ACS Chem Biol. 2020 Apr 7;15(4):925–935. doi: 10.1021/acschembio.9b00934

Custom DNA Microarrays Reveal Diverse Binding Preferences of Proteins and Small Molecules to Thousands of G-Quadruplexes

Sreejana Ray ^1,^⊥, Desiree Tillo ^2,^⊥, Robert E Boer ³, Nima Assad ⁴, Mira Barshai ⁵, Guanhui Wu ⁶, Yaron Orenstein ⁷, Danzhou Yang ⁸, John S Schneekloth Jr ⁹, Charles Vinson ¹⁰

PMCID: PMC7263473 NIHMSID: NIHMS1590673 PMID: 32216326

Abstract

Single-stranded DNA (ssDNA) containing four guanine repeats can form G-quadruplex (G4) structures. While cellular proteins and small molecules can bind G4s, it has been difficult to broadly assess their DNA-binding specificity. Here, we use custom DNA microarrays to examine the binding specificities of proteins, small molecules, and antibodies across ~15,000 potential G4 structures. Molecules used include fluorescently labeled pyridostatin (Cy5-PDS, a small molecule), BG4 (Cy5-BG4, a G4-specific antibody), and eight proteins (GST-tagged nucleolin, IGF2, CNBP, FANCJ, PIF1, BLM, DHX36, and WRN). Cy5-PDS and Cy5-BG4 selectively bind sequences known to form G4s, confirming their formation on the microarrays. Cy5-PDS binding decreased when G4 formation was inhibited using lithium or when ssDNA features on the microarray were made double-stranded. Similar conditions inhibited the binding of all other molecules except for CNBP and PIF1. We report that proteins have different G4-binding preferences suggesting unique cellular functions. Finally, competition experiments are used to assess the binding specificity of an unlabeled small molecule, revealing the structural features in the G4 required to achieve selectivity. These data demonstrate that the microarray platform can be used to assess the binding preferences of molecules to G4s on a broad scale, helping to understand the properties that govern molecular recognition.

Graphical Abstract

graphic file with name nihms-1590673-f0006.jpg

Both the sequence and the structure of the genome govern gene expression. Transcription factors (TFs) bind to specific double-stranded DNA (dsDNA) sequences and modulate gene expression. Sequence-specific binding of TFs to dsDNA has been observed and described for thousands of proteins.¹ However, estimates suggest that 13% of the genome has the capacity to form non-B-DNA structures.² Several proteins can bind non-B-DNA such as unfolded single-stranded DNAs (ssDNA)³ and folded structures such as G4s.⁴ Understanding the factors that govern both sequence- and structure-dependent binding of DNA is critical to understanding fundamental biological regulatory mechanisms. To date, it has been challenging to develop techniques capable of a high-throughput examination of the sequence specificity of non-B-DNA-binding proteins.

ssDNA containing guanine-rich stretches (G-tracts) spontaneously undergoes Hoogsteen base pairing, resulting in the formation of four-stranded structures known as G-quadruplexes (G4s).^5,6 Physiological concentrations of potassium stabilize G4s in vitro.⁶ G4-forming DNA sequences are enriched in promoter regions of oncogenes⁷ and can be conserved across species.⁸ G4 formation has been implicated in the transcriptional regulation of oncogenes such as c-MYC⁹ and BCL2¹⁰ and are potential therapeutic targets for small molecules.¹¹ Dozens of proteins⁴ and many small molecules¹² have been identified that bind G4s. Prominent examples of small molecules include pyridostatin,¹³ 5,10,15,20-tetra(N-methyl-4-pyridyl) porphyrin (TMPyP4),¹⁴ and DC-34.¹⁵ G4-binding molecules can silence the expression of G4-associated oncogenes.¹⁵ Examples of G4-binding proteins include : helicases,¹⁶ nucleolin,¹⁷ IGF2,¹⁸ and CNBP.¹⁹ Despite strong; evidence for G4 formation in vivo,^20,21 progress in understanding the G4 function has been constrained by the difficulty of examining DNA-binding specificity of molecules that bind G4s.

Most TFs bind short dsDNA sequences (6–10 nucleotides)²² allowing for the comprehensive analysis of potential binding sites.¹ Universal protein-binding microarrays (PBMs)^23,24 have been used as a high-throughput method to determine the dsDNA-binding specificity to all possible 8-mers.¹ In contrast, the simplest G4 structure is 15 nucleotides long (i.e., GGGNGGGNNGGGNGGG), not counting the nucleotides entering and exiting the structure (the flanking G4 tails). The types of DNA sequences known to form G4s is also expanding: several noncanonical G4s have been described including those with longer loops and/or insertions in G-tracts (bulges).²⁵ There are limits to a number of sequences that can be placed on a microarray, and thus, determining DNA-binding specificity of such a large sequence space is challenging. We can use this technology to examine all potential mammalian G4s, but this does not include all possible potential G4-forming sequences.

One intriguing report previously used microarrays to study ~1,900 G4-forming oligonucleotides and probed binding with; a fluorescently labeled small molecule,²⁶ suggesting that arrays could be a useful technology to investigate antibody and endogenous protein binding as well. Microarray-based platforms for measuring G4-binding specificity have several advantages over sequencing-based methods. The first is that it does not require a PCR amplification step. PCR amplification is difficult for stable G4 templates, as DNA polymerase can be biochemically inhibited by G4 DNA.^2,27,28 A second advantage is sensitivity. Protein-binding microarrays can detect distinct DNA sequence preferences between molecules even with low (<2-fold) relative differences in binding affinities.²⁹ Finally, the method is not dependent on enrichment/pulldown efficiency it can show that a molecule does not bind to all G4s present, whereas sequencing-based methods only detect what is efficiently pulled down.

Here, we use three Agilent DNA microarray designs that together contain a total of 24,154 unique sequences to examine the binding specificity of proteins, antibodies, and small molecules to G4s and variants. Using Cy5-conjugated pyridostatin (Cy5-PDS) and a fluorescently labeled antibody BG4 (Cy5-BG4), we show that G4s form on these microarrays, and binding strength can be visualized using fluorescence imaging, validating the platform as a high-throughput method to profile G4-binding specificity. We use these arrays to identify distinct G4-binding preferences of a panel of GST-tagged proteins (CNBP, IGF2, nucleolin, and five helicases). Finally, competition experiments between Cy5-PDS and the small molecule DC-34 reveal the G4-binding specificity of DC-34, highlighting the ability of the platform to examine DNA-binding specificity of unlabeled compounds.

RESULTS

Design of a G4Microarray

We designed three Agilent DNA microarrays (Table 1), each with four identical sectors that contain ~177,440 ssDNA 60-mers to examine G4-binding specificity. Arrays were designed with 9–73 replicates of each unique sequence to ensure statistical significance (Table 1 and Table S6). Each microarray contains different sets of G4 variants designed to examine several sequence parameters that affect G4 formation and binding specificity such as loop length (Design 1, Table S3), loop sequence (Design 2, Table S3), tail sequence (Design 2, Table S3), and single nucleotide variants of six known G4s (Design 3, Table S4). All microarrays include a set of 19 sequences from human telomeres and oncogene promoters known to form G4s with various topologies as positive controls (Table S2). Designs 2 and 3 have a set of 295 additional G4-forming sequences from the literature.³⁰ For the loop length variants, we increased the length of the tails and loops of four different MYC G4 sequences (MYC Pu27, MYC Pu18ntd, MYC Pu22, and MYC Pu22 NMR mutant) up to five times their length (Table S3). Loop and tail sequences were varied using A, T, G, and C polynucleotide stretches and a subset of combinations (Table S3B). For the loop sequence variants, we generated 4,096 sequences of the form NGGGNGGGNNGGGNGGGN and 64 variants of the form GGGNGGGNGGGNGGG. For the tail variants, we generated 256 versions of the major MYC G4 with all possible dinucleotide tails (NNGGGTGGGGAGGGTGGGNN). All single nucleotide variations at all positions of eight previously characterized G4 sequences (MYC Pu22, PDGFRβ BCL2, and human telomeric G4) were generated (Tables 1, S5).; Negative controls include 19 oncogene G4s in which all G-tracts are replaced with either A,T, or C, reverse complements of G4 sequences, as well as a set of 86 published non-G4 sequences³⁰ (Tables 1, S3–S5). Design 3 is the most comprehensive of the three designs, which contains sequences found in Designs 1 and 2 as well as additional G4 sequences. This design is thus used for most of the experiments and analyses in this study.

Table 1.

Summary of Array Designs

sequence type	Design 1	Design 2	Design 3
G4 variants	loop length variants of MYC G4s	tail sequence	tail sequence
	loop length variants of MYC G4s	NNGGGTGGGGAGGGTGGGNN
	G4 location (surface vs buried)	loop sequence	loop length variants of MYC G4s
	G4 location (surface vs buried)	NGGGNGGGNNGGGNGGGN, GGGNGGGNGGGNGGG	nucleotide variations of known G4 (MYC, Bcl2, Telomeric, PDGFR)
positive controls	human oncogene G4s (Table S2)	human oncogene G4s (Table S2)	human oncogene G4s (Table S2)
positive controls	human oncogene G4s (Table S2)	G4 sequences from ref 30	G4 sequences from ref 30
negative controls	replacement of G-tracts with (A/C/T)	replacement of G-tracts with (A/C/T)	replacement of G-tracts with (A/C/T)
		non-G4 sequences from ref 30	non-G4 sequences from ref 30
		reverse complements of G4 sequences	reverse complements of G4 sequences
			randomly selected from Universal PBM (GEO platform GPL11260)
no. of 60mer sequences	2,264	18,512	15,671
(no. of replicates)	(73 replicates)	(9 replicates)	(15 replicates)

Open in a new tab

We evaluated the binding specificities of several molecules (Figure 1A, Table S6). Microarrays were preincubated with 100 mM potassium chloride to induce G4 formation. Binding of each molecule is measured by detection of fluorescence intensity at each of the microarray features. BG4 and pyridostatin were conjugated with Cy5. Cellular proteins were expressed as chimeric proteins containing GST, and binding for these proteins was detected using an anti-GST antibody conjugated with Cy5 (Materials and Methods).

G4 Structures Fold on DNA Microarrays

To evaluate the utility of DNA microarrays to examine G4-binding specificity, we synthesized a Cy5-labeled pyridostatin (Cy5-PDS), a small molecule known to bind broadly to G4 structures.¹³ We also obtained a Cy5 conjugated version of BG4 (Cy5-BG4), an antibody developed to bind G4s.³¹ Figure 1B,C presents replicate binding intensities to 15,491 DNA sequences on the Design 3 microarray (Table S5) using either Cy5-PDS or Cy5-BG4. For Cy5-PDS, robust binding is observed at 1 μM, with fluorescence intensities ranging over 100-fold between strongest and weakest bound DNA features (Figure S4A). The fluorescence-binding intensities of Cy5-PDS are proportional to the concentration of pyridostatin used (Figure S4B,C). For Cy5-PDS, strong binding was observed for 19 known genomic G4s, whereas negative controls (oligonucleotides incapable of folding into G4s) have over 100-fold lower binding, consistent with preferential Cy5-PDS binding only to G4 structures (Table S6). In contrast, Cy5-BG4 binds G4-forming sequences, but it also binds several ssDNA sequences on the microarray incapable of forming G4s (Figures 1C and S5). Antibody binding to non-G4 features increases with higher concentrations, and in some cases non-G4 sequences are more strongly bound by Cy5-BG4 than G4 sequences, including multiple cytosine-rich negative control sequences (Figure S5).

Inhibition of G4 Formation Inhibits Cy5-PDS and Cy5-BG4 Binding

We next examined Cy5-PDS and Cy5-BG4 binding under conditions that inhibit G4 formation to evaluate if G4 structures form on the microarray and are required for binding. In one experiment, we replaced potassium chloride (which stabilizes G4s) with lithium chloride (which does not stabilize G4s)^28,32 and observed a decrease in binding for both Cy5-PDS (Figure 1D) and Cy5-BG4 (Figure 1E). Both Cy5-PDS and Cy5-BG4 showed preferred binding in a potassium solution that stabilizes G4 formation. It is noted that many sequences, in addition to the oncogene G4 sequences, are capable of forming G4s. Cy5-PDS binding to genomic G4s decreased up to 9–141-fold (>30-fold on average, Figure 1D, Table S7) in lithium solution, while binding to negative controls decreased only 2–30-fold (Figure 1D, Table S7), suggesting that Cy5-PDS specifically binds folded G4s rather than G-rich sequences. For Cy5-BG4, the decrease in binding was up to 270-fold for genomic G4 sequences, while binding to negative controls decreased up to 23-fold (Figure 1E, Table S7).

In a second experiment, we examined Cy5-PDS binding following a primer extension reaction that produces dsDNA²⁴ (see Materials and Methods), anticipating that dsDNA would predominate over G4 formation¹³ (Figure S6). Formation of dsDNA for each microarray feature was quantified using a spike-in of fluorescently labeled cytosine (Cy3-dCTP).³³ Many features did not incorporate Cy3-dCTP but retained Cy5-PDS binding, suggesting that dsDNA was not produced. These features tend to be guanine-rich (Figure S6A) and contain known G4 sequences (Figure S6B,C), suggesting that G4 structures form on the microarray and inhibit T7-DNA polymerase processivity, consistent with previous observations. ^2,27,28

Protein-Binding Specificity to G4 DNA in Potassium and Lithium

We next examined the G4-binding specificity of eight GST-tagged human cellular proteins: two nucleolin (NCL) constructs (an N-terminal deletion of amino acid residues 1–271 and the RNA-recognition motifs (RRMs) only, i.e., residues 272–647, Table S1), CNBP, IGF2, and full-length and truncated versions of 5 human helicases (Figures S2, S7, Table S1). Each protein construct bound G4 microarray features in the presence of potassium (Figures 2, S7A,B). We observed similar binding of IVT-expressed or purified helicase DHX36 (Figures S3, S7C). Lithium chloride weakened binding for most proteins (IGF2, NCL, FANCJ, BLM, WRN, and DHX36), highlighting their preference for binding folded G4 structures (Figure 2A,C,D,E,H–K). We note that we cannot rule out the effect of the specific cation (potassium or lithium) on protein binding, as a reduction in binding to negative control sequences (orange spots) was also observed for these proteins, similar to that observed for Cy5-BG4, CNBP, and the DNA-binding domain of FANCJ, and to a lesser extent PIF1 (Figure 2B,F,G). CNBP binding to lithium-treated microarrays (Figure 2B) and dsDNA is consistent with previous reports that CNBP binds guanine-rich nucleic acids.¹⁹

Figure 2. — Diversity in G4-binding activities of a panel of proteins to potassium- or lithium-treated microarrays. Comparison of fluorescent intensities of GST-tagged human proteins (A) IGF2, (B) CNBP, (C) an N-terminal deletion of nucleolin, (D) the RNA recognition motifs (RRMs) of nucleolin, the helicases (E) FANCJ, (F) the DNA-binding domain of FANCJ (FANCJ (DBD)), (G) PIF1, (H) BLM, (I) WRN, (J) full-length DHX36, and (K) the G4-binding domain (RHAU) of DHX36 in the presence of potassium (K⁺, x-axis) vs binding in the presence of lithium (Li⁺, y-axis).

Diversity of G4-Binding Specificity of Cellular Proteins

Figure 3A presents a heatmap summarizing the different G4-binding specificities of 13 molecules to sequences on the Design 3 microarray. All molecules bind different groups of G4s (Figures 3B, S8–S10). For example, Cy5-PDS preferentially binds G4 with specific sequence properties and topologies, including sequences with more than 4 G-tracts and parallel G4s (i.e., MYC Pu40, MYC Pu22, VEGF, PDGFRβ, RET, BCL2-Pu3055G, and BCL2-P1G4, Table S2). Moderate to low binding intensities were obtained for G4s with mixed/hybrid (hTelomeric, hTelomeric1) or antiparallel (CEB1³⁴) topologies (Figures 3A, S9). Notably, the binding profile of Cy5-BG4 is distinct from Cy5-PDS (Figure 3A), in which Cy5-BG4 preferentially binds a set of G4 sequence and structural variants including hybrid/mixed (telomeric) G4s (Figure S9) and non-G4 ssDNAs (Figure 3A). In all, we identify two distinct binding preferences in this panel of molecules: those that bind only G4 sequences (i.e., IGF2 and the helicase DHX36 Figures 3B,C, S10) and those that also bind other ssDNAs in addition to folded G4s (i.e., BG4, nucleolin, and FANCJ Figures 3D–F, S10). For example, similar to Cy5-BG4, nucleolin preferably binds folded G4 structures (Figures 1E, 2C,D), as previously reported.¹⁷ However, the two proteins also appear to bind non-G4 sequences, with nucleolin to a lesser extent, as shown by both potassium–lithium preference and comparison with Cy5-PDS (Figures 1E, 2C, 3D,E).

The Effect of Single Nucleotide Variants on G4 Binding

We next assessed the utility of the microarray platform to detect how single nucleotide variants (SNVs) of known G4s affect binding. We began with an examination of Cy5-PDS binding the MYC Pu22 G4 with the expectation that variation of the nucleotides that are important to the G4 structure would result in weaker binding (Figures 3G, S11–S13). Figure 3G presents two visualizations of Cy5-PDS binding SNVs of the MYC Pu22 G4. The top panel shows a bar graph that summarizes the average change in Cy5-PDS binding of three SNVs at each position of the G4 sequence. The bottom panel shows a heatmap that summarizes the effect of each SNV on binding. In general, alteration of the guanine repeats results in weaker binding, with the largest effect occurring in the central guanine of each G-tract. In the MYC Pu22 G4, there are two G-tracts (positions 8–11 and 17–20) that are four nucleotides long. Sequences with variants at G9 and G10 are more weakly bound by Cy5-PDS, suggesting they participate in one of the four strands of the G4. In contrast, G8 and G11 can accommodate other bases, suggesting that guanine trinucleotides comprised of positions 8–10 or 9–11 can participate in the G4 structure. For the second G-tract, variants of G20 are better bound than the consensus suggesting it is not in the G4 structure, while variants of G17, G18, and G19 are more weakly bound, suggesting they are the guanine trinucleotide that is part of the G4 structure, consistent with previous reports.^35,36 Variations in the loops (positions 7, 12, and 16) and tail sequences can either weaken or strengthen Cy5-PDS, with cytosine or thymine being preferred in the loops.

Examination of Cy5-PDS binding SNVs of five other G4 sequences (MYC Pu26, BCL2 P1G4, BCL2 55G, hTelomeric, and hTelomeric1) also highlighted G-tracts participating in the G-tetrad for all sequences except for the hTelomeric sequence (Figure S11). For example, in the G-rich BCL2 P1G4, which contains five G-tracts, we identify the four G-tracts participating in the G4 structure and the long (12 nucleotides) second loop (Figure S11B), consistent with previous reports.³⁷ Variations affect Cy5-PDS binding for the hTelomeric G4 sequence differently, in which nucleotide substitutions at most positions increase Cy5-PDS binding. This G4 differs from the hTelomericl G4 sequence only at the dinucleotide at the 3′ tail (TTAGGGTTAGGGTTAGGGTTAGGGTT for the hTelomeric G4 versus TTAGGGTTAGGGTTAGGGTTAGGGAA for hTelomericl, Figure S11D,E). These results are indicative of the interplay of the 3’ end and other nucleotides of the sequence in determining the G4 structure and Cy5-PDS binding, consistent with previous results suggesting that these nucleotides affect the structure of the telomeric G4.³⁸ Examination of single nucleotide variants of longer G4s such as the MYC Pu40 and PDGFRβ G4s, which contain more than four G-tracts and can thus potentially form multiple G4 structures, revealed that mutations of these G-tracts have variable effects on Cy5-PDS binding. Thus, different G4 structures maybe forming in these sequences (Figure S12A,B). Consistent with this idea, we examined several truncations of the PDGFRβ G4, which contain only four G-tracts, and identified that Cy-5 PDS can bind each truncation (Figure S12C).

Examination of protein binding to SNVs of a panel of G4s identifies unique patterns (Figure S13) and provides base-resolution data for investigators interested in G4 structure and G4-protein interactions. For example, mutations of the G-tracts of the MYC Pu26 G4 reduce binding of all proteins examined except for in the case of PIF1, in which all variants increase binding (Figure S13B). Another example is the effect of variations of hTelomeric and hTelomericl G4s on BLM binding. Here, SNVs have opposite effects on BLM binding, similar to that observed for Cy5-PDS. However, unlike Cy5-PDS, the binding pattern is reversed: substitutions of the G-tracts of the hTelomeric sequence decrease BLM binding, whereas sequence variations at most positions of the hTelomericl G4 increase BLM binding (Figure S13E,F), suggesting that G4 topology may be an important determinant of BLM-binding specificity and function.

Effects of G4 Loop and Tail Parameters on Molecule Binding

We next examined the effect of specific sequence parameters on molecule binding. We first examined loop length (Designs 1 and 3) and sequence (NGGGNGGGNNGGGNGGGN, Design 2), both of which influence G4 stability³⁹ and topology.⁴⁰ Figure 4A summarizes the correlation of loop length of the MYC Pu22 G4 sequence on the binding of each molecule. While loop length does not affect binding of Cy5-PDS (R = −0.15), binding decreases with increasing loop length for most molecules including Cy5-BG4 (R < −0.29, Figures 4A–E, S14, S15), with the strongest effect observed for the helicase FANCJ. This suggests that the longer loops disrupt the protein-DNA interface. We identify multiple sequences with long loops that are bound better than the parental sequence by several molecules and proteins (dotted horizontal line of Figure 4B–E). For example, Cy5-PDS preferentially binds MYC Pu22 G4s with loops >2 nucleotides long comprised primarily of poly-G or poly-T stretches (Figure S15A).

An examination of all possible loop sequence variants of a simple G4 (GGGNGGGNGGGNGGG, 64 variants) and a MYC Pu22-like G4 sequence (NGGGNGGGNNGGGNGGGN, 4,096 variants) further highlights differences between proteins and Cy5-PDS. For example, Cy5-PDS binds both classes of sequences over a 2–3-fold range (Table S7B). Specifically, we identify distinct patterns within the best-bound sequences, including flexibility for the nucleotides in the central loop of the G4 and an overall preference for thymines in loops (Figure 4F), consistent with previous findings that T nucleotides in loops have a greater propensity for folding into G4s than other nucleotides.³⁹

We find different tail sequence (NNGGGTGGGGAGGGTGGGNN) preferences for all measured molecules (Figure 4G), further underscoring the utility of the platform in identifying sequence features important for G4 binding and highlighting tail sequences in determining binding specificity. For example, DHX36 preferentially binds MYC22 G4 variants containing pyrimidines (C/T) at the 5′ end of the G4, whereas a lack of sequence specificity was observed for the 3′ end. This is consistent with the published DHX36 crystal structure that highlighted the DHX-specific motif interacting with the 5′ tail and surface of the MYC Pu22 G4.⁴¹

Competition Experiments Reveal G4-Binding Specificity of Unlabeled Small Molecules

We next explored whether the microarray platform could be used to reveal the G4-binding specificity of unlabeled molecules via a competition with Cy5-PDS binding. We examined three molecules, unlabeled PDS, TMPyP4 (a planar molecule that nonspecifically binds G4 structures¹⁴), and DC-34 (a molecule that selectively binds the MYC G4¹⁵). A competition experiment with unlabeled pyridostatin indicates no change in binding specificity, with weaker-bound G4s being more easily competed (Figure S16A). Comparison of 1 μM Cy5-PDS binding in the presence or absence of various concentrations of unlabeled TMPyP4 indicated a uniform reduction in Cy5-PDS binding to all G4-containing features (Figures 5A, S16B). These results confirm that TMPyP4 nonspecifically competes with Cy5-PDS for binding to all G4s.

Next, we examined unlabeled DC-34 (Figure 5B, Figure S17). There are no features that are better-bound in the presence of DC-34. Instead, some features are poorly bound by Cy5-PDS in the presence of DC-34 (bottom right of the plot), suggestive of specific DC-34 binding. Specifically, 17.5% of G4 sequences decreased in intensity greater than 10-fold, suggesting that DC-34 competitively binds to only a subset of the G4s. Similar results were observed with higher concentrations of DC-34 (Figure S17A).

The difference in Cy5-PDS binding to variants in the tails of the MYC Pu22 G4 in the presence of DC-34 is 3-fold, with sequences containing purine (A or G) directly adjacent to the G4 structure being preferentially bound by DC-34 (i.e., they have the strongest reduction in Cy5-PDS binding in the presence of DC-34, Figure 5B,C). This is consistent with the observation that DC-34 binds the top and bottom surfaces of the G4 and makes specific contacts with purines in the tail sequences.¹⁵ We next examined the general properties of features in which DC-34 reduced binding of Cy5-PDS (ratio of PDS/PDS+DC-34, Figure 5D). DC-34 preferentially binds features that are moderately bound by PDS (variants of telomeric G4s, Figure S18) and those that tend to have signatures of less stable G4s, such as moderate dCTP incorporation and moderate G-content (Figure 5D).

Measurements of G4 Binding on Microarrays Correlate with Sequencing-Based Methods

Finally, we evaluated how well our microarray-based measurements for Cy5-PDS binding correlate with G4 stability measured using high-throughput sequencing.²⁸ We applied a method (G4Detector)⁴² that uses parameters learned from high-throughput sequencing-based measurements of hundreds of thousands of human G4 occurrences²⁸ to predict microarray intensities based on the probe sequence. The predicted intensities show a high positive correlation with our measured array intensities for Cy5-PDS binding (R = 0.66, p-value < 1e-15, Figure S19), indicating good agreement between PDS-binding measurements made using either microarray or sequencing-based technologies. These results further demonstrate the generalizability of our array-based measurements: while the model we used was trained on human genomic sequences, it has good predictive power on unrelated sequences (our array probes).

DISCUSSION

In this work, we used microarrays containing thousands of different ssDNA sequences to evaluate G4 DNA-binding specificity of proteins and small molecules. Previous efforts to use G4 microarrays have focused on examining the binding of labeled small molecules to ~ 2,000 G4-forming sequences.²⁶ Here, we report the systematic assessment of protein, small molecule, and antibody binding to more than 25,000 G4 sequences, approaching the number and sequence diversity of G4s thought to exist at a given time in the human genome.²⁰ We demonstrate the binding preferences of a G4 antibody as well as a variety of helicases and known endogenous G4-binding proteins. We find distinct and coherent patterns/preferences of each molecule for different sequences even with low relative differences in intensities, highlighting the sensitivity of the approach. Furthermore, we demonstrate that in competitive assays, the selectivity of unlabeled small molecules can also be assessed, revealing a label-free method for quantifying G4-binding specificity.

This work highlights the utility of the microarray platform to assess the specificity of G4-binding molecules. For example, BG4 is an antibody developed to bind G4s³¹ and has been used to examine the occurrences of the G4 structure in vivo.²¹ The G4-binding specificity of BG4 has only been validated using a handful of sequences.³¹ Examination of Cy5-BG4 binding to our G4 microarrays indicates the binding specificity of Cy5-BG4 is distinct from Cy5-PDS, a small molecule that also broadly binds G4s. We report that, unlike Cy5-PDS, Cy5-BG4 G4 has the capacity to bind to some unfolded and non-G-tract containing ssDNA sequences, including multiple cytosine-rich sequences. Still, the possibility exists that BG4 induces a G4-like fold in some G-rich ssDNA sequences. Our analysis of the effect of loop lengths on binding indicates that Cy5-BG4 preferentially bind G4s with short loops, unlike Cy5-PDS, which binds similarly to G4 sequences with various loop lengths. Because BG4 does not bind to all G4s, it is possible that pulldown assays such as ChIP-seq with BG4 may either underrepresent or overrepresent the occurrence of G4s in cells or lysates. Thus, caution should be exercised in considering pulldown assays with BG4.

Experiments using this approach can also provide insights into G4-mediated regulation of biological processes. Transcription initiation is a dynamic process that involves several mechanical and topological changes to dsDNA.⁴³ We demonstrate that the microarray platform can distinguish the binding specificity of a given molecule or protein for structured or linear DNAs. For example, examination of protein binding in the presence of lithium (disfavoring G4 formation) in comparison with potassium (stabilizing G4 formation) demonstrates that inhibiting G4 formation does not inhibit DNA binding of the known G4-binding proteins CNBP and PIF1. It may therefore be more appropriate to consider these proteins as binding to purine-rich sequences of multiple; conformations, and it is tempting to speculate that the flexibility in binding DNA in multiple conformations may allow these proteins to bind genomic regions undergoing transitions in DNA conformation. In contrast, proteins such as IGF2 and DHX36 only bind to folded G4 sequences. IGF2 traditionally is known to act extracellularly, binding to the surface of cells and activating multiple signaling pathways.⁴⁴ The possibility that it also functions by directly binding to DNA is another example of a protein having multiple functions by binding totally unrelated cellular components.⁴⁵ hTelomeric G4 is structurally polymorphic which may be important for its function. Interestingly, our data shows that BLM specifically binds the wt hTelomeric sequence that forms hyb-2 G4, while WRN can bind both hTelomeric (hyb-2 G4) and hTelomericl (hyb-1 G4) sequences, suggesting that G4 topology may be an important determinant of different binding specificities and functions of BLM and WRN. We also note the differences in binding to G4 sequences between proteins and Cy5-PDS (Figures 3B–F, S10) suggesting that they may recognize distinct surfaces of the G4 structure. Analysis of future structures of small molecules and proteins in complex with G4 DNA such as the one already described⁴¹ would aid in understanding the array data, such as the contribution of different SNVs to binding specificity.

In conclusion, we show that the microarray-based analysis of G4-binding events is a robust and sensitive technology to examine DNA-binding specificity of small molecules and proteins to tens of thousands of ssDNA structures including G4s in a single experiment. Our data provide a rich resource for investigators interested in noncanonical nucleic acid structures and G4 molecule-binding specificity. We also highlight the customizability and flexibility in using microarrays to examine various aspects of G4 structure, stability, and binding by small molecules and proteins. Many G4s are polymorphic and have topologies dependent on temperature,⁴⁶ cation identity (K⁺, Na⁺, or Li⁺), or concentrations.³² Experiments conducted using differing conditions (salt concentrations or alternative ions) could allow for determination of aspects of G4 formation and stability. Parameters affecting cooperative G4-binding specificity can be examined via additional custom array designs in which the number of G-tracts within a DNA probe is varied systematically. Finally, this platform presents a unique approach to understand the sequence and structure parameters that govern nucleic acid recognition by antibodies, proteins, and small molecules in an unbiased format.

MATERIALS AND METHODS

Synthesis of Cy5 Conjugated Pyridostatin

To a 1-dram vial was added alkynyl pyridostatin (1.0 mg, 0.00102 mmol)⁴⁷ from a 5 mg mL⁻¹ stock in DMSO. The solution was diluted with a water/tert-butyl alcohol mixture (1.0 mL, 1:1 v/v). Cy5-N₃ (1.03 mg, 0.00123 mmol) was then added from a 10 mM aqueous stock solution, followed by cupric sulfate (0.065 mg, 0.00041 mmol) and sodium ascorbate (0.2 mg, 0.00102 mmol) which were added from 5 mg mL⁻¹ aqueous stock solutions. The reaction was stirred at RT for 1 h, at which time LC/MS indicated consumption of the starting material. The reaction was diluted with water (3 mL), and the solution was directly purified by reverse-phase preparative HPLC (5–90% MeCN/0.1% aqueous (NH₄)HCO₃). The product-containing fractions were lyophilized to afford Cy5-PDS (1.3 mg, 76%) as a blue solid (Figure S1).

Sources of Antibody, Small Molecule, and Protein Constructs

BG4,³¹ conjugated with FluoProbes647H (Cy5-BG4), was obtained from Absolute Antibody (product number Ab00174-1.1). TMPyP4 was obtained from Sigma-Aldrich (catalog number 613560). N-terminal glutathione S-transferase (GST) tagged human nucleolin IGF2, CNBP, and helicase plasmids were synthesized by GenScript. Purified, recombinant bovine DHX36⁴¹ was provided as a gracious gift by the Ferré-D’Amaré Lab (National Institutes of Health, Bethesda). The sequences of all proteins used are listed in Table S1. All chimeric proteins were expressed via in vitro translation (IVT) reactions using the PURExpress In Vitro Protein Synthesis Kit (NEB) as described previously.²³ For all IVT reactions, 288 ng of plasmid was added to 80 μL of a IVT mixture, and reactions were carried out at 37 °C for 2 h. Expression of all protein constructs was confirmed via Western blot (Figure S2).

Binding Experiments

Microarrays were preincubated with a 100 mM potassium chloride solution for 1 h at RT to induce G4 formation. Protein binding microarray experiments were then performed as previously described.²³ Microarrays were blocked with 4% nonfat dry milk in a potassium phosphate buffer before incubation with proteins or small molecules. Expressed proteins were blocked with 4% nonfat dry milk, ssDNA, and BSA. For the validation experiments, microarrays were also treated with 100 mM lithium chloride to inhibit G4 formation. For experiments examining dsDNA, single-stranded DNA probes were made double-stranded using a primer complementary to a 24-mer constant sequence following the method described previously.^23,24 Double stranding efficiency was monitored using 4% Cy3-dCTP.

Data Processing and Analysis

Protein or molecule-bound microarrays were scanned with the G5761A SureScan Dx Microarray Scanner System (Agilent) to detect a Cy5 signal at two laser settings (30 and 100 PMT) to ensure signal intensities were below saturation. Spot intensities from microarray images were extracted using the Agilent Feature Extraction Software and are reported as raw fluorescence units. All binding assays were performed at least twice (Figure S3), with high agreement between replicates (R > 0.8). Microarrays with the fewest number of saturated spots were used for further analysis. Median intensity was then computed for probes containing identical sequence on each microarray. Sequence logos were generated from a position frequency matrix generated from selected sequences using ggseqlogo.⁴⁸

To gauge the correlation between G4-seq and our microarray data, we used G4detector⁴² with a pretrained model on human genomic G4s stabilized by K⁺ and PDS with randomized negative genomic sequence.²⁸ For each microarray probe sequence, we used G4detector to predict the probability of it being a G4, i.e., a number between 0 and 1. We normalized both the measured array data (Design 3, PDS) and predictions using the following

Y_{i} = \log (1 + X_{i} - \min (X))

(1)

where X is the vector of array intensity measurements or G4 probability predictions. We reported the Pearson correlation between log normalized predicted probabilities and log normalized intensities.

Supplementary Material

Table S6

NIHMS1590673-supplement-Table_S6.xlsx^{(15MB, xlsx)}

Supplementary Material

NIHMS1590673-supplement-Supplementary_Material.pdf^{(22.3MB, pdf)}

Table S1-S5, S7-S8

NIHMS1590673-supplement-Table_S1-S5__S7-S8.xlsx^{(46.8KB, xlsx)}

ACKNOWLEDGMENTS

We thank M. Banco and N. Demeshkinafrom with the Ferré-D’Amareé Lab (National Institutes of Health, Bethesda) for the purified DHX36 protein.

Funding

This work was supported by the intramural program of the National Institutes of Health.

Footnotes

The authors declare no competing financial interest.

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acschembio.9b00934.

Structure of Cy5-PDS (Figure S1), Western blots of all in vitro translation (IVT) reactions for all proteins used in this study (Figure S2), summary of all pairwise Pearson correlations for all experiments (Figure S3), effect of concentration on PDS-binding intensities (Figure S4), distribution of BG4 intensities on design 3 microarray at different concentrations (Figure S5), Cy5-PDS binding to dsDNA (Figure S6), summary of protein binding to G4 microarrays (Figure S7), summary of normalized-binding intensities for Design 1 and Design 2 microarray experiments (Figure S8), Cy5-PDS preferentially binds parallel G4s (Figure S9), comparison of Cy5-PDS binding to features on Design 3 microarray versus all other proteins (Figure S10), effects of single nucleotide variants on Cy5-PDS binding to several G4 sequences (Figure S11), effects of variants on Cy5-PDS binding to G4 sequences that contain more than 4 G-tracts (Figure S12), effects of single nucleotide variations of known G4s on protein binding (Figure S13), relationship between loop length for MYC Pu22 G4 sequence and fluorescence intensity for all measured molecules (Figure S14), motif logos for molecules binding loop length variants of the MYC Pu22 G4 (Figure S15), effects of unlabeled PDS or TMPyP4 on Cy5-PDS binding (Figure S16), competition between Cy5-PDS and DC-34 (Figure S17), DC-34 binding versus pyridostatin for different sequence sets (Figure S18), and correlation of array data (Cy5-PDS) with predictive model of G4 formation potential (Figure S19) (PDF)

Protein constructs used (Table S1), list of G4 sequences from oncogene promoters (Table S2), description of features on all three microarray designs (Tables S3–S5), fold-range in binding for all sequence sets and all array designs (Table S7), and summary of decrease in binding in lithium for the design 3 array (Table S8) (XLSX)

All fluorescent-binding intensities for all three array designs (Table S6) (XLSX)

Data Availability: Data (array probe sequences, raw probe intensities, and median feature intensities) are available at the NCBI GEO database under accession no. GSE133368.

Contributor Information

Sreejana Ray, Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States.

Desiree Tillo, Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States.

Robert E. Boer, Chemical Biology Laboratory, National Cancer Institute-Frederick, Frederick, Maryland 21702, United States

Nima Assad, Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States.

Mira Barshai, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.

Guanhui Wu, Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, West Lafayette, Indiana 47907, United States.

Yaron Orenstein, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.

Danzhou Yang, Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, West Lafayette, Indiana 47907, United States.

John S. Schneekloth, Jr., Chemical Biology Laboratory, National Cancer Institute-Frederick, Frederick, Maryland 21702, United States.

Charles Vinson, Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States.

REFERENCES

(1).Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS,Govindarajan S, Shaulsky G, Walhout AJM, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, and Hughes TR (2014) Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 [DOI] [PMC free article] [PubMed] [Google Scholar]
(2).Guiblet WM, Cremona MA, Cechova M, Harris RS, Kejnovska I, Kejnovsky E, Eckert K, Chiaromonte F, and Makova KD (2018) Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate. Genome Res. 28, 1767–1778. [DOI] [PMC free article] [PubMed] [Google Scholar]
(3).Ashton NW, Bolderson E, Cubeddu L, O’Byrne KJ, and Richard DJ (2013) Human single-stranded DNA binding proteins are essential for maintaining genomic stability. BMC Mol. Biol. 14, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
(4).Mishra SK, Tawani A, Mishra A, and Kumar A (2016) G4IPDB: A database for G-quadruplex structure forming nucleic acid interacting proteins. Sci. Rep. 6, 38144. [DOI] [PMC free article] [PubMed] [Google Scholar]
(5).Gellert M, Lipsett MN, and Davies DR (1962) Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U. S. A. 48, 2013–2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
(6).Rhodes D, and Lipps HJ (2015) G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 43, 8627–8637. [DOI] [PMC free article] [PubMed] [Google Scholar]
(7).Hansel-Hertsch R, Di Antonio M, and Balasubramanian S (2017) DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential. Nat. Rev. Mol. Cell Biol. 18, 279–284. [DOI] [PubMed] [Google Scholar]
(8).Konig SL, Evans AC, and Huppert JL (2010) Seven essential questions on G-quadruplexes. Biomol. Concepts 1, 197–213. [DOI] [PubMed] [Google Scholar]
(9).Siddiqui-Jain A, Grand CL, Bearss DJ, and Hurley LH (2002) Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci U. S. A. 99, 11593–11598. [DOI] [PMC free article] [PubMed] [Google Scholar]
(10).Dai J, Dexheimer TS, Chen D, Carver M, Ambrus A, Jones RA, and Yang D (2006) An intramolecular G-quadruplex structure with mixed parallel/antiparallel G-strands formed in the human BCL-2 promoter region in solution. J. Am. Chem. Soc. 128, 1096–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
(11).Yang D, and Okamoto K (2010) Structural insights into G- quadruplexes: towards new anticancer drugs. Future Med. Chem. 2, 619–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
(12).Neidle S (2016) Quadruplex Nucleic Acids as Novel Therapeutic Targets. J. Med. Chem. 59, 5987–6011. [DOI] [PubMed] [Google Scholar]
(13).Rodriguez R, Muller S, Yeoman JA, Trentesaux C, Riou JF, and Balasubramanian S (2008) A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J.Am. Chem. Soc. 130, 15758–15759. [DOI] [PMC free article] [PubMed] [Google Scholar]
(14).Parkinson GN, Ghosh R, and Neidle S (2007) Structural basis for binding of porphyrin to human telomeres. Biochemistry 46, 2390–2397. [DOI] [PubMed] [Google Scholar]
(15).Calabrese DR, Chen X, Leon EC, Gaikwad SM, Phyo Z, Hewitt WM, Alden S, Hilimire TA, He F, Michalowski AM, Simmons JK, Saunders LB, Zhang S, Connors D, Walters KJ, Mock BA, and Schneekloth JS Jr. (2018) Chemical and structural studies provide a mechanistic basis for recognition of the MYC G-quadruplex. Nat. Commun. 9, 4229. [DOI] [PMC free article] [PubMed] [Google Scholar]
(16).Mendoza O, Bourdoncle A, Boule JB, Brosh RM Jr., and Mergny JL (2016) G-quadruplexes and helicases. Nucleic Acids Res. 44, 1989–2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
(17).Gonzalez V, Guo K, Hurley L, and Sun D (2009) Identification and characterization of nucleolin as a c-myc G- quadruplex-binding protein. J. Biol. Chem. 284, 23622–23635. [DOI] [PMC free article] [PubMed] [Google Scholar]
(18).Connor AC, Frederick KA, Morgan EJ, and McGown LB (2006) Insulin capture by an insulin-linked polymorphic region G- quadruplex DNA oligonucleotide. J. Am. Chem. Soc. 128, 4986–4991. [DOI] [PMC free article] [PubMed] [Google Scholar]
(19).Armas P, Nasif S, and Calcaterra NB (2008) Cellular nucleic acid binding protein binds G-rich single-stranded nucleic acids and may function as a nucleic acid chaperone. J. Cell. Biochem. 103, 1013–1036. [DOI] [PubMed] [Google Scholar]
(20).Kouzine F, Wojtowicz D, Baranello L, Yamane A, Nelson S,Resch W, Kieffer-Kwon KR, Benham CJ, Casellas R, Przytycka TM, and Levens D (2017) Permanganate/S1 Nuclease Footprinting Reveals Non-B DNA Structures with Regulatory Potential across a Mammalian Genome. Cell Syst 4, 344–356.e347. [DOI] [PMC free article] [PubMed] [Google Scholar]
(21).Mao SQ, Ghanbarian AT, Spiegel J, Martinez Cuesta S, Beraldi D, Di Antonio M, Marsico G, Hansel-Hertsch R, Tannahill D, and Balasubramanian S (2018) DNA G-quadruplex structures mold the DNA methylome. Nat. Struct. Mol. Biol. 25, 951–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
(22).Stewart AJ, Hannenhalli S, and Plotkin JB (2012) Why transcription factor binding sites are ten nucleotides long. Genetics 192, 973–985. [DOI] [PMC free article] [PubMed] [Google Scholar]
(23).Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang CF, Coburn D, Newburger DE, Morris Q, Hughes TR, and Bulyk ML (2009) Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723. [DOI] [PMC free article] [PubMed] [Google Scholar]
(24).Berger MF, and Bulyk ML (2009) Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4, 393–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
(25).Varizhuk A, Ilyinsky N, Smirnov I, and Pozmogova G (2016) G4 Aptamers: Trends in Structural Design. Mini-Rev. Med. Chem. 16, 1321–1329. [DOI] [PubMed] [Google Scholar]
(26).Iida K, Nakamura T, Yoshida W, Tera M, Nakabayashi K, Hata K, Ikebukuro K, and Nagasawa K (2013) Fluorescent-ligand- mediated screening of G-quadruplex structures using a DNA microarray. Angew. Chem. Int. Ed. 52, 12052–12055. [DOI] [PubMed] [Google Scholar]
(27).Weitzmann MN, Woodford KJ, and Usdin K (1996) The development and use of a DNA polymerase arrest assay for the evaluation of parameters affecting intrastrand tetraplex formation. J. Biol. Chem. 271, 20958–20964. [DOI] [PubMed] [Google Scholar]
(28).Chambers VS, Marsico G, Boutell JM, Di Antonio M, Smith GP, and Balasubramanian S (2015) High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol 33, 877–881. [DOI] [PubMed] [Google Scholar]
(29).Andrilenas KK, Penvose A, and Siggers T (2015) Using protein-binding microarrays to study transcription factor specificity: homologs, isoforms and complexes. Briefings Funct. Genomics 14, 17–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
(30).Bedrat A, Lacroix L, and Mergny JL (2016) Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 44, 1746–1759. [DOI] [PMC free article] [PubMed] [Google Scholar]
(31).Biffi G, Tannahill D, McCafferty J, and Balasubramanian S (2013) Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
(32).Bhattacharyya D, Mirihana Arachchilage G, and Basu S (2016) Metal Cations in G-Quadruplex Folding and Stability. Front. Chem. 4, 38. [DOI] [PMC free article] [PubMed] [Google Scholar]
(33).Khund-Sayeed S, He X, Holzberg T, Wang J, Rajagopal D, Upadhyay S, Durell SR, Mukherjee S, Weirauch MT, Rose R, and Vinson C (2016) 5-Hydroxymethylcytosine in E-box motifs ACATIGTG and ACACIGTG increases DNA-binding of the B-HLH transcription factor TCF4. Integr Biol. (Camb) 8, 936–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
(34).Adrian M, Ang DJ, Lech CJ, Heddi B, Nicolas A, and Phan AT (2014) Structure and conformational dynamics of a stacked dimeric G-quadruplex formed by the human CEB1 minisatellite. J. Am. Chem. Soc. 136, 6297–6305. [DOI] [PubMed] [Google Scholar]
(35).Ambrus A, Chen D, Dai J, Jones RA, and Yang D (2005) Solution structure of the biologically relevant G-quadruplex element in the human c-MYC promoter. Implications for G-quadruplex stabilization. Biochemistry 44, 2048–2058. [DOI] [PubMed] [Google Scholar]
(36).Dai J, Carver M, Hurley LH, and Yang D (2011) Solution structure of a 2:1 quindoline-c-MYC G-quadruplex: insights into G- quadruplex-interactive small molecule drug design. J. Am. Chem. Soc. 133, 17673–17680. [DOI] [PMC free article] [PubMed] [Google Scholar]
(37).Onel B, Carver M, Wu G, Timonina D, Kalarn S, Larriva M, and Yang D (2016) A New G-Quadruplex with Hairpin Loop Immediately Upstream of the Human BCL2 P1 Promoter Modulates Transcription. J. Am. Chem. Soc. 138, 2563–2570. [DOI] [PMC free article] [PubMed] [Google Scholar]
(38).Dai J, Carver M, and Yang D (2008) Polymorphism of human telomeric quadruplex structures. Biochimie 90, 1172–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
(39).Kim M, Kreig A, Lee CY, Rube HT, Calvert J, Song JS,and Myong S (2016) Quantitative analysis and prediction of G- quadruplex forming sequences in double-stranded DNA. Nucleic Acids Res. 44, 4807–4817. [DOI] [PMC free article] [PubMed] [Google Scholar]
(40).Cheng M, Cheng Y, Hao J, Jia G, Zhou J, Mergny JL, and Li C (2018) Loop permutation affects the topology and stability of G-quadruplexes. Nucleic Acids Res. 46, 9264–9275. [DOI] [PMC free article] [PubMed] [Google Scholar]
(41).Chen MC, Tippana R, Demeshkina NA, Murat P, Balasubramanian S, Myong S, and Ferre-D’Amare AR (2018) Structural basis of G-quadruplex unfolding by the DEAH/RHA helicase DHX36. Nature 558, 465–469. [DOI] [PMC free article] [PubMed] [Google Scholar]
(42).Barshai M, and Orenstein Y (2019) Predicting G- Quadruplexes from DNA Sequences Using Multi-Kernel Convolu-tional Neural Networks, In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp 357–365, Association for Computing Machinery, Niagara Falls, NY, USA, DOI: 10.1145/3307339.3342133. [DOI] [Google Scholar]
(43).Levens D, Baranello L, and Kouzine F (2016) Controlling gene expression by DNA mechanics: emerging insights and challenges. Biophys. Rev. 8, 23–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
(44).Chao W, and D’Amore PA (2008) IGF2: epigenetic regulation and role in development and disease. Cytokine Growth Factor Rev. 19, 111–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
(45).Chapple CE, Robisson B, Spinelli L, Guien C, Becker E, and Brun C (2015) Extreme multifunctional proteins identified from a human protein interaction network. Nat. Commun. 6, 7412. [DOI] [PMC free article] [PubMed] [Google Scholar]
(46).Phan AT, and Patel DJ (2003) Two-repeat human telomeric d(TAGGGTTAGGGT) sequence forms interconverting parallel and antiparallel G-quadruplexes in solution: distinct topologies, thermodynamic properties, and folding/unfolding kinetics. J. Am. Chem. Soc. 125, 15021–15027. [DOI] [PMC free article] [PubMed] [Google Scholar]
(47).Rodriguez R, Miller KM, Forment JV, Bradshaw CR, Nikan M, Britton S, Oelschlaegel T, Xhemalce B, Balasubramanian S, and Jackson SP (2012) Small-molecule- induced DNA damage identifies alternative DNA structures in human genes. Nat. Chem. Biol. 8, 301–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
(48).Wagih O (2017) ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S6

NIHMS1590673-supplement-Table_S6.xlsx^{(15MB, xlsx)}

Supplementary Material

NIHMS1590673-supplement-Supplementary_Material.pdf^{(22.3MB, pdf)}

Table S1-S5, S7-S8

NIHMS1590673-supplement-Table_S1-S5__S7-S8.xlsx^{(46.8KB, xlsx)}

[R1] (1).Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS,Govindarajan S, Shaulsky G, Walhout AJM, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, and Hughes TR (2014) Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] (2).Guiblet WM, Cremona MA, Cechova M, Harris RS, Kejnovska I, Kejnovsky E, Eckert K, Chiaromonte F, and Makova KD (2018) Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate. Genome Res. 28, 1767–1778. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] (3).Ashton NW, Bolderson E, Cubeddu L, O’Byrne KJ, and Richard DJ (2013) Human single-stranded DNA binding proteins are essential for maintaining genomic stability. BMC Mol. Biol. 14, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] (4).Mishra SK, Tawani A, Mishra A, and Kumar A (2016) G4IPDB: A database for G-quadruplex structure forming nucleic acid interacting proteins. Sci. Rep. 6, 38144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] (5).Gellert M, Lipsett MN, and Davies DR (1962) Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U. S. A. 48, 2013–2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] (6).Rhodes D, and Lipps HJ (2015) G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 43, 8627–8637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] (7).Hansel-Hertsch R, Di Antonio M, and Balasubramanian S (2017) DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential. Nat. Rev. Mol. Cell Biol. 18, 279–284. [DOI] [PubMed] [Google Scholar]

[R8] (8).Konig SL, Evans AC, and Huppert JL (2010) Seven essential questions on G-quadruplexes. Biomol. Concepts 1, 197–213. [DOI] [PubMed] [Google Scholar]

[R9] (9).Siddiqui-Jain A, Grand CL, Bearss DJ, and Hurley LH (2002) Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci U. S. A. 99, 11593–11598. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] (10).Dai J, Dexheimer TS, Chen D, Carver M, Ambrus A, Jones RA, and Yang D (2006) An intramolecular G-quadruplex structure with mixed parallel/antiparallel G-strands formed in the human BCL-2 promoter region in solution. J. Am. Chem. Soc. 128, 1096–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] (11).Yang D, and Okamoto K (2010) Structural insights into G- quadruplexes: towards new anticancer drugs. Future Med. Chem. 2, 619–646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] (12).Neidle S (2016) Quadruplex Nucleic Acids as Novel Therapeutic Targets. J. Med. Chem. 59, 5987–6011. [DOI] [PubMed] [Google Scholar]

[R13] (13).Rodriguez R, Muller S, Yeoman JA, Trentesaux C, Riou JF, and Balasubramanian S (2008) A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J.Am. Chem. Soc. 130, 15758–15759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] (14).Parkinson GN, Ghosh R, and Neidle S (2007) Structural basis for binding of porphyrin to human telomeres. Biochemistry 46, 2390–2397. [DOI] [PubMed] [Google Scholar]

[R15] (15).Calabrese DR, Chen X, Leon EC, Gaikwad SM, Phyo Z, Hewitt WM, Alden S, Hilimire TA, He F, Michalowski AM, Simmons JK, Saunders LB, Zhang S, Connors D, Walters KJ, Mock BA, and Schneekloth JS Jr. (2018) Chemical and structural studies provide a mechanistic basis for recognition of the MYC G-quadruplex. Nat. Commun. 9, 4229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] (16).Mendoza O, Bourdoncle A, Boule JB, Brosh RM Jr., and Mergny JL (2016) G-quadruplexes and helicases. Nucleic Acids Res. 44, 1989–2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] (17).Gonzalez V, Guo K, Hurley L, and Sun D (2009) Identification and characterization of nucleolin as a c-myc G- quadruplex-binding protein. J. Biol. Chem. 284, 23622–23635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] (18).Connor AC, Frederick KA, Morgan EJ, and McGown LB (2006) Insulin capture by an insulin-linked polymorphic region G- quadruplex DNA oligonucleotide. J. Am. Chem. Soc. 128, 4986–4991. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] (19).Armas P, Nasif S, and Calcaterra NB (2008) Cellular nucleic acid binding protein binds G-rich single-stranded nucleic acids and may function as a nucleic acid chaperone. J. Cell. Biochem. 103, 1013–1036. [DOI] [PubMed] [Google Scholar]

[R20] (20).Kouzine F, Wojtowicz D, Baranello L, Yamane A, Nelson S,Resch W, Kieffer-Kwon KR, Benham CJ, Casellas R, Przytycka TM, and Levens D (2017) Permanganate/S1 Nuclease Footprinting Reveals Non-B DNA Structures with Regulatory Potential across a Mammalian Genome. Cell Syst 4, 344–356.e347. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] (21).Mao SQ, Ghanbarian AT, Spiegel J, Martinez Cuesta S, Beraldi D, Di Antonio M, Marsico G, Hansel-Hertsch R, Tannahill D, and Balasubramanian S (2018) DNA G-quadruplex structures mold the DNA methylome. Nat. Struct. Mol. Biol. 25, 951–957. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] (22).Stewart AJ, Hannenhalli S, and Plotkin JB (2012) Why transcription factor binding sites are ten nucleotides long. Genetics 192, 973–985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] (23).Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang CF, Coburn D, Newburger DE, Morris Q, Hughes TR, and Bulyk ML (2009) Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] (24).Berger MF, and Bulyk ML (2009) Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4, 393–411. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] (25).Varizhuk A, Ilyinsky N, Smirnov I, and Pozmogova G (2016) G4 Aptamers: Trends in Structural Design. Mini-Rev. Med. Chem. 16, 1321–1329. [DOI] [PubMed] [Google Scholar]

[R26] (26).Iida K, Nakamura T, Yoshida W, Tera M, Nakabayashi K, Hata K, Ikebukuro K, and Nagasawa K (2013) Fluorescent-ligand- mediated screening of G-quadruplex structures using a DNA microarray. Angew. Chem. Int. Ed. 52, 12052–12055. [DOI] [PubMed] [Google Scholar]

[R27] (27).Weitzmann MN, Woodford KJ, and Usdin K (1996) The development and use of a DNA polymerase arrest assay for the evaluation of parameters affecting intrastrand tetraplex formation. J. Biol. Chem. 271, 20958–20964. [DOI] [PubMed] [Google Scholar]

[R28] (28).Chambers VS, Marsico G, Boutell JM, Di Antonio M, Smith GP, and Balasubramanian S (2015) High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol 33, 877–881. [DOI] [PubMed] [Google Scholar]

[R29] (29).Andrilenas KK, Penvose A, and Siggers T (2015) Using protein-binding microarrays to study transcription factor specificity: homologs, isoforms and complexes. Briefings Funct. Genomics 14, 17–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] (30).Bedrat A, Lacroix L, and Mergny JL (2016) Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 44, 1746–1759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] (31).Biffi G, Tannahill D, McCafferty J, and Balasubramanian S (2013) Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] (32).Bhattacharyya D, Mirihana Arachchilage G, and Basu S (2016) Metal Cations in G-Quadruplex Folding and Stability. Front. Chem. 4, 38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] (33).Khund-Sayeed S, He X, Holzberg T, Wang J, Rajagopal D, Upadhyay S, Durell SR, Mukherjee S, Weirauch MT, Rose R, and Vinson C (2016) 5-Hydroxymethylcytosine in E-box motifs ACATIGTG and ACACIGTG increases DNA-binding of the B-HLH transcription factor TCF4. Integr Biol. (Camb) 8, 936–945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] (34).Adrian M, Ang DJ, Lech CJ, Heddi B, Nicolas A, and Phan AT (2014) Structure and conformational dynamics of a stacked dimeric G-quadruplex formed by the human CEB1 minisatellite. J. Am. Chem. Soc. 136, 6297–6305. [DOI] [PubMed] [Google Scholar]

[R35] (35).Ambrus A, Chen D, Dai J, Jones RA, and Yang D (2005) Solution structure of the biologically relevant G-quadruplex element in the human c-MYC promoter. Implications for G-quadruplex stabilization. Biochemistry 44, 2048–2058. [DOI] [PubMed] [Google Scholar]

[R36] (36).Dai J, Carver M, Hurley LH, and Yang D (2011) Solution structure of a 2:1 quindoline-c-MYC G-quadruplex: insights into G- quadruplex-interactive small molecule drug design. J. Am. Chem. Soc. 133, 17673–17680. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] (37).Onel B, Carver M, Wu G, Timonina D, Kalarn S, Larriva M, and Yang D (2016) A New G-Quadruplex with Hairpin Loop Immediately Upstream of the Human BCL2 P1 Promoter Modulates Transcription. J. Am. Chem. Soc. 138, 2563–2570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] (38).Dai J, Carver M, and Yang D (2008) Polymorphism of human telomeric quadruplex structures. Biochimie 90, 1172–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] (39).Kim M, Kreig A, Lee CY, Rube HT, Calvert J, Song JS,and Myong S (2016) Quantitative analysis and prediction of G- quadruplex forming sequences in double-stranded DNA. Nucleic Acids Res. 44, 4807–4817. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] (40).Cheng M, Cheng Y, Hao J, Jia G, Zhou J, Mergny JL, and Li C (2018) Loop permutation affects the topology and stability of G-quadruplexes. Nucleic Acids Res. 46, 9264–9275. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] (41).Chen MC, Tippana R, Demeshkina NA, Murat P, Balasubramanian S, Myong S, and Ferre-D’Amare AR (2018) Structural basis of G-quadruplex unfolding by the DEAH/RHA helicase DHX36. Nature 558, 465–469. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] (42).Barshai M, and Orenstein Y (2019) Predicting G- Quadruplexes from DNA Sequences Using Multi-Kernel Convolu-tional Neural Networks, In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp 357–365, Association for Computing Machinery, Niagara Falls, NY, USA, DOI: 10.1145/3307339.3342133. [DOI] [Google Scholar]

[R43] (43).Levens D, Baranello L, and Kouzine F (2016) Controlling gene expression by DNA mechanics: emerging insights and challenges. Biophys. Rev. 8, 23–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] (44).Chao W, and D’Amore PA (2008) IGF2: epigenetic regulation and role in development and disease. Cytokine Growth Factor Rev. 19, 111–120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] (45).Chapple CE, Robisson B, Spinelli L, Guien C, Becker E, and Brun C (2015) Extreme multifunctional proteins identified from a human protein interaction network. Nat. Commun. 6, 7412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] (46).Phan AT, and Patel DJ (2003) Two-repeat human telomeric d(TAGGGTTAGGGT) sequence forms interconverting parallel and antiparallel G-quadruplexes in solution: distinct topologies, thermodynamic properties, and folding/unfolding kinetics. J. Am. Chem. Soc. 125, 15021–15027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] (47).Rodriguez R, Miller KM, Forment JV, Bradshaw CR, Nikan M, Britton S, Oelschlaegel T, Xhemalce B, Balasubramanian S, and Jackson SP (2012) Small-molecule- induced DNA damage identifies alternative DNA structures in human genes. Nat. Chem. Biol. 8, 301–310. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] (48).Wagih O (2017) ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647. [DOI] [PubMed] [Google Scholar]

PERMALINK

Custom DNA Microarrays Reveal Diverse Binding Preferences of Proteins and Small Molecules to Thousands of G-Quadruplexes

Sreejana Ray

Desiree Tillo

Robert E Boer

Nima Assad

Mira Barshai

Guanhui Wu

Yaron Orenstein

Danzhou Yang

John S Schneekloth Jr

Charles Vinson

Abstract

Graphical Abstract

RESULTS

Design of a G4Microarray

Table 1.

Figure 1.

G4 Structures Fold on DNA Microarrays

Inhibition of G4 Formation Inhibits Cy5-PDS and Cy5-BG4 Binding

Protein-Binding Specificity to G4 DNA in Potassium and Lithium

Figure 2.

Diversity of G4-Binding Specificity of Cellular Proteins

Figure 3.

The Effect of Single Nucleotide Variants on G4 Binding

Effects of G4 Loop and Tail Parameters on Molecule Binding

Figure 4.

Competition Experiments Reveal G4-Binding Specificity of Unlabeled Small Molecules

Figure 5.

Measurements of G4 Binding on Microarrays Correlate with Sequencing-Based Methods

DISCUSSION

MATERIALS AND METHODS

Synthesis of Cy5 Conjugated Pyridostatin

Sources of Antibody, Small Molecule, and Protein Constructs

Binding Experiments

Data Processing and Analysis

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases