Abstract
Determining the sequence specifity of DNA binding molecules is a nontrivial task. Here we describe the development of a platform for assaying the sequence specificity of DNA ligands using label free detection on high density DNA microarrays. This is achieved by combining Cognate Site Identification (CSI) with Fluorescence Intercalation Displacement (FID) to create CSI-FID. We use the well-studied small molecule DNA ligand netropsin to develop this high throughput platform. Analysis of the DNA binding properties of protein- and small molecule-based libraries with CSI-FID will advance the development of genome-anchored molecules for therapeutic purposes.
A critical challenge at the interface of biology, chemistry, and molecular medicine is developing highly specific small molecules that target the genome to regulate its function1–10. A greater understanding of the principles that govern specificity will enhance our ability to predict their biological action on genomes, advancing the development of genome-anchored therapeutics. Similarly, understanding natural DNA binding proteins will help elucidate their regulatory function in cells. Given its importance, many methods have been developed to assess the binding of small molecules and proteins to DNA, including low throughput methods, such as nuclease protection11, 12, affinity cleavage13, and electrophoretic mobility shift assays14, 15 (EMSAs), mid throughput assays including fluorescence anisotropy16 and fluorescence resonance energy transfer17, label free methods such as surface plasmon resonance (SPR)18 and photonics based approaches19, and high throughput assays including SELEX20 and DNA microarrays21–24. Among these, two high throughput methods which can determine DNA binding specificity of biomolecules or synthetic ligands in a rapid and unbiased manner are the Cognate Site Identifier (CSI) arrays and the Fluorescent Intercalator Displacement (FID) assay (Fig 1A)23, 25, 26. CSI arrays determine the specificity and affinity of DNA ligands using a microarray displaying double-stranded DNA (dsDNA) hairpin oligonucleotides containing all permutations of up to 12 positional variants (~2 million sequences). For CSI, a fluorescently labeled DNA ligand of interest is applied to the array to provide a distribution of intensities related to DNA binding affinity. Sequences with the highest intensities are evaluated to identify a consensus motif. The array data, therefore, provides the full-spectrum of binding specificities across the entire sequence space of a 12mer. The FID assay is a plate-based technique that measures the amount of ligand-induced displacement of an intercalated dye (commonly ethidium bromide, EtBr) from a DNA hairpin to determine the sequence preference of an unlabeled DNA binding ligand. The assay measures affinities in solution and provides a rapid means of measuring binding affinities (Fig 1A).
Both CSI and FID assays have been used successfully to determine the specificity and affinity of several DNA binding small molecules, as well as triplex forming oligonucleotides, proteins, and polyamides23, 26–30. These methods offer complementary strengths toward the goal of understanding DNA ligand specificity. CSI can be used to interrogate the entire sequence space of at least 12mer DNA binding sites with a high dynamic range, and FID offers the benefit of label-free detection. Both approaches can also be used to determine DNA dissociation constants for the DNA ligand of interest23, 25–28. However, with CSI the detection of DNA binding factors to the dsDNA array is dependent on either direct fluorescent labeling or indirect detection methods. This hinders its ability to analyze the DNA binding of unlabeled proteins and small molecules. Also, the labeling of small molecules or proteins with fluorescent tags may perturb their DNA binding properties. With FID, the major limitation is the DNA library size, with most assays being performed on all permutations of 5mer DNA (512 sequences). Larger DNA binding sites are possible; however, these typically only use a subset of the total library members to avoid an exponential increase in the cost of synthesis and purification of complex (>5mer) libraries. By combining CSI and FID we can overcome the limitations of both by utilizing the label-free detection of ligand-DNA interactions on arrays with a 3–4 order of magnitude increase in DNA library complexity, which is the genesis of CSI-FID (Fig 1B). Using CSI-FID, we examined the comprehensive binding profile of netropsin, a minor groove DNA binding small molecule, that exhibits antiviral and antitumor properties (Fig 2A) 31. Netropsin has served as a model for a class of sequence specific minor groove DNA binding agents32, 33, and previous FID assays have shown that it is an excellent candidate for assay validation of CSI-FID26.
Development and optimization of CSI-FID
Successful implementation of CSI-FID arrays required adapting the dye displacement ability of FID for a CSI microarray platform. The intercalating dye EtBr was chosen as it has several desirable properties; it increases in fluorescence upon DNA binding thereby decreasing background signal, equilibrates rapidly with DNA, and has low sequence specificity26, 34. For the CSI-FID assay we designed an array which contained all possible 9 base pair (bp) DNA permutations. This makes the low sequence specificity of EtBr an especially important property for the optimal performance of the assay as the EtBr dye should be bound to each dsDNA probe. Initial titrations of EtBr concentrations in binding buffer (100mM NaCl, 100mM Tris pH 8.0) with the CSI-FID arrays indicated an optimal range of 3–6μM EtBr, above its micromolar KD 35. This EtBr amount generated a 10-fold signal-to-noise (S/N) ratio for the intensity of EtBr binding to dsDNA over the array surface. Larger concentrations of EtBr increased surface binding and decreased the S/N ratio, while lower concentrations showed minimal dsDNA binding.
CSI-FID for the analysis of netropsin DNA binding
To assess assay performance netropsin DNA binding was examined using a CSI-FID array. First, EtBr at 6μM was incubated with the array for 1 hour. Subsequently, 3μM of netropsin (KD of 1–100nM26, 36) was added to the EtBr solution, and the array was incubated for another hour to allow the binding reaction to achieve equilibrium. To account for any sequence bias of EtBr, and as a control for displacement, a second array was run with EtBr alone. The subsequent imaging of the CSI-FID arrays was performed with a readily available 5 micron microarray scanner using a standard 532 nm (Cy3 compatible) laser.
For both arrays we obtained a distribution of intensities indicating EtBr bound to the dsDNA probes. A comparison of the histograms for the EtBr array shows a gaussian distribution centered on 100% EtBr binding (Fig 2B), whereas the histogram of the netropsin array shows a subset of sequences with a distinct decrease in EtBr binding (Fig 2B, yellow bars on left of center), demonstrating the sequence dependent displacement of EtBr by netropsin. Based on the full displacement of EtBr from the CSI-FID array by netropsin, a fluorescent decrease of 23–29% was expected based on a 4–5 bp binding site of netropsin from 17 bp total for each DNA hairpin (9 bp of variable region plus 4 bp of constant flanking sequence on either side). The data indicate that a displacement of 20% was obtained for the best netropsin probes, close to the maximal allowable for this array design. Further analysis of the netropsin CSI-FID data indicated that a library of 7mer sequences (16,384 members) was sufficient to represent the full binding profile of netropsin. Using 9mer arrays therefore had the added benefit of increasing the number of internal replicates and adding greater sequence context for each 7mer probe. When the netropsin displacement data is plotted, there is a clear preference for netropsin binding to AT-rich DNA (Fig 2C). A sequence motif obtained from the strongest netropsin binding sites further confirms this result (Fig 2D).
Sequence specificity landscapes reveal insight into netropsin binding specificity
To further distill the sequence binding preferences of netropsin, we displayed all binding intensities in a sequence specificity landscape (SSL) format (Fig 2E)37. SSLs display all sequences that are a perfect match for a chosen motif in the innermost ring, and subsequent rings display those sequences which contain mismatches to the chosen motif. The height of each peak corresponds to the fluorescent displacement (FD) of sequences on the CSI-FID array. The SSL display allows an unbiased and comprehensive analysis of the entire binding data, which is particularly beneficial for netropsin as most motif finding algorithms are unable to identify motifs using input sequences less than 8 bp in length38, 39.
For netropsin, the highest peaks (Fig 2E, red to yellow) are present in the innermost ring indicating that netropsin prefers DNA regions with a high AT content (perfect match to consensus 5′-WWWWWW-3′). Interestingly, in the 1 mismatch ring, the majority of the ridgeline (Fig 2D, light blue) contains DNA sequence stretches with at least 4 or more AT bp (WSWWWW or SWWWWW), with some higher peaks interspersed on the ridge (Fig 2E). However, the sequences present in the valley regions (dark blue, red arrows in Fig 2E) are dominated by DNA sequences with only 2 to 3 bp AT stretches (WWSWWW or WWWSWW). These results indicate a strong preference of netropsin for DNA with at least 4 or more AT bp and agrees well with previous studies on the sequence specificity of netropsin13, 26, 31.
CSI-FID versus solution-based assays
Binding to solid-surface immobilized oligonucleotides, as in CSI-FID, can be affected by mass transfer, probe density, surface characteristics, and washing steps. The data obtained by CSI-FID was therefore compared to previously obtained solution-based netropsin FID data26. For this analysis we parsed both datasets to represent all 4mer binding sites (136 members). Comparisons indicate that a clear correlation exists between both datasets (R2 of 0.76, Fig 3A). Of note is that all 10 of the possible 4mer AT rich sequences are represented in the top 10 binders for both datasets. There is also a distinct step (decrease in affinity) when moving from the top CSI-FID sequences to those containing even one GC bp (Fig 3B). This comparison indicates that surface-tethered probes yield similar results as solution-based methodologies. Taken together these results represent a strong validation of the specificity data obtained by the CSI-FID assay.
CSI-FID: complex dsDNA libraries and label free detection
While there are many assays available for the study of small molecule-DNA and protein-DNA interactions, CSI-FID surmounts several inherent shortcomings of these techniques. CSI-FID can overcome the throughput and library complexity limitations inherent with other label-free detection assays by providing a high throughput assay capable of assessing ligand binding to large DNA libraries. CSI-FID is a rapid, technically non-challenging, cost effective, and adaptable assay for the label-free detection of DNA binding by natural or engineered DNA binding molecules.
In the future, CSI-FID will be applied to other additional DNA targets, including complex mixtures of proteins and small molecule ligands. Therefore, CSI-FID will greatly enhance our ability to determine DNA binding motifs for unlabeled proteins and small molecules, which has direct applications for proteomic approaches and small molecule screening. CSI-FID will contribute dramatically to the understanding of ligand-DNA binding toward the development of genome-anchored therapeutics.
Supplementary Material
Acknowledgments
We thank Mary Ozers and Christopher Warren for helpful discussions of the manuscript, and Clayton Carlson for assistance with specificity landscapes. We gratefully acknowledge the support of the NIH (AZA: GM069420, DLB: CA041986, CA078045), March of Dimes (AZA: FY07-511), USDA (AZA: HATCH) as well as Shaw Scholar, W. M. Keck Foundation and Vilas Associate awards (A.Z.A.). K.E.H. was supported by an NSERC PGS fellowship.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References and Notes
- 1.Geierstanger BH, Wemmer DE. Annu Rev Biophys Biomol Struct. 1995;24:463. doi: 10.1146/annurev.bb.24.060195.002335. [DOI] [PubMed] [Google Scholar]
- 2.Chaires JB. Curr Opin Struct Biol. 1998;8:314. doi: 10.1016/s0959-440x(98)80064-x. [DOI] [PubMed] [Google Scholar]
- 3.Yang XL, Wang AHJ. Pharmacol Ther. 1999;83:181. doi: 10.1016/s0163-7258(99)00020-0. [DOI] [PubMed] [Google Scholar]
- 4.Gottesfeld JM, Turner JM, Dervan PB. Gene Expr. 2000;9:77. doi: 10.3727/000000001783992696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Neidle S. Nat Prod Rep. 2001;18:291. doi: 10.1039/a705982e. [DOI] [PubMed] [Google Scholar]
- 6.Ptashne M, Gann A. Genes & Signals. Cold Springs Harbor Laboratory Press; New York: 2002. [Google Scholar]
- 7.Darnell JE., Jr Nat Rev Cancer. 2002;2:740. doi: 10.1038/nrc906. [DOI] [PubMed] [Google Scholar]
- 8.Ansari AZ, Mapp AK. Curr Opin Chem Biol. 2002;6:765. doi: 10.1016/s1367-5931(02)00377-0. [DOI] [PubMed] [Google Scholar]
- 9.Waring MJ. Sequence-Specific DNA Binding Agents. Royal Society of Chemistry; Cambridge: 2006. [Google Scholar]
- 10.Hauschild KE, Carlson CD, Donato LJ, Moretti R, Ansari AZ. In: Wiley Encyclopedia of Chemical Biology. Begley T, editor. John Wiley & Sons, Inc; New York: 2008. [Google Scholar]
- 11.Galas DJ, Schmitz A. Nucleic Acids Res. 1978;5:3157. doi: 10.1093/nar/5.9.3157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Trauger JW, Dervan PB. Methods Enzymol. 2001;340:450. doi: 10.1016/s0076-6879(01)40436-8. [DOI] [PubMed] [Google Scholar]
- 13.Dyke MWV, Hertzberg RP, Dervan PB. Proc Natl Acad Sci U S A. 1982;79:5470. doi: 10.1073/pnas.79.18.5470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fried M, Crothers DM. Nucleic Acids Res. 1981;9:6505. doi: 10.1093/nar/9.23.6505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Garner MM, Revzin A. Nucleic Acids Res. 1981;9:3047. doi: 10.1093/nar/9.13.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Heyduk T, Ma Y, Tang H, Ebright RH. Methods Enzymol. 1996;274:492. doi: 10.1016/s0076-6879(96)74039-9. [DOI] [PubMed] [Google Scholar]
- 17.Heyduk T, Heyduk E. Nature Biotechnol. 2002;20:171. doi: 10.1038/nbt0202-171. [DOI] [PubMed] [Google Scholar]
- 18.Brockman JM, Frutos AG, Corn RM. Anal Biochem. 1993;214:251. [Google Scholar]
- 19.Chan LL, Pineda M, Heeres JT, Hergenrother PJ, Cunningham BT. ACS Chem Bio. 2008;3:437. doi: 10.1021/cb800057j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tuerk C, Gold L. Science. 1990;249:505. doi: 10.1126/science.2200121. [DOI] [PubMed] [Google Scholar]
- 21.Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson WJ, Bell SP, Young RA. Science. 2000;290(5500):2306–2309. doi: 10.1126/science.290.5500.2306. [DOI] [PubMed] [Google Scholar]
- 22.Wang JK, Li TX, Lu ZH. J Biochem Biophys Methods. 2005;63:100. doi: 10.1016/j.jbbm.2005.03.006. [DOI] [PubMed] [Google Scholar]
- 23.Warren CL, Kratochvil NC, Hauschild KE, Foister S, Brezinski ML, Dervan PB, Phillips GN, Jr, Ansari AZ. Proc Natl Acad Sci U S A. 2006;103:867. doi: 10.1073/pnas.0509843102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML. Nature Genet. 2004;36:1331. doi: 10.1038/ng1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tse WC, Boger DL. Acc Chem Res. 2004;37:61. doi: 10.1021/ar030113y. [DOI] [PubMed] [Google Scholar]
- 26.Boger DL, Fink BE, Brunette SR, Tse WC, Hedrick MP. J Am Chem Soc. 2001;123:5878. doi: 10.1021/ja010041a. [DOI] [PubMed] [Google Scholar]
- 27.Puckett JW, Muzikar KA, Tietjen J, Warren CL, Ansari AZ, Dervan PB. J Am Chem Soc. 2007;129:12310. doi: 10.1021/ja0744899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tse WC, Ishii T, Boger DL. Bioorg Med Chem. 2003;11:4479. doi: 10.1016/s0968-0896(03)00455-3. [DOI] [PubMed] [Google Scholar]
- 29.Keles S, Warren CL, Carlson CD, Ansari AZ. Nucleic Acids Res. 2008;36:3171. doi: 10.1093/nar/gkn057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Yeung BKS, Tse WC, Boger DL. Bioorg Med Chem Lett. 2003;13:3801. doi: 10.1016/j.bmcl.2003.07.005. [DOI] [PubMed] [Google Scholar]
- 31.Kopka ML, Yoon C, Goodsell D, Pjura P, Dickerson RE. Proc Natl Acad Sci U S A. 1985;82:1376. doi: 10.1073/pnas.82.5.1376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dervan PB, Edelson BS. Curr Opin Struct Biol. 2003;13:284. doi: 10.1016/s0959-440x(03)00081-2. [DOI] [PubMed] [Google Scholar]
- 33.White S, Szewczyk JW, Turner JM, Baird EE, Dervan PB. Nature. 1998;391:468. doi: 10.1038/35106. [DOI] [PubMed] [Google Scholar]
- 34.Waring MJ. J Mol Biol. 1965;13:269. doi: 10.1016/s0022-2836(65)80096-1. [DOI] [PubMed] [Google Scholar]
- 35.LePecq JB, Paoletti C. J Mol Biol. 1967;27:87. doi: 10.1016/0022-2836(67)90353-1. [DOI] [PubMed] [Google Scholar]
- 36.Rentzeperis D, Marky LA, Dwyer TJ, Geierstanger BH, Pelton JG, Wemmer DE. Biochemistry. 1995;34:2937. doi: 10.1021/bi00009a025. [DOI] [PubMed] [Google Scholar]
- 37.Warren CL, Carlson CD, Hauschild KE, Ozers MS, Qadir N, Bhimsaria D, Ansari AZ. doi: 10.1073/pnas.0914023107. Submitted for publication. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Das MK, Dai HK. BMC Bioinformatics. 2007;8(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ. Nat Biotechnol. 2005;23:137. doi: 10.1038/nbt1053. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.