Abstract
Gateway-compatible yeast one-hybrid (Y1H) assays provide a convenient gene-centered (DNA-to-protein) approach to identify the repertoire of transcription factors that can bind a DNA sequence of interest. We present a set of Y1H resources, including clones for 988 of 1,434 (69%) predicted human transcription factors, for the interrogation of interactions using either low or high-throughput settings. These approaches detect both known and novel interactions between human DNA regions and transcription factors.
Interactions between regulatory genomic DNA and transcription factors provide the first level of gene control and, therefore, the backbone of gene regulatory networks. Two complementary types of approaches can be used to identify such interactions1. Transcription factor-centered, or protein-to-DNA approaches (e.g. chromatin immunoprecipitation or ChIP) identify genomic regions bound by a transcription factor. Gene-centered, or DNA-to-protein approaches (e.g. Y1H), on the other hand, define the repertoire of transcription factors that can bind a DNA fragment of interest. The advantages and disadvantages of these techniques are discussed elsewhere2–4. In Y1H assays, a target DNA sequence (“DNA bait”) is cloned upstream of two reporter genes (HIS3 and LacZ) to generate two DNA bait::reporter constructs5. After integration of these constructs into the yeast genome to generate a “DNA bait strain”, interacting transcription factors (“protein preys”) can be identified either by screening complex cDNA or transcription factor mini-libraries, or by testing individual protein preys in a directed pair-wise manner5,6. Activation of the HIS3 reporter permits growth on media lacking histidine and containing 3-Amino-1,2,4-Triazole (3AT), a competitive inhibitor of the His3 enzyme, while activation of the LacZ reporter is detected by a colorimetric assay in which beta-galactosidase turns X-gal into a blue compound.
We have previously combined Y1H assays with Gateway cloning to transfer multiple DNA baits in parallel into the two Y1H reporter Destination vectors5 and have applied these assays to delineate C. elegans gene regulatory networks7–10 and to screen Arabidopsis gene promoters11. In an accompanying paper, we describe the development of a novel C. elegans Y1H pipeline, referred to as "enhanced Y1H" or eY1H4. eY1H employs a robotic setup together with an arrayed collection of yeast strains expressing transcription factor preys that can be mated with a DNA bait strain.
Currently, no gene-centered assays are available to identify human DNA-transcription factor interactions in a high-throughput manner. Here, we present a resource of human transcription factor-encoding open reading frames (ORFs) fused to the Gal4 activation domain (Gal4-AD) and apply this collection to several Y1H configurations, including the high-throughput eY1H pipeline, to map human regulatory interactions.
The human genome encodes 1,434 regulatory transcription factors, 1,116 of which are currently available in large clone collections12,13 (Methods and Supplementary Table 1). We transferred these ORFs to the Gal4-AD Y1H prey vector by Gateway cloning and, after sequence verification, obtained 988 full-length transcription factor prey clones (Fig. 1a and Supplementary Table 1). These clones can be transformed directly into DNA bait strains for haploid-based Y1H experiments6. We transformed these clones into the Y1H prey strain to generate a human transcription factor yeast array (Supplementary Table 2) that can be used in small-scale mating-based Y1H experiments as well as in eY1H assays4. We also added 236 clones for unconventional DNA-binding proteins (uDBPs)13 (Supplementary Tables 2 and 3).
Figure 1.
Human gene-centered Y1H assays. (a) Schematic showing generation of the human transcription factor collection. The 1,116 available human transcription factors were transferred to the AD-2µ Y1H prey Destination vector and 988 of the resulting clones were sequence-verified. (b) Examples of the detection of two known interactions in Y1H assays. Only HIS3 activation is shown as the DNA baits exhibit high background levels of LacZ expression (data not shown). P – permissive growth media; H – media for detecting HIS3 activation. (c) Detection of PRS interactions in different Y1H configurations. Orange – eY1H; teal – diploids by mating; green – haploids by transformation; grey – not tested. (d) Example of eY1H readout plate with the HBG1 promoter as DNA bait. Positive interactions are indicated.
To test the use of Y1H assays for the identification of human interactions, we first generated a small positive reference set (PRS) by literature curation (Supplementary Table 4). We predominantly focused on the well-studied beta-globin locus, but also included a few other regulatory regions and gene promoters (Supplementary Table 5). We tested each of the PRS interactions in different types of Y1H assays: in haploids and diploids, at different readout times, and under different Y1H conditions, because it has been demonstrated in other yeast-based assays that varying assay format and conditions results in a more comprehensive dataset14. Altogether, we could detect 24 out of 31 known interactions (77%, Fig. 1b,c and Supplementary Table 4), with five of these interactions detected in eY1H assays (16%)(Fig. 1d indicates one such interaction between the HBG1 promoter and NFYA). While this PRS detection rate for eY1H assays is comparable to that observed for high-throughput yeast two-hybrid screens15, this result contrasts to that observed with C. elegans bait strains where eY1H is at least as sensitive as the other Y1H methodologies4. This disparity is likely due to the fact the set of DNA baits used in this study exhibit higher levels of background as compared to most C. elegans baits4 as well as other human DNA baits we have examined (data not shown). In fact, all of the interactions not found by eY1H were detected with only the HIS3 reporter gene because of the high levels of background LacZ expression (data not shown). Essentially, the performance of the various Y1H approaches is intrinsically linked to bait strain behavior, and for optimal results the usercan modulate the experimental settings accordingly (Supplementary Fig. 1).
With human eY1H assays, in which we screened 14 baits against the entire collection, we detected 175 DNA-protein interactions involving 13 DNA baits and 100 proteins (Supplemental Table 6). The proteins detected include 95 human transcription factors (~10% of the 988 tested) and five uDBPs (~2% of the 236 tested)13. The eY1H interactions do not exhibit a major bias for or against a particular type of DNA-binding domain (Fig. 2a), complementing our observations in C. elegans experiments7. We did find a larger proportion of nuclear hormone receptors, however, with most of these exclusively interacting with the CSF1 promoter (Supplementary Table 6), suggesting this enrichment is likely due to the small sample size of DNA baits.
Figure 2.
eY1H data analysis. (a) DNA binding domain analysis of the transcription factor compendium (1,434 in total), the transcription factor prey yeast array (988 in total), and the transcription factors detected in Y1H interactions (97 in total). ZF – zinc finger; bHLH – basic region helix-loop-helix; ZF - NHR – nuclear hormone receptor; WH – winged helix; HMG – high mobility group. (b) ROC curve of DNA binding site analysis for the HBG2 promoter. Binding sites within the DNA bait sequence are ranked from best to worst (in terms of match to PWM) along the x-axis – only the “best” binding site match for each transcription factor is used. As the curve progresses along the x-axis, it steps up only for binding sites of transcription factors detected by eY1H. If the binding sites provided no information regarding eY1H interactions (i.e. no match between interactions predicted and detected), the curve would be largely below the diagonal. (c) Transcription factor-DNA interactions detected by eY1H depicted in a gene regulatory network (see Supplementary Figure 3 for a more detailed view). The DNA bait 5’HS5 is depicted as a clear box because it had no interactions. Unless otherwise noted, colors indicate transcription factor families as in (a). MH1 - MAD homology 1 domain. The out-going degree k(out) (i.e. number of DNA baits bound per transcription factor) is indicated. Green edges indicate detected PRS interactions.
Validating DNA-transcription factor interactions in complex systems is challenging and a “true negative” is nearly impossible to demonstrate3. While several of the interactions detected by eY1H are part of the PRS, and thus are known to have in vivo relevance, we wanted to assess the overall quality of the eY1H dataset. To this end, we evaluated the relationship between transcription factor interactions observed in eY1H assays and their reported DNA binding sites. We first compiled DNA binding specificity information of human transcription factors or their orthologs16 (Methods). Based on the potential transcription factor binding sites present within each DNA bait, we predicted which factors were expected to bind. Subsequently, we compared our experimentally detected factors to these predictions. An area under the receiver operating characteristic curve was calculated for each DNA bait sequence (Fig. 2b and Supplementary Fig. 2). We found significant enrichment for transcription factor binding sites in five of 11 DNA baits (Supplementary Table 7). This compares favorably to a similar analysis for ChIP-seq data (Supplementary Table 8 and 9), which suggests that eY1H assays may be more likely to capture direct physical interactions between DNA and transcription factors than ChIP. Two DNA baits (the MPL and HBB promoters) do not exhibit a correlation between the predicted transcription factor binding sites they harbor and the transcription factors retrieved in eY1H assays. This could be because these factors require interactions with co-factors and so are missed in eY1H, or because of differences in binding sites between human transcription factors and their orthologs. However, a more likely explanation is the high background reporter gene expression we observed for both these DNA baits, which makes them difficult to assay.
We visualized all eY1H interactions using Cytoscape17, generating the first gene-centered human gene regulatory network (Fig. 2c and Supplementary Fig. 3). Although small in size, we can already observe both specific as well as more promiscuous transcription factors. For instance, four factors each bind five DNA baits (i.e. kout = 5), while the majority of factors only interact with a single DNA bait (i.e. kout = 1). We also find several instances where multiple members of a transcription factor family can interact with the same DNA sequence. For instance, we found all four nuclear factor 1 transcription factors (NFIA, NFIB, NFIC and NFIX) interacting with the MPL promoter. This observation could reflect that these transcription factors have similar DNA binding specificities, and may be relevant in different cells or tissues, or under varying physiological conditions.
In summary, we have developed a collection of human transcription factors prey clones and a human transcription factor yeast array and combined these resources with our newly-developed eY1H platform4, facilitating the mapping of human gene-centered regulatory networks. The human eY1H pipeline will provide a powerful complement to transcription factor-centered methods, by enabling large-scale characterization of the DNA-binding activity of transcription factors that may be expressed or active only under restricted conditions or in a few cells. However, these resources can also easily be used for mating or direct DNA transformations of one or a few human DNA baits in small-scale studies.
Supplementary Material
ACKNOWLEDGEMENTS
We thank members of the Walhout lab for discussions and critical reading of the manuscript. Research in the Walhout lab is supported by National Institutes of Health (NIH) grants DK068429 and GM082971. This work is supported by NIH grant HG003143 and a W.M. Keck Foundation Distinguished Young scholar award, awarded to J.D.; an Ellison Foundation (Boston, MA) grant and Dana Farber Cancer Institute Sponsored Research funds awarded to Center for Cancer Systems Biology; Research in the Blackshaw lab is funded by a W.M. Keck Foundation Distinguished Young scholar award and a grant from the Ruth and Milton Steinbach Fund, H.Z. is supported by NIH grant GM076102.
Footnotes
AUTHOR CONTRIBUTIONS
A.J.M.W. conceived the project; J.D., J.S.R-H and A.J.M.W. designed the project; J.S.R-H and A.R.B. performed the experiments; J.S.R-H. and A.J.M.W. analyzed the data; J.J., L.J. and A.M. cherry-picked transcription factor ORF clones; J.J., L.J. assisted A.R.B. and J.S.R.-H. with Gateway cloning. R.P.M. performed the binding site analysis; H.Z., S.B., K.S.A., X.Y., A.M. and D.E.H. provided transcription factor ORFeome clones; J.S.R-H and A.J.M.W. wrote the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
REFERENCES
- 1.Walhout AJM. Unraveling Transcription Regulatory Networks by Protein-DNA and Protein-Protein Interaction Mapping. Genome Res. 2006;16:1445–1454. doi: 10.1101/gr.5321506. [DOI] [PubMed] [Google Scholar]
- 2.Arda HE, Walhout AJM. Gene-centered regulatory networks. Briefings in functional genomics and proteomics. 2009 doi: 10.1093/bfgp/elp049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Walhout AJM. What does biologically meaningful mean? A perspective on gene regulatory network validation. Genome Biol. 2011;12:109. doi: 10.1186/gb-2011-12-4-109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Reece-Hoyes JS, et al. Enhanced yeast one-hybrid (eY1H) assays for highthroughput gene-centered regulatory network mapping. Nature Methods. 2011 doi: 10.1038/nmeth.1748. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Deplancke B, Dupuy D, Vidal M, Walhout AJM. Gateway-compatible yeast one-hybrid system. Genome Res. 2004;14:2093–2101. doi: 10.1101/gr.2445504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vermeirssen V, et al. Matrix and Steiner-triple-system smart pooling assays for high-performance transcription regulatory network mapping. Nat. Methods. 2007;4:659–664. doi: 10.1038/nmeth1063. [DOI] [PubMed] [Google Scholar]
- 7.Deplancke B, et al. A gene-centered C. elegans protein-DNA interaction network. Cell. 2006;125:1193–1205. doi: 10.1016/j.cell.2006.04.038. [DOI] [PubMed] [Google Scholar]
- 8.Vermeirssen V, et al. Transcription factor modularity in a gene-centered C. elegans core neuronal protein-DNA interaction network. Genome Res. 2007;17:1061–1071. doi: 10.1101/gr.6148107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Martinez NJ, et al. A. C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev. 2008;22:2535–2549. doi: 10.1101/gad.1678608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Arda HE, et al. Functional modularity of nuclear hormone receptors in a C. elegans gene regulatory network. Mol. Syst. Biol. 2010;6:367. doi: 10.1038/msb.2010.23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Brady S, et al. A stele-enriched gene regulatory network in the Arabidopsis root. Mol. Syst. Biol. 2011;7:459. doi: 10.1038/msb.2010.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lamesch P, et al. hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes. Genomics. 2007;89:307–315. doi: 10.1016/j.ygeno.2006.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hu S, et al. Profiling the human protein-DNA interactome reveals ERK2 as a transcriptional repressor of interferon signaling. Cell. 2009;139:610–622. doi: 10.1016/j.cell.2009.08.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen YC, Rajagopala SV, Stellberger T, Uetz P. Exhaustive benchmarking of the yeast two-hybrid system. Nat Methods. 2010;7:667–668. doi: 10.1038/nmeth0910-667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Braun P, et al. An experimentally derived confidence score for binary proteinprotein interactions. Nat. Methods. 2009;6:91–97. doi: 10.1038/nmeth.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Badis G, et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–1723. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Reece-Hoyes JS, et al. A compendium of C. elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol. 2005;6:R110. doi: 10.1186/gb-2005-6-13-r110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 2009;10:252–263. doi: 10.1038/nrg2538. [DOI] [PubMed] [Google Scholar]
- 20.Fulton DL, et al. TFCat: the curated catalog of mouse and human transcription factors. Genome Biol. 2009;10: R29. doi: 10.1186/gb-2009-10-3-r29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Walhout AJM, et al. GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes. Methods in enzymology: "Chimeric genes and proteins". 2000;328:575–592. doi: 10.1016/s0076-6879(00)28419-x. [DOI] [PubMed] [Google Scholar]
- 22.Deplancke B, Vermeirssen V, Arda HE, Martinez NJ, Walhout AJM. Gateway-compatible yeast one-hybrid screens. CSH Protocols. 2006 doi: 10.1101/pdb.prot4590. [DOI] [PubMed] [Google Scholar]
- 23.Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2009;37:D77–D82. doi: 10.1093/nar/gkn660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bryne JC, et al. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 2008;36:D102–D106. doi: 10.1093/nar/gkm955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.consortium, E.p. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Granek JA, Clarke ND. Explicit equilibrium modeling of transcription factor binding and gene regulation. Genome Biol. 2005;6:R87. doi: 10.1186/gb-2005-6-10-r87. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.