Abstract
At the Drosophila melanogaster bithorax complex (BX-C) over 330kb of intergenic DNA is responsible for directing the transcription of just three homeotic (Hox) genes during embryonic development. A number of distinct enhancer cis-regulatory modules (CRMs) are responsible for controlling the specific expression patterns of the Hox genes in the BX-C. While it is has proven possible to identify orthologs of known BX-C CRMs in different Drosophila species using overall sequence conservation, this approach has not proven sufficiently effective for identifying novel CRMs or defining the key functional sequences within enhancer CRMs. Here we demonstrate that the specific spatial clustering of transcription factor (TF) binding sites is important for BX-C enhancer activity. A bioinformatic search for combinations of putative TF binding sites in the BX-C suggests that simple clustering of binding sites is frequently not indicative of enhancer activity. However, through molecular dissection and evolutionary comparison across the Drosophila genus we discovered that specific TF binding site clustering patterns are an important feature of three known BX-C enhancers. Sub-regions of the defined IAB5 and IAB7b enhancers were both found to contain an evolutionarily conserved signature motif of clustered TF binding sites which is critical for the functional activity of the enhancers. Together, these results indicate that the spatial organization of specific activator and repressor binding sites within BX-C enhancers is of greater importance than overall sequence conservation and is indicative of enhancer functional activity.
Keywords: Drosophila, bithorax complex, cis-regulation, evolution, enhancer, transcription factor, homeotic gene
Introduction
Enhancer cis-regulatory modules (CRMs) are modular regions of non-protein coding genomic DNA that bind transcription factors to either up- or down-regulate transcription of their target genes (Arnone and Davidson, 1997). Regulation of gene expression by enhancers is a fundamental process largely responsible for embryonic patterning during development and phenotypic variation among metazoans (Borok et al., 2010; Levine and Tjian, 2003; Wray, 2007). However, despite the presence of accurate models of genetic regulation in prokaryotes (e.g. lac operon; (Jacob and Monod, 1961)) and bacteriophage (e.g. bacteriophage λ; (Ptashne, 2004)), and due to the inherent complexity of eukaryotic regulation, many of the molecular details of eukaryotic enhancer functionality are still unknown.
One of the existing model systems for the study of eukaryotic enhancers is the bithorax complex (BX-C) in the fruit fly Drosophila melanogaster. The BX-C controls body patterning in the posterior of the developing embryo (specifically in the 3rd thoracic (T3) segment and all abdominal (A1-8) segments of the adult, which correspond to embryonic parasegments (PS) 5–14 (Lewis, 1978; Sanchez-Herrero et al., 1985)). The complex is an approximately 330kb genomic region (Martin et al., 1995) that contains the three homeotic (Hox) genes Ultrabithorax (Ubx), abdominal-A (abd-A) and Abdominal-B (Abd-B) (Lewis, 1978), and numerous CRMs arranged in the infrabdominal (iab) intergenic regions which regulate the spatio-temporal expression of the Hox genes (Fig. 1a) (reviewed in (Akbari et al., 2006; Maeda and Karch, 2006)). Particularly well studied is Abd-B and its associated regulatory DNA sequences (Celniker et al., 1990). Abd-B specifies the identity of PS10-14 in the developing embryo and is regulated by the enhancers IAB5 (Busturia and Bienz, 1993), IAB6 (Mihaly et al., 2006), IAB7a (Mihaly et al., 2006), IAB7b (Zhou et al., 1999), and IAB8 (Zhou et al., 1999) (Fig. 1a). The activity of these enhancers is known to be controlled by transcription factors (TFs) expressed in specific spatial patterns along the anterio-posteror axis earlier in embryonic development (Fig. 1b). For example, IAB5 is activated by the pair-rule TF FUSHI-TARAZU (FTZ), but is repressed predominantly in the anterior and central regions of the embryo by the gap TFs HUNCHBACK (HB), KRUPPEL (KR), and KNIRPS (KNI) (Fig. 1b) (Busturia and Bienz, 1993). KR has also been shown to be responsible for repression of the IAB8 enhancer in the embryo (Zhou et al., 1999). Recruitment of these TFs is mediated by sequence-specific binding sites located within the defined enhancers (Ho et al., 2009; Zhou et al., 1999).
Figure 1. Characterization of putative enhancer CRMs in the Drosophila melanogaster bithorax complex.
(A) The ~330 kb bithorax complex (BX-C) contains only three homeotic genes (leftward arrows), Ultrabithorax (Ubx), abdominal-A (abd-A), and Abdominal-B (Abd-B), but is divided into infra-abdominal regulatory regions (abx/bx, bxd/pbx, iab-2 through iab-8) which are responsible for directing homeotic gene expression during development in parasegments (PS) 5–13. The complex contains cis-regulatory modules (CRMs) including enhancers (orange rectangles), insulators (black ovals), promoter targeting sequences (white rectangles), polycomb response elements (red rectangles), and an Abd-B promoter tethering element (yellow rectangle). 26 putative enhancers (PCRMs) were identified by searching the entire BX-C for distinct clusters of critical embryonic transcription factor binding sites HB and KR. Five of these PCRMs (black arrows) overlap known enhancers, while the remaining 21 (red arrows) are novel. (B) The pair-rule transcription factors FUSHI-TARAZU (FTZ) and EVEN-SKIPPED (EVE) act as activators in alternating body segments of the embryo through binding at the BX-C enhancers, while KRUPPEL (KR), KNIRPS (KNI), HUNCHBACK (HB), and BICOID (BCD) predominantly act as repressors at the BX-C enhancers in broad regions of the embryo. (C) A transgenic construct containing the lacZ reporter gene (blue rectangle) driven by the hsp70 promoter was used to assay 16 of the 26 PCRMs (gray circle) for functional activity as enhancers. In situ hybridization was performed on transgenic D. melanogaster embryos carrying the individual transgenic constructs. Of the 16 PCRMs tested, the four overlapping known enhancers (R4, 10, 15, 20) recapitulate the expected enhancer-driven reporter gene expression patterns (parasegments in which reporter gene transcription is detected are indicated). One previously uncharacterized enhancer from the bxd/pbx regulatory region (R8) was also identified. The remaining eleven PCRMs tested (R1, 2, 3, 6, 9, 11, 12, 13, 14, 17, and 21) show no detectable expression.
Given the central biological importance of TF binding sites to enhancer function, it was initially assumed that Drosophila enhancers would be subject to significant overall sequence constraint (Costas et al., 2003; Dermitzakis and Clark, 2002). However, evidence from the extensively studied Drosophila even-skipped (eve) stripe 2 enhancer (S2E) has revealed that this is not necessarily the case (Hare et al., 2008; Ludwig et al., 2005; Ludwig et al., 1998). The endogenous S2E drives the tight expression of the second transverse stripe of eve in the embryo by binding a combination of embryonic activators BICOID (BCD) and HB, and repressors KR and GIANT (Small et al., 1991). However, varying the strength and relative position of TF binding sites in the S2E in a complementary manner does not disrupt functional enhancer activity (Arnosti et al., 1996). Additionally, despite significant sequence divergence, insertion of S2E sequences from divergent species of the Drosophila genus into transgenic D. melanogaster consistently results in reporter gene expression in a spatio-temporal pattern indistinguishable from that of the native D. melanogaster S2E (Ludwig et al., 1998). Identical reporter gene expression is even seen when S2E modules identified by sequence alignment in species from the Sepsidae family, which is approximately 100 million years diverged from Drosophila, are inserted into transgenic D. melanogaster (Hare et al., 2008). Both Drosophila and Sepsidae S2Es share several highly conserved 20–30 bp regions that contain clusters of often overlapping predicted TF binding sites, suggesting that these clusters may be responsible for conservation of functional activity in the enhancer orthologs (Crocker and Erives, 2008). Intriguingly, chimeric enhancers created from reciprocal halves of the S2E from D. melanogaster and D. pseudoobscura do not fully maintain their functional activity, suggesting that certain sequence properties may need to be conserved in order to preserve enhancer function (Ludwig et al., 2000).
This pattern of functional conservation has also recently been seen for embryonic enhancers from the BX-C. Despite significant sequence divergence at the BX-C regulatory regions within the Drosophila genus, IAB5 and IAB8 enhancer orthologs identified by simple sequence alignment from different Drosophila species drive identical reporter gene expression patterns in transgenic D. melanogaster (Ho et al., 2009). These observations raise questions as to what determines enhancer functionality and how we might be able to identify novel enhancers in D. melanogaster and distantly related species with significant sequence divergence (various attempts are reviewed in (Vavouri and Elgar, 2005)). Clustering of conserved TF binding sites, which permits functional interactions between bound TFs at an enhancer, may be important to enhancer function (Arnone and Davidson, 1997; Wasserman and Fickett, 1998) and some studies have even used this concept to identify novel enhancers (Berman et al., 2002; Berman et al., 2004; Markstein et al., 2002).
The aim of our study is to investigate the architectural requirements for BX-C enhancer functionality. Using previously assembled HB and KR position weight matrices (PWMs; (Ho et al., 2009)) we developed a simple putative TF binding site clustering algorithm that successfully identifies most of the known enhancers and one novel enhancer in the complex. However, this computational search also predicts a number of genomic regions that do not function as enhancers. Therefore, to address the importance of specific patterns of TF binding site clustering, we analyzed the functional sequences at known BX-C enhancers through molecular dissection. The minimal functional regions for three different enhancers were identified using this approach. In the case of the IAB5 and IAB7b enhancers, a region containing an evolutionarily conserved signature motif of clustered activator (FTZ) and repressor (KR) TF binding sites is critical for the functional activity of the enhancers.
Materials and Methods
Genomic sequences
Genomic regions from the Drosophila melanogaster bithorax complex (BX-C) from the annotated U31961 sequence were identified in the Berkeley Drosophila Genome Project D. melanogaster genome (annotated April 2006 release) on the University of California Santa Cruz (UCSC) Genome Browser (http://www.genome.ucsc.edu) and shown as ‘MEL Chr3R’ in Table 1.
Table 1. Positions, predicted transcription factor binding site content, and sequence conservation of 26 clusters and associated PCRMs.
Information on 26 pCRM clusters and corresponding 1kb regions.
| Cluster | Regulatory Region |
Cluster | 1kb Region | 1kb+ Region (from PCR) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| UCSC Coords (D. mel chr3R) |
Size (bp) | Cluster Contents | Relative Positions | UCSC Coords (D. mel chr3R) |
% Conservation | UCSC Coords (D. mel chr3R) |
|||||||||||
| Start | End | Start | End | simulans | yakuba | erecta | ananassae | pseudo | Start | End | Size (bp) | ||||||
| 1 | abx/bx | 12494396 | 12494667 | 271 | Hb-Kr-Hb | 0-141-264 | 12494197 | 12495197 | 100 | 100 | 100 | 83 | 56 | 12494170 | 12495219 | 1050 | 1 |
| 2 | abx/bx | 12507475 | 12507640 | 165 | Kr-Kr-Hb | 0-101-158 | 12506886 | 12507886 | 100 | 100 | 100 | 80 | 79 | 12506886 | 12507894 | 1009 | 2 |
| 3 | abx/bx | 12508851 | 12509149 | 298 | Kr-Kr-Hb | 0-46-291 | 12508249 | 12509249 | 100 | 88 | 88 | 60 | 59 | 12508153 | 12509281 | 1129 | 3 |
| 4 | abx/bx | 12526876 | 12527059 | 183 | Hb-Hb-Kr | 0-168-174 | 12526487 | 12527487 | 100 | 97 | 95 | 78 | 46 | 12526436 | 12527540 | 1105 | 4 |
| 5 | abx/bx | 12530135 | 12530330 | 195 | Hb-Kr-Hb | 0-84-188 | 12529587 | 12530587 | 100 | 100 | 99 | 56 | 13 | 12529519 | 12530601 | 1083 | 5 |
| 6 | abx/bx | 12556946 | 12557001 | 55 | Kr-Hb-Hb | 0-19-48 | 12556696 | 12557696 | 100 | 91 | 91 | 67 | 34 | 12556659 | 12557723 | 1065 | 6 |
| 7 | abx/bx | 12567627 | 12567858 | 231 | Hb-Kr-Hb | 0-31-224 | 12567388 | 12568388 | 100 | 84 | 90 | 63 | 42 | 12567370 | 12568392 | 1023 | 7 |
| 8 | bxd/pbx | 12599075 | 12599333 | 258 | Kr-Hb-Hb | 0-29-251 | 12598700 | 12599700 | 98 | 98 | 98 | 67 | 40 | 12598675 | 12599794 | 1120 | 8 |
| 9 | iab-2 | 12626759 | 12627035 | 276 | Kr-Hb-Kr | 0-166-267 | 12626135 | 12627135 | 100 | 80 | 85 | 53 | 39 | 12626117 | 12627138 | 1022 | 9 |
| 10 | IAB2 | 12636477 | 12636789 | 312 | Kr-Hb-Hb-Hb | 0-120-166-305 | 12636085 | 12637085 | 100 | 100 | 100 | 88 | 57 | 12636075 | 12637112 | 1038 | 10 |
| 11 | iab-2 | 12639077 | 12639354 | 277 | Hb-Hb-Hb-Kr | 0-9-187-268 | 12638659 | 12639659 | 100 | 100 | 99 | 71 | 57 | 12638624 | 12639667 | 1044 | 11 |
| 12 | iab-2 | 12657921 | 12658363 | 442 | Kr-Hb-Kr-Hb | 0-229-239-435 | 12657500 | 12658500 | 98 | 84 | 89 | 51 | 27 | 12657431 | 12658526 | 1096 | 12 |
| 13 | iab-3 | 12664082 | 12664359 | 277 | Hb-Kr-Kr | 0-142-268 | 12663862 | 12664862 | 100 | 91 | 100 | 69 | 18 | 12663839 | 12664862 | 1024 | 13 |
| 14 | iab-4 | 12690841 | 12691024 | 183 | Hb-Kr-Hb | 0-128-176 | 12690346 | 12691346 | 100 | 100 | 100 | 87 | 37 | 12690280 | 12691387 | 1108 | 14 |
| 15 | IAB5 | 12704503 | 12704624 | 121 | Hb-Hb-Kr-Kr | 0-17-94-112 | 12704380 | 12705380 | 100 | 94 | 95 | 62 | 55 | 12704355 | 12705380 | 1026 | 15 |
| 16 | IAB6 | 12717101 | 12717333 | 232 | Hb-Hb-Kr | 0-29-223 | 12716910 | 12717910 | 100 | 96 | 77 | 47 | 20 | 12716856 | 12717976 | 1121 | 16 |
| 17 | iab-6 | 12722577 | 12722826 | 249 | Kr-Hb-Hb | 0-213-242 | 12722477 | 12723477 | 100 | 93 | 98 | 89 | 36 | 12722415 | 12723493 | 1079 | 17 |
| 18 | Fab-7 | 12725282 | 12725476 | 194 | Kr-Hb-Hb | 0-105-187 | 12724576 | 12725576 | 100 | 100 | 98 | 38 | 9 | 12724540 | 12725575 | 1036 | 18 |
| 19 | IAB7a | 12729058 | 12729311 | 253 | Hb-Hb-Kr | 0-71-244 | 12728520 | 12729520 | 100 | 97 | 91 | 78 | 56 | 12728425 | 12729527 | 1103 | 19 |
| 20 | IAB8 | 12747068 | 12747467 | 399 | Hb-Kr-Hb-Kr | 0-99-243-390 | 12746592 | 12747592 | 100 | 100 | 100 | 52 | 46 | 12746531 | 12747637 | 1107 | 20 |
| 21 | - | 12760930 | 12761160 | 230 | Kr-Hb-Hb | 0-91-223 | 12760299 | 12761299 | 100 | 100 | 100 | 28 | 12 | 12760299 | 12761321 | 1023 | 21 |
| 22 | - | 12763276 | 12763546 | 270 | Hb-Hb-Kr-Kr | 0-33-239-261 | 12763004 | 12764004 | 100 | 100 | 100 | 59 | 36 | 12762953 | 12764030 | 1078 | 22 |
| 23 | - | 12777181 | 12777404 | 223 | Hb-Hb-Kr | 0-109-214 | 12776633 | 12777633 | 93 | 93 | 93 | 54 | 36 | 12776611 | 12777636 | 1026 | 23 |
| 24 | - | 12781654 | 12781938 | 284 | Hb-Kr-Hb | 0-219-277 | 12781554 | 12782554 | 97 | 76 | 77 | 37 | 7 | 12781554 | 12782553 | 1000 | 24 |
| 25 | - | 12791634 | 12792211 | 577 | Hb-Hb-Kr-Kr-Hb-Hb | 0-14-150-311-455-570 | 12791417 | 12792417 | 100 | 100 | 100 | 52 | 61 | 12791389 | 12792462 | 1074 | 25 |
| 26 | - | 12800569 | 12800977 | 408 | Hb-Kr-Hb-Hb-Hb-Kr | 0-101-138-229-254-399 | 12800117 | 12801117 | 100 | 100 | 100 | 65 | 66 | 12800051 | 12801116 | 1066 | 26 |
Level of conservation between sequences from D. melanogaster and six other Drosophila species is indicated by color code: .90% red, 60–90% orange, 30–60% yellow, 30% green (calculation for conservation is detailed in Methods).
Sequence alignments and transcription factor binding site analysis
Sequence conservation was analyzed for the existing CRMs at the D. melanogaster BX-C using both the VISTA Genome Browser utility (http://pipeline.lbl.gov/cgi-bin/gateway2; (Frazer et al., 2004)) and the UCSC Genome Browser as previously described (Ho et al., 2009). PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)) and previously assembled Position Weight Matrices (PWMs) for the six TFs; BICOID (BCD), EVEN-SKIPPED (EVE), FUSHI-TARAZU (FTZ), HUNCHBACK (HB), KNIRPS (KNI) and KRUPPEL (KR) (Ho et al., 2009) and FTZ-F1 (Bowler et al., 2006) were used to search the CRMs in six different Drosophila species (D. melanogaster, D. simulans, D. erecta, D. yakuba, D. ananassae, and D. pseudoobscura) for putative binding sites. ln(p-value) cutoff values for predicted sites were as described in previous studies (Hare et al., 2008).
Computational BX-C CRM prediction
Previously assembled PWMs for HB and KR (Ho et al., 2009) were converted to IUPAC strings, yielding the consensus motifs TTTTWTG and AAASGGWKN, respectively. These strings were used in FlyEnhancer (http://opengenomics.org/) to search for clusters of putative HB and KR binding sites in the complete BX-C sequence (U31961). The cluster criteria utilized were: 1 HB and 1 KR and 1 (HB or KR) site within a 300 bp window. Overlapping windows were combined and scored as a single cluster. Each individual cluster was then expanded to a 1kb window by inclusion of neighboring genomic sequence. The additional sequences added were chosen to include regions of conservation among the Drosophila genus using the VISTA Genome Browser, with the original cluster at least 100 bp away from each end of the 1 kb window.
Molecular cloning and construction of transgenes
Putative CRMs
Genomic regions representing the 26 putative CRMs from the BX-C of D. melanogaster were PCR amplified and cloned into the pGEM-T Easy vector (Promega). PCR primer sequences are available upon request. The coordinates and size of the PCR fragments corresponding to each of the 26 putative CRMs are: R1, 12494170–12495219 (1050bp); R2, 12506886–12507894 (1009bp); R3, 12508153–12509281 (1129bp); R4, 12526436–12527540 (1105bp); R5, 12529519–12530601 (1083bp); R6, 12556659–12557723 (1065bp); R7, 12567370–12568392 (1023bp); R8, 12598675–12599794 (1120bp); R9, 12626117–12627138 (1022bp); R10, 12636075–12637112 (1038bp); R11, 12638624–12639667 (1044bp); R12, 12657431–12658526 (1096bp); R13, 12663839–12664862 (1024bp); R14, 12690280–12691387 (1108bp); R15, 12704355–12705380 (1026bp); R16, 12716856–12717976 (1121bp); R17, 12772415–12723493 (1079bp); R18, 12724540–12725575 (1036bp); R19, 12728425–12729527 (1103bp); R20, 12746531–12747637 (1107bp); R21, 12760299–12761321 (1023bp); R22, 12762953–12764030 (1078bp); R23, 12776611–12777636 (1026bp); R24, 12781554–12782553 (1000bp); R25, 12791389–12792462 (1074bp); R26, 12800051–12801116 (1066bp).
Each PCR amplified putative CRM was sub-cloned as a NotI fragment into the NotI site of a placZattB transformation vector (Bischof et al., 2007). DNA sequencing verified the inserts and ensured consistent insertion orientation on the transgenic construct.
IAB8 CRM sub-regions
Genomic regions representing the minIAB8, ΔEVE, and EK regions from the IAB8 enhancer CRM in the BX-C of D. melanogaster were PCR amplified and cloned into the pGEM-T Easy vector (Promega). Each PCR amplified putative CRM was sub-cloned as a NotI fragment into the NotI site of a placZattB transformation vector (Bischof et al., 2007). DNA sequencing was used to verify correct sequence and consistent orientation of the insertion on the transgenic construct.
| Sub-region from D. mel | Primer Forward (5′-3′) Primer Reverse (5′-3′) |
Genomic Coordinates Chromosome: location |
Sub-region length (bp) |
|---|---|---|---|
| minIAB8 | CGTATTATTAAAGCACTTTCTTACTC AATTAAATTGTGACAGAACAGAATTC |
3R: 12747022-12747623 | 602 |
| IAB8 ΔEVE | TGAAAACATTTGAATGTCAGACAGGT AATTAAATTGTGACAGAACAGAATTC |
3R: 12747022-12747503 | 482 |
| IAB8 EK | AGAAAGGACGCCCGCTCGAAT ACCGCGGGCCTCTTTTCGCA |
3R: 12747410-12747550 | 141 |
IAB5 CRM sub-regions
D. melanogaster and D. pseudoobscura fly stocks were obtained from the Tucson Stock Center (D. melanogaster: 14021-0231.36, D. pseudoobscura: 14011-0121.94). The locations of orthologous IAB5 regions from each species were identified by aligning genomic sequences using VISTA (Frazer et al., 2004). IAB5 sub-regions were PCR amplified using primers designed with a linker (either HindII or NotI restriction site) appended to the 5′ end of the primers:
| Sub-region Species | Primer Forward (5′-3′) Primer Reverse (5′-3′) |
Linker Forward Linker Reverse |
Genomic Coordinates Chromosome: location |
Sub-region length (bp) |
|---|---|---|---|---|
| IAB5.1 melanogaster | ACGCGTAAGCTTCGATTCTGCTGGCCATGACCAT* ACGCGTAAGCTTCGCGCCCAGTGAGGTCCTCACA |
HindIII HindIII |
3R: 104744-105037 | 293 |
| IAB5.2 melanogaster | ACGCGTAAGCTTTGTGAGGACCTCACTGGGCGCG CCCGGGGCGGCCGCTCCACTTCCGAACTTGGTCGAC* |
HindIII NotI |
3R: 103995-104744 | 749 |
| IAB5.1 pseudoobscura | ACGCGTAAGCTTTTGTGGCCCTGACAGTGAAGAG* ACGCGTAAGCTTAGGGCCAGTTTAAATCTACGCA |
HindIII HindIII |
2: 17696120-17696521 | 403 |
| IAB5.2 pseudoobscura | ACGCGTAAGCTTGTAAGGCACATACTCGTAAGA CCCGGGGCGGCCGCTTCCATAATGAACCCCGCGGAA* |
HindIII NotI |
2: 17694935-17696119 | 1184 |
| eIAB5 melanogaster | CCCGGGGCGGCCGCTCCACTTCCGAACTTGGTCGAC AAGCTTCGATCGCTAAGAAAAGTGA |
NotI HindIII |
3R: 103995-104320 | 325 |
| cIAB5 melanogaster | GCGGCCGCCACTTTTCTTAGCGATCGC ACGCGTAAGCTTTGTGAGGACCTCACTGGGCGCG |
NotI HindIII |
3R: 104320-104744 | 424 |
PCR amplified fragments were cloned into the pGEM-T Easy vector (Promega) and verified by DNA sequencing. The IAB5 sub-regions were inserted into the NotI and HindIII sites of the placZattB transformation vector (Bischof et al., 2007).
IAB7b CRM sub-regions
Genomic regions representing the 2F2K, 2F1K and 1F2K regions from the IAB7b enhancer CRM in the BX-C of D. melanogaster were PCR amplified and cloned into the pGEM-T Easy vector (Promega). Each PCR amplified putative CRM was sub-cloned as a NotI fragment into the NotI site of a placZattB transformation vector (Bischof et al., 2007). DNA sequencing was used to verify correct sequence and consistent orientation of the insertion on the transgenic construct.
| Sub-region from D. mel | Primer Forward (5′-3′) Primer Reverse (5′-3′) |
Genomic Coordinates Chromosome: location |
Sub-region length (bp) |
|---|---|---|---|
| IAB7b 2F2K | CTAACTCGACTTGCTAACCTT GTGCGTTTTCCTTTTAAGCCT |
3R: 12741647-12741800 | 154 |
| IAB7b 2F1K | CTAACTCGACTTGCTAACCTT TGCTCTGTTTGTGTTTGCCCG |
3R: 12741670-12741800 | 131 |
| IAB7b 1F2K | TTTGCTGAGTCAAATCACAGA GTGCGTTTTCCTTTTAAGCCT |
3R: 12741647-12741760 | 114 |
Transformation assays and in situ hybridization
placZattB reporter transgenes were introduced into the Drosophila germ-line using standard methods for site-specific ΦC31-mediated integration (Bischof et al., 2007). Generation of all the transgenic lines utilized ΦC31 integrase insertion of the reporter construct into the attB 68E site and were performed by BestGene, Inc. Multiple transgenic lines were generated for each construct and reporter gene expression of one line for each construct was analyzed by in situ hybridization. Embryos were collected, fixed and hybridized with a digoxigenin-labeled lacZ probe as previously described (Bae et al., 2002).
Results
Clustering of predicted TF binding sites predominantly identifies known enhancers in the BX-C
In order to examine the importance of clustering of predicted TF binding sites in enhancers from the Drosophila BX-C, previously compiled PWMs for the key embryonic TFs HB and KR (Ho et al., 2009) were converted to sequence strings (see methods for details). Earlier studies in gap gene mutants demonstrated that the spatio-temporal expression of Abd-B in the embryo is controlled by KR and HB (Casares and Sanchez-Herrero, 1995) (Fig. 1b). Indeed, KR and HB are known to directly interact with the enhancers at the BX-C; IAB2 (Shimell et al., 2000; Shimell et al., 1994) and IAB5 (Busturia and Bienz, 1993; Ho et al., 2009) harbor binding sites for HB and KR and IAB7b binds KR (Zhou et al., 1999). The HB and KR sequence strings were used to search for clusters of binding sites in the entire 330 kb BX-C region. When the clustering criteria were set to 1 HB, 1 KR and 1 (HB or KR) site within a 300 bp window (with overlapping hits combined) the search algorithm returned 26 putative CRMs (PCRMs) (Fig. 1a). Each of these PCRMs was expanded to a 1 kb region (as this represents the approximate average size of known enhancers in the BX-C) by maximizing conservation with D. pseudoobscura using VISTA alignment and maintaining at least a 100 bp buffer between the original cluster of TFBSs and either end of the 1 kb region. Surprisingly, of the 26 PCRMs (R1 – R26), six (23%) overlap known BX-C enhancers (BRE, IAB2, IAB5, IAB6, IAB7a, and IAB8), one overlaps the known Fab-7 insulator, and the remaining 19 map to regions of unknown function (Fig. 1a). At least one PCRM was identified within each genetically defined iab regulatory region, and 5 of the 6 known IAB enhancers (all but IAB7b) were returned, representing an 83% success rate at identifying existing IAB enhancers in the BX-C.
To test the potential functional activity of a subset of these PCRM regions, each 1kb region was individually cloned into a transgenic lacZ reporter construct, which was then integrated in a site-specific manner into the D. melanogaster genome. Reporter gene expression driven by each PCRM was visualized by in situ hybridization in transgenic embryos (Fig. 1c). In total, 16 PCRMS (R1, 2, 3, 4, 6, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, and 21) were tested (Table 1 and 2) in the transgenic assay. Of these, the 4 that overlap known enhancers in the BX-C (R4, 10, 15, and 20 overlapping BRE, IAB2, IAB5, and IAB8, respectively) recapitulate the embryonic expression patterns of those enhancers. In addition, the R8 sequence from the bxd/pbx regulatory region is able to drive reporter gene expression in a spatially restricted anterio-posterior pattern during development (Fig. 1c), corresponding to the embryonic domain in which Ubx transcription is controlled by the bxd/pbx region. Therefore, of the 16 regions analyzed, 5 (31%) represent genuine in vivo embryonic enhancers. The remaining 11 PCRMs show no activity and, as such, represent false positive predictions for the computational search. Thus, the simple clustering algorithm is accurate at identifying known enhancers in the BX-C, but does not perform as well at identifying novel enhancers in the complex. Furthermore, it fails to identify the known IAB7b enhancer, representing a false negative prediction. However, this is due to the use of converted sequence strings to search for consensus KR (TTTTWTG) and HB (AAASGGWKN) binding sites (see methods). A more refined computational analysis of the IAB7b enhancer using position weight matrices in PATSER (see below) reveals a closely spaced cluster of HB and KR binding sites (Fig. 4a). These results suggest that there exists additional constraints on BX-C enhancer functionality beyond simple clustering of binding bites. We hypothesized that the precise spacing of specific groups of TF binding sites may in fact be important for functional activity, as previously observed at other CRMs in Drosophila (Erives and Levine, 2004; Swanson et al., 2010; Wittkopp, 2010). To address this possibility we investigated TF binding site patterns in three of the known IAB enhancers; IAB8, IAB7b and IAB5 (Fig. 1a). These three enhancers all regulate transcription of a single gene (Abd-B) in the BX-C and are perhaps the best characterized CRMs in the entire complex, both in terms of the transcription factors that they are known to interact with and the evolutionary conservation of sequences within the module (Akbari et al., 2006; Ho et al., 2009; Mihaly et al., 2006; Zhou et al., 1999). As a result they represent appealing targets for further study.
Table 2.
Experimentally verified transcription factor binding sites in the 26 BX-C PCRMs.
| BCD | CAD | GT | HB | KNI | KR | TLL | FTZ | PRD | SLP1 | Total | Predict | Known | CRM | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | No | No | |
| R2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | No | No | |
| R3 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | No | No | |
| R4 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 5 | ? | BRE | Yes |
| R5 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | No | ND | |
| R6 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 5 | ? | No | |
| R7 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 7 | Yes | ND | |
| R8 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 7 | Yes | Yes | |
| R9 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | No | No | |
| R10 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 6 | Yes | IAB2 | Yes |
| R11 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | No | No | |
| R12 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | No | No | |
| R13 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 5 | ? | No | |
| R14 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 5 | ? | No | |
| R15 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 6 | Yes | IAB5 | Yes |
| R16 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 7 | Yes | IAB6 | ND |
| R17 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | No | No | |
| R18 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 4 | ? | ND | |
| R19 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 5 | ? | IAB7a | ND |
| R20 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 8 | Yes | IAB8 | Yes |
| R21 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 3 | No | No | |
| R22 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 3 | No | ND | |
| R23 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 4 | ? | ND | |
| R24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | No | ND | |
| R25 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | No | ND | |
| R26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | No | ND | |
| Total | 2 | 18 | 8 | 13 | 3 | 21 | 6 | 5 | 13 | 5 |
Each of the 1kb+ 26 PCRMs is scored for the presence (1, green) or absence (0, pink) of an overlap with a signal for verified transcription factor binding for selected anterio-posterior restricted gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate) in the Berkeley Drosophila Transcription Network Project ChIP/chip track (see Supp. Figs. 1–7). The PCRMs tested in the transgenic assay are indicated by yellow highlight. Row and column totals are indicated. Functional enhancer activity for each PCRM is predicted based on the threshold of the total score: <4, no activity predicted; 4 or 5, possible activity predicted; >5, activity predicted. Overlap with known enhancers (Known) and in vivo functional activity (CRM) based on results of transgenic reporter gene assay are shown.
Figure 4. The IAB7b enhancer contains the highly conserved signature motif of FTZ and KR binding sites.
(A) The 3′ half (abd-A side) of the IAB7b enhancer is highly conserved across all insect species (UCSC conservation track in black) and also specifically between D. melanogaster and D. pseudoobscura (VISTA track in red). PATSER was used to predict the spatial distribution of binding sites for six embryonic TFs within Drosophila IAB7b orthologs (see methods for details). Predicted binding sites on the forward (top) and reverse (bottom) strands are indicated by color-coded rectangles: BICOID (BCD, pink), HUNCHBACK (HB, purple), KRUPPEL (KR, red), KNIRPS (KNI, brown), FUSHI-TARAZU (FTZ, green), and Even-SKIPPED (EVE, yellow). Rectangle height is proportional to the score strength of each predicted TF binding site. IAB7b contains a highly evolutionarily conserved FTZ-KR motif containing a pair of predicted FTZ binding sites adjacent to at least one predicted KR binding site (yellow box). The sub-regions of IAB7b (2F2K, 2F1K and 1F2K [gray rectangles]) that were tested for enhancer activity in transgenic reporter assays are shown. (B) Detail of the conserved spatial organization of the FTZ (green), HB (purple), and KR (red) binding sites in the IAB7b signature motif. The numbers within the color-coded rectangles indicate the sequence length of the predicted binding sites and the start and end positions for each site (if present) in each of six Drosophila species is shown below. (C) 2F2K and 2F1K subregions of IAB7b drive lacZ reporter gene expression in presumptive segments A5, A7, and A9 in transgenic D. melanogaster embryos. In contrast, no reporter gene expression was detectable from constructs carrying the 1F2K sub-region of IAB7b (data not shown).
IAB8 enhancer contains a conserved EVE-KR binding site cluster
In an effort to further investigate the critical regulatory sequences in known enhancers from the BX-C, we analyzed the overall sequence conservation of the IAB8 enhancer between D. melanogaster and D. pseudoobscura. The 5′ and 3′ thirds of the defined 1.7 kb enhancer are relatively well-conserved between the two species, but the middle third of the enhancer sequence is not (Fig. 2a). In order to assess the potential importance of evolutionarily conserved clustering of TF binding sites within IAB8, previously compiled PWMs for the embryonic TFs EVE, KR, HB, BCD, KNI, and FTZ (Ho et al., 2009) were used to predict the spatial arrangement of binding sites across the IAB8 sequence in six different Drosophila species using PATSER (see methods for details). This analysis revealed the presence of a cluster of predicted TF binding sites within the 3′ third (abd-A side) of IAB8 that is conserved across distantly related Drosophila species, including D. melanogaster and D. pseudoobscura (Fig. 2a). The cluster features a distinct conserved motif of a pair of predicted high-scoring EVE binding sites on opposite strands, adjacent to a predicted KR binding site (Fig. 2a). EVE is a good candidate activator for IAB8 as its most posterior (seventh) transverse stripe of expression in early embryogenesis corresponds to the presumptive A8 segment in which IAB8 is active (Fig. 1b). A 602bp region from the D. melanogaster IAB8 enhancer (minIAB8), representing the 3′ third harboring the EVE-KR motif, was found to be capable of driving lacZ reporter gene expression in the presumptive A8 segment in transgenic embryos (Fig. 2b), corresponding to the normal expression pattern driven by the characterized full length IAB8 enhancer.
Figure 2. The IAB8 enhancer contains a conserved cluster of binding sites.
(A) The 3′ (abd-A side) and 5′ (Abd-B side) thirds of the 1.7 kb IAB8 enhancer are relatively well-conserved across all insect species (UCSC conservation track in black) and also specifically between D. melanogaster and D. pseudoobscura (VISTA track in red and blue). PATSER was used to predict the spatial distribution of binding sites for six embryonic TFs within Drosophila IAB8 orthologs (see methods for details). Putative binding sites on the forward (top) and reverse (bottom) strands are indicated by color-coded rectangles: BICOID (BCD, pink), HUNCHBACK (HB, purple), KRUPPEL (KR, red), KNIRPS (KNI, brown), FUSHI-TARAZU (FTZ, green), and EVEN-SKIPPED (EVE, yellow). Rectangle height is proportional to the score strength of each predicted TF binding site. IAB8 contains an evolutionarily conserved cluster of TF binding sites (green box) which harbors an EVE-KR motif containing a pair of predicted EVE binding sites adjacent to a predicted KR binding site. The sub-regions of IAB8 (minIAB8, ΔEVE and EK (grey rectangles)) that were tested for enhancer activity in transgenic reporter assays are shown. (B,C) The minIAB8 sub-region of the IAB8 enhancer drives strong lacZ reporter gene expression in presumptive segment A8 in stage 5 (B) and stage 9 (C) transgenic D. melanogaster embryos. In contrast, the ΔEVE subregion of IAB8 drives only very weak reporter gene expression. In stage 5 embryos the EK subregion of IAB8 drives strong reporter gene expression in the presumptive A8 segment and domains immediately posterior in the embryo. In addition, weak expression immediately anterior of A8 and in the far anterior domains of the embryo is detected. Of note is the absence of reporter gene expression from the EK construct in presumptive C3-A4 segments, the domain of KR expression in the embryo.
To determine the functional importance of the pair of evolutionarily conserved high-scoring EVE binding sites in IAB8, we deleted a 120bp region containing these sites from the minIAB8 fragment (ΔEVE, Fig. 2a). In transgenic embryos the ΔEVE construct was found to drive only a very weak stripe of reporter gene expression restricted to the presumptive A8 segment (Fig. 2b). This result suggests that the two deleted EVE sites contribute to the functional activation of the minIAB8 enhancer. Lowering the significance threshold for predicted sites in PATSER indicates that additional weaker EVE binding sites are present in the remaining genomic region on the ΔEVE construct. At a ln(p-value) cutoff of −6.0 there are no predicted sites in the ΔEVE genomic region, but as the threshold is lowered more sites are predicted: −5.75, 1 site; −5.50, 3 sites; −5.00, 5 sites. These sites may therefore be responsible for the weak reporter gene expression observed (see discussion). To further assess the combinatorial contribution of the EVE-KR motif to enhancer activity, we tested a small 141bp genomic fragment from IAB8 containing only the EVE-KR motif (EK, Fig. 2a). The EK region directs strong reporter gene expression in the presumptive A8 segment, as well as ectopic expression posterior of A8 and weaker expression immediately anterior of A8 (Fig. 2b). In addition, reporter gene expression from the EK construct was observed in the anterior region of the embryo (Fig. 2b) (see discussion for more details). Strikingly, reporter gene expression was not detected in the middle region of the embryo, corresponding to the expression domain of the KR repressor in the presumptive C3-A4 segments (Fig. 1b). This result is consistent with the idea that the single predicted KR binding sites on this construct is functional, and is sufficient to mediate repression in this domain of the embryo.
Regulatory architecture at the IAB5 enhancer
To further explore how the arrangement of TF binding sites in a regulatory module at the BX-C determines its functional output we examined the sequence conservation at the IAB5 early embryonic enhancer. Despite the fact that the defined 1027bp IAB5 enhancer is no more conserved at the primary sequence level than neighboring genomic regions (as is the case with all the known enhancer CRMs in the BX-C), it is functionally conserved across the Drosophila genus (Ho et al., 2009). Sequence conservation in IAB5 falls into three broad peaks (Fig. 3a). Chimeric IAB5 enhancers composed of reciprocal regions from the enhancer (IAB5.1 and IAB5.2, containing one and two peaks of conservation, respectively) from D. melanogaster and D. pseudoobscura drive characteristic IAB5 expression patterns in a transgenic assay (Fig. 3b). Regardless of the order of the species sequence combinations in the chimeric enhancer, reporter gene expression is localized to the presumptive A5, A7, and A9 segments in embryos and is of comparable intensity to expression driven by the full length D. melanogaster IAB5 (Fig. 3b). These results suggest two possibilities: a) that if the overall architecture of IAB5 is intact, it will retain its function regardless of which species the genetic components come from, and/or b) that one of the sub-regions in the chimeric enhancers harbors all the conserved sequences sufficient for the functional activity of the enhancer.
Figure 3. Molecular dissection of the IAB5 enhancer reveals a highly conserved transcription factor binding site signature motif.
(A) The 3′ (abd-A side) and 5′ (Abd-B side) thirds of the defined 1.0 kb IAB5 enhancer have three broad peaks of sequence conservation across all insect species (UCSC conservation track in black) and also specifically between D. melanogaster and D. pseudoobscura (VISTA track in red). PATSER was used to predict the spatial distribution of binding sites for six embryonic TFs within Drosophila IAB5 orthologs (see methods for details). Predicted binding sites on the forward (top) and reverse (bottom) strands are indicated by color-coded rectangles: BICOID (BCD, pink), HUNCHBACK (HB, purple), KRUPPEL (KR, red), KNIRPS (KNI, brown), FUSHI-TARAZU (FTZ, green), and Even-SKIPPED (EVE, yellow). Rectangle height is proportional to the score strength of each predicted TF binding site. The sub-regions of IAB5 (IAB5.1 (red), IAB5.2 (blue), cIAB5 (light blue), eIAB5 (yellow)) that were tested for enhancer activity in transgenic reporter assays are shown. The IAB5 enhancer contains an evolutionarily conserved cluster of TF binding sites (blue box) which harbors a FTZ-KR motif containing a pair of predicted FTZ binding sites adjacent to a pair of predicted KR binding sites. (B) Chimeric enhancers consisting of reciprocal genomic regions from the D. melanogaster (mel) and D. pseudoobscura (pse) IAB5 orthologs (chMP and chPM) drive a pattern of lacZ reporter gene expression identical to the normal 1kb D. melanogatser IAB5 enhancer (IAB5 D. mel) in presumptive segments A5, A7, and A9 in transgenic embryos at stage 5 and stage 9 of development. This same pattern of expression is also directed by the IAB5.2 sub-region alone from D. mel or D. pse. In contrast, no reporter gene expression was detectable from constructs carrying the IAB5.1 sub-region from either species (data no shown). Further dissection of the IAB5.2 region revealed that a 424 bp fragment from the center of the D. mel IAB5 module (cIAB5) is sufficient to drive the full reporter gene expression pattern. No detectable expression is driven in embryos by the remaining sub-region of IAB5.2 (eIAB5; data not shown). (C) Detail of the conserved spatial organization of the predicted FTZ (green) and KR (red) binding sites in the IAB5 signature motif. The numbers within the color-coded rectangles indicate the sequence length of the predicted binding sites and the confirmed in vivo Superabdominal KR binding site (Ho et al., 2009) is indicated (asterisk). Start and end positions for each site in each of six Drosophila species is shown below.
To test whether the regulatory activity of the chimeric enhancers is derived from the full assembly or smaller regions of the enhancer, the IAB5.1 and IAB5.2 sub-regions from both D. melanogaster and D. pseudoobscura were tested individually for enhancer activity. The larger IAB5.2 sub-regions from both species were sufficient to drive the characteristic three-stripe IAB5 patterns in transgenic embryos (Fig. 3b). In contrast, transgenic embryos carrying the IAB5.1 sub-region from either species did not exhibit reporter gene expression at any stage of embryonic development (data not shown). These data demonstrate that the IAB5.1 region is not necessary for enhancer activity and that the entire functional component of IAB5 is conserved within the IAB5.2 region. In an effort to further refine the functional sequences in IAB5, the IAB5.2 region of D. melanogaster was dissected into smaller sub-regions (Fig. 3a). Of these subregions, a 325bp region from the Abd-B side of IAB5 (eIAB5, Fig. 3a) does not drive reporter gene expression during embryonic development. However, a 424bp region encompassing the central peak of conserved sequence (cIAB5) confers full enhancer function as it drives a characteristic IAB5-regulated pattern of lacZ reporter gene expression in transgenic embryos (Fig. 3b).
Minimal IAB5 enhancer contains a conserved FTZ-KR binding site signature motif
Bioinformatic analysis of the sequence in cIAB5 reveals a distinct evolutionarily conserved cluster of putative TF binding sites (Fig. 3a, blue box). This signature motif consists of a tightly organized combination of two high-scoring FTZ and two high-scoring KR binding sites and is conserved in all six Drosophila species (Fig. 3a). The two predicted FTZ sites are on opposite strands of the DNA sequence and are partially over-lapping. FTZ has been shown to be the activator for IAB5, while KR is responsible for repressing the activity of IAB5 in the anterior of the embryo (Fig. 1b) (Busturia and Bienz, 1993). Furthermore, the highest scoring KR binding site in the center of the signature motif has been shown to be essential for repression of IAB5 activity in vivo (Ho et al., 2009). The molecular architecture of the signature motif is highly conserved in the Drosophila genus. All four binding sites are organized in the same orientation in a 67–100bp interval and the spacing between sites is consistent (relative to the total size of the enhancer) across all six species, spanning approximately 60 million years of evolutionary time (Fig. 3c).
FTZ-KR signature motif is conserved in the IAB7b enhancer
In order to further investigate the functional importance of clustering of evolutionarily conserved TF binding sites, the IAB7b enhancer was also bioinformatically examined using PATSER. This analysis revealed the presence of a distinct cluster of predicted TF binding sites in the middle of IAB7b that is also highly conserved across distantly related Drosophila species (Fig. 4a). Strikingly, this cluster contains a signature motif very similar to the motif found within the IAB5 enhancer, featuring a pair of high-scoring FTZ binding sites on opposite strands adjacent to two high-scoring KR binding sites (Fig. 4a, yellow box). Despite the absence of the most 5′ (Abd-B side) KR binding site in the most distantly related species (D. ananassae, D. pseudoobscura and D. virilis), the signature motif of two FTZ sites adjacent to at least one strong KR site (the most 3′ KR site) appears in all the Drosophila species examined. The spacing of the FTZ and KR binding sites in IAB7b is also highly conserved across the Drosophila genus (Fig. 4b).
To test the functional activity of the FTZ-KR motif a minimal 154bp region from the D. melanogaster IAB7b enhancer (2F2K) containing the ‘IAB5-like’ signature motif was tested in our transgenic reporter assay. We hypothesized that this minimal region from IAB7b would exhibit a spatio-temporal output of functional activity consistent with inputs from binding of the FTZ activator and KR repressor. The 2F2K region was sufficient to drive expression in A5, A7 and A9 in stage 5 and stage 9 embryos, with notably stronger expression in A7 (Fig. 4c), corresponding to the specific nuclei in the embryo in which FTZ is expressed and KR is absent (Fig. 1b) and matches the activity of the known IAB5 enhancer (Fig. 3b). To assess the potential functional redundancy between the two KR binding sites in the signature motif, the 3′ KR binding site was deleted from the minimal fragment (2F1K, Fig. 4a). The absence of this KR binding site did not alter the functional activity of the enhancer module, as reporter gene expression was only detected in A5, A7 and A9 and did not expand into the KR domain (C3-A4) in transgenic embryos (Fig. 4c). In contrast, deletion of the 5′ putative FTZ site from the minimal module (1F2K, Fig. 4a) resulted in a complete loss of reporter gene expression in the developing embryo (data not shown). This result is consistent with the idea that the outer FTZ site is required within the signature motif for functional activity of the enhancer.
Discussion
A simple TF binding site clustering algorithm detects known BX-C enhancers
The clustered organization of TF binding sites has been shown to be crucially important to the functional activity of enhancers (see (Borok et al., 2010) for detailed review). However, despite detailed studies of a small set of enhancers in Drosophila, including the eve stripe 2 (S2E) enhancer (Hare et al., 2008; Ludwig and Kreitman, 1995; Ludwig et al., 1998), the precise rules of cis-regulatory grammar have yet to be fully elucidated. In an effort to investigate the role of clustering of predicted TF binding sites for the identification of enhancers in the 330kb Drosophila BX-C, a search for simple clusters of HB and KR binding sites was performed. The search algorithm returned 26 putative enhancers (PCRMs), of which 6 (23%) overlapped previously identified enhancers (Fig. 1a). The overlapping regions for four of these confirmed enhancers (BRE, IAB2, IAB5 and IAB8) were tested in our transgenic reporter gene assay and recapitulated the known domains of regulatory activity in the embryo (Fig. 1c). Furthermore, the 1037 bp R10 region that we tested, which is able to recapitulate IAB2 enhancer functional activity, refines the boundaries of the previously characterized 1970 bp IAB2 sequence. The search also identified 20 additional PCRM sequences. Twelve of these previously uncharacterized genomic regions were analyzed for enhancer activity and only one (R8 from the bxd/pbx region) was found to be a novel embryonic enhancer capable of driving expression in a pattern indicative of Ubx gene expression (Fig. 1c). This result indicates that the approach of searching for novel enhancers in the BX-C using simple clustering may have significant limitations.
A key question is why 11 of the 16 PCRMs tested (69%) are false positives. Two possibilities include; a) that the PCRMs may in fact be actively regulating expression of the BX-C genes at later stages of development or in very specific patterns in post-embryonic tissues, and b) that in testing a specific ~1kb genomic fragment from each PCRM we may have removed critical regulatory sequences in neighboring regions. However, the recent availability of in vivo TF binding data (MacArthur et al., 2009) may also offer some potential answers. The binding of anterio-posterior restricted gap/terminal and pair-rule transcription factors in stage 4–5 embryos appears to correlate strongly with the functional activity of the PCRMs. When scored for ten specific TFs which are potential regulators of the BX-C enhancers, all the PCRMs tested in our transgenic assay that had chromatin immunoprecipitation (ChIP) binding peaks (see Supplementary Figs. 1–7) for 6 or more of the protein factors function as embryonic enhancers (Table 2). For each of these confirmed enhancers, both KR and HB demonstrate in vivo binding at the endogenous genomic region corresponding to the enhancer (Supplementary Figs. 1–7). In contrast, all the false positive PCRMs do not have binding peaks for more than 5 of the TFs and most have less than 3, often reflecting an absence of binding for KR or HB (Table 2).
One interpretation of this data is that the predicted TF binding sites in many of the false positive PCRMs do not represent actual in vivo embryonic binding sites and, as a result, the PCRM is not functional. In addition to KR and HB repressor binding sites, it is also important to consider the presence of potential binding sites for an appropriate activator (FTZ or EVE) necessary for the functional activity of the enhancer. Analysis of the 5 PCRMs that demonstrate in vivo activity reveals that each contains at least 3 strong predicted binding sites for the appropriate pair-rule activator (see Supplementary Figs. 8–10). However, in many cases the false positive PCRMs tested also appear to contain putative activator binding sites. In these cases it is possible that additional architectural requirements (for example, close spacing between multiple activator and/or repressor binding sites) may be necessary for in vivo embryonic activity to occur. In support of this idea, our analysis of the genomic fragments that we tested from the iab-2 to iab-8 genomic regions (R10, 11, 12, 13, 14, 15, 17, 20, and 21), predicts that R15 (overlapping IAB5) has a closely-spaced cluster of FTZ-KR sites and that R10 (overlapping IAB2) and R20 (overlapping IAB8) possess a closely spaced cluster of EVE-KR sites within 150 bp of one another, whereas the other regions do not appear to harbor pair-rule activator (FTZ or EVE) and repressor (KR) clusters in such close proximity (Supplementary Fig. 9 and 10). A third possibility is that additional protein factors may be involved which may affect the ability of TFs to access the binding sites within the predicted enhancer sequence. Such proteins, which control the recruitment of chromatin components and nucleosome positioning, are thought to be critical to the regulation of embryonic gene expression through the modulation of TF binding affinity at enhancers (Bradley et al., 2010; Wittkopp, 2010).
The role of TF binding site clustering in BX-C enhancer function
The presence of a simple cluster of KR and HB binding sites in many of the enhancers of the BX-C argues that certain precise patterns of TF binding site clusters may be responsible for functional activity among similarly-regulated enhancers. In the IAB8 enhancer, a distinct cluster of EVE-KR binding sites (one KR, two EVE sites) is highly conserved across different Drosophila species (Fig. 2a). The 3′ third of IAB8 harboring the EVE-KR motif (minIAB8) is able to drive reporter gene expression in the characteristic IAB8-pattern in the presumptive A8 segment of transgenic D. melanogaster. Deletion of the pair of EVE binding sites (ΔEVE) significantly weakens enhancer activity in A8 (Fig. 2b), suggesting that while the these two EVE sites are important, cryptic weak EVE binding sites in the remaining sequence of the enhancer (which are sufficiently low scoring to escape computational prediction at the ln(p-value) cutoff of −6.0) are capable of partially compensating for the loss of the two strong predicted EVE sites. In support of this idea is the recent discovery that even weak affinity binding sites contribute to TF occupancy at regulatory regions in Drososphila embryos (Li et al., 2011). In this study the authors found that the level of factor occupancy in vivo correlates more strongly with the degree of chromatin accessibility at a given site, rather than in vitro measurements of the affinity of a factor for a particular DNA sequence. This observation may be especially relevant in the case of pair-rule factors (such as EVE), where a high localized concentration of the protein in each stripe (see Fig. 1b) may also facilitate the increased occupancy of low affinity binding sites (Li et al., 2011).
A 141 bp fragment (EK) from within the minIAB8 region containing only the EVE-KR cluster drives strong reporter gene expression in A8, but also ectopic expression immediately posterior of A8 and weaker expression immediately anterior of A8 (Fig. 2b). Ectopic reporter gene expression is also observed in the anterior head domain of the embryo. This result indicates that the EK fragment by itself lacks important binding sites responsible for repression in the anterior head domain of the embryo (such as HB) and for the region immediately anterior of A8 (such as KNI). Several predicted HB and KNI repressor sites capable of performing this role are present within the 602 bp minIAB8 enhancer. Importantly, in the C3-A4 domain of the embryo where the KR repressor protein is expressed (Fig. 1b), there is a lack of enhancer-driven reporter gene expression from the EK fragment, suggesting that the single KR site within the EVE-KR cluster is sufficient to allow KR-mediated repression in that domain of the embryo. The continued presence of the EVE-KR cluster within the IAB8 enhancer, despite extensive reorganization of TF binding sites across the Drosophila orthologs (fig. 2a), is reminiscent of the architectural constraints in the Drosophila and Sepsid eve S2E orthologs, which possess a highly conserved cluster of overlapping BCD activator and KR repressor binding sites necessary for enhancer function (Hare et al., 2008).
Signature motifs in BX-C enhancers
To extend our analysis of the functional role of clustered TF binding sites we also analyzed the IAB5 and IAB7b enhancers from the Drosophila BX-C. Chimeric enhancers assembled from the D. melanogaster and D. pseudoobscura IAB5 orthologs appear to have their functional activity entirely preserved and drive reporter gene expression in presumptive abdominal segments A5, A7 and A9 (Fig. 3b). This result contrasts with an earlier study in which chimeric enhancers assembled from reciprocal halves of D. melanogaster and D. pseudoobscura S2E orthologs did not accurately recapitulate enhancer activity (Ludwig et al., 1998). It is possible that the regulatory output for the chimeric IAB5 enhancers may be subject to very subtle modifications. Such modifications may result in changes to expression patterns that are beyond the detection of the reporter gene assay. However, one explanation for the difference in functional output between these two examples is that in the case of the S2E the organization of TF binding sites within the chimeric enhancer was sufficiently modified so as to destroy the functional activity of the enhancer (Ludwig et al., 2000), whereas for IAB5 this was not the case.
To further dissect the organization of TF binding sites in IAB5 we examined the predicted TF binding sites in the sequence. This approach reveals a highly evolutionarily conserved signature TF binding site motif consisting of two strong FTZ activator sites close to two strong KR repressor sites in the center of the defined 1 kb enhancer (Fig. 3a). The FTZ-KR signature motif is present and intact in both the functional IAB5 chimeric enhancers, cMP and cPM (Fig. 2b). In the case of the cMP enhancer, the signature motif is present in the IAB5.2 half from D. pseudoobscura, while in the case of the reciprocal cPM enhancer, the signature motif is present in the IAB5.2 half from D. melanogaster. Molecular dissection of IAB5 shows that the individual IAB5.2 halves from D. melanogaster and D. pseudoobscura each show functional enhancer activity, while the corresponding IAB5.1 halves that lack the FTZ-KR signature motif do not (Fig. 2b). Furthermore, the 424bp region containing the center peak of sequence conservation of IAB5 (cIAB5) and the FTZ-KR signature motif drives reporter gene expression in the characteristic three-stripe IAB5-pattern in transgenic D. melanogaster. In support of the critical functional role of this region, our previous functional studies showed that the strongest predicted KR binding site within this signature motif is in fact critical to regulate the spatially restricted expression directed by IAB5 to the posterior presumptive A5, A7, and A9 segments in the D. melanogaster embryo (Ho et al., 2009). In the context of the endogenous gene complex a single point mutation in this KR repressor binding site (Superabdominal mutation) causes an anterior expansion of the embryonic domain of Abd-B expression and results in a homeotic transformation of the A3 segment into the more posterior A5 segment (Celniker et al., 1990). This result confirms that the strong KR binding site in the signature motif is essential for the in vivo functional activity of the IAB5 enhancer.
The IAB7b enhancer, which is expressed in the presumptive A7 segment of the Drosophila embryo, is thought to be regulated by many of the same activators and repressors as IAB5 (Busturia and Bienz, 1993; Zhou et al., 1999). Bioinformatic analysis reveals that a highly conserved FTZ-KR signature motif, very similar to the one identified in IAB5, is also present in the IAB7b enhancer (Fig. 3c and 4b). Molecular dissection of IAB7b to test the role of the signature motif in the activity of the enhancer demonstrates that a 154bp region containing the motif (2F2K, with two FTZ and two KR sites) from within the D. melanogaster IAB7b enhancer is able to drive reporter gene expression in the presumptive A5, A7 and A9 segments of transgenic D. melanogaster, with notably stronger expression in A7 (Fig. 4c). This expression pattern is very similar to that driven by the IAB5 enhancer. A 114bp region (2F1K, containing two FTZ and one KR site) from within the D. melanogaster IAB7b enhancer also drives this same pattern of reporter gene expression (Fig. 4c), suggesting that the 3′ KR site is dispensable for repression of enhancer activity in the central domains of the embryo. Despite the fact that the 3′ KR site also overlaps predicted BCD and HB repressor binding sites, no ectopic anterior enhancer-driven expression is observed in the 2F1K construct when tested in transgenic embryos (Fig. 4a), suggesting that the single remaining 5′ KR binding site is sufficient for repression. In fact, in more distantly related Drosophila species, the presence of two KR sites positioned near the pair of FTZ sites is lost, and only a single KR site remains (Fig. 4a).
A 110 bp region (1F2K, containing 1 FTZ and two KR sites) from IAB7b does not drive gene expression, demonstrating that the outer FTZ site is required for activation of the enhancer (Fig. 4c). One possible molecular explanation for the necessity of the outer FTZ binding site is that FTZ may be acting as a dimer in order to activate IAB5 and IAB7b. In both enhancers a pair of strong FTZ sites are present in the FTZ-KR signature motif. While the ability of FTZ to dimerize has not been reported in the literature, other homeodomain-leucine zipper proteins have been shown to function as dimers (Palena et al., 2001). In many such cases the protein factors are also able to bind DNA target sequences as monomers, albeit with comparatively lower affinity (Palena et al., 2001). There is also evidence that FTZ is capable of interaction with other proteins, namely the orphan nuclear receptor FTZ-F1 through its LXXLL leucine zipper motif (Suzuki et al., 2001). In this case the heterodimer is capable of co-activation of target genes (Suzuki et al., 2001). Given that the consensus binding sites for the two factors are very different; FTZ (NNYAATTR), FTZ-F1 (BSAAGGDKRDD) (Bowler et al., 2006; Ho et al., 2009), it is perhaps to be expected that none of the predicted FTZ and FTZ-F1 binding sites in the IAB5 or IAB7b enhancers directly overlap. However, in future studies it will be of interest to explore the role of FTZ homo- and hetero-dimerization in regulating IAB5 and IAB7b activity.
The ability of the 2F2K and 2F1K regions to drive reporter gene expression in an IAB5-like manner in the presumptive A5, A7 and A9 segments of transgenic D. melanogaster suggests that additional inputs into IAB7b are required to spatially restrict endogenous enhancer-driven gene expression to only the A7 segment. A likely candidate for repression of IAB7b activity in the A5 segment of the embryo is KNI, which is expressed in the presumptive A1–A6 segments (Fig. 1b). Our bioinformatic analysis predicts several candidate KNI binding sites in the full length 728bp IAB7b enhancer, whereas the 2F2K and 2F1K regions lack any such predicted KNI sites (Fig. 3a). Previous studies revealed that the repression of the IAB7b enhancer in A5 is mediated by sequences in the 728bp fragment and does not require additional flanking 5′ or 3′ regions (Zhou et al., 1999). In addition, while disruption of the two KR sites in the signature motif does result in reporter gene activation by IAB7b in anterior regions of the embryo, repression persists in the A5 segment (Zhou et al., 1999). This result indicates that a factor other than KR is responsible for repression in A5. In the entire 728bp enhancer only three strong KNI sites are predicted, all located in the ~300bp region on the abd-A side of the signature motif (see Fig. 4a). These sites all lie within an evolutionarily conserved genomic region and some of the sites are conserved in distantly related Drosophila species. The significance of these KNI sites in restricting the IAB7b mediated-expression pattern is currently under investigation.
Evolutionary implications for binding site turnover and enhancer architecture
A key question in understanding cis-regulatory grammar is why certain arrangements of TF binding sites confer functional enhancer activity while others fail to do so. The turnover of binding sites is common during the evolution of enhancers in different species, yet the functional activity of rapidly-evolving enhancer orthologs from different species is often robust (Ho et al., 2009; Ludwig and Kreitman, 1995), even across several hundred million years of evolutionary divergence (Hare et al., 2008). In the case of the BX-C, our bioinformatic analysis demonstrates that there is extensive binding site turnover in the IAB5, IAB7b, and IAB8 enhancers across the Drosophila genus, particularly in more distantly related species (see Fig. 2a, 3a and 4a). Despite this turnover of TF binding sites, the newly identified FTZ-KR signature motif present in both IAB5 and IAB7b and the functionally important EVE-KR cluster within IAB8 are composed of similar patterns of conserved binding site architecture. Specifically, the organization of sites is such that a pair of strong activator (FTZ or EVE) binding sites and at least one strong repressor (KR) site are in close proximity (<116 bp) to each other. Notably, the spacing between the FTZ and KR sites in the signature motif is largely unchanged across IAB5 and IAB7b enhancer orthologs in distantly related Drosophila species (Fig. 3c and 4b), although in the case of IAB7b there is the loss of the secondary KR binding site in species more distantly related to D. melanogaster. Conservation of genomic architecture of these TFBSs in the BX-C enhancers does not directly indicate that the specific spacing between sites is essential. However, the functional activity of genomic regions containing these motifs supports previous findings that closely spaced activator and repressor binding sites are critical for enhancer function (Erives and Levine, 2004; Swanson et al., 2010) and suggests that the architecture of binding sites within an enhancer is subject to significant evolutionary constraint.
It has recently been suggested through computational synthetic evolution studies that the inherent bias for deletions over insertions in the genome of D. melanogaster (and many other species) may result in the gradual loss of nucleotide space between TF binding sites (Lusk and Eisen, 2010). In effect, this deletion bias helps to artificially cluster binding sites together. In this case, although clustering of TF binding sites may not itself be a feature originally selected for in evolution on the basis of its functional significance, once established in the genome it may still play a functional role in enhancer activity. Our molecular dissection of IAB5, IAB7b, and IAB8 enhancer function argues that specific clusters of activator and repressor binding sites do play a key role in enhancer activity. As a result, such clusters, once present in enhancers, may well be under positive evolutionary selective pressure, as evidenced by the largely invariant organization of the binding sites in the IAB5 and IAB7b FTZ-KR signature motif. This selection does not preclude the possibility that if binding sites arise nearby in the genome de novo, these new binding sites may also contribute to enhancer functional activity. In this scenario, the original TF binding site cluster may no longer be necessary for enhancer function. Indeed, in the case of the IAB8 enhancer, the ΔEVE region tested in our transgenic assay may be an example of this phenomenon. This fragment is able to exhibit a weak IAB8-like enhancer function even with the deletion of the pair of strong predicted EVE binding sites (Fig. 2), potentially through the activity of weaker EVE binding sites that are present in the remaining sequence.
Although the precise spatial arrangement of TF binding sites within an enhancer may not exactly mirror the ancestral arrangement, computational predictions suggest that functional clusters of TF binding sites are likely to result from the spatial re-organization of older pre-existing sites during evolution (Lusk and Eisen, 2010). Such clusters therefore also likely indicate genomic regions with robust enhancer activity. The fact that enhancer activity in the BX-C appears to be dependent on signature motifs that represent specific spatial arrangements of TF binding sites in minimal modular regions, indicates that the physical patterns of binding site clustering are functionally significant in terms of enhancer architecture.
Supplementary Material
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Position weight matrices for six different transcription factors were used to predict binding sites in R1–8 of the 26 genomic regions identified in our bioinformatic search of the BX-C. Binding sites were predicted independently for each 1kb+ region (see materials and methods for definition) using PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)). The original cluster of binding sites identified in our search is indicated (light yellow boxes). Predicted sites for BICOID, HUNCHBACK, KRUPPEL, KNIRPS, FUSHI-TARAZU, and EVEN-SKIPPED are shown for both strands of DNA in each region (p-value thresholds indicated). The height of the vertical bars is proportional to the predicted binding site p-value (higher bars represent sites with higher predicted affinities). Note that some clusters contain fewer than 3 HUNCHBACK/KRUPPEL binding sites; this is because the bioinformatic cluster search used sequence strings as opposed to PWMs. As a result, some sites identified in the original search are not found by PATSER.
Position weight matrices for six different transcription factors were used to predict binding sites in R9–17 of the 26 genomic regions identified in our bioinformatic search of the BX-C. Binding sites were predicted independently for each 1kb+ region (see materials and methods for definition) using PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)). The original cluster of binding sites identified in our search is indicated (light yellow boxes). Predicted sites for BICOID, HUNCHBACK, KRUPPEL, KNIRPS, FUSHI-TARAZU, and EVEN-SKIPPED are shown for both strands of DNA in each region (p-value thresholds indicated). The height of the vertical bars is proportional to the predicted binding site p-value (higher bars represent sites with higher predicted affinities). Note that some clusters contain fewer than 3 HUNCHBACK/KRUPPEL binding sites; this is because the bioinformatic cluster search used sequence strings as opposed to PWMs. As a result, some sites identified in the original search are not found by PATSER.
Position weight matrices for six different transcription factors were used to predict binding sites in R18–26 of the 26 genomic regions identified in our bioinformatic search of the BX-C. Binding sites were predicted independently for each 1kb+ region (see materials and methods for definition) using PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)). The original cluster of binding sites identified in our search is indicated (light yellow boxes). Predicted sites for BICOID, HUNCHBACK, KRUPPEL, KNIRPS, FUSHI-TARAZU, and EVEN-SKIPPED are shown for both strands of DNA in each region (p-value thresholds indicated). The height of the vertical bars is proportional to the predicted binding site p-value (higher bars represent sites with higher predicted affinities). Note that some clusters contain fewer than 3 HUNCHBACK/KRUPPEL binding sites; this is because the bioinformatic cluster search used sequence strings as opposed to PWMs. As a result, some sites identified in the original search are not found by PATSER.
Research Highlights.
Transcription factor binding sites identified in bithorax complex CRMs
Spatial clustering of binding sites is functionally important
Organization of specific activator and repressor binding sites is key in CRMs
Evolutionarily conserved signature motif of clustered binding sites is critical
Acknowledgments
The research in this paper was supported by funding to R.A.D. from the National Institutes of Health (HD54977 and GM090167), the National Science Foundation (IOS-0845103) and Howard Hughes Medical Institute Undergraduate Science Education Program grants (520051213 and 52006301) to the Biology department at Harvey Mudd College. M.C.W.H. received support from the Merck-American Association for the Advancement of Science (AAAS) Undergraduate Science Research Program.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Akbari OS, et al. Unraveling cis-regulatory mechanisms at the abdominal-A and Abdominal-B genes in the Drosophila bithorax complex. Developmental Biology. 2006;293:294. doi: 10.1016/j.ydbio.2006.02.015. [DOI] [PubMed] [Google Scholar]
- Arnone MI, Davidson EH. The hardwiring of development: Organization and function of genomic regulatory systems. Development. 1997;124:1851–1864. doi: 10.1242/dev.124.10.1851. [DOI] [PubMed] [Google Scholar]
- Arnosti DN, et al. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development. 1996;122:205–214. doi: 10.1242/dev.122.1.205. [DOI] [PubMed] [Google Scholar]
- Bae E, et al. Characterization of the intergenic RNA profile at abdominal-A and Abdominal-B in the Drosophila bithorax complex. PNAS. 2002;99:16847–16852. doi: 10.1073/pnas.222671299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman BP, et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. PNAS. 2002;99:757–762. doi: 10.1073/pnas.231608898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman BP, et al. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biology. 2004;5:R61. doi: 10.1186/gb-2004-5-9-r61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bischof J, et al. An optimized transgenesis system for Drosophila using germ-line-specific phiC31 integrases. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:3312–7. doi: 10.1073/pnas.0611511104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borok MJ, et al. Dissecting the regulatory switches of development: lessons from enhancer evolution in Drosophila. Development. 2010;137:5–13. doi: 10.1242/dev.036160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowler T, et al. Computational identification of Ftz/Ftz-F1 downstream target genes. Developmental Biology. 2006;299:78–90. doi: 10.1016/j.ydbio.2006.07.007. [DOI] [PubMed] [Google Scholar]
- Bradley RK, et al. Binding Site Turnover Produces Pervasive Quantitative Changes in Transcription Factor Binding between Closely Related Drosophila Species. PLoS Biology. 2010;8:e1000343. doi: 10.1371/journal.pbio.1000343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busturia A, Bienz M. Silencers in abdominal-B, a homeotic Drosophila gene. The Embo Journal. 1993;12:1415–25. doi: 10.1002/j.1460-2075.1993.tb05785.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casares F, Sanchez-Herrero E. Regulation of the infraabdominal regions of the bithorax complex of Drosophila by gap genes. Development. 1995;121:1855–1866. doi: 10.1242/dev.121.6.1855. [DOI] [PubMed] [Google Scholar]
- Celniker SE, et al. The molecular genetics of the bithorax complex of Drosophila: cis-regulation in the Abdominal-B domain. The Embo Journal. 1990;9:4277–86. doi: 10.1002/j.1460-2075.1990.tb07876.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium ORA. ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res. 2008;36:D107–113. doi: 10.1093/nar/gkm967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costas J, et al. Turnover of binding sites for transcription factors involved in early Drosophila development. Gene. 2003;310:215–220. doi: 10.1016/s0378-1119(03)00556-0. [DOI] [PubMed] [Google Scholar]
- Crocker J, Erives A. A closer look at the eve stripe 2 enhancers of Drosophila and Themira. PLoS Genetics. 2008;4:e1000276. doi: 10.1371/journal.pgen.1000276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dermitzakis ET, Clark AG. Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002;19:1114–1121. doi: 10.1093/oxfordjournals.molbev.a004169. [DOI] [PubMed] [Google Scholar]
- Erives A, Levine M. Coordinate enhancers share common organizational features in the Drosophila genome. Proceedings of the National Academy of Sciences of the United States of America. 2004;101:3851–3856. doi: 10.1073/pnas.0400611101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frazer KA, et al. VISTA: computational tools for comparative genomics. Nucleic Acids Research. 2004;32:W273–279. doi: 10.1093/nar/gkh458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hare EE, et al. Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genetics. 2008;4:e1000106. doi: 10.1371/journal.pgen.1000106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hertz GZ, Stormo GD. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999;15:563–577. doi: 10.1093/bioinformatics/15.7.563. [DOI] [PubMed] [Google Scholar]
- Ho MC, et al. Functional evolution of cis-regulatory modules at a homeotic gene in Drosophila. PLoS Genetics. 2009;5:e1000709. doi: 10.1371/journal.pgen.1000709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology. 1961;3:318–56. doi: 10.1016/s0022-2836(61)80072-7. [DOI] [PubMed] [Google Scholar]
- Kent WJ, et al. The human genome browser at UCSC. Genome Research. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–151. doi: 10.1038/nature01763. [DOI] [PubMed] [Google Scholar]
- Lewis EB. A gene complex controlling segmentation in Drosophila. Nature. 1978;276:565–70. doi: 10.1038/276565a0. [DOI] [PubMed] [Google Scholar]
- Li XY, et al. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biology. 2011;12:R34. doi: 10.1186/gb-2011-12-4-r34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ludwig MZ, et al. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature. 2000;403:564–7. doi: 10.1038/35000615. [DOI] [PubMed] [Google Scholar]
- Ludwig MZ, Kreitman M. Evolutionary dynamics of the enhancer region of even-skipped in Drosophila. Mol Biol Evol. 1995;12:1002–1011. doi: 10.1093/oxfordjournals.molbev.a040277. [DOI] [PubMed] [Google Scholar]
- Ludwig MZ, et al. Functional evolution of a cis-Regulatory Module. Public Library of Science. 2005;3:588–598. doi: 10.1371/journal.pbio.0030093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ludwig MZ, et al. Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development. 1998;125:949–58. doi: 10.1242/dev.125.5.949. [DOI] [PubMed] [Google Scholar]
- Lusk RW, Eisen MB. Evolutionary mirages: selection on binding site composition creates the illusion of conserved grammars in Drosophila enhancers. PLoS Genetics. 2010;22:e1000829. doi: 10.1371/journal.pgen.1000829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacArthur S, et al. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 2009;10:R80. doi: 10.1186/gb-2009-10-7-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maeda RK, Karch F. The ABC of the BX-C: the bithorax complex explained. Development. 2006;133:1413–1422. doi: 10.1242/dev.02323. [DOI] [PubMed] [Google Scholar]
- Markstein M, et al. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:763–8. doi: 10.1073/pnas.012591199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin CH, et al. Complete sequence of the bithorax complex of Drosophila. Proceedings of the National Academy of Sciences of the United States of America. 1995;92:8398–402. doi: 10.1073/pnas.92.18.8398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mihaly J, et al. Dissecting the regulatory landscape of the Abd-B gene of the bithorax complex. Development. 2006;133:2983–2993. doi: 10.1242/dev.02451. [DOI] [PubMed] [Google Scholar]
- Palena CM, et al. Positively charged residues at the N-terminal arm of the homeodomain are required for efficient DNA binding by Homeodomain-leucine Zipper Proteins. Journal of Molecular Biology. 2001;308:39–47. doi: 10.1006/jmbi.2001.4563. [DOI] [PubMed] [Google Scholar]
- Ptashne M. A Genetic Switch, Third Edition: Phage Lambda Revisited. Cold Spring Harbor Laboratory Press; 2004. [Google Scholar]
- Sanchez-Herrero E, et al. Genetic organization of Drosophila bithorax complex. Nature. 1985;313:108–13. doi: 10.1038/313108a0. [DOI] [PubMed] [Google Scholar]
- Shimell MJ, et al. Functional analysis of repressor binding sites in the iab-2 regulatory region of the abdominal-A homeotic gene. Dev Biol. 2000;218:38–52. doi: 10.1006/dbio.1999.9576. [DOI] [PubMed] [Google Scholar]
- Shimell MJ, et al. Enhancer point mutation results in a homeotic transformation in Drosophila. Science. 1994;264:968–71. doi: 10.1126/science.7909957. [DOI] [PubMed] [Google Scholar]
- Small S, et al. Transcriptional regulation of a pair-rule stripe in Drosophila. Genes Dev. 1991;5:827–839. doi: 10.1101/gad.5.5.827. [DOI] [PubMed] [Google Scholar]
- Suzuki T, et al. Segmentation gene product Fushi Tarazu is an LXXLL motif-dependent coactivator for orphan receptor FTZ-F1. Proceedings of the National Academy of Sciences of the United States of America. 2001;98:12403–12408. doi: 10.1073/pnas.221552998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swanson CI, et al. Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer. Developmental Cell. 2010;18:359–370. doi: 10.1016/j.devcel.2009.12.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas-Chollier M, et al. RSAT: regulatory sequence analysis tools. Nucleic Acids Res. 2008;36:W119–27. doi: 10.1093/nar/gkn304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vavouri T, Elgar G. Prediction of cis-regulatory elements using binding site matrices--the successes, the failures and the reasons for both. Curr Opin Genet Dev. 2005;15:395–402. doi: 10.1016/j.gde.2005.05.002. [DOI] [PubMed] [Google Scholar]
- Wasserman WW, Fickett JW. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 1998;278:167–181. doi: 10.1006/jmbi.1998.1700. [DOI] [PubMed] [Google Scholar]
- Wittkopp P. Variable Transcription Factor Binding: A Mechanism of Evolutionary Change. PLoS Biology. 2010;8:e1000342. doi: 10.1371/journal.pbio.1000342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray GA. The evolutionary significance of cis-regulatory mutations. Nat Rev Genet. 2007;8:206–216. doi: 10.1038/nrg2063. [DOI] [PubMed] [Google Scholar]
- Zhou J, et al. Characterization of the transvection mediating region of the abdominal-B locus in Drosophila. Development. 1999;126:3057–65. doi: 10.1242/dev.126.14.3057. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Transcription factor clusters identified in our bioinformatic search (R cluster), the associated 1kb+ region cloned for functional analysis (R# 1kb+), and known enhancers in the BX-C are shown as a custom track in the UCSC Genome Browser (http://genome.ucsc.edu/; (Kent et al., 2002)). The Berkeley Drosophila Transcription Network Project ChIP/chip track (MacArthur et al., 2009) shows the location of verified binding for selected anterior-posterior gap/terminal (green) and pair-rule (yellow) transcription factors in stage 4–5 embryos (1% false discovery rate). The ORegAnno track (Consortium, 2008) displays previously identified literature-curated regulatory regions (dark olive) and transcription factor binding sites (light olive). The Conservation track shows Multiz alignment and phastCons scores for 12 flies, mosquito, honeybee, and beetle.
Position weight matrices for six different transcription factors were used to predict binding sites in R1–8 of the 26 genomic regions identified in our bioinformatic search of the BX-C. Binding sites were predicted independently for each 1kb+ region (see materials and methods for definition) using PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)). The original cluster of binding sites identified in our search is indicated (light yellow boxes). Predicted sites for BICOID, HUNCHBACK, KRUPPEL, KNIRPS, FUSHI-TARAZU, and EVEN-SKIPPED are shown for both strands of DNA in each region (p-value thresholds indicated). The height of the vertical bars is proportional to the predicted binding site p-value (higher bars represent sites with higher predicted affinities). Note that some clusters contain fewer than 3 HUNCHBACK/KRUPPEL binding sites; this is because the bioinformatic cluster search used sequence strings as opposed to PWMs. As a result, some sites identified in the original search are not found by PATSER.
Position weight matrices for six different transcription factors were used to predict binding sites in R9–17 of the 26 genomic regions identified in our bioinformatic search of the BX-C. Binding sites were predicted independently for each 1kb+ region (see materials and methods for definition) using PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)). The original cluster of binding sites identified in our search is indicated (light yellow boxes). Predicted sites for BICOID, HUNCHBACK, KRUPPEL, KNIRPS, FUSHI-TARAZU, and EVEN-SKIPPED are shown for both strands of DNA in each region (p-value thresholds indicated). The height of the vertical bars is proportional to the predicted binding site p-value (higher bars represent sites with higher predicted affinities). Note that some clusters contain fewer than 3 HUNCHBACK/KRUPPEL binding sites; this is because the bioinformatic cluster search used sequence strings as opposed to PWMs. As a result, some sites identified in the original search are not found by PATSER.
Position weight matrices for six different transcription factors were used to predict binding sites in R18–26 of the 26 genomic regions identified in our bioinformatic search of the BX-C. Binding sites were predicted independently for each 1kb+ region (see materials and methods for definition) using PATSER (http://rsat.ulb.ac.be/rsat/patser_form.cgi; (Hertz and Stormo, 1999; Thomas-Chollier et al., 2008)). The original cluster of binding sites identified in our search is indicated (light yellow boxes). Predicted sites for BICOID, HUNCHBACK, KRUPPEL, KNIRPS, FUSHI-TARAZU, and EVEN-SKIPPED are shown for both strands of DNA in each region (p-value thresholds indicated). The height of the vertical bars is proportional to the predicted binding site p-value (higher bars represent sites with higher predicted affinities). Note that some clusters contain fewer than 3 HUNCHBACK/KRUPPEL binding sites; this is because the bioinformatic cluster search used sequence strings as opposed to PWMs. As a result, some sites identified in the original search are not found by PATSER.




