Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jul 10.
Published in final edited form as: Dev Biol. 2009 Jan 8;328(2):518–528. doi: 10.1016/j.ydbio.2008.12.039

Genome wide ChIP-chip analyses reveal important roles for CTCF in Drosophila genome organization

Sheryl T Smith 1,5, Priyankara Wickramasinghe 1, Andrew Olson 2, Dmitri Loukinov 3, Lan Lin 1, Joy Deng 1, Yanping Xiong 1, John Rux 1, Ravi Sachidanandam 4, Hao Sun 1, Victor Lobanenkov 3, Jumin Zhou 1,*
PMCID: PMC6620017  NIHMSID: NIHMS110998  PMID: 19210964

Abstract

Insulators or chromatin boundary elements are defined by their ability to block transcriptional activation by an enhancer and to prevent the spread of active or silenced chromatin. Recent studies have increasingly suggested that insulator proteins play a role in large-scale genome organization. To better understand insulator function on the global scale, we conducted a genome-wide analysis of the binding sites for the insulator protein CTCF in Drosophila by Chromatin Immunoprecipitation (ChIP) followed by a tiling-array analysis. The analysis revealed CTCF binding to many known domain boundaries within the Abd-B gene of the BX-C including previously characterized Fab-8 and MCP insulators, and the Fab-6 region. Based on this finding, we characterized the Fab-6 insulator element. In genome-wide analysis, we found that dCTCF-binding sites are often situated between closely positioned gene promoters, consistent with the role of CTCF as an insulator protein. Importantly, CTCF tends to bind gene promoters just upstream of transcription start sites, in contrast to the predicted binding sites of the insulator protein Su(Hw). These findings suggest that CTCF plays more active roles in regulating gene activity and it functions differently from other insulator proteins in organizing the Drosophila genome.


In eukaryotes, particularly in metazoans, a vast number of genes and other genetic elements must share a linear DNA molecule regardless of diverse structures and functional states. The regulatory information of one gene could be located tens of kilobases away from the promoter, thus raising the question of how regulatory specificity is achieved. A growing body of evidence suggests that specific DNA elements exist to organize the genome in order to ensure that the proper long-range enhancer-promoter communication is selectively facilitated, while inappropriate regulation is prevented. Chromatin boundary elements, or insulators, are a class of regulatory DNA elements that can block transcription activation when inserted between an enhancer and a promoter. Insulators also provide chromatin barrier function to prevent the spread of active or inactive chromatin; in particular, the encroachment of heterochromatin into active domains of expression (Dorman et al., 2007; Felsenfeld et al., 2004; Gaszner and Felsenfeld, 2006; Wallace and Felsenfeld, 2007).

A number of proteins have been found to interact with and function through insulators. Examples include CTCF in vertebrates (Bell et al., 1999; Hark et al., 2000; Kanduri et al., 2000) and in Drosophila (Moon et al., 2005), SuHw (Dorsett, 1993; Geyer and Corces, 1992), ZW5 (Gaszner et al., 1999), BEAF (Zhao et al., 1995) in flies and TFIIc in yeast (Noma et al., 2006). CTCF is one of the best-characterized zinc finger-containing proteins, and is believed to interact with most insulators identified to date in vertebrates, and with at least one insulator in Drosophila, Fab-8 (Kim et al., 2007; Moon et al., 2005). The enhancer-blocking function of CTCF is well documented. However, the chromatin barrier function of insulators appears to involve other DNA binding proteins, such as USP1 in the case of the HS4 insulator at the β-globin locus (West et al., 2004).

Genetic and molecular studies of insulator proteins have provided mechanistic insights into how these elements might function. In cultured vertebrate cells the HS4 insulator of the β-globin locus associates with CTCF, which has been shown to interact with nucleophosmin, a nucleolar protein, resulting in a tethering of the insulator to the nucleolus (Yusufzai et al., 2004). In Drosophila, targeting of the Su(Hw) insulator to subnuclear compartments termed “insulator bodies” near the nuclear periphery has been demonstrated (Gerasimova et al., 2000). These studies suggest that large-scale movement of DNA within the nucleus, (i.e. the movement of DNA loops in the nucleus via interaction of insulators with structural components) may be an important component of insulator function. Other studies suggest an alternative, but not mutually exclusive, possibility that insulators may function by providing a promoter decoy to trap enhancers,( i.e. insulators mimic features of promoters to interact with enhancer elements), thus preventing an enhancer element from activating its promoter (Chernukhin et al., 2007; Geyer, 1997).

Despite recent insights into insulator function, our views have largely been limited to the actions of insulators within specific chromosomal contexts. Recent studies from several laboratories have provided direct evidence that the insulator protein CTCF may be involved in organizing the genome into higher order structures in the interphase nuclei (Dorman et al., 2007; Wallace and Felsenfeld, 2007). For example, a modified Chromatin Conformation Capture (CCC) technique has led to the discovery that CTCF is essential for inter-chromosomal interactions between the Imprinting Control Region (ICR) from the Igf2/H19 loci on chromosome 7 and Wsb1/Nf1 on chromosome 11 (Ling et al., 2006). More recent studies have shown that CTCF co-localizes with cohesins on the replication origin of KHSV and on mouse chromosomes (Parelho et al., 2008; Rubio et al., 2008; Stedman et al., 2008). Furthermore, CTCF and cohesins are both localized in the ICR of Igf2/H19 loci and contribute to enhancer blocking (Wendt et al., 2008). These studies, combined with the recent evidence that large-scale organization of chromosomes in the interphase nucleus is highly nonrandom, suggest that insulator proteins may play an active, regulatory role in the functional organization of chromatin in the nucleus.

In this study, we analyzed the CTCF protein-binding sites in the Drosophila genome by Chromatin Immunoprecipitation followed by tiling array analyses. (ChIP-chip). We found more than 3,561 strong binding sites (8 fold enrichment or higher of ChIP signals) and an additional 8,872 weaker but robust binding sites (4 fold or higher enrichment). The ChIP-chip data allowed us to identify the Fab-6 insulator element from the Abd-B locus. In the genome-wide analysis, CTCF is often found to be situated between closely positioned yet divergently transcribed genes. Most importantly, CTCF binding sites are highly enriched near the promoter region to the 5’ end of genes. These results suggest that CTCF may play an important role in regulating gene function and organizing the Drosophila genome.

Results

Drosophila ChIP-chip

Polyclonal antibodies were raised in rabbits against the C-terminal 158 residues of Drosophila CTCF (Covance Research Products). ChIPed material from S2 cells was amplified and sent to NimbleGen for hybridization to tiling arrays, which uses 50 bp oligonucleotide probes with a median probe spacing of 97 bp to cover the entire 118.4 Mb euchromatic region of the Drosophila genome (BDGP 2004 release 4) and 20.3 Mb of heterochromatin (Release 3.2 BHGP). ChIPed DNA fragments were in the range of 200–500 bp (data not shown). Therefore, a labeled CTCF fragment is expected to hybridize with two to four probes randomly arranged on the array. NimbleGen provided the array hybridization data in three forms: signal intensity (raw) data, scaled Log2-Ratio data, and peak data. The processed peak data files were generated from the scaled log2-ratio data. Peaks were detected by identifying four or more probes whose signals were above specified cutoff values using a 500 bp sliding window. Peaks were assigned a False Discovery Rate (FDR) to evaluate false positives. Essentially, the lower the FDR, the more likely a peak corresponds to a CTCF-binding site. Peaks with FDR scores ≤ 0.05 represent the highest confidence protein-binding sites, and were designated by a red colored bars (Fig. 1A). Peaks with FDR scores between 0.05 and 0.2 were designated by an orange/yellow colored bars, and also correspond to protein binding sites. Finally, peaks with FDR scores > 0.2 were designated by a grey colored bars, and represent the lowest confidence protein-binding sites. We analyzed only the red peaks. The peaks vary in width and height. The width of a peak represents the relative length of DNA to which labeled ChIPped fragments bind to consecutive probes along the DNA. The height of each peak represents the ratio of the experimental sample to input control. The peaks is divided into three categories: 1x peaks correspond signals greater than 2 fold but less than 4-fold enrichment above background; 2x peaks correspond CTCF binding signals greater than 4-fold enrichment but less then 8-fold; and 3x CTCF peaks are ChIP signals that are greater than 8-fold above background. The data were extracted from scanned images of each array, and represented in a scaled Log2 ratio. The ChIP-chip data from a segment of the X chromosome was compared to polytene staining using CTCF antibody. The major binding sites from ChIP-chip appeared to correlate well with bands detected by immunostaining of polytene chromosomes (Fig. 1C).

Figure 1. ChIP-chip analysis of CTCF targets in the Drosophila genome.

Figure 1.

A. Validation of ChIP-chip results by conventional ChIP. The major CTCF peaks in the Abd-B region are confirmed by ChIP. Lane 1 iab-4 region; lane 2 region upstream from MCP, lane 3 MCP; lanes 4–6, Fab-6 fragments; lane 7 Fab-8; lane 8 Abd-B promoter.

B. EMSA test of CTCF binding to Fab-6. Three overlapping 100bp DNA fragments from Fab-6c (Fab-6c1, Fab-6c2 and Fab-6c3) were used for EMSA. Only the CTCF motif-containing fragment displayed binding to CTCF.

C. Polytene staining of a segment of the X chromosome (1A to 4F) shows that CTCF band corresponds with the major peaks in ChIP-chip analyses.

D. The Drosophila CTCF consensus in flies is a subset of human CTCF consensus. Diagram of Drosophila and human CTCF is shown below.

The Drosophila CTCF binding motif was identified by comparing DNA sequences covered by all CTCF peaks obtained from NimbleGen using sequence data from the Release 4 (Apr. 2004, UCSC version dm2) assembly of the Drosophila genome (UCSD GoldenPath Server), and the discriminating matrix enumerator (DME) algorithm by Smith et al. 2005 (Kim et al., 2007; Smith et al., 2005). The Drosophila CTCF motifs spanning 8 to 15 base pairs were evaluated over all of the 12,433 sites using the motifclass program from the CREAD package (Smith et al., 2005; Smith et al., 2006). The highest ranking motif with respect to relative error rate is the width = 11 motif. The eleven-residue Drosophila CTCF motif corresponds to a subset of the twenty-residue human CTCF sequence described recently, and is similar to a recent ChIP-chip study of the Adh and BX-C regions (Holohan et al., 2007; Kim et al., 2007). We next examined how the CTCF motif correlated with CTCF peaks. We compared 1x, 2x, 3x CTCF peaks for the distances between CTCF peaks and the nearest motif within a 200bp window. As shown in Fig. 1D, and 5D, in all three cases, a CTCF motif could be found at similar frequencies within approximately 70 bp from the center of each peak.

Figure 5. CTCF binding near promoter regions.

Figure 5.

A. Density of CTCF signals plotted near the promoter. All three levels of enrichment show the same trend of peaking just a few hundred base pairs upstream of the promoter.

B. The positions of the closest neighboring TSS located upstream of gene promoters is analyzed for CTCF-bound promoter and non CTCF-bound promoters. The different numbers of such neighboring promoters are plotted according to the distance from the CTCF-bound promoter (hollow black rectangle). Promoters that do not have CTCF signals bound to them are represented by blue rectangle with gray fill. When the two groups were superimposed, the first group clearly had more promoters located nearby.

C. The relative distance of CTCF binding motif from an actual peak for intronic CTCF and promoter CTCF.

D The relative distance between CTCF binding signal and CTCF binding motif for different signal levels.

CTCF interacts with multiple sites within the bithorax complex including the predicted domain boundary Fab-6.

Drosophila CTCF was first identified by our previous study to interact with the Frontabdominal-8 (Fab-8) element between the two adjacent regulatory domains, iab-7 and iab-8 in the Abd-B region (Moon et al., 2005). Mutations of CTCF binding sites disrupt CTCF binding both in vitro and in vivo, resulting in the loss of enhancer blocking in transgenic embryos. Consistent with these results, our ChIP-chip analysis shows a strong peak in the position of Fab-8, while no detectable binding is observed in the surrounding region (Fig. 1A). In addition, CTCF binding is detected in other domain boundaries within the Abd-B locus including Fab-6 and MCP. Two strong peaks of CTCF were also detected in the 5’ end and in a region near the Abd-B promoter. The ChIP-chip results were confirmed by conventional ChIP and EMSA (Fig. 1A, B), thus the CTCF peaks in Abd-B locus correspond to authentic in vivo interactions. These findings suggest that CTCF plays an important regulatory role in organizing the Abd-B locus.

CTCF is essential for the enhancer blocking activity of the Fab-6 insulator at the Abd-B locus

The Bithorax gene complex consists of three homeotic genes, Ultrabithorax, abdominal-A and Abd-B, which organizes the developmental program of thoracic and abdominal segments (Duncan, 1987; Lewis, 1978). The Abd-B gene contains four segment-specific regulatory regions that are separated by domain boundary elements (Duncan, 1987; Karch et al., 1985; Maeda and Karch, 2006; Mihaly et al., 2006). Supporting this hypothesis, MCP, Fab-7 and Fab-8 have been identified and characterized in terms of enhancer-blocking activity. Preliminary reports also suggest the existence of a Fab-6 boundary (Maeda and Karch, 2006; Mihaly et al., 2006). To identify the Fab-6 insulator, we characterized a 2 kb DNA sequence containing CTCF sites, which was subsequently tested in an enhancer blocking assay in early Drosophila embryos (Fig. 2A, B).

Figure 2. Identification of the Fab-6 insulator.

Figure 2.

A. Transgene construct for detecting insulator activity. It consists of divergently transcribed w and lacZ genes, PE enhancer from twist gene and IAB5 from Abd-B. Test DNA is inserted between the two enhancers.

B. The 2 kb DNA from Fab-6 region effectively blocks enhancer-promoter interactions, as both IAB5-w and PE-lacZ interactions are blocked.

C. Fab-6a fragment exhibits no blocking activity as IAB5 activates w and PE activates lacZ.

D. Fab-6b does not block enhancer activity, similar to Figure 2C.

E. Fab-6c fragment blocks the PE enhancer from activating lacZ and attenuates the IAB5-w interaction.

Two divergently transcribed reporter genes, white (w) and lacZ, and embryonic enhancers PE from the twist gene (Jiang et al., 1992) and IAB5 from the Abd-B gene (Busturia and Bienz, 1993) were inserted into a transgene(Fig. 2A). The PE enhancer activates reporter gene expression in the ventral region of the 2 to 4 hour-old embryos, while the IAB5 enhancer activates reporter gene expression in a vertical banding pattern in the posterior third of the embryos. When a control spacer is inserted, the two enhancers direct additive transcription activities. However, when an insulator element is inserted between the two enhancers, only the proximal enhancers activate transcription. For example, if an insulator were inserted, PE would only activate the w promoter, while IAB5 would only activate the lacZ gene. The normal IAB5-w interaction and PE-lacZ interaction would thus be blocked when an insulator is inserted. When the 2 kb Fab-6 is inserted, enhancer blocking is observed (Fig. 2B). We next dissected this region into three 1 kb overlapping fragments (a, b, c) and tested each fragment in the transgene. As can be seen in Fig. 2C–2E, only one of the fragments, Fab-6c shows enhancer-blocking activity. Compared to Fab-8, Fab-6 exhibits a weaker enhancer blocking activity in the early embryos (data not shown). The 1 kb Fab-6c contains two CTCF binding sites and includes a peak of CTCF binding in the ChIP-chip analysis. CTCF binding was confirmed both by independent ChIP (Fig. 1A) and by EMSA (Fig. 1B), strongly suggesting that CTCF is the insulator-binding protein associated with Fab-6.

Fab-6 insulator function was also tested using an enhancer-blocking assay using a vector that contains GFP, RFP and a constitutive PE enhancer (Fig. S1A). We then inserted the insulator DNA containing either Fab-8 and Fab-6 between the PE enhancer and the GFP promoter. When Drosophila S3 cells were transfected with these plasmids containing either Fab-8 or Fab-6 insulator, the PE enhancer was found to activate the leftward transcribed RFP gene but only moderately activate the rightward transcribed GFP gene. As a result, the merged image showed mostly red cells, suggesting that both of these elements block the enhancer-promoter interaction in the plasmid (Fig. S1C, D). We also tested a one kb region near the Abd-B promoter containing the CTCF binding signal based on ChIP-chip experiment (Fig. 1A). Post transfection, both GFP and RFP are expressed to a similar extent, suggesting that the Abd-B promoter element does not block enhancers in this context (Fig. S1E). These results suggest that CTCF is essential for the enhancer-blocking activity of the Fab-6 and Fab-8 insulators, and that CTCF may play an important role in organizing the Abd-B locus.

CTCF binds between spatially or temporarily divergent genes

The enhancer blocking activity of insulators demonstrated both in vitro and in vivo. suggests that these elements might be located between closely positioned genes in the genome, especially genes that are either spatially or temporally divergent. A survey of the ChIP-chip data reveals that this is often the case. Figure 3 provides two examples: In the first (Fig. 3A), CTCF binds between β amyloid protein precursor-like (Appl), the Drosophila orthologue of human gene implicated in Alzheimer’s disease (Torroja et al., 1999), and an uncharacterized transcript CG4293. Expression data from Affymatrix shows that Appl is expressed in the embryo after 6 hours of development while the CG4293 is likely maternally loaded and is transcribed in early embryogenesis (Fig. 3A). A strong CTCF peak is seen between the divergently transcribed promoters. Interestingly, the CTCF binding coincides with the boundary between cytological bands 1B8 and 1B9. In the second example, CTCF is situated between the leftward transcribed bicoid (bcd)(Lawrence, 1988) and the rightward transcribed Amalgam (Ama) gene (Seeger et al., 1988)(Fig. 3B). bcd is a member of gap genes required for early patterning along the anterior-posterior axis. bcd RNA is maternally loaded into the embryos and is restricted in the anterior of the early embryo (Lawrence, 1988) (Fig. 3C). In contrast, Ama is a ligand for the transmembrane receptor neurotactin and is required for neurotactin-mediated cell adhesion and axon fasciculation in developing flies. It is expressed during embryogenesis and is localized to the dorsal region and ventral neural ectoderm of the embryo (Fremion et al., 2000)(Fig. 3D). These binding patterns of CTCF imply that CTCF insulators may be necessary to separate neighboring genes that are differentially regulated.

Figure 3. CTCF exists between closely positioned, divergently transcribed promoters.

Figure 3.

A. CTCF interacts with a region between Appl and CG4293. 1x, 2x and 3x indicate 2-, 4- and 8-fold ChIP enrichment. Cytobands highlight positions of cytological band and boundaries defined on polytene chromosomes. Affy data sets denote transcripts at different times post egg-laying. Appl, rightward transcribed Appl gene. CG4293, leftward transcribed undefined transcript (Flybase http://flybase.org/).

B. CTCF binds strongly between bcd and Ama. TSS, transcription start site.

C. Expression of bcd and Ama in different embryonic stages. Bcd is maternal and can be detected between stages 1–6, while ama is zygotic and expressed after stage 4 (image courtesy of Flybase http://flybase.org/).

CTCF interacts with the promoter proximal regions.

The whole genome wide distribution of CTCF signals was plotted using a 100,000 bp sliding window with the number of CTCF signals and genes in each window represented on a graph. An example of chromosome 3R is shown in Fig. 4 in either 1x (2 fold enrichment, Fig. 4A), or 3x peaks (8 fold enrichments, Fig. 4B), which shows that CTCF distribution is highly non-random. To specific insight into CTCF distribution, we then analyzed the numbers of CTCF binding sites (2x vs. 3x) that localized to each region of the gene. The regions that were examined were as follows: Promoter (−500 bp to first exon), within exons, within introns, 3’ end of the gene (defined as the last exon), or within intergenic regions (Fig. 4C, D).

Figure 4. Genome wide distribution of CTCF binding sites.

Figure 4.

A. Distribution of 1x CTCF signal (red) vs. random distribution (yellow) on chromosome 3R. The pattern shows highly nonrandom distribution. Genes are shown in blue.

B. Similar distribution of 3xCTCF signals.

C. Percent of 2xCTCF signals in each region of the gene.

D. Distribution of 3xCTCF signals in different regions of the gene.

E. Stacking analyses showing the distribution of 3xdCTC (top) and 2xCTCF near transcription start sites (TSS) in a 6kb window. In these analyses, each TSS is taken together with CTCF binding information in the nearby 6kb DNA and is collected and stacked together. The peaks in the center, around 3kb position, indicate the close association of CTCF with promoters. Note that the predicted Su(Hw) binding site does not show such a trend. In contrast, fewer binding sites near the TSS are observed (see rectangle).

First, we found significantly higher numbers of CTCF binding near TSSes and exons as compared to the other gene regions. For example, of 3,561 3x CTCF binding peaks, 710 are situated near the promoter in 656 nonoverlapping 500 bp bins (Table 1A). This is supported by stacking analyses of CTCF against transcription start sites (TSS), which showed a significant correlation (Fig. 4E and Table 1). Interestingly, the strong bias of CTCF binding to promoter regions is specific to CTCF and not to another well-characterized insulator protein, Su(Hw) (Fig. 4E, Table 2B)(Ramos et al., 2006). To determine the relative CTCF binding distribution to TSSes, we plotted the CTCF binding from −2,000 to +3,000 around the TSSes. As can be seen from Fig. 5A, CTCF binding is concentrated approximately 200–300 bp upstream of the TSS.

Table 1. Genome-wide distribution of CTCF sites in different regions of genes.

Distribution of 2xCTCF signals within the promoter (−500bp to first exon), exons, introns, 3’ of the gene (the last exon), and intergenic regions were calculated. Table 1 shows strong enrichment of CTCF binding sites near the TSS, and also lists the frequencies of CTCF binding motif in each group of CTCF sites. No obvious differences were observed among different groups.

CTCF signal Strength 3X (8-fold enrichment) 2X (4-fold enrichment)
 Total Recorded sites 3561 12433
# of sites # of non-overlap 500 bp bins # of sites # of non-overlap 500 bp bins
CTCF sites overlap with TSS 710 656 1794 1540
CTCF sites overlap with 3’ 170 386 669 1245
CTCF sites overlap with exons 193 235 565 588
CTCF sites overlap with introns 986 14196 3964 32426
CTCF sites overlap with intragenic 1502 22134 5442 51626
Count % Count %
CTCF sites near TSS with motif 389 54.79 947 52.79
CTCF sites near TSS no 321 45.21 847 47.21
motif Intronic CTCF sites 525 53.25 2034 51.31
with motif Intronic CTCF sites no motif 461 46.75 1930 48.69

Table 2. Genomic features comparison.

The total numbers of each of the two values compared were given. The window size is 1,500 bp. When both values, for example TSS and 2xCTCF, were found within the same 1500 bp region, it is counted as found. The total number found was compared to an expected number if the two values are unrelated. The ratio of found vs. expected is given in the last column. A value of 1.0 is considered as no association. 2A. The table shows strong association between TSS and 2xCTCF. 2B. For TSS and predicted binding sites of Su(Hw), no association is observed. 2C. Association between CTCF and ncRNA is shown here. 2D. Comparison of tRNA and CTCF also reveals an association between the two.

    Table 2A. Comparison between TSS and 2x CTCF peaks

Chromosome Length of chromosome (kb) Total # of TSS regions Length of TSS (kb) Total # CTCF peaks #found in TSS regions # of TSS regions Expected # if random Ratio Observed/ expected
Chr X 22,224 1,690 2,762 2,546 614 221 316 1.94
Chr 4 1,282 72 120 105 19 11 9 1.93
Chr 2L 22,408 1,847 3,083 1,944 463 198 267 1.73
Chr 2R 20,767 1,956 3,345 1,596 411 204 257 1.59
Chr 3L 23,719 1,980 3,314 2,801 605 236 390 1.54
Chr 3R 27,905 2,504 4,201 3,114 814 268 468 1.73
Total 118,357 10,049 16,825 12,101 2,926 1,138 1,720 1.70
    Table 2B. Comparison between TSS and predicted Su(Hw) binding sites

Chromosome Length of chromosome (kb) Total # of TSS regions Length of TSS (kb) Total # Su(Hw) sites #found in TSS regions # of TSS regions Expected # if random Ratio Observed/ expected
Chr X 22,224 1,690 2,762 558 60 58 69 0.86
Chr 4 1,282 72 120 20 3 3 1 1.6
Chr 2L 22,408 1,847 3,083 493 73 67 67 1.07
Chr 2R 20,767 1,956 3,345 446 73 72 71 1.01
Chr 3L 23,719 1,980 3,314 485 64 61 67 0.94
Chr 3R 27,905 2,504 4,201 623 84 81 93 0.89
Total 118,357 10,049 16,825 2,625 357 342 373 0.95
    Table 2C. Comparison between noncoding RNAs and 2x CTCF peaks

Chromosome Length of chromosome (kb) Total # of noncoding regions Length of non coding DNA (kb) Total # CTCF peaks #found in noncoding regions # of TSS regions Expected # if random Ratio Observed/ expected
Chr X 22,224 44 101 2,546 19 6 11 1.64
Chr 4 1,282 0 0 105 0 0 0 0
Chr 2L 22,408 103 226 1,944 21 12 19 1.06
Chr 2R 20,767 87 227 1,596 16 11 17 0.91
Chr 3L 23,719 75 158 2,801 22 8 18 1.18
Chr 3R 27,905 89 238 3,114 76 23 26 2.86
Total 118,357 398 951 12,101 154 60 97 1.58
    Table 2D. Comparison between tRNA genes and 2x CTCF peaks

Chromosome Length of chromosome (kb) Total # of tRNA regions Length of tRNA (kb) Total # CTCF peaks #found in tRNA regions # of tRNA regions Expected# if random Ratio Observed/ expected
Chr X 22,224 16 2,762 2,546 3 2 3 0.89
Chr 2L 22,408 30 3,083 1,944 6 6 4 1.3
Chr 2R 20,767 50 3,345 1,596 10 7 7 1.29
Chr 3L 23,719 29 3,314 2,801 5 2 6 0.78
Chr 3R 27,905 47 4,201 3,114 43 14 10 4.29
Total 118,357 172 16,825 12,101 67 31 33 1.99

It is possible that many of the CTCF binding sites near the promoter may represent examples where insulators are necessary to separate two closely positioned promoters. To test this possibility, we searched the upstream region of all 1x, 2x and 3x CTCF bound promoters for the nearest neighboring gene promoter (again, defined as the sequence between −500 bp to the first exon). We then randomly selected the same number of non-CTCF-bound promoters and conducted the same search. The results for 1x CTCF are shown in Fig. 5B (results for 2x and 3x CTCF are shown in Fig. S2A and S2B). There is a larger number of promoters from neighboring genes present within 1kb upstream from a CTCF-bound promoter compared to the random group, while from 1 kb upstream to 5kb upstream, the chance of finding a promoter element in the CTCF-bound promoter group is significantly less compared to the control group. For example, 365 neighboring TSSs were found within 1kb of a promoter that is bound by 2xCTCF signal as compared to the 293 in control group (Table S1). But when the regions between 1kb upstream to 3kb upstream were examined, the numbers are 192 and 227, respectively. When the same analysis is done with 3xCTCF signals, 170 TSSes were found to be within 1kb of a 3xCTCF-bound promoter as compared to the control group of 136. Between 1kb to 3kb upstream, the number becomes 65 and 113, respectively. At all three signal strengths (1x, 2x and 3x), the CTCF-bound promoters tend to have more neighboring TSSes located within only 1 kb away upstream, but are less likely to have such TSSes between 1 and 3kb upstream. This trend strongly suggests that at least one function of promoter-interacting CTCF peaks is to separate two closely positioned genes (promoters that are within 1 kb of each other).

However, the unusually high level of CTCF overlap with the promoter may not be explained solely by insulator function. For example, only a small fraction of the CTCF bound promoters have a nearby neighboring promoter located upstream. For example, of the 796 3xCTCF peaks bound directly adjacent to a promoter, only 170 are there ostensibly to separate closely-positioned promoters. In addition, the distribution of predicted Su(Hw) insulator sites does not show such a bias towards promoters (Fig. 4E, Table 2B) (Ramos et al., 2006). Thus, it might be possible that not all CTCF binding upstream of a promoter may function as insulators. Some of the CTCF binding sites might have novel regulatory functions.

Under this scenario, it is possible that some of the CTCF may have been recruited to the promoter region by indirect means. A recent report suggests that CTCF interacts with Pol II, thus it is possible that some of the CTCF binding to promoters may be due to recruitment by Pol II (Chernukhin et al., 2007). If CTCF were recruited by PolII, we would likely observe a lower occurrence of the CTCF binding motif at promoters as compared to the occurrence of the CTCF binding motif at other genomic regions (i.e., intergenic, intronic, etc.).To test this, we searched CTCF sites near the promoters for CTCF motif, and compared the frequency of this motif with that of intronic CTCF binding sites. Table 1 shows that the promoter-binding CTCF sites do not have a lower percentage of sites that contain the CTCF motif than do intronic CTCF binding sites. In contrast, promoter sites even have a slightly higher percentage of CTCF motif than the intronic CTCF binding sites (54.79% vs. 53.25%) (Table 1) when 3x binding sites are analyzed. When 2x CTCF binding was examined, the percentages of the binding sites with the CTCF motif are similar for the promoter associated CTCF and the intronic CTCF (52.79% vs. 51.31%) (Table 1). We also calculated the distribution of CTCF motif relative to CTCF peaks for both promoter binding CTCF sites and intronic CTCF sites. We found no major differences between the two groups (Fig. 5C), thus the two are directly comparable. These findings suggest that the increased incidence of CTCF binding to promoter regions is not due to indirect recruitment.

The genome wide trend of CTCF binding near promoters also extends to noncoding RNA (ncRNA) genes particularly small noncoding RNAs (snRNA) and tRNAs. It should be noted that the majority of ncRNAs do not have a recognizable TSS. However, due to the small size of most ncRNAs, the binding of CTCF at or near an ncRNA would be considered at or near its promoter as well. From 398 noncoding RNA genes, we found 154 that contain CTCF binding at close proximity to the transcription start site (from −1000 bp to +500 bp), which is 1.58 times that of what is expected if the distribution was random (Table 2C). For tRNA genes, 67 of 172 are located near CTCF binding sites (Table 2D). Although many of the CTCF binding sites could serve as insulators that separate these noncoding genes from neighboring coding genes, some of these are well separated from nearby genes but still strongly interact with CTCF.

Discussion

In this study, we conducted a genome-wide analysis of the interaction sites of the Drosophila insulator protein CTCF. We observed that CTCF is biased towards genes and strongly prefers the 5’ promoter proximal region of genes. In gene-rich regions, CTCF is often found between divergently transcribed genes. In the Drosophila Abd-B locus, CTCF interacts with all predicted domain boundaries except Fab-7. Based on the ChIP-chip data, we identified a 1 kb Fab-6 boundary in enhancer blocking assays. These results support the role of CTCF as an insulator protein, and suggest that CTCF plays important role in organizing higher order chromatin structures.

CTCF was first cloned almost two decades ago as a transcription factor, which poses both activator and repressor functions (Lobanenkov et al., 1990). Only recently has CTCF been found to function as an insulator protein (Bell et al., 1999; Hark et al., 2000; Kanduri et al., 2000). In fact, later studies suggested that CTCF may be the only enhancer-blocking protein in vertebrates, unlike in Drosophila, which contains several different insulator proteins in its genome (West et al., 2002). A large collection of studies suggest that CTCF has diverse functions, including activator or repressor functions for specific genes, enhancer blocking, X-inactivation, imprinting control, cell cycle regulation and apoptosis (Gaszner and Felsenfeld, 2006; Ohlsson et al., 2001). So far no unifying hypothesis exists to explain the apparently diverse functions of CTCF. Posttranslational modifications and diverse binding targets could account for some of the functional diversity of CTCF (Klenova et al., 2002). The current ChIP-chip study and recent studies suggest that fly and vertebrate CTCF recognize a similar consensus sequence (Fig. 1D) (Holohan et al., 2007; Kim et al., 2007). However, the Drosophila consensus is shorter than that of the human consensus, which makes it difficult to predict CTCF binding based exclusively on consensus.

Insulators are predicted to be located between neighboring genes, separating the regulatory activity of one gene from that of another. As insulators work over long distances, a popular assumption is that insulators are located in intragenic regions. However, recent studies of the human genome suggest that CTCF binding sites are highly biased towards genes (Kim et al., 2007). This was confirmed by our ChIP-chip study in the Drosophila genome (Table 1). In addition, the majority of CTCF sites are located upstream of transcription start sites. By examining the distribution of CTCF sites relative to predicted TSS, we found about one in every three CTCF binding sites is located within 1 kb of a TSS. This is significantly higher than random distribution, with an observed ration versus the predicted ratio of 1.77 (Table 1, table 2A). This distribution pattern contrasts with that of predicted Su(Hw) insulator protein binding sites (Ramos et al., 2006), which does not show any bias towards promoters (Table 2B, Fig.4E).

Several interpretations exist to explain the strong promoter bias of CTCF binding sites. First, this bias suggests that some of these CTCF binding sites serve as insulators to separate closely positioned, yet divergently transcribed genes. In one example, CTCF binds near the APPL promoter separating it from a differentially expressed transcript (Fig. 3A). In the second case, CTCF interacts with a region between bcd and ama, two divergently expressed yet closely juxtaposed genes (Fig. 3B, C). Interestingly, the human ß-amyloid precursor protein (APP) gene promoter also interacts with CTCF, which plays a direct role in regulating APP (Burton et al., 2002; Vostrov et al., 2002). From approximately 800 3x CTCF binding site, nearly170 may belong to this class of CTCF binding sites. This would suggest the second possibility that the majority of CTCF sites may not be used to block the enhancer of one gene from working on the promoter of another. Rather, CTCF may function directly as an activator or repressor to control gene expression. Examples include the human CTCF activator function for APP (Burton et al., 2002; Vostrov et al., 2002), or the repressor activity for myc and hTERT (Klenova et al., 1993; Lobanenkov et al., 1990; Ohlsson et al., 2001; Renaud et al., 2005). Although no example has been reported in Drosophila, we expect that the strong bias of CTCF binding to promoter regions suggests that such a regulatory function exists.

A third possible function of CTCF at gene promoters is to regulate specific chromatin structures within the control region of a gene. The Abd-B locus serves as a good example, where CTCF interacts with multiple sites within the regulatory region (Fig. 1A) (Holohan et al., 2007). Finally, the function of the CTCF binding sites may be structural, i.e. CTCF targets a gene promoter to sub nuclear compartments (Yusufzai et al., 2004), such that the promoters of these genes could be precisely regulated. In the Abd-B locus, the location of CTCF may suggest such a scenario. Strong CTCF binding is found near the Abd-Bm promoter (Fig. 1A) (Holohan et al., 2007), but these sites did not function as insulators in enhancer blocking assays (Fig. 2). If CTCF were to organize spatial loops in this locus, the spatial proximity of these four CTCF binding sites would help bring together the Abd-B promoter and different boundary elements. This would allow efficient regulation of the promoter by PRE elements that are located just next to these boundaries (Hagstrom et al., 1997; Holohan et al., 2007; Mihaly et al., 1998). Recent studies have provided evidence consistent with this model (Cleard et al., 2006; Lanzuolo et al., 2007).

More potential examples are found in noncoding RNA promoters, especially tRNA genes, where CTCF binding is often found. As yeast tRNA and 5SRNA are tethered to the nucleolus (Thompson et al., 2003), no reports are available on the subnuclear locations of metazoan tRNA and 5SRNA genes. Yet, vertebrate CTCF is known to tether DNA to the nucleolus (Yusufzai et al., 2004), the binding of CTCF to these genes suggests that CTCF may bring these genes to similar locations for early processing (Bertrand et al., 1998). This possibility will be tested in a following study.

Whether CTCF functions strictly as an insulator protein as a constituent of a domain boundary element, as a direct regulator of transcription, or as an organizer of large scale chromatin loops, it appears to play different roles from other insulator proteins such as Su(Hw), which, unlike CTCF, does not show a bias towards gene promoters. The functional differences, if proven to be true, would reveal the regulatory complexity of the 3-D organization of the genome.

Materials and methods

Antibody Production

The terminal 158 amino acid open reading frame of Drosophila CTCF (accession AAL78208) was cloned into a pDEST15 vector (Invitrogen) and expressed according to manufacturer’s specifications. Resulting pellets from 100 mL LB cultures were resuspended in 6 mL of STE buffer (10mM Tris, pH 8.0, 150 mM NaCl, 1mM EDTA), and incubated on ice for 15 minutes. DTT was added to a final concentration of 5mM, followed by 2 tablets of protease inhibitor cocktail (Roche Molecular Biochemicals), and 100 μg/mL of lysozyme. Sarkosyl was then added to a final concentration of 1.5%, before sonication and centrifugation (10,000 x g) for 5’ to clarify the lysate. Triton X-100 was then added to a final concentration of 2% before adding lysate to pre-equilibrated Glutathione Sepharose 4B resin according to manufacturer’s instructions (Amersham Biosciences). The column was washed repeatedly with ice-cold PBS (8.4 mM Na2HPO4, 1.9 mM NaH2PO4, pH 7.4, 150 mM NaCl). Protein was eluted with 1.5 mL of elution buffer (1mM EGTA, pH 8.0, 100 mM KCl, 50mM Tris-HCl, pH 8.0, 0.5 mM DTT, 20 mM reduced glutathione, 1 mM PMSF). Fractions were collected and assayed for protein molecular weight/purity. The eluted protein was then dialyzed against PBS and concentrated to 1 mg/mL using a spin column (Pierce). Antigens were either injected into pre-screened rabbits (Covance Research Products) or used to generate monoclonal antibodies (The Wistar Institute Hybridoma Facility).

Chromatin Immunoprecipitation

Chromatin was prepared as previously described(Breiling et al., 2004) with the following exceptions. Cell lysates (1 mL per tube) were sonicated using Sonic Dismemberator, Model 100 (Fisher Scientific) in the presence of 0.25 mL of glass beads). Microcentrifuge tubes were situated in circulating ice/water baths and sonicated according to the following conditions: 5 cycles X 30 seconds on, 45 seconds off. Following sonication, sample chromatin was diluted 4X in dilution buffer (10mM Tris-HCl, pH 8.0, 0.5 mM EGTA, 1% Triton X-100, 140 mM NaCl, 1mM PMSF, protease inhibitors leupeptin, aprotinin, pepstatin A, each at a final concentration of 2 μg/mL), distributed in 600 μL aliquots, and used directly for immunoprecipitation (or stored at −80°C). ChIP was performed by adding either 5 μL anti-CTCF serum, 5 μL pre-immune serum from the same rabbit, or mock antibody to prepared chromatin, pre-cleared with 20 μL of protein A agarose beads (Upstate biotechnology). Chromatin was further processed as described (Breiling et al., 2004). A pre-cleared aliquot with no addition of antibody was served as the input control, and was processed similarly.

Chromatin amplification

Processed ChIP samples (resuspended in 30 μL sterile water) or 20 ng of input DNA (in sterile water) were first treated to repair sheared ends using End-It DNA Repair Kit (Epicentre Biotechnologies) according to manufacturer’s instructions. Following heat inactivation of the end repair enzyme (70°C, 20 minutes), each reaction was divided into 2 X 25 μL samples, labeled A and B. To the A sample, the following adaptor was ligated: BamHI/SmaI (5’-GATCCCCCGGG-3’/3’-GGGGCCCp-5’). To the B sample, the following adaptor was ligated: BglII/NotI (5’-GATCTGCGGCCGC/3’-ACGCCGGCp-5’). Adaptors were ligated overnight at 16°C using 400 U T4 DNA ligase. The T4 DNA ligase was then heat inactivated at 70°C, 20 minutes, and samples A and B were then re-combined. DNA was purified using a Qiaquick PCR purification column (Qiagen), and eluted in 30 μL elution buffer. Purified DNA was then concatemerized by ligating fragments containing compatible (BamHI/BglII) overhangs for 6 hours at 37°C using T4 DNA ligase, after which the enzyme was inactivated at 70°C, 20 minutes. The DNA was then EtOH precipitated and dissolved in 1 μL of sterile, distilled water. DNA amplification was achieved using a GenomiPhi DNA amplification kit (Amersham Biosciences) and manufacturer’s protocol. Following amplification, DNA was digested with Not I to produce 200–500 bp fragments. DNA was then purified and assayed for quality according to specifications by NimbleGen Systems, Inc.

Insulator assay in transfected Drosophila S3 cells.

2XR, an RFP/GFP dual fluorescence vector, was a gift kindly supplied by Dr. Haini Cai from University of Georgia at Anthens. This vector was modified as follows: the metallothionein enhancer between DsRed and EGFP genes was replaced by an EcoRI/Not 1 double digest, with the 2xPE enhancer from the 2xPE-IAB5 construct (previously described). Different DNA elements were inserted at the Not 1 site between the 2xPE enhancer and EGFP gene to make RPLG or RPLDG constructs, respectively. S3 cells were cultured and transfected in HyQ-SFX insect medium (Hyclone) supplemented with 0.5x Pen-Strep. One ml of 5×105 cells/ml was seeded per well using a 12-well plate for each transfection. One μg of DNA construct and 2.5μl of Cellfectin (Invitrogen) were separately diluted into 100μl of media, shortly incubated for 10 minutes, then mixed and incubated for 45 minutes at room temperature. The cells were rinsed and subsequently loaded with DNA/Cellfectin mix and 800μl media, and incubated at 25°C for 5 hours. Finally, the transfection cocktail was replaced with 2 ml of media, and the fluorescence signal was assessed after 24 hours incubation at 25°C, using an inverted microscope. A 500bp CTCF sequence was amplified with T7 promoter-tailed primers (Fwd: T7-AGACTACGCCCAAGAAGCAA; Rev: T7-CTTGTCGGCATTCTCATCCT), then transcribed using an in vitro RNA transcription kit (Ambion MEGAscript T7 kit).

Electrophoretic Mobility Shift Assay (EMSA)

EMSAs were carried out using a LightShift Chemiluminescent EMSA Kit (Pierce), essentially according to manufacturer’s protocol. Full-length CTCF protein was produced in baculovirus (Wistar Protein Expression Facility) and used at concentrations of either 1μg or 3μg per reaction. Reactions were carried out in 20μL containing 50ng/μL Poly (dI-dC), 2.5% glycerol, 5mM MgCl2 0.05% NP-40, 1.5mM DTT, and 0.1 mM ZnSO4.

Bioinfomatic analyses

1. Comparison Statistics

1x, 2x, and 3x CTCF binding regions, exons, transcription start sites (TSS), other protein binding data and various genomic features were gathered from our experiments, FlyBase, and other publications and converted to genomic intervals (chrom, start, stop). These sets of genomic intervals were indexed in such a way as to allow rapid scale-sensitive range querying. Range queries are run when creating input for LWGV, a genome viewer that renders a web page given a description of the tracks (Figure 3). The indexing also allows histograms to be calculated quickly when moving a sliding window over the genome or regions of interest. Analysis tools were written to help explore relationships between data sets. One such tool, stack features, loops over all query features, e.g., all TSS sites, and searches for target features within a given window. Target features within the query windows are converted to intervals relative to the TSS. These target features are displayed as histograms using LWGV (Figure 4E).

The genomic coordinates of 5’ end, exon, intron, 3’ end and intergenic regions were calculated based on Drosophila genome 2006 assembly downloaded from UCSC (http://genome.ucsc.edu). We defined the 5’ end region as the region that covers the first exon and 500 bps upstream of the first exon. We defined the 3’ end region as the last exon of the transcript. We calculated the distribution of CTCF binding sites within each genomic region described above by counting the number of CTCF binding sites overlapping with each region.

Comparison statistics were obtained only when CTCF sites were bound to the examined region (i.e TSS, 3’, exon, intron, intergenic). Obtained statistics were as follows: number of CTCF sites overlapping with first exons; total number of non-overlapping 500bp windows in first-exon regions which overlap with CTCF sites; number of CTCF sites overlapping with introns; total number of non-overlapping 500bp windows in intronic regions which overlap with CTCF sites.

2. CTCF motif

The Drosophila CTCF binding motif was identified by examining the results of the ChIP-chip experiment performed using a Drosophila genome tiling array from NimbleGen. The NimbleGen analysis yielded a large number of potential CTCF binding sites (12,433 peaks). The sites were ranked by false discovery rate (FDR), and the genome sequences corresponding to those sites with an FDR equal to 0 (1061 peaks) were examined for conserved patterns to determine an initial pool of potential CTCF binding motifs. The sequence data were from the Release 4 (Apr. 2004, UCSC version dm2) assembly of the Drosophila genome (UCSD GoldenPath Server). The discriminating matrix enumerator (DME) algorithm by Smith et al. 2005 was used to identify a CTCF-binding site that best distinguishes the CTCF-binding sites from their adjacent control sequences (as described by (Kim et al., 2007; Smith et al., 2005). The Drosophila CTCF motifs spanning 8 to 15 base pairs were evaluated over all of the 12,433 sites using the motifclass program from the CREAD package (Smith et al., 2005; Smith et al., 2006). The top ranking motif with respect to relative error rate is the width = 11 motif. The eleven-residue Drosophila CTCF motif corresponds to a subset of the twenty-residue human CTCF sequence described recently, and is similar to a recent ChIP-chip study of the ADH and BXC regions (Holohan et al., 2007; Kim et al., 2007).

For all CTCF bound sites associated with Known TSS, 200 bp sequences around the CTCF sites were analyzed for CTCF motif search. Sequences were obtained using Drosophila 2006 assembly and nibFrag obtained from (http://www.soe.ucsc.edu~kent/src/unzipped/utils/nibFrag/). In the process of finding the CTCF motif, position weight matrix (PWM) obtained from http://www.plosgenetics.org/article/info:doi%2F10.1371%2Fjournal.pgen.0030112;jsessionid=383153B5DBDF7F4A132C36F13FB24737 was used with core cutoff 0.80 and matrix cutoff 0.65 [1]. Motif position with the highest core score within the 200 bp window was considered for calculating the distance between CTCF bound position and CTCF motif position, as well as the distance between CTCF bound TSS and the next closest upstream TSS.

3. Genome wide CTCF Density for each chromosome arm

For the purpose of checking the density distribution of CTCF sites along each chromosome arm, the following procedure was applied. The number of CTCF bound sites (score) falling within a non-overlapping 100,000 bp window was calculated for each chromosome arm. For each window, the middle of the window was taken as the position to report the score. Background random distribution was obtained using the same number of sites distributed randomly throughout each chromosome and calculating the score as in the real case. Subsequently, the same procedure was used to obtain the number of genes in each window in the corresponding chromosomes.

4. Distance distribution of CTCF associated/non-associated TSS to Closest upstream TSS

In the first instance, the closest distances between each CTCF bound TSS to the next upstream TSS were obtained. Secondly, with the exception of those CTCF involved TSS pairs, all other TSS pairs were obtained from the genome as non-CTCF associated cases. Finally, the same number of non-CTCF associated TSS pairs were obtained randomly from the above cases for comparison with the CTCF-associated cases.

5. Global CTCF sites distribution around Known TSS

For each known TSS site, a window of 2000 bps upstream and 3000 bps downstream was considered in order to obtain the CTCF-bound site distribution around known TSS. A 500 bp window with 50 bp sliding was used to obtain a CTCF score (a simple count of the number of CTCF sites within the 500 bp window). Reported score coordinates were those in the middle of each window. To get the average score for each window, all the scores were added for each corresponding window and divided by the number of cases considered.

Supplementary Material

01. Figure. 1S Enhancer blocking assay in S3 cells.

A. Reporter transgene construct for testing enhancer-blocking activity in cultured S3 cells.

B. Reporter construct without insulator inserted. Strong GFP and robust RFP are expressed, resulting in green to yellow fluorescent cells.

C. When Fab-8 is inserted, GFP expression is reduced, while RFP remains unchanged.

D. Similar to Fab-8, Fab-6 also selectively reduced GFP expression.

E. A 1kb DNA from Abd-Bm promoter containing strong CTCF binding is tested here, but this element does not block the PE enhancer.

F. Location of individual insulators in the Abd-B locus.

02. Figure. S2. Analyses of CTCF binding to promoter regions.

A. Promoter distance analyses for 2xCTCF signals. The positions of the closest neighboring TSS located upstream of gene promoters is analyzed for CTCF-bound promoter and non CTCF-bound promoters. The different numbers of such neighboring promoters are plotted according to the distance from the CTCF-bound promoter (hollow black rectangle). Promoters that do not have CTCF signals bound to them are represented by blue rectangle. When the two groups were superimposed, the first group clearly had more promoters located nearby.

B. Similar to Fig. 2S with 3xCTCF peaks.

03

Acknowledgments:

We thank Dale Dorsett and Ramana Davuluri for insightful discussions during the analyses. This work is supported in part by NIH grants GM 65391, NS33768 to J. Zhou, NCI Cancer Center grant the Wistar Institute and NIH training grant to S. Smith

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Bell AC, West AG, Felsenfeld G, 1999. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98, 387–96. [DOI] [PubMed] [Google Scholar]
  2. Bertrand E, Houser-Scott F, Kendall A, Singer RH, Engelke DR, 1998. Nucleolar localization of early tRNA processing. Genes Dev 12, 2463–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Breiling A, O’Neill LP, D’Eliseo D, Turner BM, Orlando V, 2004. Epigenome changes in active and inactive polycomb-group-controlled regions. EMBO Rep 5, 976–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Burton T, Liang B, Dibrov A, Amara F, 2002. Transforming growth factor-beta-induced transcription of the Alzheimer beta-amyloid precursor protein gene involves interaction between the CTCF-complex and Smads. Biochem Biophys Res Commun 295, 713–23. [DOI] [PubMed] [Google Scholar]
  5. Busturia A, Bienz M, 1993. Silencers in abdominal-B, a homeotic Drosophila gene. Embo J 12, 1415–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chernukhin I, Shamsuddin S, Kang SY, Bergstrom R, Kwon YW, Yu W, Whitehead J, Mukhopadhyay R, Docquier F, Farrar D, Morrison I, Vigneron M, Wu SY, Chiang CM, Loukinov D, Lobanenkov V, Ohlsson R, Klenova E, 2007. CTCF interacts with and recruits the largest subunit of RNA polymerase II to CTCF target sites genome-wide. Mol Cell Biol 27, 1631–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cleard F, Moshkin Y, Karch F, Maeda RK, 2006. Probing long-distance regulatory interactions in the Drosophila melanogaster bithorax complex using Dam identification. Nat Genet 38, 931–5. [DOI] [PubMed] [Google Scholar]
  8. Dorman ER, Bushey AM, Corces VG, 2007. The role of insulator elements in large-scale chromatin structure in interphase. Semin Cell Dev Biol 18, 682–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dorsett D, 1993. Distance-independent inactivation of an enhancer by the suppressor of Hairy-wing DNA-binding protein of Drosophila. Genetics 134, 1135–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Duncan I, 1987. The bithorax complex. Annu Rev Genet 21, 285–319. [DOI] [PubMed] [Google Scholar]
  11. Felsenfeld G, Burgess-Beusse B, Farrell C, Gaszner M, Ghirlando R, Huang S, Jin C, Litt M, Magdinier F, Mutskov V, Nakatani Y, Tagami H, West A, Yusufzai T, 2004. Chromatin boundaries and chromatin domains. Cold Spring Harb Symp Quant Biol 69, 245–50. [DOI] [PubMed] [Google Scholar]
  12. Fremion F, Darboux I, Diano M, Hipeau-Jacquotte R, Seeger MA, Piovant M, 2000. Amalgam is a ligand for the transmembrane receptor neurotactin and is required for neurotactin-mediated cell adhesion and axon fasciculation in Drosophila. Embo J 19, 4463–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gaszner M, Felsenfeld G, 2006. Insulators: exploiting transcriptional and epigenetic mechanisms. Nat Rev Genet 7, 703–13. [DOI] [PubMed] [Google Scholar]
  14. Gaszner M, Vazquez J, Schedl P, 1999. The Zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. Genes Dev 13, 2098–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gerasimova TI, Byrd K, Corces VG, 2000. A chromatin insulator determines the nuclear localization of DNA. Mol Cell 6, 1025–35. [DOI] [PubMed] [Google Scholar]
  16. Geyer PK, 1997. The role of insulator elements in defining domains of gene expression. Curr Opin Genet Dev 7, 242–8. [DOI] [PubMed] [Google Scholar]
  17. Geyer PK, Corces VG, 1992. DNA position-specific repression of transcription by a Drosophila zinc finger protein. Genes Dev 6, 1865–73. [DOI] [PubMed] [Google Scholar]
  18. Hagstrom K, Muller M, Schedl P, 1997. A Polycomb and GAGA dependent silencer adjoins the Fab-7 boundary in the Drosophila bithorax complex. Genetics 146, 1365–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hark AT, Schoenherr CJ, Katz DJ, Ingram RS, Levorse JM, Tilghman SM, 2000. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405, 486–9. [DOI] [PubMed] [Google Scholar]
  20. Holohan EE, Kwong C, Adryan B, Bartkuhn M, Herold M, Renkawitz R, Russell S, White R, 2007. CTCF genomic binding sites in Drosophila and the organisation of the bithorax complex. PLoS Genet 3, e112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Jiang J, Rushlow CA, Zhou Q, Small S, Levine M, 1992. Individual dorsal morphogen binding sites mediate activation and repression in the Drosophila embryo. Embo J 11, 3147–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kanduri C, Pant V, Loukinov D, Pugacheva E, Qi CF, Wolffe A, Ohlsson R, Lobanenkov VV, 2000. Functional association of CTCF with the insulator upstream of the H19 gene is parent of origin-specific and methylation-sensitive. Curr Biol 10, 853–6. [DOI] [PubMed] [Google Scholar]
  23. Karch F, Weiffenbach B, Peifer M, Bender W, Duncan I, Celniker S, Crosby M, Lewis EB, 1985. The abdominal region of the bithorax complex. Cell 43, 81–96. [DOI] [PubMed] [Google Scholar]
  24. Kim TH, Abdullaev ZK, Smith AD, Ching KA, Loukinov DI, Green RD, Zhang MQ, Lobanenkov VV, Ren B, 2007. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Klenova EM, Morse HC 3rd, Ohlsson R, Lobanenkov VV, 2002. The novel BORIS + CTCF gene family is uniquely involved in the epigenetics of normal biology and cancer. Semin Cancer Biol 12, 399–414. [DOI] [PubMed] [Google Scholar]
  26. Klenova EM, Nicolas RH, Paterson HF, Carne AF, Heath CM, Goodwin GH, Neiman PE, Lobanenkov VV, 1993. CTCF, a conserved nuclear factor required for optimal transcriptional activity of the chicken c-myc gene, is an 11-Zn-finger protein differentially expressed in multiple forms. Mol Cell Biol 13, 7612–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lanzuolo C, Roure V, Dekker J, Bantignies F, Orlando V, 2007. Polycomb response elements mediate the formation of chromosome higher-order structures in the bithorax complex. Nat Cell Biol 9, 1167–74. [DOI] [PubMed] [Google Scholar]
  28. Lawrence PA, 1988. Background to bicoid. Cell 54, 1–2. [DOI] [PubMed] [Google Scholar]
  29. Lewis EB, 1978. A gene complex controlling segmentation in Drosophila. Nature 276, 565–70. [DOI] [PubMed] [Google Scholar]
  30. Ling JQ, Li T, Hu JF, Vu TH, Chen HL, Qiu XW, Cherry AM, Hoffman R, 2006. CTCF mediates interchromosomal colocalization between Igf2/H19 and Wsb1/Nf1. Science 312, 269–72. [DOI] [PubMed] [Google Scholar]
  31. Lobanenkov VV, Nicolas RH, Adler VV, Paterson H, Klenova EM, Polotskaja AV, Goodwin GH, 1990. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5′-flanking sequence of the chicken c-myc gene. Oncogene 5, 1743–53. [PubMed] [Google Scholar]
  32. Maeda RK, Karch F, 2006. The ABC of the BX-C: the bithorax complex explained. Development 133, 1413–22. [DOI] [PubMed] [Google Scholar]
  33. Mihaly J, Barges S, Sipos L, Maeda R, Cleard F, Hogga I, Bender W, Gyurkovics H, Karch F, 2006. Dissecting the regulatory landscape of the Abd-B gene of the bithorax complex. Development 133, 2983–93. [DOI] [PubMed] [Google Scholar]
  34. Mihaly J, Hogga I, Barges S, Galloni M, Mishra RK, Hagstrom K, Muller M, Schedl P, Sipos L, Gausz J, Gyurkovics H, Karch F, 1998. Chromatin domain boundaries in the Bithorax complex. Cell Mol Life Sci 54, 60–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Moon H, Filippova G, Loukinov D, Pugacheva E, Chen Q, Smith ST, Munhall A, Grewe B, Bartkuhn M, Arnold R, Burke LJ, Renkawitz-Pohl R, Ohlsson R, Zhou J, Renkawitz R, Lobanenkov V, 2005. CTCF is conserved from Drosophila to humans and confers enhancer blocking of the Fab-8 insulator. EMBO Rep 6, 165–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Noma K, Cam HP, Maraia RJ, Grewal SI, 2006. A role for TFIIIC transcription factor complex in genome organization. Cell 125, 859–72. [DOI] [PubMed] [Google Scholar]
  37. Ohlsson R, Renkawitz R, Lobanenkov V, 2001. CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet 17, 520–7. [DOI] [PubMed] [Google Scholar]
  38. Parelho V, Hadjur S, Spivakov M, Leleu M, Sauer S, Gregson HC, Jarmuz A, Canzonetta C, Webster Z, Nesterova T, Cobb BS, Yokomori K, Dillon N, Aragon L, Fisher AG, Merkenschlager M, 2008. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–33. [DOI] [PubMed] [Google Scholar]
  39. Ramos E, Ghosh D, Baxter E, Corces VG, 2006. Genomic organization of gypsy chromatin insulators in Drosophila melanogaster. Genetics 172, 2337–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Renaud S, Loukinov D, Bosman FT, Lobanenkov V, Benhattar J, 2005. CTCF binds the proximal exonic region of hTERT and inhibits its transcription. Nucleic Acids Res 33, 6850–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Rubio ED, Reiss DJ, Welcsh PL, Disteche CM, Filippova GN, Baliga NS, Aebersold R, Ranish JA, Krumm A, 2008. CTCF physically links cohesin to chromatin. Proc Natl Acad Sci U S A 105, 8309–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Seeger MA, Haffley L, Kaufman TC, 1988. Characterization of amalgam: a member of the immunoglobulin superfamily from Drosophila. Cell 55, 589–600. [DOI] [PubMed] [Google Scholar]
  43. Smith AD, Sumazin P, Das D, Zhang MQ, 2005. Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics 21 Suppl 1, i403–12. [DOI] [PubMed] [Google Scholar]
  44. Smith AD, Sumazin P, Xuan Z, Zhang MQ, 2006. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc Natl Acad Sci U S A 103, 6275–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Stedman W, Kang H, Lin S, Kissil JL, Bartolomei MS, Lieberman PM, 2008. Cohesins localize with CTCF at the KSHV latency control region and at cellular c-myc and H19/Igf2 insulators. Embo J 27, 654–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Thompson M, Haeusler RA, Good PD, Engelke DR, 2003. Nucleolar clustering of dispersed tRNA genes. Science 302, 1399–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Torroja L, Chu H, Kotovsky I, White K, 1999. Neuronal overexpression of APPL, the Drosophila homologue of the amyloid precursor protein (APP), disrupts axonal transport. Curr Biol 9, 489–92. [DOI] [PubMed] [Google Scholar]
  48. Vostrov AA, Taheny MJ, Quitschke WW, 2002. A region to the N-terminal side of the CTCF zinc finger domain is essential for activating transcription from the amyloid precursor protein promoter. J Biol Chem 277, 1619–27. [DOI] [PubMed] [Google Scholar]
  49. Wallace JA, Felsenfeld G, 2007. We gather together: insulators and genome organization. Curr Opin Genet Dev 17, 400–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Wendt KS, Yoshida K, Itoh T, Bando M, Koch B, Schirghuber E, Tsutsumi S, Nagae G, Ishihara K, Mishiro T, Yahata K, Imamoto F, Aburatani H, Nakao M, Imamoto N, Maeshima K, Shirahige K, Peters JM, 2008. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801. [DOI] [PubMed] [Google Scholar]
  51. West AG, Gaszner M, Felsenfeld G, 2002. Insulators: many functions, many mechanisms. Genes Dev 16, 271–88. [DOI] [PubMed] [Google Scholar]
  52. West AG, Huang S, Gaszner M, Litt MD, Felsenfeld G, 2004. Recruitment of histone modifications by USF proteins at a vertebrate barrier element. Mol Cell 16, 453–63. [DOI] [PubMed] [Google Scholar]
  53. Yusufzai TM, Tagami H, Nakatani Y, Felsenfeld G, 2004. CTCF Tethers an Insulator to Subnuclear Sites, Suggesting Shared Insulator Mechanisms across Species. Mol Cell 13, 291–8. [DOI] [PubMed] [Google Scholar]
  54. Zhao K, Hart CM, Laemmli UK, 1995. Visualization of chromosomal domains with boundary element-associated factor BEAF-32. Cell 81, 879–89. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01. Figure. 1S Enhancer blocking assay in S3 cells.

A. Reporter transgene construct for testing enhancer-blocking activity in cultured S3 cells.

B. Reporter construct without insulator inserted. Strong GFP and robust RFP are expressed, resulting in green to yellow fluorescent cells.

C. When Fab-8 is inserted, GFP expression is reduced, while RFP remains unchanged.

D. Similar to Fab-8, Fab-6 also selectively reduced GFP expression.

E. A 1kb DNA from Abd-Bm promoter containing strong CTCF binding is tested here, but this element does not block the PE enhancer.

F. Location of individual insulators in the Abd-B locus.

02. Figure. S2. Analyses of CTCF binding to promoter regions.

A. Promoter distance analyses for 2xCTCF signals. The positions of the closest neighboring TSS located upstream of gene promoters is analyzed for CTCF-bound promoter and non CTCF-bound promoters. The different numbers of such neighboring promoters are plotted according to the distance from the CTCF-bound promoter (hollow black rectangle). Promoters that do not have CTCF signals bound to them are represented by blue rectangle. When the two groups were superimposed, the first group clearly had more promoters located nearby.

B. Similar to Fig. 2S with 3xCTCF peaks.

03

RESOURCES