Abstract
CTCF is the main architectural protein found in most of the examined bilaterian organisms. The cluster of the C2H2 zinc-finger domains involved in recognition of long DNA-binding motif is only part of the protein that is evolutionarily conserved, while the N-terminal domain (NTD) has different sequences. Here, we performed biophysical characterization of CTCF NTDs from various species representing all major phylogenetic clades of higher metazoans. With the exception of Drosophilides, the N-terminal domains of CTCFs show an unstructured organization and absence of folded regions in vitro. In contrast, NTDs of Drosophila melanogaster and virilis CTCFs contain unstructured folded regions that form tetramers and dimers correspondingly in vitro. Unexpectedly, most NTDs are able to self-associate in the yeast two-hybrid and co-immunoprecipitation assays. These results suggest that NTDs of CTCFs might contribute to the organization of CTCF-mediated long-distance interactions and chromosomal architecture.
Subject terms: Chromatin, Protein folding, Transcription
Introduction
Chromosomes in the genomes of all higher eukaryotes have a highly organized architecture and consist of discrete topologically associated domains (TADs)1–5. TADs often also include smaller domains (sub-TADs) that are flanked by short boundary elements or longer regions (inter-TADs) that contain active chromatin and housekeeping genes. In addition, promoters, enhancers, silencers and insulators form a network of specific distance interactions that properly regulate gene transcription6–9. Until now, the unresolved question remains how specific distance interactions between remote regulatory elements are established and maintained through the cell cycle10.
Currently, the best-characterized protein involved in the organization of chromosome architecture is CTCF, which was initially found as a transcriptional repressor11. It is believed that CTCF is the main architectural protein in mammals, which is responsible for the organization of TAD boundaries and distance interactions between enhancers and promoters12–16. CTCF was found in most of the higher eukaryotes including all studied bilateral organisms but is absent in yeast and plants17,18. Usually CTCFs from different organisms contain the cluster of eleven C2H2 zinc-finger domains (ZF) localized in the central part of the protein. In human CTCF, ZFs from 3 to 7 recognize specific 15 bp consensus19. The DNA-binding ZFs are the most evolutionary conserved among CTCFs that bind to similar sites in most higher eukaryotic genomes20. Moreover, it was found that even several chromatin domains controlled by CTCF are conserved in distant species21. Other ZF domains are usually less conserved and are involved in recognition of additional minor sequences22, interaction with specific RNAs23 or proteins24–27. The N- and C-termini of CTCF do not have structural domains and are not conserved in evolution28,29.
CTCF can support distance-selective interactions between its sites, suggesting that protein-protein interactions are possibly involved in organization and maintaining long-range chromatin interactions. However, dimerization domains have not been found in hCTCF. The N-terminal domain of hCTCF was shown to be intrinsically disordered30. The current model suggests that the movement of cohesin complexes along chromatin31 is blocked by chromatin-bound CTCF protein, which leads to the formation of chromatin loops between CTCF sites in interphase chromosomes2,32. In vitro studies have shown that the cohesin complex interacts with the C-terminal domain of hCTCF31. The Drosophila CTCF homolog, dCTCF, is often associated with TAD boundaries and insulators33,34. dCTCF supports distance interactions between the GAL4 activator and the white gene reporter in model transgenic lines35,36. A novel multimerization domain was described within Drosophila CTCF (dCTCF) protein37. Deletion of this domain strongly affects the activity of dCTCF.
The existence of the N-terminal dimerization domain in Drosophila melanogaster raised the question about the structure of the N-terminal domain in CTCF from other bilaterian organisms. We found that CTCF in Drosophila virilis (dvCTCF) also has the dimerization domain. However, in other selected organisms from different bilaterian clades, the N-terminal CTCF domains are intrinsically disordered and unable to form dimers in vitro. Unexpectedly, the N-terminal domains from CTCF of human and several other organisms showed self-interaction in the yeast two-hybrid (Y2H) and co-immunoprecipitation assays.
Results
Drosophila CTCF N-terminal multimerization domains display stable fold in solution, but lack of secondary structure
Earlier, we described the N-terminal multimerization module between 70 and 163 aa of CTCF protein from Drosophila melanogaster (dmCTCF), which is essential for functional activity of the dmCTCF protein, but the first 70 residues contribute to its stability and together with 70–163 aa most likely are parts of the entire protein domain37. In Drosophila genus, alignment of the N-terminal regions of CTCFs showed a moderate level of homology with a few conserved sequence blocks in the interval of 1–163 aa according to dmCTCF sequence (Fig. S1). A plausible hypothesis is that N-terminal domains of CTCFs from different Drosophila species have a similar organization and dimerization activity. To test this possibility, we selected for further study the N-terminal domain (1–144 aa) of CTCF from Drosophila virilis (dvCTCF), which has the comparatively low sequence (49%) homology to dmCTCF in 1–163 interval (Fig. S1).
The N-terminal domains (NTDs) from dvCTCF (1–144 aa, 16 kDa) and dmCTCF (1–163 aa, 18 kDa) were expressed in bacteria and tested for dimerization using size-exclusion chromatography (SEC) (Fig. 1a) and cross-linking experiment (Fig. 1b). SEC showed that both NTDs have larger size than calculated for monomeric and even dimeric globular protein of that molecular weight (Fig. 1a). As was shown for dmCTCF NTD37, the cross-linking with glutaraldehyde shows that dvCTCF NTD forms dimers (Fig. 1b). Because the values obtained in SEC still are larger than those calculated for dimeric NTDs, they can either form higher-order assemblies that somehow do not cross-link, probably because of the lack of neighboring lysines, or they have unfolded regions that contribute to an increase of the size and shape of the molecule. To study secondary structure, we obtained circular-dichroism (CD) spectrum for dmCTCF-NTD, which revealed a lack of alpha-helices and beta-sheets (Fig. 1c). This observation agrees with secondary structure prediction algorithms that evaluate Drosophila CTCF NTDs as disordered protein domains. It is much more likely they are partially unfolded, therefore resulting in the heavier appearance of these polypeptides in SEC. Interestingly, dimer formation presumes the existence of a stable fold, which these polypeptides should adopt without typical secondary structure elements.
To further assess the oligomeric state and check the monodispersity of the purified dmCTCF NTD sample, we used Dynamic Light Scattering (DLS). The size distribution of both samples contained only one narrow peak. Estimated hydrodynamic radius (Rh) value varies in range 4.4–4.6 nm and corresponds to molecular weights of 110 kDa. However, DLS calculations of molecular weight as well as SEC are highly sensitive to the shape of the molecule, which often leads to the overestimation of protein molecular weight, so the value of 110 kDa corresponding to hexamer could result from multimerization as well as from the presence of unfolded regions.
To determine the correct oligomerization status of NTDs, we used a Small-Angle X-ray Scattering (SAXS) approach. Calculated molecular weights of dmCTCF NTD were in the range of (71–83) kDa corresponding to tetramer (monomer Mw is 18 kDa) in agreement with SEC data rather than with cross-linking experiments (Table 1). For dvCTCF NTD, the calculated molecular weight was in the range of 29–35 kDa, corresponding to dimer (monomer Mw is 16 kDa). Several possible low-resolution models were built based on scattering profiles (Fig. 1d). Two-fold symmetry of the model suggests that tetramer is assembled from two dimers consisting of tightly bound monomers that effectively cross-link to each other. The elongated shape of the tetramer explains a heavier molecular weight of 100 kDa, roughly calculated from the SEC profile (Fig. 1d). Model of dimeric dvNTD has a smaller volume, in accordance with a lower molecular weight of assemblies, and roughly resembles the dimeric part of the dmCTCF NTD (Fig. 1d). Kratky plot of the SAXS profile shows that both Drosophila NTDs are folded at least partially (Fig. 1e). Because the CD experiment does not show any secondary structure elements in dmCTCF-NTD, it seems likely that both NTDs have unusual spatial fold lacking secondary structure elements. From the overall shape of the Kratky plot, we can conclude that Drosophila CTCF NTDs has overall globular structure formed by unfolded regions, which explains why the SEC profile is heavier than could be expected for globular dmNTD tetramer or dvNTD dimer.
Table 1.
CTCF-NTD | MW of the monomer, kDa | Estimated MW in solution, kDa |
---|---|---|
dmCTCF | 18.7 | 71.0–83.0 |
dvCTCF | 16.3 | 29.0–35.0 |
dpCTCF | 25.1 | 33.0–41.0 |
amCTCF | 26.1 | 25.0–29.0 |
cgCTCF | 19.9 | 27.5–33.5 |
spCTCF | 35.5 | 49.0–54.0 |
skCTCF | 44.2 | 52.0–62.0 |
ciCTCF | 31.6 | 37.0–47.0 |
hsCTCF | 31.4 | 45.0–51.0 |
Scattering parameters for the N-terminal domains of CTCF from various species are shown in Table S1.
To provide further insight into structural features underlying their multimerization we studied CTCF NTDs from two Drosophila species (D. virilis and D. melanogaster) using 2D NMR spectroscopy. The 15N,1H HSQC spectra for 15N-labelled dmCTCF NTD and dvCTCF NTD were found to have similar features for both proteins (Fig. S3). The spectra undoubtedly indicate some signals typical for folded proteins. These signals exhibit significant line broadening, which is not fully eliminated by increasing the temperature to 50 °C (Fig. S4). Even at this temperature, there are significant differences in the signal line widths of the residues located in structured and unstructured parts. Such behavior is typical for large structured proteins due to their slow tumbling. Putting it all together we can conclude that both dmCTCF and dvCTCF NTDs have a similar structural organization with a structured protein core, but at least 2/3 of the protein chains represent an unstructured coil.
N-terminal domains of CTCF show unstructured nature in all tested organisms from all major phylogenetic groups of higher metazoans
Since the cluster of zinc-finger domains of CTCF proteins is the only domain (Fig. 2a) that exhibits the high level of conservation within higher metazoans17, we asked whether their NTDs could display multimerization activity like Drosophila NTDs despite a lack of evolutionary conservation. To answer this question, we cloned the NTDs of CTCFs from well-characterized representatives with the known genomes of diverse phylogenetic groups of higher metazoans (Fig. 2b). The functional role of CTCF in most of these groups has not been characterized yet. D. pulex (dpCTCF, water flea) and A. mellifera (amCTCF, European honey bee) are Arthropods, both belonging to Ecdysozoa phylum of Protostomia. C. gigas (cgCTCF) belongs to mollusks, which also are Protostomes, together with annelids comprising the Lophotrochozoa phylum (Fig. 2b). A Deuterostomia superphylum is comprised of three phyla — Chordata, Hemichordata and Echinodermata (Fig. 2b). S. purpuratus (spCTCF) is a representative of Echinodermata. The role of spCTCF in the establishment of TAD borders was shown earlier21. S. kowalewski (skCTCF) is the marine invertebrate, a representative of Hemichordata, being close to basal Chordates. This organism also displays signs of reduction38. C. intestinalis is lower Chordata. C. intestinalis (ciCTCF) genome was sequenced in 2002 and despite being about 1/20 of the human genome by size, it contains an almost complete set of genes found in vertebrates, although many organs were reduced or secondary lost39. In vertebrates, CTCF proteins are described as key organisers of chromosomal architecture. Consistent with the important role in transcription regulation, vertebrata CTCFs have high homology in all characterized representatives (Fig. S2). We cloned the NTDs of CTCFs from human (hsCTCF) and zebrafish (drCTCF), which display the maximum difference in amino acid sequences between vertebrates. Despite sequence differences, human CTCF was recently found to be able to rescue zebrafish CTCF knockout, which otherwise is lethal40. We did not find CTCF homologs in Radiata (Cnidaria and Ctenophora), basal metazoans — Porifera and Placozoa. Emergence of CTCF protein is often associated with origin of Bilaterian metazoans, but CTCF homologs were not found in flatworms, presumably due to the secondary loss. Also, CTCF is absent in several clades of nematodes29.
Bioinformatic analysis of selected domains using a PredictProtein algorithm41 revealed that all of them are predicted to be mostly disordered. For subsequent biochemical and biophysical analysis, CTCF NTDs were expressed in E.coli. Unfortunately, we were unable to express in bacteria a sufficient amount of drCTCF NTD. Values measured by SEC for all proteins appeared larger than could be expected for monomeric globular form and close to expected for unfolded proteins (Fig. 2c). Chemical cross-linking revealed no multimer formation (Fig. 2d), suggesting that domains are possibly intrinsically disordered, in agreement with previous studies of N-terminal domain from human CTCF protein30.
The SAXS technique was applied to provide further information about the structure of CTCF NTDs. We summarize the results of SAXS data analyses in Table 1. Despite the lack of stable fold in solution and absence of multimers revealed by cross-linking experiments, several proteins demonstrate heavier estimated weight in SAXS experiments. Analysis of SAXS data from spCTCF and hsCTCF NTDs suggests possible aggregation of molecules; however, these assemblies have stable size. At the same time, molecular weight calculated from SAXS data is only about 1.5 times higher than expected from the amino acid sequence, which can be explained by the fact that molecules are intrinsically disordered. It has been shown that human CTCF NTD is monomeric in solution30. Our cross-linking experiments also did not reveal high-molecular weight product (possibly because of the lack of neighbouring lysines), but SAXS data (reproduced in two measurements of independent protein preparations) suggest that assemblies with a larger volume can form under several conditions. Chemical cross-linking with glutaraldehyde and EGS along with size-exclusion chromatography were used to test possible change in oligomerization status of hsCTCF NTD induced by concentration to 10 mg/ml and freeze-thaw cycles, but still, we did not observe any detectable presence of hsCTCF-NTD multimers. SAXS is extremely sensitive to the presence of high-molecular-weight particles, so most likely, these observations could be attributed to small amounts of aggregates in samples. NTDs of D. pulex and S. purpuratus also have slightly larger molecular weight than calculated for monomer, but both are unstructured (as can be seen from the Kratky plot (Fig. 2e)). For all NTDs, Dmax (maximum linear size of particles) was several times higher than Rg (averaged distance to all atoms from the center of mass of the molecule), suggesting the elongated shape of particles. Analysis of SAXS data using the Kratky plot (Fig. 2e) revealed a bell-shaped curve only for Ciona and Drosophila (Fig. 1e) NTDs, showing that these polypeptides are at least partially folded, but other proteins had the logarithmic shape of the plot that is rather appropriate for disordered protein chains, which explains their heavier appearance on SEC profile.
Thus, Drosophila CTCF NTDs have the unique ability to form multimers in vitro among metazoans, even in contrast to related Apis mellifera. They adopt an unusual fold with the absence of secondary structure elements.
Testing dimerization of the N-terminal domains of CTCFs in heterologous in vivo systems
Lack of the homodimerization ability of CTCF NTDs in experiments in vitro does not exclude this property of NTDs in vivo. To test this possibility, we used two different approaches. The first was a yeast two-hybrid assay (Y2H). Sequences encoding the NTDs were fused in-frame to the yeast GAL4 DNA-binding domain (BD) and activation domain (AD). Because steric hindrance can interfere with transcriptional activation in the two-hybrid system, the NTD sequences were placed at both the N-terminus (NTD-AD and NTD-BD) and the C-terminus (AD-NTD and BD-NTD) of the fusion protein.
For dmNTD, we had previously found that a positive result was observed in only one configuration, dmNTD-BD and AD-dmNTD (Bonchuk et al.37). Here we confirmed this observation for the dm-NTD and found that other NTDs also were able to interact only in one of four tested configurations (Table S2). The most of the tested NTDs (am, cg, sp, ci, dp and h) demonstrated the pairing ability (Fig. 3). We did not observe the interaction between the skNTDs. We also failed to test dimerization of drNTD due to strong self-activation induced by the NTD sequence fused with the BD.
To confirm the Y2H results by independent assay, we analysed co-immunoprecipitation of 3 × FLAG and 3 × HA-tagged NTDs in transfected S2 cells (Fig. 3). Each NTD was fused with either 3 × FLAG or 3 × HA epitope and co-transfected into S2 cells. After immunoprecipitation with HA-Sepharose, the bands corresponding to homodimers of NTDs were detectable for all NTDs (strong signal for dm, dv, am, sp, ci, hs and weak for dp, cg, dr), the exception being skCTCF. At the same time, in the reverse experiment with FLAG-Sepharose, we observed homodimer bands only for part of NTDs (strong signal for dm, sp and weak for am, ci, dr). Such an unstable result can be explained by some steric difficulties in immunoprecipitation of proteins. Taken together, the results of Y2H and co-immunoprecipitaion assay show that the NTDs of CTCFs from different organisms are capable of homodimerization. Only skNTD did not show the ability to form dimers in both used approaches.
Discussion
The CTCF belongs to transcription factors with an arranged array of C2H2 domains. In contrast to TFs of other classes, C2H2 proteins typically bind to 12–20 bp sequences42–44. The C2H2 domains of CTCF are most conserved among this class of the proteins, suggesting the model that CTCF is the ancestral protein from which other C2H2 proteins originated during evolution. According to a hypothesis, CTCF appeared in evolution when long-distance interactions between regulatory elements had emerged in transcription regulation21,45. It seems likely that many other C2H2 proteins originating from CTCF are also involved in the organization of chromosomal architecture. Some of these proteins were discovered in Drosophila initially as insulator proteins Su(Hw), Zw5, Pita and Zipic46–48.
Many C2H2 proteins have N-terminal homodimerization domains. In arthropods and vertebrates49–51, expansion of different domains was observed: ZAD and SCAN, respectively, which exhibit the ability to predominantly form homodimers. It was demonstrated that homodimerization ZAD from three C2H2 proteins (Pita, ZIPIC and Zw5) determines the specificity of long-range interactions52. C2H2 proteins can also have other types of multimerization domains. For example, the C2H2 protein Opbp has the N-terminal C2H2 domain that can form homodimers and is involved in distance interactions53. It was recently shown that YY1 participates in enhancer-promoter interactions by forming oligomers54. Interestingly, YY1 contains 3 C2H2 domains at the C-terminus that are involved at the same time in oligomerization and DNA binding55. Another protein, LDB1, the Lim domain binding 1 protein, contains a dimerization domain that plays an important role in enhancer-promoter interactions in various developmental pathways56–58.
Here, we found that non-conserved N-terminal domains of CTCFs in all tested metazoan are intrinsically unstructured in vitro, but in most cases, they show potency to self-association in vivo. Only in the case of CTCF isolated from acorn worm (Saccoglossus kowalewski) did we not observe homodimerization of N-terminal domain in vivo. Thus, most N-terminal CTCF domains keep structural and functional properties despite the non-conservation of sequences during evolution. Exceptions are CTCFs from Drosophila melanogaster and virilis (Drosophilids). Those N-terminal domains are folded in vitro despite the lack of secondary structure elements. It seems likely that such domain organization was adopted in Drosophilids, as the N-terminal domain of CTCF in honey bee is intrinsically disordered. Even in Drosophilids, the structure of N-terminal domains varies between tested species: N-terminal domain of Drosophila melanogaster forms tetramer, but N-terminal domain of Drosophila virilis forms only dimer.
The crucial role of CTCF in supporting specific distance interaction in mammalians might suggest the ability of CTCF to homodimerize. It was shown that hCTCF can dimerize by purification of FLAG-HA-tagged CTCF complex and in the yeast two-hybrid assay59. However, attempts to find the dimerization domain in hCTCF that can support specific distance interactions have thus far been unsuccessful. It was shown in pulldown experiments that the C-terminal part of one CTCF binds to the C2H2 zinc-finger domains of another CTCF60, but the specificity of this interaction has not been proven. It was also found that some RNAs can interact with 10 and 11 ZF and induce oligomerization of the CTCF protein61. Because many C2H2 domains can with relatively low specificity interact with RNAs62–64, the involvement of RNAs in protein multimerization does not explain how CTCF can support specific distance interactions.
Unstructured N-terminal regions of CTCFs are a good candidate for the role of a domain that supports specific distance interactions between CTCF sites. The strength of pairing between unfolded NTDs can be easily regulated by various post-translational modifications of amino acid residues, which are crucial for effective stimulation/repression of enhancer-promoter interactions. The NTDs in CTCFs lack secondary structure and sequence similarity, therefore, making it impossible to identify such domains using bioinformatics approaches. Thus, there is a probability that unstructured domains are widely distributed at the N-terminal ends of C2H2 proteins, which, however, can only be verified experimentally. Further studies are required to understand the role of the N-terminal domains in the organization and regulation of distance interactions mediated by CTCFs.
Materials and Methods
Plasmid construction
CTCF homologues were identified using BLAST search by similarity with zinc-finger domain of Drosophila and human CTCF proteins. For protein purification purposes, protein fragments were PCR-amplified using corresponding primers (see Table S3) and subcloned into modified pET32a(+) vector (Merck Biosciences) in-frame with TEV-cleavable Thioredoxin-6xHis-tag. Adult bees (Apis mellifera) were obtained from a local apiary, oysters (Crassostrea gigas) were purchased at a local food store, and Daphnia pulex culture was purchased at a pet shop. RNA was isolated using TRIzol reagent, and cDNA was obtained with reverse transcription with oligo(dT) primer following standard protocols. For other cDNAs sources, see the acknowledgements.
Protein expression and purification, size-exclusion chromatography and chemical cross-linking
Protein expression and purification were performed using standard procedures, as described previously37. Stable isotope-labelled proteins were expressed according to65 and purified using the same procedure as for native proteins. Size-exclusion chromatography was performed as described37 using Superdex 200 10/300GL columns (GE Healthcare). Expected Rs values for globular and unfolded proteins were calculated as described66. Chemical cross-linking of proteins was carried out with glutaraldehyde as described previously37.
Circular dichroism
Circular-dichroism measurements were performed using Chirascan instrument (Applied Photophysics, UK). The instrument was calibrated using Camphor-10-sulfonic acid, according to67. Measurements were made in 0.1 cm isolated cuvette at sample concentration of 0.05 mg/ml at 20 °C. Sample concentration was calculated from peptide-bond extinction values at 205, 206, 210 и 215 nm68.
SAXS measurements and data processing
Synchrotron radiation X-ray scattering data were collected using standard procedures on the BM29 BioSAXS beamline at the ESRF (Grenoble, France) at a wavelength of 0.099 nm. The 2D detector Pilatus1M and sample to detector distance 2.87 m were used to acquire scattering data within the momentum transfer (s) covering a range of 0.033–4.9 nm-1 (s = 4πSinθ/λ where 2θ is the scattering angle). Data collection and processing were performed in an automated manner using dedicated beamline software BsxCuBE. The samples were measured at least at two concentrations. A volume of 30 μl of sample solution was placed in a 1.8-mm-diameter quartz capillary with a few tens of microns wall thickness. Thirty consecutive frames with 1 s exposure were collected from the sample at constant temperature 277 K without observing any radiation damage (characterized by systematic deviations in consecutive scattering curves). Solvent scattering was measured to allow for subtraction of the background scattering. The data from consecutive frames were inspected, normalized to the incident beam intensity and averaged in PRIMUS69. Data processing and analysis were done with the ATSAS program suite for small angle scattering from biological molecules70. The subtraction of the buffer scattering was done manually by program subtrNc. The radius of gyration Rg of protein molecule in solution was evaluated using the Guinier approximation at small angles (s < 1.3/Rg), assuming the intensity I(s) to be equal to I0 exp(−(sRg)2/3). To evaluate the maximum particle dimension Dmax, the pair-distance distribution function P(r) was generated with the program GNOM so that the Rg value of protein samples had to agree with that determined from the Guinier region in PRIMUS. The molecular mass (MM) of the protein was calculated using extrapolated I0 scattering intensity and protein standards of known Mw as described71. Low-resolution ab initio structure models of CTCF(1–163aa) protein representing the protein as an ensemble of dummy atoms were constructed by program DAMMIN. The program was used to build a compact configuration of beads inside a sphere of Dmax diameter with χ = 1.05 minimal discrepancy between intensity of experimental data and that calculated from generated model.
Dynamic light scattering (DLS)
Dynamic light scattering (DLS) measurements were performed using an instrumentation of Dyna Pro Titan (Wyatt Technology Corporation). Light scattering analysis was performed using a laser wavelength of 832 nm, quartz cuvette of 20 µl volume, temperature controlled DynaPro instrument at 4 °C and Dynamics software. The protein samples were analysed in 20 mM TrisHCl buffer (pH 7.4), 200 mM NaCl, containing 1 mM β-mercaptoethanol and 10% (w/v) glycerol. The protein was concentrated up to 1 and 7 mg/ml and filtered prior to the measurements. Sequences of 10 sample acquisitions with 1 s time duration were collected at each concentration. The value of the solution viscosity was chosen out from the corresponding table of the instrument. The hydrodynamic radius (Rh) was evaluated by Stokes-Einstein equation from the autocorrelation function of DLS measurements following standard procedures. The average MM was estimated using default Mark-Houwink parameters for a hard sphere.
NMR spectroscopy
The NMR samples in concentrations of 0.2–0.5 mM for 15N-labelled dmCTCF and dvCTCF were prepared in 95% H2O/5% D2O, 20 mM NaCl, 20 mM sodium phosphate buffer (pH 7.0 or 6.5), and 0.02% NaN3. All spectra were recorded on Bruker AVANCE 600 MHz spectrometer (Moscow State University). For 2D NMR the SOFAST HMQC pulse program was used72. The acquired data were processed using NMRPipe73, and analyzed using NMRFAM-Sparky software74.
Yeast two-hybrid assay
Yeast two-hybrid assay was carried out using yeast strain pJ69-4A (MATa trp1-901 leu2-3,112 ura3-52 his3-200 gal4Δ gal80Δ GAL2-ADE2 LYS2::GAL1-HIS3 met2::GAL7-lacZ), as described previously52.
Co-immunoprecipitation assay
Protein extracts were prepared from S2 cells cotransfected by 3 × FLAG- and 3 × HA-fused plasmids with MACSfectin (Miltenyi Biotec). Coimmunoprecipitaion assay was described previously52.
Supplementary information
Acknowledgements
We are grateful to Dr. Alexander Kuklin (Joint Institute of Nuclear Research, Dubna) for help in SAXS data collection and to Dr. Vladimir Shubin (A. N. Bach Institute of Biochemistry RAS) for CD measurements and calculation. Strongylocentrotus purpuratus cDNA was the generous gift of Dr. Maria Arnone (Stazione Zoologica Anton Dohrn, Napoli). Saccoglossus kowalewski cDNA was kindly provided by Prof. John Gerhart (University of California, Berkeley). Ciona intestinalis cDNA was a gift of Dr. Erin Newman-Smith (University of California, Santa Barbara). IGB RAS facilities are supported by the Ministry of Science and Education of the Russian Federation. In vivo experiments were supported by the Russian Science Foundation, project no. 19-74-30026 (to P.G.). In vitro experiments were supported by grant 075-15-2019-1661 from the Ministry of Science and Higher Education of the Russian Federation. The NMR study was supported by Russian Science Foundation, project no. 19-14-00115. Funding for open access charge: Russian Science Foundation.
Author contributions
A.B., V.O.P., S.M., O.M., P.G. designed experiments. A.B., S.K., S.M., K.M.B., O.M. performed experiments. A.B., O.M., P.G. wrote the main manuscript text. A.B. and O.M. prepared figures. All authors reviewed the manuscript.
Data availability
All data generated or analysed during this study are included in this published article and its Supplementary Information files.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Oksana Maksimenko, Email: maksog@mail.ru.
Pavel Georgiev, Email: georgiev_p@mail.ru.
Supplementary information
is available for this paper at 10.1038/s41598-020-59459-5.
References
- 1.Dekker J, Mirny L. The 3D Genome as Moderator of Chromosomal Communication. Cell. 2016;164:1110–1121. doi: 10.1016/j.cell.2016.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Merkenschlager M, Nora EP. CTCF and Cohesin in Genome Folding and Transcriptional Gene Regulation. Annu. Rev. Genomics Hum. Genet. 2016;17:17–43. doi: 10.1146/annurev-genom-083115-022339. [DOI] [PubMed] [Google Scholar]
- 3.Acemel Rafael D., Maeso Ignacio, Gómez-Skarmeta José Luis. Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. Wiley Interdisciplinary Reviews: Developmental Biology. 2017;6(3):e265. doi: 10.1002/wdev.265. [DOI] [PubMed] [Google Scholar]
- 4.Chetverina Darya, Fujioka Miki, Erokhin Maksim, Georgiev Pavel, Jaynes James B., Schedl Paul. Boundaries of loop domains (insulators): Determinants of chromosome form and function in multicellular eukaryotes. BioEssays. 2017;39(3):1600233. doi: 10.1002/bies.201600233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ramirez F, et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9:189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chetverina D, Aoki T, Erokhin M, Georgiev P, Schedl P. Making connections: Insulators organize eukaryotic chromosomes into independent cis-regulatory networks. BioEssays: N. Rev. molecular, Cell. developmental Biol. 2014;36:163–172. doi: 10.1002/bies.201300125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ghavi-Helm Y, et al. Enhancer loops appear stable during development and are associated with paused polymerase. Nat. 2014;512:96–100. doi: 10.1038/nature13417. [DOI] [PubMed] [Google Scholar]
- 8.Matzat Leah H., Lei Elissa P. Surviving an identity crisis: A revised view of chromatin insulators in the genomics era. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 2014;1839(3):203–214. doi: 10.1016/j.bbagrm.2013.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hnisz D, Day DS, Young RA. Insulated Neighborhoods: Structural and Functional Units of Mammalian Gene Control. Cell. 2016;167:1188–1200. doi: 10.1016/j.cell.2016.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Maksimenko, O. & Georgiev, P. Mechanisms and proteins involved in long-distance interactions. Frontiers in Genetics5, 10.3389/fgene.2014.00028 (2014). [DOI] [PMC free article] [PubMed]
- 11.Lobanenkov VV, et al. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5′-flanking sequence of the chicken c-myc gene. Oncogene. 1990;5:1743–1753. [PubMed] [Google Scholar]
- 12.Merkenschlager M, Odom DT. CTCF and cohesin: linking gene regulatory elements with their targets. Cell. 2013;152:1285–1297. doi: 10.1016/j.cell.2013.02.029. [DOI] [PubMed] [Google Scholar]
- 13.Ghirlando R, Felsenfeld G. CTCF: making the right connections. Genes. Dev. 2016;30:881–891. doi: 10.1101/gad.277863.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hanssen LLP, et al. Tissue-specific CTCF-cohesin-mediated chromatin architecture delimits enhancer interactions and function in vivo. Nat. Cell Biol. 2017;19:952–961. doi: 10.1038/ncb3573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Willi M, et al. Facultative CTCF sites moderate mammary super-enhancer activity and regulate juxtaposed gene in non-mammary cells. Nat. Commun. 2017;8:16069. doi: 10.1038/ncomms16069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lee HK, et al. Functional assessment of CTCF sites at cytokine-sensing mammary enhancers using CRISPR/Cas9 gene editing in mice. Nucleic Acids Res. 2017;45:4606–4618. doi: 10.1093/nar/gkx185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Heger P, Marin B, Bartkuhn M, Schierenberg E, Wiehe T. The chromatin insulator CTCF and the emergence of metazoan diversity. Proc. Natl Acad. Sci. U S Am. 2012;109:17507–17512. doi: 10.1073/pnas.1111941109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schoborg T, Labrador M. Expanding the roles of chromatin insulators in nuclear architecture, chromatin organization and genome function. Cell. Mol. life sciences: CMLS. 2014;71:4089–4113. doi: 10.1007/s00018-014-1672-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hashimoto H, et al. Structural Basis for the Versatile and Methylation-Dependent Binding of CTCF to DNA. Mol. Cell. 2017;66:711–720 e713. doi: 10.1016/j.molcel.2017.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Schmidt D, et al. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell. 2012;148:335–348. doi: 10.1016/j.cell.2011.11.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gomez-Marin C, et al. Evolutionary comparison reveals that diverging CTCF sites are signatures of ancestral topological associating domains borders. Proc. Natl Acad. Sci. U S Am. 2015;112:7542–7547. doi: 10.1073/pnas.1505463112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nakahashi H, et al. A genome-wide map of CTCF multivalency redefines the CTCF code. Cell Rep. 2013;3:1678–1689. doi: 10.1016/j.celrep.2013.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kung JT, et al. Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol. Cell. 2015;57:361–375. doi: 10.1016/j.molcel.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Klenova E, et al. YB-1 and CTCF differentially regulate the 5-HTT polymorphic intron 2 enhancer which predisposes to a variety of neurological disorders. J. neuroscience: Off. J. Soc. Neurosci. 2004;24:5966–5973. doi: 10.1523/JNEUROSCI.1150-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ishihara K, Oshimura M, Nakao M. CTCF-dependent chromatin insulator is linked to epigenetic remodeling. Mol. Cell. 2006;23:733–742. doi: 10.1016/j.molcel.2006.08.008. [DOI] [PubMed] [Google Scholar]
- 26.Donohoe ME, Zhang LF, Xu N, Shi Y, Lee JT. Identification of a Ctcf cofactor, Yy1, for the X chromosome binary switch. Mol. Cell. 2007;25:43–56. doi: 10.1016/j.molcel.2006.11.017. [DOI] [PubMed] [Google Scholar]
- 27.Donohoe ME, Silva SS, Pinter SF, Xu N, Lee JT. The pluripotency factor Oct4 interacts with Ctcf and also controls X-chromosome pairing and counting. Nat. 2009;460:128–132. doi: 10.1038/nature08098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Moon H, et al. CTCF is conserved from Drosophila to humans and confers enhancer blocking of the Fab-8 insulator. EMBO Rep. 2005;6:165–170. doi: 10.1038/sj.embor.7400334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Heger P, Marin B, Schierenberg E. Loss of the insulator protein CTCF during nematode evolution. BMC Mol. Biol. 2009;10:84. doi: 10.1186/1471-2199-10-84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Martinez SR, Miranda JL. CTCF terminal segments are unstructured. Protein Sci. 2010;19:1110–1116. doi: 10.1002/pro.367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Xiao T, Wallace J, Felsenfeld G. Specific sites in the C terminus of CTCF interact with the SA2 subunit of the cohesin complex and are required for cohesin-dependent insulation activity. Mol. Cell Biol. 2011;31:2174–2183. doi: 10.1128/MCB.05093-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Vietri Rudan M, et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 2015;10:1297–1309. doi: 10.1016/j.celrep.2015.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schwartz YB, et al. Nature and function of insulator protein binding sites in the Drosophila genome. Genome Res. 2012;22:2188–2198. doi: 10.1101/gr.138156.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sexton T, et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012;148:458–472. doi: 10.1016/j.cell.2012.01.010. [DOI] [PubMed] [Google Scholar]
- 35.Kyrchanova O, Chetverina D, Maksimenko O, Kullyev A, Georgiev P. Orientation-dependent interaction between Drosophila insulators is a property of this class of regulatory elements. Nucleic Acids Res. 2008;36:7019–7028. doi: 10.1093/nar/gkn781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kyrchanova O, et al. Selective interactions of boundaries with upstream region of Abd-B promoter in Drosophila bithorax complex and role of dCTCF in this process. Nucleic Acids Res. 2011;39:3042–3052. doi: 10.1093/nar/gkq1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bonchuk A, et al. Functional role of dimerization and CP190 interacting domains of CTCF protein in Drosophila melanogaster. BMC Biol. 2015;13:63. doi: 10.1186/s12915-015-0168-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Simakov O, et al. Hemichordate genomes and deuterostome origins. Nat. 2015;527:459–465. doi: 10.1038/nature16150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dehal P, et al. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Sci. 2002;298:2157–2167. doi: 10.1126/science.1080049. [DOI] [PubMed] [Google Scholar]
- 40.Carmona-Aldana F, et al. CTCF knockout reveals an essential role for this protein during the zebrafish development. Mechanisms Dev. 2018;154:51–59. doi: 10.1016/j.mod.2018.04.006. [DOI] [PubMed] [Google Scholar]
- 41.Yachdav G, et al. PredictProtein–an open resource for online prediction of protein structural and functional features. Nucleic Acids Res. 2014;42:W337–343. doi: 10.1093/nar/gku366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Razin SV, Borunova VV, Maksimenko OG, Kantidze OL. Cys2His2 zinc finger protein family: classification, functions, and major members. Biochem. 2012;77:217–226. doi: 10.1134/S0006297912030017. [DOI] [PubMed] [Google Scholar]
- 43.Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. C2H2 Zinc Finger Proteins: The Largest but Poorly Explored Family of Higher Eukaryotic Transcription Factors. Acta naturae. 2017;9:47–58. doi: 10.32607/20758251-2017-9-2-47-58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Najafabadi HS, et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat. Biotechnol. 2015;33:555–562. doi: 10.1038/nbt.3128. [DOI] [PubMed] [Google Scholar]
- 45.Gaiti F, Calcino AD, Tanurdzic M, Degnan BM. Origin and evolution of the metazoan non-coding regulatory genome. Developmental Biol. 2017;427:193–202. doi: 10.1016/j.ydbio.2016.11.013. [DOI] [PubMed] [Google Scholar]
- 46.Geyer PK, Corces VG. DNA position-specific repression of transcription by a Drosophila zinc finger protein. Genes &. Dev. 1992;6:1865–1873. doi: 10.1101/gad.6.10.1865. [DOI] [PubMed] [Google Scholar]
- 47.Gaszner M, Vazquez J, Schedl P. The Zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. Genes &. Dev. 1999;13:2098–2107. doi: 10.1101/gad.13.16.2098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Maksimenko O, et al. Two new insulator proteins, Pita and ZIPIC, target CP190 to chromatin. Genome Res. 2015;25:89–99. doi: 10.1101/gr.174169.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Chung HR, Lohr U, Jackle H. Lineage-specific expansion of the zinc finger associated domain ZAD. Mol. Biol. evolution. 2007;24:1934–1943. doi: 10.1093/molbev/msm121. [DOI] [PubMed] [Google Scholar]
- 50.Tadepally HD, Burger G, Aubry M. Evolution of C2H2-zinc finger genes and subfamilies in mammals: species-specific duplication and loss of clusters, genes and effector domains. BMC Evolut. Biol. 2008;8:176. doi: 10.1186/1471-2148-8-176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Emerson RO, Thomas JH. Gypsy and the birth of the SCAN domain. J. virology. 2011;85:12043–12052. doi: 10.1128/JVI.00867-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zolotarev N, et al. Architectural proteins Pita, Zw5,and ZIPIC contain homodimerization domain and support specific long-range interactions in Drosophila. Nucleic Acids Res. 2016;44:7228–7241. doi: 10.1093/nar/gkw371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zolotarev N, et al. Opbp is a new architectural/insulator protein required for ribosomal gene expression. Nucleic Acids Res. 2017;45:12285–12300. doi: 10.1093/nar/gkx840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Weintraub AS, et al. YY1 Is a Structural Regulator of Enhancer-Promoter Loops. Cell. 2017;171:1573–1588 e1528. doi: 10.1016/j.cell.2017.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Lopez-Perrote A, et al. Structure of Yin Yang 1 oligomers that cooperate with RuvBL1-RuvBL2 ATPases. J. Biol. Chem. 2014;289:22614–22629. doi: 10.1074/jbc.M114.567040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Krivega I, Dale RK, Dean A. Role of LDB1 in the transition from chromatin looping to transcription activation. Genes. Dev. 2014;28:1278–1290. doi: 10.1101/gad.239749.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lee J, Krivega I, Dale RK, Dean A. The LDB1 Complex Co-opts CTCF for Erythroid Lineage-Specific Long-Range Enhancer Interactions. Cell Rep. 2017;19:2490–2502. doi: 10.1016/j.celrep.2017.05.072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Liu G, Dean A. Enhancer long-range contacts: The multi-adaptor protein LDB1 is the tie that binds. Biochimica et biophysica acta. Gene regulatory mechanisms. 2019;1862:625–633. doi: 10.1016/j.bbagrm.2019.04.003. [DOI] [PubMed] [Google Scholar]
- 59.Yusufzai TM, Felsenfeld G. The 5′-HS4 chicken beta-globin insulator is a CTCF-dependent nuclear matrix-associated element. Proc. Natl Acad. Sci. U S Am. 2004;101:8620–8624. doi: 10.1073/pnas.0402938101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Pant V, et al. Mutation of a single CTCF target site within the H19 imprinting control region leads to loss of Igf2 imprinting and complex patterns of de novo methylation upon maternal inheritance. Mol. Cell Biol. 2004;24:3497–3504. doi: 10.1128/mcb.24.8.3497-3504.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Saldana-Meyer R, et al. CTCF regulates the human p53 gene through direct interaction with its natural antisense transcript, Wrap53. Genes. Dev. 2014;28:723–734. doi: 10.1101/gad.236869.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Brown RS. Zinc finger proteins: getting a grip on RNA. Curr. Opin. Struct. Biol. 2005;15:94–98. doi: 10.1016/j.sbi.2005.01.006. [DOI] [PubMed] [Google Scholar]
- 63.Brayer KJ, Kulshreshtha S, Segal DJ. The protein-binding potential of C2H2 zinc finger domains. Cell Biochem. biophysics. 2008;51:9–19. doi: 10.1007/s12013-008-9007-6. [DOI] [PubMed] [Google Scholar]
- 64.Brayer KJ, Segal DJ. Keep your fingers off my DNA: protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem. biophysics. 2008;50:111–131. doi: 10.1007/s12013-008-9008-5. [DOI] [PubMed] [Google Scholar]
- 65.Marley J, Lu M, Bracken C. A method for efficient isotopic labeling of recombinant proteins. J. biomolecular NMR. 2001;20:71–75. doi: 10.1023/a:1011254402785. [DOI] [PubMed] [Google Scholar]
- 66.Uversky VN. Use of fast protein size-exclusion liquid chromatography to study the unfolding of proteins which denature through the molten globule. Biochem. 1993;32:13288–13298. doi: 10.1021/bi00211a042. [DOI] [PubMed] [Google Scholar]
- 67.Miles AJ, Wien F, Wallace BA. Redetermination of the extinction coefficient of camphor-10-sulfonic acid, a calibration standard for circular dichroism spectroscopy. Anal. Biochem. 2004;335:338–339. doi: 10.1016/j.ab.2004.08.035. [DOI] [PubMed] [Google Scholar]
- 68.Kelly SM, Jess TJ, Price NC. How to study proteins by circular dichroism. Biochimica et. biophysica acta. 2005;1751:119–139. doi: 10.1016/j.bbapap.2005.06.005. [DOI] [PubMed] [Google Scholar]
- 69.Konarev PV, Volkov VV, Sokolova AV, Koch MHJ, Svergun DI. PRIMUS: a Windows PC-based system for small-angle scattering data analysis. J. Appl. Crystallogr. 2003;36:1277–1282. doi: 10.1107/S0021889803012779. [DOI] [Google Scholar]
- 70.Petoukhov MV, Konarev PV, Kikhney AG, Svergun DI. ATSAS 2.1 - towards automated and web-supported small-angle scattering data analysis. J. Appl. Crystallogr. 2007;40:S223–S228. doi: 10.1107/S0021889807002853. [DOI] [Google Scholar]
- 71.Mylonas E, Svergun DI. Accuracy of molecular mass determination of proteins in solution by small-angle X-ray scattering. J. Appl. Crystallogr. 2007;40:S245–S249. doi: 10.1107/S002188980700252x. [DOI] [Google Scholar]
- 72.Schanda P, Kupce E, Brutscher B. SOFAST-HMQC experiments for recording two-dimensional heteronuclear correlation spectra of proteins within a few seconds. J. biomolecular NMR. 2005;33:199–211. doi: 10.1007/s10858-005-4425-x. [DOI] [PubMed] [Google Scholar]
- 73.Delaglio F, et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. biomolecular NMR. 1995;6:277–293. doi: 10.1007/bf00197809. [DOI] [PubMed] [Google Scholar]
- 74.Lee W, Tonelli M, Markley JL. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinforma. 2015;31:1325–1327. doi: 10.1093/bioinformatics/btu830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Rambo RP, Tainer JA. Characterizing flexible and intrinsically unstructured biological macromolecules by SAS using the Porod-Debye law. Biopolym. 2011;95:559–571. doi: 10.1002/bip.21638. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analysed during this study are included in this published article and its Supplementary Information files.