Skip to main content
The EMBO Journal logoLink to The EMBO Journal
. 2021 Sep 6;40(20):e107795. doi: 10.15252/embj.2021107795

CTCF binding modulates UV damage formation to promote mutation hot spots in melanoma

Smitha Sivapragasam 1, , Bastian Stark 1, , Amanda V Albrecht 2, Kaitlynne A Bohm 1, Peng Mao 1,3, Raymond G Emehiser 4, Steven A Roberts 1, Patrick J Hrdlicka 4, Gregory M K Poon 2,5, John J Wyrick 1,6,
PMCID: PMC8521319  PMID: 34487363

Abstract

Somatic mutations in DNA‐binding sites for CCCTC‐binding factor (CTCF) are significantly elevated in many cancers. Prior analysis has suggested that elevated mutation rates at CTCF‐binding sites in skin cancers are a consequence of the CTCF‐cohesin complex inhibiting repair of UV damage. Here, we show that CTCF binding modulates the formation of UV damage to induce mutation hot spots. Analysis of genome‐wide CPD‐seq data in UV‐irradiated human cells indicates that formation of UV‐induced cyclobutane pyrimidine dimers (CPDs) is primarily suppressed by CTCF binding but elevated at specific locations within the CTCF motif. Locations of CPD hot spots in the CTCF‐binding motif coincide with mutation hot spots in melanoma. A similar pattern of damage formation is observed at CTCF‐binding sites in vitro, indicating that UV damage modulation is a direct consequence of CTCF binding. We show that CTCF interacts with binding sites containing UV damage and inhibits repair by a model repair enzyme in vitro. Structural analysis and molecular dynamic simulations reveal the molecular mechanism for how CTCF binding modulates CPD formation.

Keywords: CCCTC‐binding factor, DNA damage, DNA repair, skin cancer, ultraviolet light

Subject Categories: Cancer; Chromatin, Transcription & Genomics; DNA Replication, Recombination & Repair


Genome‐wide CPD‐seq analyses and biochemical studies with a model repair enzyme reveal the mechanisms driving UV mutagenesis at CTCF‐insulator binding sites.

graphic file with name EMBJ-40-e107795-g010.jpg

Introduction

Sequencing of tumor genomes has revealed many somatic mutation hot spots associated with specific genomic features (Gonzalez‐Perez et al, 2019; Rheinbay et al, 2020). Somatic mutation hot spots are sites in which the same nucleotide(s) in the genome sequence is mutated in multiple, independent tumors. Mutation hot spots are typically found in cancer‐causing “driver” genes, and presumably arise due to selection for their carcinogenic functions (Chang et al, 2016; Rheinbay et al, 2020). Indeed, the presence of such recurrent mutations is often strong evidence for a role in carcinogenesis. In sequenced melanomas, mutation hot spots are not only located in the coding sequences of cancer‐causing “driver” genes (e.g., BRAF V600E and NRAS Q61R), but also frequently located in noncoding DNA (Hayward et al, 2017; Rheinbay et al, 2020). In some cases, hot spot mutations in noncoding DNA promote carcinogenesis, as is the case for hot spot mutations in the promoter of the telomerase reverse transcriptase (TERT) gene (Horn et al, 2013; Huang et al, 2013; Chiba et al, 2017; Heidenreich & Kumar, 2017). However, recent studies have indicated that some hot spot mutations in noncoding DNA may not drive carcinogenesis, but instead function as neutral “passenger” mutations (Fredriksson et al, 2017; Buisson et al, 2019; Roberts et al, 2019). Such “passenger” hot spots are not under selection, but may instead arise from highly mutagenic processes, such as elevated UV damage formation, as occurs at the binding sites of ETS transcription factors (Elliott et al, 2018; Mao et al, 2018; Premi et al, 2019), or inhibition of repair (Poulos et al, 2016; Sabarinathan et al, 2016; Mao & Wyrick, 2019).

Recurrent hot spot mutations have been identified in the DNA‐binding sites of the CCCTC‐binding factor (CTCF) in different cancer types (Katainen et al, 2015; Kaiser et al, 2016; Poulos et al, 2016; Sabarinathan et al, 2016; Umer et al, 2016; Guo et al, 2018). CTCF functions as a key regulator of gene expression and genome organization, since its insulator activity modulates the function of neighboring enhancers and promoters by regulating chromatin topology and looping (Merkenschlager & Nora, 2016). Indeed, CTCF binding is often associated with cohesin loading at the boundaries of topologically associated domains (TADs), which regulate genome organization (Rao et al, 2014). Mutation hot spots in CTCF‐binding sites (CBS) were first identified in gastrointestinal cancers (Katainen et al, 2015; Kaiser et al, 2016; Umer et al, 2016), but are also prominent in skin cancers such as melanoma (Poulos et al, 2016; Sabarinathan et al, 2016; Liu et al2019). The mutation type and relative location of the hot spot in the CTCF‐binding motif differs between gastrointestinal cancers and skin cancers. In gastrointestinal cancers, T:A base pairs throughout the binding motif are recurrently mutated (Katainen et al, 2015; Kaiser et al, 2016; Umer et al, 2016), while in skin cancers, hot spot mutations primarily occur at a CC sequence at the 5’ end of the CTCF‐binding motif (Poulos et al, 2016). In some cases, CBS mutations have been associated with functional alterations in CTCF binding, genome topology, and cancer gene expression, indicating these noncoding mutations may drive carcinogenesis (Poulos et al, 2016; Guo et al, 2018; Liu et al2019). However, it has also been argued that many CBS mutations in skin cancers are likely passenger hot spots that are not under selection, but instead arise due to elevated mutation rates at CBS that are a consequence of decreased repair of UV damage (Poulos et al, 2016; Sabarinathan et al, 2016). Bioinformatic analysis of excision repair‐sequencing (XR‐seq) data, which measures repair of UV damage by the nucleotide excision repair (NER) pathway (Hu et al, 2015, 2017b; Adar et al, 2016), has further suggested that cohesin loading at a subset of CBS may be required for NER inhibition (Poulos et al, 2016). However, whether CTCF can remain bound to a CBS containing UV damage and directly inhibit repair of this damage is not known.

A unique feature of CBS mutation hot spots in skin cancer is that they are primarily associated with the 5’ end of the cytosine‐rich strand of the CTCF‐binding motif (Poulos et al, 2016; Gonzalez‐Perez et al, 2019), even though NER inhibition occurs throughout the binding motif and extends to DNA immediately adjacent to the CBS (Poulos et al, 2016; Sabarinathan et al, 2016). It has been suggested that uneven NER inhibition across the CTCF‐binding motif may be responsible for this mutation hot spot (Poulos et al, 2016). This conclusion is based on analysis of XR‐seq data, which revealed somewhat lower NER activity at the 5’ mutation hot spot relative to other locations in the binding motif. Moreover, this same study found that the CBS mutation hot spot is absent in cutaneous squamous cell carcinomas (cSCC) derived from patients with a germline defect in the NER pathway (XPC−/− ), which was interpreted as confirming the importance of repair inhibition in inducing CBS mutation hot spots (Poulos et al, 2016).

Somatic mutation hot spots in skin cancers can also be induced by elevated UV damage formation. We and others have recently shown that DNA binding by ETS transcription factors causes very high levels of UV‐induced cyclobutane pyrimidine dimer (CPD) formation at specific sites in the ETS‐binding motif, both in cells and in vitro (Elliott et al, 2018; Mao et al, 2018; Premi et al, 2019; Roberts et al, 2019). Elevated UV damage formation at ETS‐binding sites causes somatic mutation hot spots in melanomas (Elliott et al, 2018; Mao et al, 2018; Gonzalez‐Perez et al, 2019; Premi et al, 2019; Roberts et al, 2019). However, the extent to which this mechanism promotes mutation hot spots at other, non‐ETS transcription factor‐binding sites is unclear.

We wondered whether CTCF binding might similarly promote the formation of UV‐induced CPD lesions, as a potential explanation for CBS mutation hot spots. To test this hypothesis, we analyzed genome‐wide maps of UV‐induced CPD formation in human cells, and measured the impact of CTCF binding on CPD formation in vitro. Here, we report that CTCF binding significantly modulates UV damage formation, specifically promoting the formation of CPD lesions at the 5’ side of the CTCF motif where mutation hot spots form.

Results

Repair inhibition alone cannot explain mutation hot spots at CTCF‐binding sites

To investigate the molecular etiology of mutation hot spots at CTCF‐binding sites (CBS) in melanoma, we analyzed somatic mutation density, UV damage levels, and repair activity at a well‐defined set of CBS (Sabarinathan et al, 2016; Mao et al, 2018). Analysis of somatic mutations derived from whole‐genome sequencing of 183 melanoma tumors (Hayward et al, 2017) revealed a striking peak of somatic mutations at active CBS (Fig 1A), defined as CBS from ENCODE (Consortium EP, 2012) located in melanocyte DNase I hypersensitivity (DHS) regions (Sabarinathan et al, 2016; Mao et al, 2018). The mutation hot spot was located at a 5’ CC dinucleotide on the cytosine‐rich strand in the CTCF consensus motif (positions −4/−3 in Fig 1B), consistent with previous reports analyzing a smaller set of melanoma genomes (Poulos et al, 2016). In contrast, mutations were only slightly enriched at inactive CBS (i.e., CBS not located in a melanocyte DHS, see Fig 1C and D), indicating that active CTCF binding is required for somatic mutation hot spots. Although CC dinucleotides are commonly mutated in melanoma (Hayward et al, 2017), mutation enrichment at the position −4/−3 in active CBS was elevated ˜10‐fold (Fig 1E and F) after normalizing for the expected mutation count derived from the trinucleotide DNA sequence context (see Materials and Methods).

Figure 1. Somatic mutations in melanoma are associated with a specific hot spot in the CTCF‐binding site (CBS), while repair activity is broadly inhibited throughout the CBS.

Figure 1

  • A
    Somatic mutation hot spots in melanoma are associated with active CBS (i.e., CBS in a DNase I hypersensitivity sites [DHS] in melanocytes). In parallel, CPD repair activity, as quantified by the XR‐seq method (Adar et al, 2016) in human fibroblasts 1 h after UV irradiation, is diminished at active CBS.
  • B
    Close‐up of somatic mutation density and repair activity at active CBS. Mutation hot spot is located at the CC dinucleotide at positions −4/−3 in the CBS. Top of panel is the sequence logo of active CBS, derived from the WebLogo software (Crooks et al, 2004).
  • C, D
    Same as panel (A) and (B), respectively, except for inactive CBS (i.e., CBS not in a DHS in melanocytes).
  • E, F
    Same as panel (A) and (B), except mutation density is normalized by the expected mutation density based on the trinucleotide DNA sequence context and the 1 h XR‐seq CPD repair data are normalized using initial CPD levels (0 h) as determined by the HS‐Damage‐seq method (Hu et al, 2017a).
  • G
    Same as panel (B), except XR‐seq CPD repair data are shown for all repair time points (1–48 h). Repair data from subsequent time points were standardized so that the total number of XR‐seq reads was equal to the 1‐h time point.
  • H
    Graph of XR‐seq read density at different repair time points at the indicated locations in the CTCF motif (i.e., −4/−3 [mutation hot spot], 0/+1, +5/+6, all [entire] binding site, and flanking regions).
  • I
    Graph of mutation density in melanoma tumors at the same locations in the CTCF motif as in panel (H).

It has previously been reported that uneven inhibition of NER activity is responsible for the mutation hot spot in CBS in skin cancers (Poulos et al, 2016), based in part on bioinformatics analysis of human XR‐seq data (Hu et al, 2015; Adar et al, 2016). We reanalyzed the same CPD XR‐seq data (Adar et al, 2016), but unlike the previous study (Poulos et al, 2016), we assigned each XR‐seq read count to the likely lesion‐containing dinucleotide within the XR‐seq read (Appendix Fig S1A and B), as previously described (Mao et al, 2020). This method significantly increases the resolution of the XR‐seq data and removes the intrinsic asymmetry of the NER excision reaction from the XR‐seq data, which can otherwise cause oddities in which large numbers of XR‐seq reads are associated with non‐lesion‐forming purine bases. Our analysis indicated that NER activity is very low at active CBS (Fig 1A and B), but is elevated in the DHS region immediately flanking the CBS, consistent with previous reports (Polak et al, 2014; Poulos et al, 2016; Sabarinathan et al, 2016). Notably, repair activity and mutation density oscillate in opposite phases flanking active CBS (Fig 1A and E), likely due to arrays of positioned nucleosomes adjacent to CBS (Appendix Fig S2A and B).

NER activity, as measured by XR‐seq read density, was generally low within the CBS motif, both at 1hr (Fig 1B) and at later repair time points (Fig 1G). However, repair activity was not specifically decreased at positions −4/−3 relative to other locations in the CTCF motif (Fig 1B, G and H). Moreover, NER activity was also similarly low at inactive CBS (Fig 1C and D), yet mutation density is only slightly elevated at inactive CBS. Since the NER activity measured by XR‐seq is affected by initial UV damage levels, we used our published method (Mao et al, 2020) to normalize the XR‐seq read density using initial CPD levels measured by Damage‐seq (0 h) in the same cells (normal human fibroblasts [NHF1]) as the XR‐seq experiments (Hu et al, 2017a). Normalized repair activity was generally low within the CBS relative to flanking DNA and was negatively correlated with the normalized mutation density (Fig 1E and F), as expected. However, at position −4/−3 normalized repair activity was high relative to other positions in the CBS motif (Fig 1F). This analysis reveals that repair inhibition is less pronounced at the mutation hot spot than elsewhere in the CBS motif, suggesting that repair inhibition alone cannot explain the presence of mutation hot spot at position −4/−3 in active CBS (Fig 1I).

UV‐induced CPD formation is elevated at 5’CC hot spot in CBS in human cells

We wondered whether CTCF binding to DNA might modulate UV damage formation, since this could provide an alternative explanation for the observed CBS mutation hot spots in melanoma. Previous analysis of UV damage formation in human cells using the Damage‐seq method did not observe elevated CPD formation within CBS (Hu et al, 2017a). However, Damage‐seq detected very low numbers of CPD lesions at CC dinucleotides (Hu et al, 2017a), which comprise the majority of lesion‐forming sites in the CTCF‐binding motif. To re‐examine the effects of CTCF binding on UV damage formation, we analyzed CPD maps from UV‐irradiated human fibroblasts using our CPD‐seq method (Mao et al, 2018). Notably, analysis of CPD‐seq data from two different studies (Elliott et al, 2018; Mao et al, 2018) revealed that UV damage formation is consistently altered at active CBS (Fig 2A and Appendix Fig S3A and B). In order to account for the influence of sequence biases in the CBS motif, we normalized the cellular CPD‐seq data with parallel CPD‐seq data sets derived from UV irradiation of human genomic DNA in vitro (i.e., “naked” DNA). This analysis revealed that normalized CPD levels are significantly modulated in active CBS, even after accounting for DNA sequence biases at these sites (Fig 2B). In contrast, there is relatively little effect on CPD enrichment at inactive CBS (Fig 2C), indicating that differences in CPD levels at CBS are a consequence of CTCF binding.

Figure 2. UV‐induced CPD formation in human cells is specifically elevated at the mutation hot spot in active CBS.

Figure 2

  • A
    UV‐induced CPD formation in human cells is modulated at active CBS (i.e., CBS in a DNase I hypersensitivity sites [DHS] in melanocytes). CPD formation was quantified using published CPD‐seq data (Elliott et al, 2018; Mao et al, 2018) from UV‐irradiated human cells.
  • B
    Same as panel (A), except CPD enrichment is plotted after normalizing for the effects of DNA sequence context using UV‐induced CPD formation in naked DNA as a control.
  • C
    Same as panel (B), except for inactive CBS (i.e., CBS not in a DNase I hypersensitivity sites [DHS] in melanocytes). At these inactive CBS, there is relatively little modulation of CPD formation.
  • D
    Close‐up of somatic mutation density and CPD formation (see panel A) at active CBS. Mutation hot spot is located at the CC dinucleotide at positions −4/−3 in the CBS. Top of panel is the sequence logo of active CBS, from the WebLogo software (Crooks et al, 2004).
  • E
    Same as panel (D), except mutation density is normalized by the expected mutation density based on the trinucleotide DNA sequence context and the cellular CPD‐seq data are normalized using the naked DNA control (see panel B).
  • F
    Same as panel (E), except for inactive CBS (i.e., CBS not in a DHS in melanocytes).

Close inspection revealed that UV‐induced CPD formation is significantly modulated throughout the CBS motif (Fig 2D). After normalizing for the effects of DNA sequence using the naked DNA control, this analysis revealed that CPD enrichment in active CBS has its peak at position −4/−3 (Fig 2E), which coincides with the location of the mutation hot spot. At this location, CPD enrichment is ˜2‐fold higher in UV‐irradiated cells than the naked DNA control. CPD levels are also enriched at positions −6/−5 (˜1.5‐fold), 0/+1 (˜1.7‐fold), and +2/+3 (˜1.4‐fold). However, CPD formation is depleted at other locations in active CBS (Fig 2E), including motif positions −5/−4 (˜0.7‐fold), −3/‐2 (˜0.6‐fold), and −2/−1 (˜0.6‐fold). This pattern of CPD enrichment and depletion is fairly consistent in independent CPD‐seq experiments (Appendix Fig. S3C and D), even though differing cell types and UV doses were used. In contrast, CPD enrichment is hardly affected at inactive CBS (Fig 2F), confirming that active CTCF binding is required for UV damage modulation. Taken together, these findings indicate that in human cells, CTCF binding significantly modulates UV‐induced CPD formation throughout the CBS motif and induces CPD formation at the −4/−3 mutation hot spot.

Most CBS do not have a dipyrimidine sequence at position −6/−5 in the motif, even though this is a site of CPD enrichment, so we analyzed CPD formation at the subset of CBS that have a dipyrimidine at this location. At this subset of CBS, CPD formation was again elevated at position −6/−5, which correlated with a ˜2‐ to 4‐fold mutation enrichment in melanoma (Fig EV1A and B). In performing this analysis, we noticed that CPD formation at the −4/−3 hot spot is more significantly elevated at this subset of CBS where position −6/−5 is a dipyrimidine. CPD density and enrichment are increased 2.6‐fold and 1.9‐fold, respectively, at this CBS subset (compare Fig 2D and E with Fig EV1A and B). Importantly, mutation density showed a similar increase at the CBS subset, with ˜3.6‐fold higher frequency of mutations at both positions −4 and −3 (Fig EV1A), even though repair activity is increased 5.9‐fold and 3‐fold at positions −4 and −3, respectively, in this CBS subset (Fig EV1C and D). These findings indicate that the level of CPD induction is strongly correlated with mutation frequency at the position −4/−3 hot spot in melanoma.

Figure EV1. UV damage formation and mutagenesis are elevated at a subset of CBS that have a dipyrimidine at position −6/−5 in the CTCF‐binding motif.

Figure EV1

  • A
    CPD density in UV‐irradiated human skin cells at a subset of CBS containing a dipyrimidine sequence at position −6/−5 (see sequence logo at top of panel, which was generated using the WebLogo software (Crooks et al, 2004)). Elevated CPD density at position −4/−3 at this CBS subset correlates with a higher mutation density at these positions in melanoma.
  • B
    Same as panel (A), except showing CPD and mutation enrichment at this CBS subset.
  • C, D
    Repair activity, as measured by XR‐seq read density (Adar et al, 2016), is also elevated at position −4/−3 in this CBS subset.

CTCF binding modulates UV‐induced CPD formation in vitro

To determine whether CTCF binding to DNA directly modulates UV damage formation, we measured CPD formation in the presence or absence of bound CTCF at selected CBS in vitro. We analyzed double‐stranded DNA oligonucleotides encompassing CBS located in the promoter region of Kinase Suppressor of Ras 2 (KSR2) gene (Fig 3A) and a putative enhancer region of the VAX 1 gene (Fig EV2A). These particular CBS were chosen because they are among the most highly mutated CBS in sequenced melanoma tumors (Appendix Table S1). Electrophoretic mobility shift assays revealed that purified recombinant CTCF protein tightly bound each set of radiolabeled oligonucleotides (Figs 3B and EV2B). At very high CTCF concentrations, a second supershifted band was detected, potentially representing either CTCF dimerization (Pant et al, 2004; Yusufzai et al, 2004) or non‐specific binding to the DNA substrate.

Figure 3. CTCF binding to DNA modulates UV‐induced CPD formation in vitro .

Figure 3

  • A
    Sequence of the DNA oligonucleotide containing a recurrently mutated CBS upstream of the KSR2 gene. The CTCF‐binding motif is shaded in gray with bold text, and the positions of CPD hot spots in the CBS are indicated (e.g., −4/−3, etc.).
  • B
    Electrophoretic mobility shift assay (EMSA) demonstrating binding of recombinant murine CTCF protein (Lane 1–7: 0, 1.4 pmol, 2.8 pmol, 5.6 pmol, 11.2 pmol, 22.1 pmol, 80 pmol) to double‐stranded DNA oligonucleotides containing the KSR2 CBS (4 pmol).
  • C
    Sequencing gel analysis of CPD lesions induced in UV‐irradiated KSR2 DNA oligonucleotides (˜1,800 J/m2 of UVC light). The resulting CPD lesions were cleaved with T4 endonuclease V. In vitro UV irradiation was performed in the absence or presence of increasing concentrations of bound murine CTCF protein. Numbers at the bottom of the gel match the lanes shown in panel (B). A representative gel out of four independent replicates is shown here. The first and last lanes correspond to custom single‐stranded KSR2 oligonucleotides of different sizes and single‐stranded DNA ladder, respectively. Numbers indicate oligonucleotide marker length in nucleotides.
  • D
    Quantification of in vitro CPD enrichment following CTCF binding relative to unbound control for different positions in the KSR2 CBS. For comparison, CPD enrichment in cellular DNA relative to naked DNA for active CBS is shown for the genome‐wide CPD‐seq data.

Figure EV2. CTCF binding to VAX1‐binding site modulates CPD formation in vitro .

Figure EV2

  • A
    Sequence of a double‐stranded DNA oligonucleotide containing a recurrently mutated CBS neighboring the VAX 1 gene. The CTCF‐binding motif is in gray, and positions of detected CPD hot spots in the CBS (e.g., −4/−3, etc.) are indicated.
  • B
    EMSA showing binding of recombinant murine CTCF protein (Lanes 2 through 7 – 1.1 pmol, 2.2 pmol, 4.5 pmol, 9 pmol, 18 pmol, 36.1 pmol) to double‐stranded DNA oligonucleotides containing the VAX 1 CBS (4 pmol).
  • C
    Sequencing gel analysis of CPD lesions induced in UV‐irradiated VAX1 CBS DNA oligonucleotides (˜1,800 J/m2 of UVC light) in the presence or absence of bound CTCF. CPD lesions were cleaved with T4 endonuclease V. Numbers at the bottom of the gel match the lanes shown in panel (B). A representative gel out of three independent replicates is shown. The first and last lanes correspond to custom single‐stranded VAX 1 CBS oligonucleotides of different sizes (numbers indicate length in nucleotides (nt)) and ss10 (Simplex Sciences) single‐stranded DNA ladder (10 nt, 20 nt, and 30 nt markers are shown), respectively.
  • D
    Quantification of CPD enrichment following CTCF binding in vitro relative to the unbound DNA control for different positions in the VAX1 CBS. CPD enrichment from CPD‐seq data for cellular DNA relative to naked DNA for active CBS is shown for comparison.

We characterized the impact of CTCF binding on CPD formation by UV irradiating doubled‐stranded DNA oligonucleotides containing the KSR2 or VAX1 CBS in vitro in the presence or absence of bound CTCF. CPD formation following UV exposure (1.8 kJ/m2 dose) was monitored by digesting the radiolabeled DNA substrates with T4 endonuclease V (T4 endoV), which specifically cleaves at CPDs to generate nicks, and analyzing the resulting digestion products on denaturing sequencing gels. In the absence of bound CTCF, UV exposure induced CPD lesions throughout the cytosine‐rich strand of the CBS, with the highest damage levels observed at the TC dinucleotide at position −5/−4 (Fig 3C). Binding by CTCF strongly suppressed UV‐induced CPD formation at position −5/−4 and at other positions in the CBS (e.g., positions −3/−2, −2/−1, and −1/0). In contrast, CTCF binding stimulated CPD formation at the −4/−3 mutation hot spot, as well as at other locations in the CBS (e.g., positions −6/−5 and 0/+1). Similar modulation of CPD formation was observed upon CTCF binding to the VAX1 enhancer CBS (Fig EV2C). Notably, the modulation of CPD formation by CTCF binding in vitro closely matches the modulation of CPD formation in CBS observed in the CPD‐seq data derived from UV‐irradiated human cells (Figs 3D and EV2D). These findings indicate that CTCF binding to DNA directly modulates CPD formation in vitro and is likely responsible for the observed CPD hot spots at CBS in human cells.

CTCF binding induces DNA conformational changes that modulate susceptibility to UV damage

To understand how CTCF binding modulates CPD formation, we analyzed published structures of CTCF bound to DNA (Hashimoto et al, 2017; Yin et al, 2017). Biophysical studies have suggested that two geometric parameters play a critical role in regulating the rate of CPD formation: (i) the distance (d) between the C5‐C6 double bonds of adjacent pyrimidines, and (ii) the relative torsion angle (η) between these bonds (Fig 4A). DNA conformations that minimize the distance (d < 4 Å) and torsion angle (η ˜ 20°–30°) often enhance susceptibility to CPD formation (Johnson & Wiest, 2007; Law et al, 2008; Yuan et al, 2011; Schreier et al, 2015). Analysis of DNA structures bound by CTCF indicates that CPD hot spot at position −4/−3 has an average distance of 4.04 ± 0.05 Å and torsion angle of 17.9 ± 2.2° (Fig 4B and C), both of which are favorable for CPD formation. Positions −1/0 and 0/+1 in the CBS also have a somewhat favorable distance (average distance of 4.14 ± 0.11 Å and 4.11 ± 0.05 Å, respectively), whereas other positions in the CBS have distance or torsion angle parameters that are generally unfavorable for CPD formation (Fig 4B and C). These findings suggest that altered DNA conformations upon CTCF binding could potentially explain the modulation of UV susceptibility at CBS.

Figure 4. CTCF binding alters the DNA structure to modulate susceptibility to UV‐induced CPD formation.

Figure 4

  • A
    Chemical structure of dipyrimidine base step showing key structural parameters important for CPD formation, namely the distance (d) and relative torsion angle (η) between the C5–C6 double bonds of neighboring pyrimidine bases.
  • B, C
    Average distance (B) and torsion angle (C) between neighboring pyrimidine bases derived from X‐ray crystallography structures of DNA sequences bound by CTCF (Hashimoto et al, 2017; Yin et al, 2017). Arrow indicates the location of mutation and CPD hot spot at positions −4/−3 relative to the CBS midpoint. Note that there is no distance and torsion angle data for positions −5 and −6 since these are not a dipyrimidine sequence in the X‐ray structures. Error bars indicate the standard error of the mean (SEM) derived from n = 4 X‐ray structures (5KKQ, 5T00, 5T0U, and 5YEF).
  • D
    Molecular dynamic analysis of a simulated CTCF target in its free (gray) and protein‐bound (red) states. A duplex encoding the KSR2 CBS (i.e., d(TGGTTCCCCCTGGTGGA) was simulated in explicit TIP3P water and 0.15 M NaCl for 200 ns at 298 K. The CTCF‐bound complex was templated from the co‐crystal structure (5KKQ) and simulated under the same conditions for 200 ns. The trajectories of central pyrimidine–pyrimidine (YY) dinucleotide steps at the −6 to +1 positions during the final 50 ns (50,000 frames) were binned for frequency analysis. Each increment in the ordinate represents 4,000 counts. The distance separation (d) between the midpoints of the C5–C6 bonds between adjacent pyrimidines and improper dihedrals (η) formed by the C5–C6 bonds of adjacent pyrimidines is plotted. Green dashes denote values of fiber B‐form DNA.
  • E
    Representative snapshot of the CPD‐susceptible base steps (C‐4‐C‐3) in the complex at instants when the C5‐C6 separation distances (thickened lines) are at their modal values (4.0 Å for C‐4‐C‐3). For clarity, only DNA atoms of the two base steps are shown, in green. Magenta dashes correspond to H‐bonds with CTCF, and gray dashes correspond to H‐bonds between DNA. Arrow shows an inter‐base‐step H‐bond.
  • F
    Average values of the major groove width, helical twist, tilt, and roll angles of the free (gray) and CTCF‐bound DNA (red), ± standard deviation over the final 50 ns of the trajectory. Green dashed lines denote canonical values for B‐DNA. Blue dashed lines mark the C‐4‐C‐3 and C‐1‐C‐0 base steps. Note that computation of groove widths requires ±2 positions. Cartoons of twist, tilt, and roll are taken from (Lu & Olson, 2003).

To understand the molecular basis of CTCF‐modulated CPD formation at higher resolution, we carried out molecular dynamic (MD) simulations of the CTCF‐binding site in the KSR2 promoter and its complex with CTCF. We used the published structure of a partial CTCF‐DNA‐binding domain (DBD), consisting of the central F3 to F7 zinc fingers, of human CTCF (PDB: 5KKQ, residues 321–465) as a template (Hashimoto et al, 2017). In the co‐crystal structure, the protein courses along the bound DNA, apposing evenly spaced α‐helical segments into the DNA major groove. The helices were punctuated by extended β‐hairpins that, together, comprise the five tandem C2H2 zinc finger motifs. In preparation for MD simulation, we carefully parameterized the protein to preserve its coordination to Zn2+ ions, which were parsimoniously modeled as unbonded receptors. Throughout the 200 ns of unstrained simulation, all five zinc fingers maintained their occupancies despite significant conformational dynamics in the complex (Appendix Fig S4A–D and Movie EV1). Together with the use of a state‐of‐the‐art DNA forcefield, parmbsc1 (Galindo‐Murillo et al, 2016; Ivani et al, 2016), conservation of the zinc fingers lent credence to the structures and dynamics resulting from analysis of the MD trajectories.

Focusing on the dipyrimidine tract in the KSR2 promoter, we observed perturbations in both separation distance (d) and dihedral angle (η) for the C5‐C6 double bonds of adjacent pyrimidines in the CTCF‐bound DNA relative to free DNA. Of the seven dipyrimidine steps in the KSR2 CBS, the C‐4C‐3 base step (position −4/−3) showed the greatest shift to a shorter distance and smaller dihedral angle in the CTCF/DNA complex (Fig 4D), consistent with our analysis of static 3D structures (Fig 4B and C). These findings confirm that CTCF binding induces a DNA conformation at the −4/−3 base step that promotes CPD formation. The positions and rank order of the CTCF‐induced structural perturbations elsewhere in the KSR2 CBS generally agreed with the observed effect of CTCF binding on CPD formation in vitro and in cells. For example, CPD formation is significantly diminished at the T‐5C‐4 base step upon CTCF binding in vitro (Fig 3C), which can be explained by CTCF binding causing an increase in the distance between the C5–C6 double bonds (Fig 4D). A similar mechanism likely explains why CPD formation is suppressed by CTCF binding at the C‐3C‐2 and C‐2C‐1 base steps (Fig 4D). Favorable distance parameters were also detected at the C‐1C‐0 and C0T1 base steps (Fig 4D), although CPD formation is only induced at the C0T1 base step (position 0/+1). This discrepancy could reflect limitations of the simulated CTCF‐DNA complex, which contains an oligomeric DNA bound to a truncated DBD of CTCF, or the known propensity of CPD lesions to form on the 3' end of a pyrimidine repeat relative to internal locations (Premi et al, 2019). In longer simulations (to 1 µs), the first and last zinc fingers of the CTCF construct showed reduced occupancy at the fraying ends of the oligomeric DNA (Movie EV2). However, binding by the remaining CTCF zinc fingers to the core CBS still induced similar structural distortions in the DNA (with the exception of C0T1 base step), resulting in smaller distance and torsion angles at position −4/−3 (Appendix Fig S5A–E) that would favor CPD formation.

To unveil the structural and dynamic basis of the CTCF‐induced DNA distortions that promotes CPD formation at the −4/−3 position, we scrutinized the CTCF/DNA interactions at the C‐4C‐3 base step (Fig 4E). Direct readout of the major groove floor was at the complementary dG nucleobases with Lys and Arg sidechains, as the protein did not directly contact the pyrimidine bases in the cytosine‐rich strand. Indirect readout of the phosphate backbone occurred with sidechains of Tyr and basic residues. Moreover, the nucleobases at the C‐4C‐3 base step were brought sufficiently close to form one or more inter‐base‐step hydrogen bonds, which may provide additional stabilization of a CPD‐susceptible conformation. Analysis of several key parameters of DNA structure (Fig 4F) indicates that there is local unwinding of the double helix (decreased helical twist) at the C‐4C‐3 base step to bring the pyrimidine C5–C6 double bonds into angular alignment. Since helical unwinding also increases the helical pitch (i.e., rise), additional distortions occurred to bring the pyrimidines more closely together. Roll and tilt angles of base steps are well‐established markers of DNA curvature and other base‐stack distortions in the context of an intact duplex (Dickerson & Chiu, 1997; Dickerson, 1998). For the C‐4C‐3 base step, this occurred by a change in tilt, closing the gap directly on the pyrimidine side of the stack. This analysis suggests a molecular mechanism for how CTCF binding modulates the susceptibility of DNA to UV‐induced DNA lesions.

CTCF binds to CPD‐containing binding sites and inhibits repair by T4 endonuclease V in vitro

CTCF must remain bound to DNA in the presence of CPD lesions in order to inhibit repair. To directly test this hypothesis, we measured the relative affinity of CTCF to the KSR2 promoter region containing a CPD lesion at position −6/−5 in the CBS, which corresponds to one of the locations of a CPD hot spot. This site was chosen because it contains a T‐T thymine dimer, as phosphoramidites are not available for cytosine‐containing CPDs, due to their chemical instability. Oligonucleotides featuring a cis‐syn thymine dimer at the indicated position in the KSR2 CBS (Fig EV3A) were synthesized using solid‐phase DNA synthesis protocols. Incubation with T4 endoV resulted in near‐complete digestion of the resulting oligonucleotides (e.g., see Fig 5C), confirming site‐specific incorporation of the CPD lesion. Purified CTCF protein readily bound the KSR2 CBS containing a CPD lesion as assessed by EMSA (e.g., Fig EV3B). To more directly compare the impact of a CPD lesion on CTCF binding, we performed competition assays in which unlabeled oligonucleotides containing the KSR2 CBS with or without a CPD lesion were used to compete for CTCF binding to a radiolabeled KSR2 CBS substrate (Fig 5A and B). While both undamaged and CPD‐containing competitor DNA could successfully compete for CTCF binding (Fig 5A and B), quantification of replicate experiments revealed that the affinity of the CPD‐containing competitor was ˜2‐fold lower than the undamaged competitor DNA (Appendix Fig S6).

Figure EV3. CTCF can bind the KSR2 CBS containing a CPD lesion in vitro .

Figure EV3

  • A
    Location of site‐specific CPD lesion in CTCF‐binding site (CBS) associated with the KSR2 gene.
  • B
    EMSA demonstrating binding of recombinant murine CTCF (0.5, 1, 2, 3, 4, 6.8, 8.8, 18, and 36.1 pmol) protein to CPD‐containing DNA oligonucleotides containing the KSR2 CBS (4 pmol) (see panel A). Asterisk indicates band likely representing an alternative conformation of CPD‐containing KSR2 oligonucleotides.

Figure 5. CTCF can bind CPD‐containing DNA‐binding sites in vitro and inhibit repair by T4 endonuclease V.

Figure 5

  • A, B
    Competition electrophoretic mobility shift assays (EMSA) measuring relative binding affinity of CTCF protein to undamaged and CPD‐containing binding sites. CTCF (12 pmol) binding to 32P‐labeled undamaged KSR2 CBS (4 pmol) was measured by EMSA in the presence of increasing concentrations of unlabeled competitor DNA (Lanes 3 through 10 – 0.2 to 32 pmol) comprised of (A) undamaged KSR2 CBS or (B) KSR2 CBS containing a site‐specific CPD. Representative gels out of three independent replicates are shown in the figure.
  • C
    CTCF binding inhibits cleavage by T4 endonuclease V (T4 endoV) in vitro. KSR2 CBS containing a site‐specific CPD (8 pmol) was cleaved with T4 endoV (5.2 units) for the indicated times in the presence or absence of bound CTCF (18 pmol).
  • D
    Quantification of T4 endoV cleavage of a site‐specific CPD in the KSR2 CBS in the presence or absence of CTCF. Error bars indicate the standard error of the mean (SEM). *P < 0.01 from two‐sided t‐test for n = 3 biological replicates.

Since CTCF can bind to a CPD‐containing CBS, albeit with marginally lower affinity, we tested whether CTCF binding inhibited repair of the CPD lesion in vitro. We measured repair of the CPD lesion by T4 endonuclease V (T4 endoV) glycosylase, since this involves a single repair factor (unlike the > 20 proteins involved in NER) that has been previously used to study repair of CPD lesions in vitro (Schieferstein & Thoma, 1998). While T4 endoV rapidly cleaved the CPD lesion in the KSR2 CBS in the absence of CTCF, bound CTCF reduced CPD cleavage approximately 4‐fold (Fig 5C and D). These in vitro data suggest that CTCF can remain bound to CPD‐containing binding sites and inhibit their accessibility to a repair enzyme.

A previous bioinformatic study suggested that cohesin recruitment to CTCF‐bound sites is important for repair inhibition and mutation hot spots in melanoma (Poulos et al, 2016). Since our in vitro data suggest that CTCF binding is able to inhibit repair even in the absence of cohesin (Fig 5C and D), we re‐examined this finding. We used ENCODE data mapping the binding sites of the RAD21 subunit of cohesin (Consortium EP, 2012) to stratify CBS into those associated with cohesin binding and those not associated (see Methods). Analysis of these CBS subsets revealed that CBS bound by cohesin showed repair inhibition and mutation enrichment (Figs 6A and EV4A), whereas there was little to no mutation enrichment at CBS not bound by cohesin (Figs 6B and EV4B), consistent with the previous report (Poulos et al, 2016). However, this analysis also revealed that cohesin binding was highly correlated with active CBS (i.e., CBS in a DHS; see Fig 6A and B). Indeed, 91% of active CBS are also bound by cohesin. To test whether cohesin binding impacted repair and mutation enrichment independent of whether a CBS was active (i.e., located in a DHS), we analyzed the small subset of active CBS that did not have an associated cohesin‐binding site (1,363 CBS sites in total). Active CBS lacking a cohesin‐binding site showed a roughly similar level of repair inhibition and elevated mutation density (Fig 6C and D), although the magnitude of mutation enrichment (Fig EV4C) was somewhat lower than cohesin‐bound CBS or active CBS generally. There was also a similar modulation of CPD formation at these sites (Fig EV4D and E). These findings are consistent with our in vitro data and suggest that CTCF binding alone can inhibit repair and promote mutation hot spots in melanoma.

Figure 6. Cohesin binding is not required for repair inhibition or mutation enrichment at CTCF‐binding sites.

Figure 6

  • A
    Somatic mutation hot spots in melanoma are associated with cohesin‐bound CBS (i.e., binding site of RAD21 cohesin subunit within 50 bp of CBS). CPD repair activity, measured using the XR‐seq method (Adar et al, 2016) in human fibroblasts 1hr after UV irradiation, is diminished at cohesin‐bound CBS. Percentage of active CBS (i.e., associated with a melanocyte DHS) is indicated.
  • B
    Same as part (A), except for non‐cohesin‐bound CBS.
  • C, D
    Same as panel (B), except for non‐cohesin‐bound CBS that are active (i.e., associated with a melanocyte DHS).

Figure EV4. Somatic mutation density in melanoma is enriched at active CBS lacking cohesin‐binding sites.

Figure EV4

  • A
    Mutation density in melanoma normalized by the expected mutation density due to the trinucleotide sequence context is enriched at CBS associated with cohesin binding. Only CBS sites with a RAD21 (cohesin subunit)‐binding site within 50 bp were included in this analysis. The percentage of active CBS (i.e., located in a melanocyte DHS) in this subset is indicated. RAD21‐binding data are from ENCODE.
  • B
    Same as panel (A), except for CBS not associated with cohesin binding.
  • C
    Same as panel (A), except for active CBS (i.e., located in a melanocyte DHS) not associated with a RAD21‐binding site.
  • D
    Same as panel (C), except CPD density as measured by CPD‐seq is depicted. CPD formation was quantified using published CPD‐seq data (Elliott et al, 2018; Mao et al, 2018) from UV‐irradiated human cells.
  • E
    Close‐up of CPD enrichment at active CBS sites lacking an associated RAD21 binding site. Cellular CPD‐seq data were normalized using the naked DNA CPD‐seq controls.

UV‐induced 6‐4PP formation is not elevated at 5’CC hot spot in CBS

Our analysis suggests that elevated CPD formation at specific sites in CBS promotes mutation hot spots in melanoma. However, a previous report argued that differences in UV damage formation do not play a role in CBS mutation hot spots, since these hot spots disappear in repair‐deficient (XPC−/− ) cutaneous squamous cell carcinomas (cSCC) (Poulos et al, 2016). We used published XR‐seq data from UV‐irradiated XPC−/− human skin fibroblasts (Hu et al, 2015) to confirm that overall NER activity was significantly decreased around active CBS (Appendix Fig S7A) compared with WT cells (Fig 1A), even though roughly similar numbers of XR‐seq reads were sequenced in each data set. Residual repair activity in the XPC−/− cells likely represents TC‐NER associated with CBS located in transcribed introns. Notably, TC‐NER activity is still inhibited at active CBS (Appendix Fig S7A and B), consistent with a previous report that transcription factors can inhibit TC‐NER (Sabarinathan et al, 2016). We confirmed that there is no enrichment of single nucleotide variants (SNVs) at CBS in XPC−/− cSCC (Appendix Fig S8A and B), consistent with the previous report (Poulos et al, 2016). UV light can also induce tandem mutations at CC dinucleotides, and our analysis indicates that CC>TT tandem mutations are elevated at the 5’CC hot spot in active CBS in cutaneous melanomas (Appendix Fig S9A–D). However, analysis of tandem mutations (see Methods) indicated that there was no enrichment of tandem mutations at active CBS in the XPC−/− cSCCs (Appendix Fig. S10A–D). Comparison of the number of mutations per tumor revealed that tandem mutation density is significantly elevated in flanking DNA in the XPC−/− cSCCs relative to WT cSCCs, consistent with a general repair defect (Appendix Fig S10B and D). Mutation density per tumor is roughly similar at position −4/‐3 between WT and XPC −/− cSCCs, but elevated at other locations in the CBS, particularly at position −1/0. Similarly, mutations overall (single and tandem substitutions, etc.) are relatively higher at position −1/0 relative to −4/−3 (see Fig 7D below) in cSCCs. However, because of the overall higher rate of mutations in the XPC−/− tumors, there was relatively little enrichment at these positions in the CBS (see Appendix Fig S11B).

Figure 7. UV‐induced 6‐4PP formation is not induced by CTCF binding at position −4/−3 in the binding motif.

Figure 7

  • A
    Schematic describing the UVDE‐seq method for mapping 6‐4PPs across the human genome. Genomic DNA isolated from UV‐irradiated cells is fragmented by sonication, ligated to the first adapter (green), which contains dideoxy (dd) 3’ ends on one strand. CPD lesions are removed using CPD photolyase and UVA light, and the remaining 6‐4PPs are cleaved using ultraviolet DNA endonuclease (UVDE). The resulting free 3’ OH is then ligated to the second adapter (purple), which contains a random hexanucleotide (NNNNNN) overhang. Following PCR amplification (˜8 cycles), the resulting library is sequenced and mapped to the human genome to identify the locations of 6‐4PPs. Adapted from Laughery et al (2020).
  • B
    UVDE‐seq reads are enriched at dipyrimidine sequences in the UV‐irradiated human skin fibroblasts (500 J/m2; UV 0 h) relative to the no UV control.
  • C
    6‐4PP density is modulated at active CBS in human cells, generally showing lower 6‐4PP levels at CBS.
  • D
    Close‐up of 6‐4PP density at active CBS compared to mutation density in XPC−/− cSCCs. 6‐4PP density is lower throughout the CBS, including position −4/−3, except for position −1/0.
  • E
    Model of how CTCF binding modulates UV damage formation and inhibits repair to promote a mutation hot spot at CBS. In GG‐NER‐deficient (XPC−/− ) tumors, the lack of repair, particularly of 6‐4PPs (which form less frequently at CBS), results in the absence of a mutation hot spot at CBS.

XPC is important for repair of not only CPD lesions but also UV‐induced 6‐4PPs, which are highly mutagenic but repaired very rapidly in repair‐proficient cells. We wondered whether modulation of 6‐4PP formation in CBS might account for the differences in the mutation pattern in XPC−/− cSCCs. To test this hypothesis, we used our recently published UVDE‐seq method (Laughery et al, 2020) to map the genome‐wide distribution of 6‐4PPs in UV‐irradiated human skin fibroblasts. In this protocol (see Fig 7A), CPD lesions are removed using CPD photolyase prior to cleavage of 6‐4PPs using ultraviolet damage endonuclease (UVDE). UVDE‐seq reads were significantly enriched at dipyrimidine sequences in UV‐irradiated human cells relative to the No UV control (Fig 7B), representing UV‐induced 6‐4PP formation.

Analysis of 6‐4PP formation revealed significant modulation of 6‐4PP at active CBS (Fig 7C), even after normalizing for DNA sequence biases in 6‐4PP formation (Appendix Fig S11A). Closer inspection revealed that unlike CPDs, 6‐4PP formation is reduced throughout the CBS, including the −4/−3 mutation hot spot, relative to flanking DNA (Fig 7D and Appendix Fig S11A and B). One exception is position −1/0 in the CBS, which shows relatively higher levels of 6‐4PP formation (Fig 7D), which could explain why there is a peak of mutations at this CBS location in XPC−/− cSCCs. Similarly, a peak in 6‐4PP formation at position −16/−15 outside the CBS correlates with somewhat elevated mutation density at these positions in XPC−/− cSCCs (Fig 7D and Appendix Fig S11B). These findings indicate that unlike CPD lesions, CTCF binding does not induce 6‐4PP formation at position −4/−3 in the CBS, which may in part explain why the mutation hot spot at this position is less apparent in repair‐deficient skin cancers.

Discussion

Somatic mutations in many cancers are elevated at DNA‐binding sites of the CTCF insulator protein. In melanoma and other skin cancers, these mutations are clustered at a unique hot spot in the CTCF‐binding motif, which is located at a conserved CC dinucleotide at the 5' end of the cytosine‐rich strand of the CBS (i.e., position −4/−3 relative to the motif midpoint). Understanding the mutational processes responsible for this hot spot is critical, as it has been reported that some of these may function as driver mutations to promote melanomagenesis (Liu et al2019). It has been previously suggested that CBS mutation hot spots arise from differential inhibition of NER across the CTCF‐binding motif (Poulos et al, 2016). While our re‐analysis of genome‐wide repair data confirms that CTCF binding is associated with decreased NER activity, we find that repair inhibition is not specifically associated with the 5’CC mutation hot spot, but rather occurs generally throughout the CTCF‐binding motif. Instead, our data indicate that CTCF binding likely induces mutations at the 5’CC hot spot by promoting the formation of UV‐induced CPD lesions, both in human cells and in in vitro. We further describe the molecular mechanism by which CTCF binding to DNA promotes its susceptibility to UV damage. In summary, these findings reveal the molecular etiology of recurrent CBS mutations in melanoma and advance our understanding of how DNA‐bound proteins such as CTCF alter UV damage susceptibility and repair.

Our analysis of published genome‐wide XR‐seq data (Adar et al, 2016) indicates that repair of UV‐induced CPD lesions is inhibited at CTCF‐binding sites in human cells, consistent with previous reports (Poulos et al, 2016; Sabarinathan et al, 2016). We further show that lower repair activity at CBS is still apparent even after normalizing for initial UV damage levels. In order for CTCF to inhibit repair, it presumably must remain bound to sites containing UV damage, although this has not been previously tested. Our data indicate that the presence of a CPD lesion in a CBS only marginally reduces CTCF‐binding affinity (˜2‐fold), consistent with this hypothesis. Moreover, we show that CTCF binding inhibits repair of a site‐specific CPD lesion by a model repair enzyme in vitro.

A previous bioinformatic analysis suggested that cohesin binding at CBS might be required to inhibit repair (Poulos et al, 2016). However, our analysis indicates that cohesin binding is highly correlated with active CBS (i.e., CBS located in a DNase I hypersensitivity site [DHS] and likely bound by CTCF). Examination of active CBS lacking an associated cohesin‐binding site revealed repair inhibition and a mutation hot spot (Fig 6C and D). These data are supported by our in vitro data demonstrating that CTCF binding directly inhibits repair, even in the absence of bound cohesin. These studies utilized the T4 endonuclease V glycosylase as a model repair enzyme, so it will be important in future work to examine the effects of CTCF binding on repair by purified NER factors. Structural analysis of the CTCF/DNA complex indicates that CTCF makes relatively few direct contacts with the CPD‐forming cytosine‐rich strand of the CTCF‐binding motif, as it primarily interacts with guanine residues on the opposite strand (Hashimoto et al, 2017). This observation could potentially explain why CTCF is able to remain bound to CPD‐containing binding sites and inhibit their repair.

However, our analysis does not support the model that CPD repair is specifically or differentially inhibited at the 5’CC mutation hot spot. Instead, our analysis indicates that repair activity is inhibited throughout the CTCF‐binding site and that repair inhibition at the 5’CC hot spot is no greater than elsewhere in the binding motif. The discrepancy between our results and the prior study (Poulos et al, 2016) is likely due to the method used to analyze the XR‐seq data. The prior study assigned counts of repair activity to each nucleotide in the XR‐seq read, while we assigned repair activity only to the likely damage‐containing dinucleotide in the XR‐seq read (see Appendix Fig S1A and B). Our method of analysis should significantly increase the resolution of the XR‐seq data and remove the intrinsic asymmetry of the NER excision reaction, which may have affected the previous analysis. Moreover, we show that normalizing for initial damage levels using the related HS‐Damage‐seq method (Hu et al, 2017a) yields a higher normalized repair activity at the 5’CC mutation hot spot relative to other locations in the CTCF‐binding motif, which is again inconsistent with the prior model.

Instead, we propose that the 5’CC mutation hot spot is a consequence of elevated CPD formation at this site (i.e., position −4/−3) in the CBS motif. Both genome‐wide CPD‐seq data and in vitro experiments using purified CTCF protein indicate that CTCF binding induces CPD formation at this position in the CBS motif. Our data also indicate that CTCF binding inhibits CPD formation at a number of other locations in the CBS motif, thereby suppressing mutation rates at these locations. CPD formation is also elevated at the 0/+1 position of the CTCF‐binding motif, even though this position is only slightly enriched (˜1.3‐ to 2‐fold) for somatic mutations in melanoma. This may simply reflect the lower mutability of CT dinucleotides (0/+1 position) relative to CC dinucleotides (−4/−3 position) in human skin cancers (Hayward et al, 2017; Mao et al, 2018) or suggest that other factors may also contribute to elevated mutation rates at the −4/−3 position hot spot. A recent analysis of our published CPD‐seq data (Mao et al, 2018) concluded that UV damage is induced primarily by tryptophan cluster transcription factors (Frigola et al, 2020), which consist almost entirely of ETS family members, while mutation hot spots at other transcription factor‐binding sites (including CTCF) were attributed to repair inhibition. However, closer inspection of their analysis indicates they also observed CPD induction at the −4/−3 mutation hot spot in CBS, similar to our results, although this was not discussed.

Taken together, these findings suggest a new model in which CTCF binding promotes its unique pattern of mutation hot spots through two complementary mechanisms: (i) enhancing initial CPD formation at the specific hot spot site in the binding motif and suppressing CPD formation elsewhere, and (ii) remaining bound to the resulting CPD‐containing sites and inhibiting subsequent repair (Fig 7E). A previous study had argued against this UV damage formation model in favor of differential repair inhibition because the 5’CC mutation hot spot in CBS disappeared in repair‐deficient (XPC−/− ) cSCC (Poulos et al, 2016). However, XPC is not only important for repair of CPDs, but also UV‐induced 6‐4PPs, which though highly mutagenic (Otoshi et al, 2000), are normally repaired very rapidly (e.g., 2‐h half‐life vs. 33‐h half‐life for CPDs) (Young et al, 1996). Our UVDE‐seq data indicate that unlike CPDs, 6‐4PP formation is not elevated at the 5’CC mutation hot spot in CBS. Since 6‐4PPs likely make a much greater contribution to UV mutagenesis in repair‐defective tumors, this can potentially explain why mutation enrichment is not elevated at the 5’CC hot spot in XPC−/− cSCCs (Fig 7E), as UV damage formation as a whole (e.g., CPDs and 6‐4PPs) does not show the same level of enrichment as CPDs alone. GG‐NER also repairs many endogenous DNA lesions, which significantly contribute to mutagenesis in XPC−/− tumors (Volkova et al, 2020), but may not be enriched at the 5’CC hot spot. Finally, it is important to note that the XPC−/− cSCCs (Zheng et al, 2014) were derived from an isolated village in Guatemala (Cleaver et al, 2007), while the WT cSCCs were obtained from a clinical study based in San Francisco (Durinck et al, 2011). Differences in patient age (typically < 10 years old vs. 58–87 years old) and ethnicity—factors that can significantly modulate mutagenesis in skin cells (Saini et al, 2021)—could also affect mutation rates at CBS. These observations suggest that the XPC−/− cSCC mutation data should be analyzed with caution.

We also describe the DNA conformational changes induced by CTCF binding that alter the susceptibility to UV damage formation at different sites in the binding motif. At the 5’CC hot spot site, the adjacent C5–C6 double bonds of the CC dinucleotide have a smaller distance and more favorable alignment upon CTCF binding, which can explain their enhanced susceptibility to UV damage. A similar mechanism is likely responsible for elevated UV damage formation induced by binding of ETS transcription factors (Elliott et al, 2018; Mao et al, 2018; Premi et al, 2019; Roberts et al, 2019), although the magnitude of CPD induction by CTCF binding is less than that observed upon ETS binding to DNA. These findings point to a common set of DNA conformational changes that enhance susceptibility to UV‐induced CPD formation. Notably, the DNA conformation changes accompanying CTCF binding do not promote 6‐4PP formation at the 5’CC hot spot, consistent with the distinct photochemistry of these two photoproducts (Friedberg et al, 2006). It will be important in future studies to determine whether other DNA‐binding proteins utilize similar or alternative mechanisms to modulate UV damage susceptibility and promote mutagenesis in human skin cells.

Materials and Methods

Analysis of somatic mutations in skin cancers

Single nucleotide variants identified in 183 sequenced melanomas (Hayward et al, 2017) were compiled from the International Cancer Genome Consortium (ICGC) release 20 (https://dcc.icgc.org/releases/release_20/Projects/MELA‐AU) and processed as previously described (Mao et al, 2018). Tandem double nucleotide variants from the subset of 140 cutaneous melanoma tumors were also compiled and analyzed. Somatic mutations from eight repair‐proficient and five repair‐deficient (XPC−/− ) cutaneous squamous cell carcinomas (cSCCs) were compiled from Zheng et al (2014) and processed as previously described (Mao et al, 2020). To identify tandem mutations in the WT and XPC −/− cSCCs, filters removing variants supported by reads with more than 1.5 mutations per read and quality sum of mismatches (per read) ≤ 20 were not applied to allow for the inclusion of multiple nucleotide variants in the dataset. This analysis identified 52,295 and 467,833 tandem mutations in the WT and XPC−/− cSCC, respectively. Tandem mutations in the XPC−/− cSCC had a very similar mutation spectrum as cutaneous melanoma, with abundant CC>TT mutations, but also frequent AC>TT and CT>TC tandem mutations (Fig EV5A and B). Tandem mutations in the WT cSCC tumors were primarily CC>TT (˜82%) and show a mutation spectrum similar to cutaneous melanomas (Fig EV5C).

Figure EV5. Spectrum of somatic tandem mutations in cutaneous squamous cell carcinomas (cSCCs) resembles that of cutaneous melanoma.

Figure EV5

  • A
    Count of each class of tandem mutation in five cSCCs with germline mutations in XPC. Only tandem mutation classes with at least 150 mutations are depicted.
  • B
    Same as panel (A), except for 140 cutaneous melanomas.
  • C
    Same as panel (A), except for eight repair‐proficient (WT) cSCCs, and only tandem mutation classes with at least 100 mutations are depicted.

The density of somatic mutations at inactive and active CTCF‐binding sites (CBS) was determined using custom perl scripts (code available upon request), as previously described for ETS‐binding sites (Mao et al, 2018). Known CBS compiled from ENCODE (Consortium EP, 2012) were downloaded from http://funseq.gersteinlab.org/data/ (file: ENCODE.tf.bound.union.bed; (Khurana et al, 2013)), and processed as previously described (Mao et al, 2018). Active CBS were identified by intersecting the set of known CBS with the set of DNase I hypersensitivity sites (DHS) from melanocytes derived from the Epigenome Roadmap Project (Roadmap Epigenomics Consortium et al, 2015), from http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/narrowPeak/ (file: E059‐DNase.hotspot.fdr0.01.peaks.bed.gz). CBS not associated with a melanocyte DHS were labeled as inactive CBS.

To identify CBS associated with cohesin binding, ENCODE data for discovered RAD21 (cohesin subunit)‐binding sites from http://funseq.gersteinlab.org/data/ (file: ENCODE.tf.bound.union.bed; Khurana et al, 2013) were intersected with CBS. CBS with an identified RAD21‐binding site within 50 bp were labeled as cohesin associated (i.e., +Cohesin), while CBS without a RAD21‐binding site within 50 bp were labeled as not cohesin associated (i.e., No Cohesin). It is important to note, however, that the absence of a RAD21‐binding site does not necessarily imply the complete absence of cohesin binding or RAD21 ChIP‐seq signal.

Expected (background) mutation density due to DNA sequence context was determined from the expected mutation density due to the trinucleotide sequence context at each position relative to the CBS motif for the set of active or inactive CBS, as previously described (Mao et al, 2018). The background mutation density at each trinucleotide sequence context was calculated from each tumor mutation dataset (e.g., melanoma, WT cSCC, or XPC−/− cSCC). Expected mutation density due to DNA sequence context was used to normalize the somatic mutation density, in order to account for effects of DNA sequence on mutation frequency in CBS. For tandem mutants and the reanalyzed XPC−/− mutation data (i.e., including newly identified tandem mutations, etc.), the background mutation density was determined for each trinucleotide context in which the mutated base (middle base) was a pyrimidine.

Analysis of UV‐induced CPD formation and repair activity at CBS in human skin cells

CPD‐seq data from UV‐irradiated human skin fibroblasts (Mao et al, 2018) and melanoma cells (Elliott et al, 2018), as well as UV‐irradiated naked DNA controls, were used to analyze UV‐induced CPD formation at CTCF‐binding sites. In addition, one of the naked DNA control CPD‐seq libraries from Mao et al (2018) was re‐sequenced to provide more CPD reads for the naked DNA control samples. Only CPD‐seq reads associated with lesion‐forming dipyrimidine sequences were retained for analysis. CPD lesions were assigned half‐integer positions in‐between the positions of the neighboring pyrimidine bases forming the lesion (i.e., if a lesion occurred at positions 99 and 100, the lesion was assigned a coordinate of 99.5).

The density of CPD lesions at inactive and active CTCF‐binding sites (CBS) was determined using custom perl scripts, as previously described for ETS‐binding sites (Mao et al, 2018). Perl scripts are available upon request. The cellular CPD‐seq data were also normalized using the human genomic DNA samples that were irradiated in vitro (naked DNA control) to account for differences in CPD formation due to DNA sequence context. The cellular CPD‐seq data from the two different studies (Elliott et al, 2018; Mao et al, 2018) were normalized to the matched naked DNA control and then averaged weighted by the total number of CPD‐seq sequencing reads in each study.

XR‐seq CPD reads derived from different repair time points following UV irradiation of human skin fibroblasts (Adar et al, 2016) were downloaded and processed as previously described (Mao et al, 2020). The likely lesion‐containing site was inferred to be located at position 5 and 6 nucleotides from the 3’ end of the XR‐seq, as previously described (Mao et al, 2020), due to enrichment of lesion‐forming dipyrimidine sequences at this position (Appendix Fig S1A and B). The HS‐Damage‐seq data for CPD lesions were obtained from Hu et al (2017a), processed as previously described (Mao et al, 2020), and used to normalize the XR‐seq data to account for differences in initial damage levels. XR‐seq CPD reads from repair‐deficient (XPC−/− ) human fibroblasts (Hu et al, 2015) were downloaded and processed similarly, except duplicate reads were excluded.

UVDE‐seq library preparation and sequencing

The emRiboSeq protocol (Ding et al, 2015) and our CPD‐seq protocol (Mao & Wyrick, 2020) were adapted to create the UVDE‐seq method to map 6‐4PP formation in human skin cells. Normal human fibroblasts (NHF1) were cultured in DMEM + 10% FBS media at 37°C and 5% CO2. Cells were grown to approximately 80% confluence, and media was removed. Cells were then washed with 5 ml Dulbecco’s phosphate‐buffered saline (PBS), which was subsequently removed. 2 ml of fresh PBS was then pipetted onto cells, and cells were exposed to 500 J/m2 UVC light. Following UV exposure, PBS was removed and 2 ml trypsin was added to cells. Once cell adhesions were digested (approximately 5 min), 8 ml of DMEM + 10% FBS was added to neutralize the trypsin. Cells were then transferred to a 15‐ml conical tube and centrifuged at 400×g for 4 min to pellet. Media above the pellet was removed, and cells were stored at −80°C until genomic DNA isolation. Unexposed NHF1 cells (“No UV”) were collected the same way, excluding the UV treatment.

NHF1 genomic DNA was isolated using the GenElute Mammalian Genomic DNA Miniprep Kit (Sigma‐Aldrich, G1N70‐1KT). Following isolation, purified NHF1 genomic DNA was sonicated using a Bioruptor 300 Sonicator (Diagenode, UCD‐300 TM) for 15 cycles (30‐s ON/OFF intervals) to create fragments between 200 and 500bp in length. DNA fragments were then end‐repaired (NEB, E6050L) and dA‐tailed (NEB, E6053L), and the first adapter, a trP1 double‐stranded adapter, was ligated to both ends of the fragments using a quick ligase module (NEB, E6056L). Following PCR confirmation of trP1 ligation, free 3’‐OH groups were blocked with terminal transferase (NEB, M0315L) and dideoxyGTP (Roche Diagnostics, 03732738001). Samples were then treated with E. coli CPD photolyase and incubated under 365 nm UV light for 2 h at room temperature. DNA fragments were purified using a phenol:chloroform:isoamylalcohol (PCI; 25:24:1 pH ˜6.7) extraction, followed by ethanol precipitation. Samples were then treated with T. thermophilus UVDE for 45 min at 55°C. 5’ phosphate groups were removed using shrimp alkaline phosphatase (Affymetrix, AF78390500), and DNA was subsequently denatured at 95°C for 5 min and snap‐cooled on ice. A second double‐stranded adapter, the A adapter, was then ligated to the 3’‐OH created immediately upstream of the cleaved UV lesion (NEB, E6056L). Second adapter ligation was PCR confirmed using a Cy3‐labeled primer complimentary to the A adapter. Each A adapter contains a unique 6‐nucleotide barcode (sequences available upon request) that allows for the pooling of different libraries to be analyzed through multiplexed DNA sequencing, as well as a biotin‐labeled strand. DNA containing the biotin label was purified using Streptavidin beads (Thermo Fisher Scientific, 11205D), while the DNA strand lacking the biotin label was removed using 0.15 M NaOH. The remaining ssDNA was then used as a template for second‐strand synthesis, using the second strand of adapter A as the extension primer. Libraries were PCR amplified for eight cycles using primers complimentary to each adapter. Finally, samples were combined at equal volumes and submitted for Ion Proton sequencing (Life Technologies). The resulting UVDE‐seq reads were mapped to the hg19 genome using Bowtie 2 (Langmead & Salzberg, 2012), and the corresponding dinucleotide damage site was identified, essentially as previously described (Laughery et al, 2020). Only UVDE‐seq reads associated with 6‐4PP‐forming dipyrimidine sequences were retained for analysis. 6‐4PP lesions were assigned half‐integer positions in‐between the positions of the neighboring pyrimidine bases forming the lesion, as described above for CPD lesions. UVDE‐seq data were normalized by calculating the expected UVDE‐seq reads (i.e., 6‐4PP frequency) for the indicated DNA sequences using the frequencies of UVDE‐seq reads associated with TT, TC, CT, and CC dipyrimidines.

Protein expression and purification

A T7‐based expression plasmid encoding a truncated DNA‐binding domain of murine CTCF, termed HisSUMO‐CTCF(F1–F9) was a gift from Gary Stormo via Addgene (plasmid #102859) (Zuo et al, 2017). The HisSUMO tag was required for expression and solubility of the target protein following purification (Zuo et al, 2017). The DNA was transformed into BL21(DE3) pLysS E. coli (Invitrogen) and cultured in LB medium supplemented with 100 µM ZnCl2 at 37°C. Expression was constitutively maintained for 5 h. Cells were harvested by centrifugation at 4,000 × g and lysed by sonication in 0.1 M Tris–HCl (pH 8.0) containing 0.5 M NaCl, 100 µM ZnCl2, 5 mM imidazole, 0.5 mM Tris(2‐carboxyethyl)phosphine (TCEP). The cell lysate was passed through a Ni‐NTA HiTrap column (GE) equilibrated with the same buffer under the control of an ÄKTA start instrument (GE) to extract the HisSUMO‐tagged target, extensively washed, and eluted on an imidazole gradient in buffer (Appendix Fig S12A). Protein purity was judged to be at least 90% by Coomassie‐stained SDS–PAGE (Appendix Fig S12B). Protein concentration was determined by UV absorption at 280 nm using an extinction coefficient of 19,370 M−1 cm−1.

Electrophoretic mobility shift assay and analysis of CPD formation in vitro

The forward and reverse primers of KSR2 CBS and the VAX 1 gene CBS (Integrated DNA Technologies) were used for EMSA. Pyrimidine‐rich CBS containing strand was labeled with ATP(gamma‐32P) (PerkinElmer) using T4 polynucleotide kinase and annealed with the corresponding complementary strand. Increasing amounts of purified CTCF protein (14–800 pmol or 11.3–361.6 pmol) were incubated with labeled KSR2 CBS and VAX 1 gene CBS oligonucleotides (40 pmol) in a 50 µl reaction buffer (25 mM Tris–HCl (pH 7.9), 6 mM MgCl2, 0.5 mM EDTA (pH 8), 60 mM KCl, 10% glycerol, 5 mM DTT, and BSA 160 µg ml−1) on ice for 40 min. Samples (5 µl) were loaded on 8% native polyacrylamide gel, and electrophoresis was carried out at 200 V for 30 min. The gel was then exposed to phosphor screen and imaged using Typhoon scanner FLA7000 (GE Healthcare).

The remaining amount of samples used above for EMSA was spotted (10 µl of four spots) on to a coverslip and exposed to UVC light of approximately 1.8 kJ m−2. The spots were collected; DNA was extracted using phenol:chloroform:isoamyl alcohol and precipitated using 100% ethanol. The precipitated DNA was washed with 70% ethanol, air‐dried, and dissolved in 1X T4 endoV buffer pH 7.2 (100 mM NaCl, 1 mM DTT, 1 mM EDTA, 25 mM Na2HPO4) and incubated with 15 units of T4 endoV (NEB) in a 10 µl volume for 1 h at 37°C. The reaction was stopped by addition 10 µl of 100% formamide and heated at 75°C for 5 min followed by which the samples were loaded on to a pre‐run 15% denaturing urea sequencing gel. Single‐stranded truncated oligos for the KSR2 CBS and VAX1 gene CBS and a 10‐nucleotide (SS10) marker (Simplex Sciences) were labeled with ATP(gamma‐32P) and used as markers. The truncated oligos synthesized from IDT were used for precise location of CPD lesions. Following loading of samples and markers, electrophoresis was carried out at 60 W for 2 h and 10 min. The gel was then exposed to a phosphor screen and imaged using Typhoon scanner FLA7000 (GE Healthcare). This experiment was repeated at least three times for each DNA substrate.

Synthesis of CPD‐containing oligonucleotides

The CPD‐modified oligonucleotide was synthesized via solid‐phase DNA synthesis (0.2 μmol scale, 500 Å LCAA‐CPG support). Standard machine‐assisted protocols were used for incorporation of protected DNA monomers, whereas the cis‐trans thymine dimer was incorporated via hand‐coupling (4,5 dicyanoimidazole, 10 min, anhydrous CH3CN) of a commercially available phosphoramidite (Glen Research). The methyl phosphate group of the latter was removed by treating the solid support with thiophenol/triethylamine/THF (1:2:2) v/v/v for 45 min at room temperature. The support was sequentially washed with THF (1X), methanol (5X), and acetonitrile 3X, dried using argon flow, and treated with 32% aq. ammonia (55°C, 17 h) to ensure deprotection of base‐labile groups and cleavage from support. The resulting crude DMTr‐protected oligo was purified using TOP trityl‐on oligonucleotide purification cartridges (Agilent Technologies). The purity and identity of the synthesized oligo was verified by analytical HPLC (XTerra MS C18 column using a 0.05 M triethylammonium acetate and acetonitrile gradient, > 90% purity; Appendix Fig S13A) and MALDI‐MS analysis (Quadrupole Time‐of‐Flight mass spectrometer, 3‐hydroxypicolinic acid matrix; Appendix Fig S13B), respectively.

T4 endoV repair assay of CPD‐containing CTCF‐binding site

An oligonucleotide containing the KSR2 CBS region was synthesized with a CPD lesion using a cis‐syn thymine dimer phosphoramidite (Glen Research), labeled with ATP(gamma‐32P), and hybridized to an undamaged complementary DNA oligonucleotide. The hybridized lesion‐containing oligonucleotides (40 pmol) were mixed with increasing concentrations of CTCF protein (5–361.6 pmol) in a 50 µl reaction buffer (25 mM Tris‐HCl (pH 7.9), 6 mM MgCl2, 0.5 mM EDTA (pH 8), 60 mM KCl, 10% glycerol, 5 mM DTT, 160 µg/ml bovine serum albumin (BSA)) and incubated on ice for 40 min. 5 µl of each sample (4 pmol) was loaded onto 8% native polyacrylamide gel. Electrophoresis was carried out at 200 V for 30 min and imaged using a Typhoon FLA 7000 scanner. DNA‐protein (1:2.2) complex that showed complete binding was chosen for repair assay using T4 endoV. T4 endoV (NEB) (11 units) was mixed with 21 µl of naked DNA or DNA–protein complex from above and incubated at 37°C. 10 µl of digested sample (8 pmol) was collected at 15 and 30 min of time points. The DNA was extracted using phenol:chloroform:isoamyl alcohol and precipitated using 100% ethanol. The pelleted DNA was washed, dried, and resuspended in formamide (80%) and heated at 95°C for 13 min and loaded on a 12% pre‐run denaturing urea‐gel. Electrophoresis was carried out at 200 V for 30 min. The gel was then exposed to a phosphor screen, and the image was scanned using Typhoon scanner FLA7000 (GE Healthcare). The graph showing cleavage inhibition in the presence and absence of CTCF was generated using GraphPad Prism (three independent replicates).

Competition assay for CTCF binding

KSR2 CBS (pyrimidine‐rich strand) containing oligonucleotide was 5’ end‐labeled with ATP(gamma‐32P) and annealed to the complementary strand. Radiolabeled KSR2 CBS oligonucleotide (4 pmol) was titrated with increasing concentration (0.2–32 pmol) of unlabeled KSR2 CBS oligonucleotides or unlabeled KSR2 CPD‐containing oligonucleotides in the presence of 12 pmol CTCF protein in a 20 μl reaction buffer containing 25 mM Tris–HCl (pH 7.9), 6 mM MgCl2, 0.5 mM EDTA (pH 8), 60 mM KCl, 10% glycerol, 5 mM DTT, and 160 µg/ml BSA. The samples were incubated for 40 min on ice and were loaded onto 8% native polyacrylamide gel. Following electrophoresis as previously mentioned for EMSA, the gel was exposed to phosphor screen and the image was scanned using Typhoon scanner FLA7000. The experiment was repeated at least three times.

Structural analysis of CTCF‐bound DNA

Structural analysis measured the distance (d) and improper torsion angle (η) between adjacent pyrimidine bases in CTCF‐bound DNA structures, as previously described (Law et al, 2008; Mao et al, 2018). PDB IDs for analyzed CTCF‐DNA co‐crystal structures are as follows: 5KKQ, 5T00, 5T0U, and 5YEF.

Molecular dynamic simulations

Explicit‐solvent simulations of a duplex DNA fragment of the KSR2 promoter, and its complex with human CTCF(F3‐F7), were performed using the Amber14sb/parmbsc1 forcefield (Ivani et al, 2016) in version 2020.2 of the GROMACS environment. The free ksr2‐derived duplex, d(TGGTTCCCCCTGGTGGA), was generated in canonical B‐form using 3DNA (Lu & Olson, 2003). The co‐crystal structure 5KKQ was used as the template for the complex by mutating the DNA to match the ksr2 sequence, also using 3DNA. Each of the five C2H2 zinc fingers was assigned manually such that two coordinating His residues projected unprotonated imidazole nitrogen atoms toward Zn2+ ions (residue types HID or HIE in the Amber naming scheme as required), and the two coordinating Cys sidechains were thiolates (residue type CYM). The Zn2+ ions were handled according to the non‐bonded model using the 6–12 parameters provided by the forcefield. The DNA terminal residues were assigned the residue types DT5 and DA3. The protonation states of other ionizable residues in the protein were computed using PROPKA3 (Olsson et al, 2011).

Each system was set up in dodecahedral boxes at least 1.0 nm wider than the longest dimension of the solute, solvated with TIP3P, and neutralized with Na+ and Cl to 0.15 M. Electrostatic interactions were handled by particle‐mesh Ewald summation with a 1‐nm distance cutoff. All simulations were carried out at an in silico temperature and pressure of 298 K (modified Berendsen thermostat) (Bussi et al, 2007) and 1 bar (Parrinello‐Rahman ensemble). A timestep of 2 fs was used, and H‐bonds were constrained using LINCS. After the structures were energy‐minimized by steepest descent, the NVT ensemble was equilibrated at 298 K for 1 ns to thermalize the system, followed by another 1 ns of equilibration of the NPT ensemble at 1 bar and 298 K. The final NPT ensemble was simulated without restraints for 200 ns, recording coordinates every 1 ps. Convergence of the trajectories was checked by RMSD from the energy‐minimized structures, after adjustments for periodic boundary effects. As sanity checks, the five Zn2+ ions, which were closely monitored, remained stably contacted by the C2H2 sidechains throughout the trajectory. Post‐processing and routine analysis of MD trajectories were performed with tools provided in GROMACS. DNA helical parameters were extracted using do_x3DNA and dnaMD (Kumar & Grubmüller, 2015). Specific geometries denoting a predisposition to CPD formation were computed as previously described (Mao et al, 2018). Movies of trajectories were generated using VMD from frames at 0.1‐ns intervals and rendered at 60 frames per second.

Author contributions

SS performed the in vitro CTCF binding, repair, and UV damage formation experiments. BS and JJW performed bioinformatic analysis of damage formation, repair, and mutagenesis at CBS. AVA and GMKP purified recombinant CTCF protein and GMKP performed the molecular dynamics simulations. RGE and PJH prepared and characterized the CPD‐containing oligonucleotide. PM performed the CPD‐seq experiments, and KAB performed the UVDE‐seq experiments. SAR provided the UVDE and CPD photolyase proteins, assisted with the analysis of the competition binding experiments, and helped analyze the XPC mutations. JJW directed the study, and JJW, SS, and GMKP wrote the manuscript with assistance from the other authors. All authors edited the manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

Supporting information

Appendix

Expanded View Figures PDF

Movie EV1

Movie EV2

Acknowledgements

We thank Dr. Michael Smerdon for helpful comments. We are grateful to Weiwei Du and Mark Wildung for Ion Proton sequencing. We thank Benjamin Morledge‐Hampton for bioinformatic assistance. We thank the International Cancer Genome Consortium (ICGC) for making publicly available data on somatic mutations in melanomas and Dr. Raymond Cho for assisting with the cSCC mutation data. This study was supported by funding from National Science Foundation grant MCB2028902 (GMKP) and National Institute of Environmental Health Sciences grants R21ES027937 (JJW and SAR), R21ES029655 (JJW and GMKP), R01ES028698 (JJW), and R01ES032814 (JJW and SAR). KAB was supported by a National Institute of General Medical Sciences training grant (T32GM008336).

The EMBO Journal (2021) 40: e107795

Data availability

The datasets generated in this study are available in the following databases:

References

  1. Adar S, Hu J, Lieb JD, Sancar A (2016) Genome‐wide kinetics of DNA excision repair in relation to chromatin state and mutagenesis. Proc Natl Acad Sci USA 113: E2124–2133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Buisson R, Langenbucher A, Bowen D, Kwan EE, Benes CH, Zou L, Lawrence MS (2019) Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 364: eaaw2872 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bussi G, Donadio D, Parrinello M (2007) Canonical sampling through velocity rescaling. J Chem Phys 126: 014101 [DOI] [PubMed] [Google Scholar]
  4. Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB et al (2016) Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol 34: 155–163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chiba K, Lorbeer FK, Shain AH, McSwiggen DT, Schruf E, Oh A, Ryu J, Darzacq X, Bastian BC, Hockemeyer D (2017) Mutations in the promoter of the telomerase gene TERT contribute to tumorigenesis by a two‐step mechanism. Science 357: 1416–1420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cleaver JE, Feeney L, Tang JY, Tuttle P (2007) Xeroderma pigmentosum group C in an isolated region of Guatemala. J Invest Dermatol 127: 493–496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Consortium EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14: 1188–1190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dickerson RE, Chiu TK (1997) Helix bending as a factor in protein/DNA recognition. Biopolymers 44: 361–403 [DOI] [PubMed] [Google Scholar]
  10. Dickerson RE (1998) DNA bending: the prevalence of kinkiness and the virtues of normality. Nucleic Acids Res 26: 1906–1926 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ding J, Taylor MS, Jackson AP, Reijns MA (2015) Genome‐wide mapping of embedded ribonucleotides and other noncanonical nucleotides using emRiboSeq and EndoSeq. Nat Protoc 10: 1433–1444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Durinck S, Ho C, Wang NJ, Liao W, Jakkula LR, Collisson EA, Pons J, Chan S‐W, Lam ET, Chu C et al (2011) Temporal dissection of tumorigenesis in primary cancers. Cancer Discov 1: 137–143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Elliott K, Boström M, Filges S, Lindberg M, Van den Eynden J, Ståhlberg A, Clausen AR, Larsson E (2018) Elevated pyrimidine dimer formation at distinct genomic bases underlies promoter mutation hotspots in UV‐exposed cancers. PLoS Genet 14: e1007849 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fredriksson NJ, Elliott K, Filges S, Van den Eynden J, Stahlberg A, Larsson E (2017) Recurrent promoter mutations in melanoma are defined by an extended context‐specific mutational signature. PLoS Genet 13: e1006773 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Friedberg EC, Walker GC, Siede W, Wood RD, Schultz RA, Ellenberger T (2006) DNA repair and mutagenesis. Washington, DC: ASM Press; [Google Scholar]
  16. Frigola J, Sabarinathan R, Gonzalez‐Perez A, Lopez‐Bigas N (2020) Variable interplay of UV‐induced DNA damage and repair at transcription factor binding sites. Nucleic Acids Res 49: 891–901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Galindo‐Murillo R, Robertson JC, Zgarbova M, Sponer J, Otyepka M, Jurecka P, Cheatham TE 3rd (2016) Assessing the current state of amber force field modifications for DNA. J Chem Theory Comput 12: 4114–4127 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gonzalez‐Perez A, Sabarinathan R, Lopez‐Bigas N (2019) Local determinants of the mutational landscape of the human genome. Cell 177: 101–114 [DOI] [PubMed] [Google Scholar]
  19. Guo YA, Chang MM, Huang W, Ooi WF, Xing M, Tan P, Skanderup AJ (2018) Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers. Nat Commun 9: 1520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hashimoto H, Wang D, Horton JR, Zhang X, Corces VG, Cheng X (2017) Structural basis for the versatile and methylation‐dependent binding of CTCF to DNA. Mol Cell 66: 711–720 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hayward NK, Wilmott JS, Waddell N, Johansson PA, Field MA, Nones K, Patch A‐M, Kakavand H, Alexandrov LB, Burke H et al (2017) Whole‐genome landscapes of major melanoma subtypes. Nature 545: 175–180 [DOI] [PubMed] [Google Scholar]
  22. Heidenreich B, Kumar R (2017) TERT promoter mutations in telomere biology. Mutat Res 771: 15–31 [DOI] [PubMed] [Google Scholar]
  23. Horn S, Figl A, Rachakonda PS, Fischer C, Sucker A, Gast A, Kadel S, Moll I, Nagore E, Hemminki K et al (2013) TERT promoter mutations in familial and sporadic melanoma. Science 339: 959–961 [DOI] [PubMed] [Google Scholar]
  24. Hu J, Adar S, Selby CP, Lieb JD, Sancar A (2015) Genome‐wide analysis of human global and transcription‐coupled excision repair of UV damage at single‐nucleotide resolution. Genes Dev 29: 948–960 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hu J, Adebali O, Adar S, Sancar A (2017a) Dynamic maps of UV damage formation and repair for the human genome. Proc Natl Acad Sci USA 114: 6758–6763 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hu J, Selby CP, Adar S, Adebali O, Sancar A (2017b) Molecular mechanisms and genomic maps of DNA excision repair in Escherichia coli and humans. J Biol Chem 292: 15588–15597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA (2013) Highly recurrent TERT promoter mutations in human melanoma. Science 339: 957–959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ivani I, Dans PD, Noy A, Pérez A, Faustino I, Hospital A, Walther J, Andrio P, Goñi R, Balaceanu A et al (2016) Parmbsc1: a refined force field for DNA simulations. Nat Methods 13: 55–58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Johnson AT, Wiest O (2007) Structure and dynamics of poly(T) single‐strand DNA: implications toward CPD formation. J Phys Chem B 111: 14398–14404 [DOI] [PubMed] [Google Scholar]
  30. Kaiser VB, Taylor MS, Semple CA (2016) Mutational biases drive elevated rates of substitution at regulatory sites across cancer types. PLoS Genet 12: e1006207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Katainen R, Dave K, Pitkänen E, Palin K, Kivioja T, Välimäki N, Gylfe AE, Ristolainen H, Hänninen UA, Cajuso T et al (2015) CTCF/cohesin‐binding sites are frequently mutated in cancer. Nat Genet 47: 818–821 [DOI] [PubMed] [Google Scholar]
  32. Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A et al (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342: 1235587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kumar R, Grubmüller H (2015) do_x3dna: a tool to analyze structural fluctuations of dsDNA or dsRNA from molecular dynamics simulations. Bioinformatics 31: 2583–2585 [DOI] [PubMed] [Google Scholar]
  34. Langmead B, Salzberg SL (2012) Fast gapped‐read alignment with Bowtie 2. Nat Methods 9: 357–359 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Laughery MF, Brown AJ, Bohm KA, Sivapragasam S, Morris HS, Tchmola M, Washington AD, Mitchell D, Mather S, Malc EP et al (2020) Atypical UV photoproducts induce non‐canonical mutation classes associated with driver mutations in melanoma. Cell Rep 33: 108401 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Law YK, Azadi J, Crespo‐Hernandez CE, Olmon E, Kohler B (2008) Predicting thymine dimerization yields from molecular dynamics simulations. Biophys J 94: 3590–3600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Liu EM, Martinez‐Fundichely A, Diaz BJ, Aronson B, Cuykendall T, MacKay M, Dhingra P, Wong EWP, Chi P, Apostolou E et al (2019) Identification of cancer drivers at CTCF insulators in 1,962 whole genomes. Cell Syst 8: 446–455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Lu XJ, Olson WK (2003) 3DNA: a software package for the analysis, rebuilding and visualization of three‐dimensional nucleic acid structures. Nucleic Acids Res 31: 5108–5121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Mao P, Brown AJ, Esaki S, Lockwood S, Poon GMK, Smerdon MJ, Roberts SA, Wyrick JJ (2018) ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat Commun 9: 2626 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Mao P, Smerdon MJ, Roberts SA, Wyrick JJ (2020) Asymmetric repair of UV damage in nucleosomes imposes a DNA strand polarity on somatic mutations in skin cancer. Genome Res 30: 12–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Mao P, Wyrick JJ (2019) Organization of DNA damage, excision repair, and mutagenesis in chromatin: a genomic perspective. DNA Repair (Amst) 81: 102645 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Mao P, Wyrick JJ (2020) Genome‐wide mapping of UV‐induced DNA damage with CPD‐seq. Methods Mol Biol 2175: 79–94 [DOI] [PubMed] [Google Scholar]
  43. Merkenschlager M, Nora EP (2016) CTCF and Cohesin in genome folding and transcriptional gene regulation. Annu Rev Genomics Hum Genet 17: 17–43 [DOI] [PubMed] [Google Scholar]
  44. Olsson MHM, Søndergaard CR, Rostkowski M, Jensen JH (2011) PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J Chem Theory Comput 7: 525–537 [DOI] [PubMed] [Google Scholar]
  45. Otoshi E, Yagi T, Mori T, Matsunaga T, Nikaido O, Kim ST, Hitomi K, Ikenaga M, Todo T (2000) Respective roles of cyclobutane pyrimidine dimers, (6–4)photoproducts, and minor photoproducts in ultraviolet mutagenesis of repair‐deficient xeroderma pigmentosum A cells. Cancer Res 60: 1729–1735 [PubMed] [Google Scholar]
  46. Pant V, Kurukuti S, Pugacheva E, Shamsuddin S, Mariano P, Renkawitz R, Klenova E, Lobanenkov V, Ohlsson R (2004) Mutation of a single CTCF target site within the H19 imprinting control region leads to loss of Igf2 imprinting and complex patterns of de novo methylation upon maternal inheritance. Mol Cell Biol 24: 3497–3504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Polak P, Lawrence MS, Haugen E, Stoletzki N, Stojanov P, Thurman RE, Garraway LA, Mirkin S, Getz G, Stamatoyannopoulos JA et al (2014) Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nat Biotechnol 32: 71–75 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Poulos RC, Thoms JA, Guan YF, Unnikrishnan A, Pimanda JE, Wong JW (2016) Functional mutations form at CTCF‐Cohesin binding sites in melanoma due to uneven nucleotide excision repair across the motif. Cell Rep 17: 2865–2872 [DOI] [PubMed] [Google Scholar]
  49. Premi S, Han L, Mehta S, Knight J, Zhao D, Palmatier MA, Kornacker K, Brash DE (2019) Genomic sites hypersensitive to ultraviolet radiation. Proc Natl Acad Sci USA 116: 24196–24205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Rao S, Huntley M, Durand N, Stamenova E, Bochkov I, Robinson J, Sanborn A, Machol I, Omer A, Lander E et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159: 1665–1680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Rheinbay E, Nielsen MM, Abascal F, Wala JA, Shapira O, Tiao G, Hornshoj H, Hess JM, Juul RI, Lin Z et al (2020) Analyses of non‐coding somatic drivers in 2,658 cancer whole genomes. Nature 578: 102–111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Roadmap Epigenomics C, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi‐Moussavi A, Kheradpour P, Zhang Z, Wang J et al (2015) Integrative analysis of 111 reference human epigenomes. Nature 518: 317–330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Roberts SA, Brown AJ, Wyrick JJ (2019) Recurrent noncoding mutations in skin cancers: UV damage susceptibility or repair inhibition as primary driver? BioEssays 41: e1800152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Sabarinathan R, Mularoni L, Deu‐Pons J, Gonzalez‐Perez A, Lopez‐Bigas N (2016) Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532: 264–267 [DOI] [PubMed] [Google Scholar]
  55. Saini N, Giacobone CK, Klimczak LJ, Papas BN, Burkholder AB, Li JL, Fargo DC, Bai R, Gerrish K, Innes CL et al (2021) UV‐exposure, endogenous DNA damage, and DNA replication errors shape the spectra of genome changes in human skin. PLoS Genet 17: e1009302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Schieferstein U, Thoma F (1998) Site‐specific repair of cyclobutane pyrimidine dimers in a positioned nucleosome by photolyase and T4 endonuclease V in vitro. EMBO J 17: 306–316 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Schreier WJ, Gilch P, Zinth W (2015) Early events of DNA photodamage. Annu Rev Phys Chem 66: 497–519 [DOI] [PubMed] [Google Scholar]
  58. Umer HM, Cavalli M, Dabrowski MJ, Diamanti K, Kruczyk M, Pan G, Komorowski J, Wadelius C (2016) A significant regulatory mutation burden at a high‐affinity position of the CTCF motif in gastrointestinal cancers. Hum Mutat 37: 904–913 [DOI] [PubMed] [Google Scholar]
  59. Volkova NV, Meier B, González‐Huici V, Bertolini S, Gonzalez S, Vöhringer H, Abascal F, Martincorena I, Campbell PJ, Gartner A et al (2020) Mutational signatures are jointly shaped by DNA damage and repair. Nat Commun 11: 2169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Yin M, Wang J, Wang M, Li X, Zhang M, Wu Q, Wang Y (2017) Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites. Cell Res 27: 1365–1377 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Young AR, Chadwick CA, Harrison GI, Hawk JL, Nikaido O, Potten CS (1996) The in situ repair kinetics of epidermal thymine dimers and 6–4 photoproducts in human skin types I and II. J Invest Dermatol 106: 1307–1313 [DOI] [PubMed] [Google Scholar]
  62. Yuan S, Zhang W, Liu L, Dou Y, Fang W, Lo GV (2011) Detailed mechanism for photoinduced cytosine dimerization: a semiclassical dynamics simulation. J Phys Chem A 115: 13291–13297 [DOI] [PubMed] [Google Scholar]
  63. Yusufzai TM, Tagami H, Nakatani Y, Felsenfeld G (2004) CTCF tethers an insulator to subnuclear sites, suggesting shared insulator mechanisms across species. Mol Cell 13: 291–298 [DOI] [PubMed] [Google Scholar]
  64. Zheng CL, Wang NJ, Chung J, Moslehi H, Sanborn JZ, Hur JS, Collisson EA, Vemula SS, Naujokas A, Chiotti KE et al (2014) Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep 9: 1228–1234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Zuo Z, Roy B, Chang YK, Granas D, Stormo GD (2017) Measuring quantitative effects of methylation on transcription factor‐DNA binding affinity. Sci Adv 3: eaao1799 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Appendix

    Expanded View Figures PDF

    Movie EV1

    Movie EV2

    Data Availability Statement

    The datasets generated in this study are available in the following databases:


    Articles from The EMBO Journal are provided here courtesy of Nature Publishing Group

    RESOURCES