Summary
DNA becomes single-stranded (ssDNA) during replication, transcription, and repair. Transiently formed ssDNA segments can adopt alternative conformations, including cruciforms, triplexes, and quadruplexes. To determine whether there are stable regions of ssDNA in the human genome, we utilized S1-END-seq to convert ssDNA regions to DNA double-strand breaks, which were then processed for high-throughput sequencing. This approach revealed two predominant non-B DNA structures: cruciform DNA formed by expanded (TA)n repeats that accumulate in microsatellite unstable human cancer cell lines and DNA triplexes (H-DNA) formed by homopurine/homopyrimidine mirror repeats common across a variety of cell lines. We show that H-DNA is enriched during replication, that its genomic location is highly conserved, and that H-DNA formed by (GAA)n repeats can be disrupted by treatment with a (GAA)n-binding polyamide. Finally, we show that triplex forming repeats are hotpots for mutagenesis. Our results identify dynamic DNA secondary structures in vivo that contribute to elevated genome instability.
eTOC Blurb
Matos-Rodrigues et al. use S1-END-seq to reveal non-B DNA secondary structures including cruciforms and triplexes formed in vivo. DNA triplexes form during DNA replication, are enhanced in cancer cells, and are hotspots for mutagenesis. This study provides a foundation for research into the biology of non-B DNA structures.
Graphical Abstract

Introduction
Following the discovery of Z-DNA in 1979, it became evident that DNA repeats can adopt multiple structures that are radically different from the typical right handed double-helical DNA, B-DNA (Mirkin, 2008). Besides Z-DNA, the best studied examples of such alternative DNA structures are cruciform DNA formed by inverted repeats (Murchie and Lilley, 1992), triplexes (H-DNA) formed by homopurine-homopyrimidine (hPu/hPy) mirror repeats (Mirkin and Frank-Kamenetskii, 1994), G-quadruplexes (G4) formed by orderly spaced Gn runs (Maizels and Gray, 2013) and hairpins and/or slipped-strand DNA formed by direct tandem repeats (Sinden et al., 2007). The structural, biophysical, and biochemical characteristics of these structures have been well documented in vitro over recent decades. One fundamental similarity between these structures is that they are thermodynamically unfavorable in linear DNA at physiological conditions, but are formed under conditions favoring DNA unwinding, such as negative supercoiling(Mirkin, 2001). Furthermore, the kinetics of non-B DNA structure formation depend on the sequence composition of DNA, exemplified by observations that AT-rich sequences are more prone to unwinding and forming structures than GC-rich sequences (Murchie and Lilley, 1987).
Alternative DNA structures are not steadily present in genomic DNA, but rather form transiently during the course of various genetic processes including DNA replication, repair and transcription (DePamphilis and Wassarman, 1980; Khristich et al., 2020; Krasilnikova and Mirkin, 2004; Liu and Wang, 1987). This dynamic nature of alternative DNA structures makes it particularly challenging to capture them and thus directly prove their existence in vivo in large eukaryotic genomes. Several approaches have been used thus far for the detection of dynamic non-B DNA structures in vivo. Antibodies have been raised to interact with and thereby detect specific DNA structures within fixed cells. For example, triplex DNA-recognizing antibodies were shown to bind many locations in the human genome (Agazie et al., 1996; Agazie et al., 1994); however, this approach was not able to identify where the H-DNA was present at nucleotide resolution genome-wide. Another approach has relied upon chemicals that specifically modify alternative DNA structures, such as chloroacetaldehyde, potassium permanganate, or osmium tetroxide, to identify regions in genomes where non-B DNA forms. While this method was successful in detecting dynamic DNA structures in bacterial plasmids (Dayn et al., 1992; Kohwi and Panchenko, 1993), it has been difficult to apply to eukaryotes. One study, combining potassium permanganate treatment with S1 nuclease footprinting followed by next-generation sequencing (ssDNA-seq) to detect dynamic non-B DNA structures in the genome of mouse B cells, revealed 17,000 sites of H-DNA in the genome of mouse B cells, the prevalence of which correlated with transcription level (Kouzine et al., 2017). The results from ssDNA-seq were inconsistent with another recent report that utilized direct digestion of the genome with S1 nuclease followed by next-generation sequencing (S1-seq) to detect sites of H-DNA at (TC)n repeats within mouse B cells (Maekawa et al., 2022). Notably, Maekawa et al. did not distinguish whether H-DNA, as detected by S1-seq, formed in vivo or during sample processing(Maekawa et al., 2022).
Here, we use S1-END-seq to demonstrate two types of dynamic DNA structures in living human cells: DNA cruciforms formed at (TA)n repeats and H-DNA formed at long homopurine-homopyrimidine mirror repeats. (TA)n cruciform structures accumulate uniquely in human cancer cell lines characterized by microsatellite instability (MSI). H-DNA, in contrast, is detected across a variety of human cell lines at conserved locations. Triplex formation is enriched in the S-phase of actively dividing cells, enhanced in transformed cells compared to primary cell lines and formation of triplexes consisting of GAA repeats can be inhibited by a (GAA)n-binding polyamide. Moreover, replication stress induces mutagenesis at triplex forming repeats. Our study thereby provides a foundation for research into the biology of non-B DNA structures in human cells.
Results
Accumulation of DNA cruciform secondary structures in microsatellite unstable cancer cells
Recently, we found that (TA)n dinucleotide repeats undergo large-scale expansion in cancer cells with microsatellite instability (MSI) (van Wietmarschen et al., 2020). Depletion of the Werner helicase (WRN) in MSI cells leads to the formation of highly-resected DSBs at (TA)n repeats as detected by END-seq (van Wietmarschen et al., 2020). We proposed that expanded (TA)n repeats form non-B DNA structures that require WRN for their resolution. In the prescence of WRN, (TA)n repeats were susceptible to in-situ digestion with purified, recombinant MUS81-EME1(van Wietmarschen et al., 2020), a structure-specific nuclease which cleaves cruciform structures into DSBs by a nick and counter-nick mechanism (Ehmsen and Heyer, 2008; Gaillard et al., 2003).
To provide further evidence that expanded (TA)n repeats form non-B DNA cruciform structures in MSI cancer cells, we treated MSI cell DNA in agarose plugs with the ssDNA-specific S1 nuclease, which opens hairpins and cleavages diagonally at the four-way junction of a cruciform to generate two-ended DSBs (Haniford and Pulleyblank, 1985; West et al., 1987) (Figure 1A). Following S1 treatment, DNA ends were captured by END-seq (Canela et al., 2016; Wu et al., 2021) (Figure 1A). We compared S1-END-seq signal in microsatellite unstable (MSI: RKO, KM12, SW48) and microsatellite stable (MSS: SW620 and SW837) colon cancer cell lines (Figure 1B). MSI cell lines exhibited 5.8 times more S1-END-seq peaks at (TA)n repeats compared to MSS cell lines (Figures 1B and 1C). S1-END-seq peaks at (TA)n repeats were symmetrical surrounding the (TA)n sequence, with reads equally accumulating on the plus- (right end) and minus-strand (left end), as would be expected of S1-derived two-ended DSBs (Figure 1B). Moreover, S1-END-seq peaks detected in WRN-proficient cells overlapped significantly with the subset of expanded (TA)n repeats that underwent massive breakage and resection in the absence of WRN (Figures 1B and 1D; Figure S1A). Thus, structure-forming (TA)n repeats detectable by S1-END-seq in WRN-proficient MSI cells predict sites of breakage in WRN-deficient MSI cells.
Figure 1: S1-END-seq reveals cruciform structures at expanded (TA)n repeats in microsatellite unstable cells.

(A) Schematic representation of the S1-END-seq method. Cells are embedded in agarose, S1 endonuclease converts ssDNA gaps/breaks into DSBs and the DSB ends are ligated to biotinylated adaptors. After DNA sonication, DSBs are captured by streptavidin magnetic beads, Illumina sequencing adaptors are added to the DNA ends, and the samples are subject to sequencing. Left end reads are aligned to minus stand and right end reads are aligned to the plus strand. A typical two-ended DSB is displayed. (B) Genome browser screenshots as normalized read density (reads per million, RPM) for END-seq in KM12 cells after WRN knockdown (shWRN) for 48h (top track) and S1-END-seq in MSI (KM12, SW48 and RKO) and MSS (SW620 and SW837) colon cancer cell lines (2nd to 6th tracks). Plus- and minus-strand reads are displayed in black and grey, respectively. MSI: microsatellite unstable. MSS: microsatellite stable. Black triangle represent the plus strand repeat annotation. Statistical analysis: Student’s t-test, *p<0,05. (C) Number of (TA)n repeats detected at S1-END-seq peaks in each cell line. (D) Venn-diagram comparing END-seq breaks at (TA)n repeats in KM12 cells after WRN knockdown and S1-END-seq peaks at (TA)n repeats in WRN-proficient KM12 cells.
S1-END-seq peaks in hPu-hPy repeats
In addition to observing S1-END-seq signal clustering at (TA)n repeats in MSI cancer cells, we found that approximately one half of S1-END-seq peaks localized to homopurine/homopyrimidine (hPu/hPy) mirror repeats that are known to form triplex H-DNA (Figure 2A) (Mirkin and Frank-Kamenetskii, 1994). Importantly, these peaks were undetectable by END-seq without prior S1 treatment (Figures 2B and 2C), indicating the signal was originating from S1 cleavage of ssDNA structures and not from DSBs within the cell. This is distinct from END-seq signal detected at bona fide DSBs in MSI cells depleted of WRN (Figure 1 and Figure S1). Therefore, the peaks at hPu/hPy mirror repeats correspond to structures with S1-cleavable ssDNA regions rather than DSBs.
Figure 2: S1-END-seq reveals S1-sensitive homopurine/homopyrimidine (hPu/hPy) repeats genome wide.

(A) Number of S1-END-seq peaks at hPu/hPy repeats (red), (TA)n repeats (grey) and other peaks (black) in MSI (KM12, SW48 and RKO) and MSS (SW620 and SW837) colon cancer cell lines. (B) Quantification of S1-END-seq vs. END-seq intensities (RPKM, reads per kilobase per million mapped reads) at hPu/Py repeat peaks in two independent experimental replicates in KM12 cells performed in parallel. The top, center mark, and bottom hinges of the box plots, respectively, indicate the 90th, median, and 10th percentile values. Statistical analysis: Wilcoxon rank sum test, **** p<0,0001. (C) Genome browser screenshots as normalized read density (reads per million, RPM) for S1-END-seq and END-seq in KM12 cells. Plus- and minus-strand reads are displayed in black and grey, respectively.
S1-END-seq peaks at (TA)n repeats in MSI cells exhibit symmetrical plus and minus strand reads, corresponding to the left and right ends of a cleaved cruciform DNA structure (Figure 1 and Figure S1B). Peaks at hPu/hPy mirror repeats were distinct in that they exhibited a consistent strand polarity (Figures 3A and 3B; Figure S1B). When homopyrimidine or homopurine repeats were on the plus strand, asymetrical plus and minus S1-END-seq peaks were detected, respectively (Figures 3A and 3B; Figure S1C). Sequencing reads were distributed from the center to the edge of the repeats (Figure 3B, left panel), but the intensity of the peaks was greatest near the border, with the 5’ end of reads (the first nucleotide sequenced) also clustering near the border (Figure 3B, right panel). This distribution was not caused by low mappability within the center of the repeat as S1-END-seq using paired-end sequencing across the repeats revealed an identical pattern (Figure S2). Homopurine runs on the plus strand showed an S1-END-seq peak enrichment flanking the repeat upstream, while homopyrimidine runs on the plus strand showed an opposite pattern with enrichment downstream of the repeats (Figure 3B and Figure S2). Thus, in addition to having strand polarity, S1-END-seq reads associated with the hPu/Py repeats map at the border of these structures.
Figure 3: S1-END-seq peaks in hPu/hPy mirror repeats display asymmetric strand polarity.

(A) Representative genome browser screenshots as normalized read density (reads per million, RPM) for S1-END-seq peaks at hPu/hPy repeats (GAAA and TTTC) in five different colon cancer cell lines KM12, SW48, RKO, SW620 and SW837. Black triangle represent the plus strand repeat annotation. Plus- and minus-strand reads are displayed in black and grey, respectively. (B) Aggregate plots (top) and heatmaps (bottom) of S1-END-seq intensity flanking 500bp at the center of S1 sensitive hPu/hPy mirror repeats in KM12. The data displays S1-ENDseq intensity using full read length (left) or using the first (5’) nucleotide sequenced (right). (C) Schematic representation of potential H-DNA structures (H-r5 and H-y3) that are consistent with the strand bias observed in S1-END-seq peaks. Homopurine (hPu) mirror repeats are represented in red and homopyrimidine (hPy) mirror repeats are represented in blue.
S1-END-seq signal strand bias at hPu/hPy repeats was consistent across all five MSI or MSS colon cancer cell lines (Figure S1C). Moreover, the genomic location of the peaks was highly conserved in the different cell lines (Figure S3A). The number of peaks at hPu/hPy mirror repeats in individual cell lines ranged from 5,554 to 9,474, of which 3,110 were shared across all five cell lines (Figure S3A). The peaks shared between the 5 cell lines are hereafter referred to as “S1-sensitive” H-motifs. While annotated hPu/hPy mirror repeats are abundant in the human reference genome (approximately 50,000), S1-sensitive H-motifs mapped to significantly longer repeats, averaging 202 bp (Figure S3B). S1-sensitive sites also tended to have fewer sequence interruptions in the H-motifs suggesting that both hPu/hPy length and purity contribute to structure formation (Figure S3C). Among all H-motifs, (GAAA)n/(TTTC)n were most sensitive to S1 cleavage, both in terms of the absolute abundance of S1-END-seq peaks at these repeats, as well as relative to the total number of repeats within each hPu/hPy class, followed by (GGAA)n/(TTCC)n and (GAA)n/(TTC)n (Figure S3D). Finally, S1-senstive sites tended to be late replicating and enriched in distal intergenic regions (Figure S3E and F).
Homopurine-homopyrimidine mirror repeats form DNA triplexes in vivo
hPu/hPy mirror repeats can adopt intramolecular triple helical DNA structures called H-DNA in vitro (Frank-Kamenetskii and Mirkin, 1995), and the likelihood of H-DNA formation increases with the increasing length of an hPu/hPy mirror repeat (Lyamichev et al., 1989). Within this structure, a DNA strand corresponding to one half of the repeat folds back and anneals to the duplex half of the repeat forming a triplex structure. Importantly, the strand, which is complementary to the triplex strand, becomes single-stranded and succeptible to cleavage by the S1 nuclease (Frank-Kamenetskii and Mirkin, 1995; Mirkin et al., 1987),(Sakamoto et al., 2001; Wells et al., 1988). After complete S1 cleavage, only one end of the structure would become double-stranded (the other end would have triplex DNA), and thus during END-seq, a sequencing adapter would only be ligatable to one DNA end. Therefore, our findings that S1-END-seq yields asymmetric single-ended peaks localized to just one of long, uninterrupted hPu/hPy mirror repeats suggesting the presence of H-DNA at these sites (Figure 3C).
DNA triplexes consist of either one pyrimidine and two purine strands (YR*R triplex) or one purine and two pyrimidine strands (YR*Y) (Figure 3C). Two isoforms of H-DNA have also been observed (Mirkin and Frank-Kamenetskii, 1994): one single-stranded in the 5’ part of the purine or pyrimidine strand (H-r3 or H-y3, respectively) and the other single-stranded in the 3’ part of the corresponding strands (H-r5 or H-y5, respectively). The polarity of the observed S1-END-seq signal would be consistent with either H-r5 or H-y3 (Figure 3C).
To quantify the number of H-DNA structures per cell, mixed in a 5:1 ratio our asynchronous KM12 cells with a spike-in mouse pre-B cell line, which contains a single zinc-finger-induced DSB at the TCRβ enhancer (Canela et al., 2016). These spike-in cells were mixed at a 1:5 ratio with asynchronous KM12 cells. In two independent experiments, we estimated that there were 312 and 270 S1-END-seq peaks at hPu/hPy repeats per cell, revealing that hundreds of H-DNA structures can be present in a given cell.
Evidence that H-DNA form in vivo
Acidic pH can fuel the formation of H-DNA due to cytosine protonation (Mirkin and Frank-Kamenetskii, 1994). Because acidic pH is necessary for S1 nuclease activity, one possibility is that these structures form ex vivo during S1 treatment in the agarose plug. To assess this, we developed a new protocol using the P1 nuclease which can process ssDNA at neutral pH. This method, which we refer to as P1-END-seq, revealed thousands of peaks in hPu/hPy repeats that overlap with S1-END-seq peaks and have a similar repeat profile (Figure S4A–D). Therefore, acidic pH, though favoring H-DNA formation in vitro, is not a cause of triplex formation by AT-rich hPu/hPy repeats that we detect in vivo.
To further test whether S1-END-seq reveals H-DNA structures formed in cells as opposed to ex vivo, we chemically interfered with repeat folding into H-DNA in vivo. Friedreich’s ataxia (FRDA) is caused by the expansion of a (GAA)n/(TTC)n repeat in the first intron of the FXN gene, which encodes for the mitochondrial protein frataxin (Campuzano et al., 1996). Unaffected humans harbor 8 to 34 repeats, while affected individuals have more than 70 and very often hundreds of repeats (Filla et al., 1996). Lymphoblasts derived from a FRDA patient (GM15850), but not an unaffected sibling (GM15851) (Figure 4A), harbored S1-END-seq peaks in the expanded (GAA)n repeat within intron 1 of the FXN locus (Figure 4B). Thus, H-DNA is associated with pathological (GAA)n expansion.
Figure 4: H-DNA formation in (GAA)n repeats are suppressed by (GAA)n binding polyamide in vivo.

(A) Schematic representation of (GAA)n repeat size within the first intron of the FXN locus in lymphoblasts cell lines derived from a FRDA patient (GM15850) and its unaffected sibling (GM15851). (B) Analysis of cell cycle distribution by EdU (S-phase) and DAPI (nucleus) staining after the treatment with the polyamide PA1 (1μM) or vehicle (DMSO) for 48 hours. (C) Genome browser screenshots shown as RPM, reads per million for S1-END-seq at the FXN intron 1 of GM15851 and GM15850 treated with PA1 or DMSO for 48 hours. Plus- and minus-strand reads are displayed in black and grey, respectively. Black triangle represent the plus strand repeat annotation. (GAA)n repeat annotated in reference genome is shown. (D) Quantitative analysis of reads per kilo million (RPKM) of S1-END-seq in peaks in (GAA)n repeats from GM15851 or GM15850 cells treated with PA1 or DMSO for 48 hours. The top, center mark, and bottom hinges of the box plots, respectively, indicate the 90th, median, and 10th percentile values. Statistical analysis: Wilcoxon rank sum test, **** p<0,0001.
Beta-alanine-linked pyrrole-imidazole polyamide are known to bind (GAA)n tracts in B-DNA with high affinity and disrupt the DNA-DNA annealing necessary for triplex formation in long, uninterrupted (GAA)n repeats (Burnett et al., 2006),(Erwin et al., 2017). We treated cells with the (GAA)n-binding polyamide PA1 for 48 hours and then washed extensively before processing by S1-END-seq. PA1 decreased the intensity of S1-END-seq peaks at the FXN locus in FRDA cells (Figure 4C) without affecting cell cycle distribution (Figure 4B). In addition, PA1 caused a genome-wide decrease of S1-END-seq peaks at (GAA)n repeats in both FRDA cells and healthy donors (Figure 4C and Figure S4E). In contrast, PA1 had only minimal impact on S1-END-seq peaks at (GA)n, (GAAA)n and (GGAA)n repeats, to which it does not bind (Figure S4E). The finding that disruption of (GAA)n triplexes by PA1 in cells alters S1-END-seq signal at these repeats provides additional evidence that H-DNA is formed in vivo.
DNA triplexes are replication-dependent and cell-specific structures
H-DNA formation is thermodynamically unfavorable in linear double-stranded DNA, but becomes favorable during DNA replication (Krasilnikova and Mirkin, 2004; Samadashwily et al., 1993). To examine whether DNA replication contributes to H-DNA formation, we first treated cells with aphidicolin (APH) which induces replication stress and enriches the proportion of S phase cells (Figure 5A). APH-treated cells showed an increase in S1-END-seq signal intensity and an increase in the number of peaks in hPu/hPy repeats (Figures 5A; Figure S5A and B). Next, we treated cells with either CDK4/6 or CDK1 inhibitors to arrest cells in G1 or G2, respectively. (Figures 5A and 5B). We observed a 4.1-fold decrease of H-DNA peak intensity in G1 and a 2.4-fold decrease in G2-arrested cells (Figures 5A and 5B), suggesting that H-DNA formation is cell cycle dependent being predominantly formed during the S-phase. We also compared G1 and asynchronous MCF10A, GM15850 and SW48 cell lines and found that G1 arrest reduced the S1-END-seq signal at hPu/hPy peaks in all cases (Figure S5C).
Figure 5: H-DNA is formed during replication.

(A, B and C) Analysis of cell cycle distribution by EdU (S-phase) and DAPI (nucleus) staining (left panel) and quantification of S1-END-seq peaks in hPu/hPy mirror repeats (right panel) after the treatment with (A) aphidicolin (APH, 600nM) (B) CDK4/6 inhibitor (Palbociclib, 10μM) or (C) CDK1 inhibitor (RO-3306, 10μM) or or vehicle (DMSO) for 24 hours. Experiments were performed in KM12 cells. The top, centre mark, and bottom hinges of the box plots, respectively, indicate the 90th, median, and 10th percentile values. Statistical analysis: Wilcoxon rank sum test, **** p<0,0001.
To determine whether H-DNA formation is generally associated with replication, we conducted S1-END-seq in human induced pluripotent stem cells (iPSC) undergoing differentiation into glutamatergic neurons (i3 Neurons) (Fernandopulle et al., 2018; Wang et al., 2017) (Figure 6A). After 5 days, differentiating iPSCs exit the cell cycle and have fully differentiated to i3Neurons by day 7 (Fernandopulle et al., 2018; Wang et al., 2017; Wu et al., 2021). Interestingly, distinct from the cancer cell lines analyzed, asynchronously growing iPSCs showed virtually no H-DNA S1-END-seq signal (Figure 6B and C). Nevertheless, induction of neuronal differentiation led to a transient generation of thousands of H-DNA peaks which gradually disappeared as cells continued to differentiate and ultimately exited the cell cycle (Figure 6B and C). These results demonstrate that H-DNA is not a feature of all proliferating cells, but that they are dynamically regulated during the cell cycle and differentiation, further supporting the conclusion that H-DNA is formed in vivo.
Figure 6: DNA triplexes are transiently created during the induction of iPSC differentiation into neurons.

(A) Schematic representation of human induced pluripotent stem cells (iPSC) differentiation and cell cycle exit upon neuronal induction via i3N protocol. (B) Quantification of peaks at hPu/hPy repeats in reads per kilo million (RPKM) and (C) genome browser screenshots shown as RPM, reads per million from S1-END-seq performed in asynchronous iPSCs and iPSCs after induction of neuronal differentiation via i3N protocol for 1, 2, 3 or 5 days. Plus- and minus-strand reads are displayed in black and grey, respectively. (D) (left) S1-END-seq genome browser screenshots shown as RPM, reads per million and (right) quantification of peaks at hPu/hPy repeats in reads per kilo million (RPKM) in primary normal human epithelial keratinocytes (NHEK) and in the transformed cell line derived from human epithelial keratinocytes- HACAT. Plus- and minus-strand reads are displayed in black and grey, respectively.
All asynchronous transformed cell lines that we tested showed high levels of S1-END-seq signal at triplex-forming sequences, while iPSCs did not (see above). We speculated that triplex formation might be associated with high levels of replication stress in transformed cell lines (Kotsantis et al., 2018). To address this, we first performed S1-END-seq in primary human skin fibroblasts, which also showed low levels of triplex formation (Figure 6B). Then, we compared S1-END-seq signal in asynchronous primary normal human epidermal keratinocytes (NHEK) and their transformed counterpart (HACAT) (Boukamp et al., 1988). Similar to primary human skin fibroblasts and iPSCs, NHEK cells harbored virtually no triplex S1-END-seq signal (Figure 6D). In contrast, HACAT cells showed an approximately 15-fold increase in triplex-associated S1-END-seq peaks compared to NHEK cells (Figure 6D). Taken together, these data indicate that triplex formation may be enhanced in transformed cells, although they can also form in primary cells (eg. iPSCs at days 1–3 of neuronal differentiaton).
DNA triplexes are hotspots of genome instability
Alternative DNA structures are known to promote genome instability (Wang and Vasquez, 2014). Recently, whole genome sequencing of 10 cancer types showed somatic mutation elevated specifically within H-DNA motifs (Georgakopoulos-Soares et al., 2018). Consistent with this, we observed a significant enrichment of cancer somatic mutations from the International Cancer Genome Consortium dataset (Consortium, 2020) at S1-sensitive H-motifs. S1-sensitive hPu/Py DNA sequences harbored more mutations compared to hPu/Py DNA sequences insensitive to S1 treatment (Figure 7A). Thus, there is increased mutability at sites that form H-DNA in vivo.
Figure 7: H-DNA forming repeats are hotspots for genome instability.

(A) Aggregate plot comparing the frequency of somatic single nucleotide variation (SNV) in cancer genomes from the International Cancer Genome Consortium at S1-sensitive H-motifs (shared peaks- see Figure S3) versus S1-insensitive H-motifs (annotated hPu/hPy repeats excluding peaks detected by S1 in the 5 colon cancer cell lines) relative to the center of the hPu/hPy repeats. (B) RPE-MLH1 knockout cells were plated on 10 cm plates and treated next day with 200 nM of APH for 24 hours. Cells were allowed to recover in APH free medium for two to three days. This cycle of APH treatment was repeated 20 times before picking single cell clones. Whole genome sequencing was then performed using PacBio long-read sequencing. (C) Aggregate plots comparing the frequency of somatic mutations (left), structure variation breakpoints (middle) and indels (right) at the center of S1-sensitive H-motifs versus S1-insensitive H-motifs for one of the APH pulsed clones. Analyses of two other clones are shown in Figure S6B. (D) Base substitutions profile in S1-sensitive H-motifs (top) and genome wide (bottom) in RPE-MLH1 knockout cells pulsed with APH. (E) Comparision of the fraction of large (>20 bp) deletions and insertions in S1-sensitive H-motifs, S1-insensitive H-motifs and total larger deletions and insertions (All) in RPE-MLH1 knockout cells pulsed with APH. Analyses of two other clones are shown in Figure S7A and B.
Given that replication stress generated by APH treatment is mutagenic(Arlt et al., 2009) and that APH increases the frequency of H-DNA (Figure 5A), we asked whether repeated cycles of APH treatment could induce mutagenesis in H-DNA-forming motifs. To elevate different classes of mutations, we knocked out the mismatch repair gene MLH1 in RPE-hTERT cells by CRISPR/Cas9-mediated editing (Figure S6A). We then treated the MLH1−/− parental clone with a low dose of APH for 24 hours and allowed cells to recover for 2 to 3 days. APH pulses were then applied 20 additional times, after which individual clones were isolated. Whole genome sequencing was performed on 20-times pulsed cells using the Pacific Biosciences (PacBio) long-read SMRT sequencing platform (Figure 7B). Mutations in three individual APH-pulsed clones were analyzed using the MLH1−/− parental clone that did not receive APH pulses as a reference genome. We found that APH treatment induced a large increase in the frequency of somatic mutations (SNV), small insertions/deletions (indels) and structural variation (including translocations, inversions, large deletions/insertions and copy number changes) at S1-sensitive hPu/hPy repeats relative to S1-insensitive hPu/hPy in all three clones (Figures 7C and S6B).
The analysis of single base substitution signatures revealed that mutations at S1-sensitive sites in APH pulsed cells exhibited an enrichment of C-to-T and T-to-C transitions (Figures 7D and S7A, B). The mutational profile at S1-sensitive sites was distinct from overall base substitutions, which were similar to the mutational signatures SBS26 and SBS44 associated with MMR-deficiency (Alexandrov et al., 2020) (Figure 7D and Figure S7A, B). Mutated SNVs were enriched in distal intergenic regions (Figure S7C), consistent with the presence of S1-sensitive repeats at these loci (Figure S3E). Both insertions and deletions occurred within S1-sensitive sites in APH pulsed cells, but large deletions were enriched over insertions (Figure 7E and Figure S7D). This is reminiscent of recent findings in yeast suggesting that triplex forming repeats are susceptible to large-scale contractions during the the processing of lagging DNA strands(Khristich et al., 2020). Together, these data demonstrate that H-DNA detected by S1-END-seq is prone to genome instability upon replication stress.
Discussion
Sequences that have the capacity to adopt alternative DNA secondary structures have been implicated in numerous heredity diseases and cancer (Georgakopoulos-Soares et al., 2018; Kaushal and Freudenreich, 2019; Kurahashi et al., 2006; Lopez Castel et al., 2010; McMurray, 2010; Mirkin, 2007; Zhao et al., 2018). The presence of long structure-forming repeats can inhibit DNA replication(Follonier et al., 2013; Gerhardt et al., 2016; Liu et al., 2012; Mirkin and Mirkin, 2007), repress transcription (Belotserkovskii et al., 2013; Grabczyk and Usdin, 2000; Robinson et al., 2021), and promote genome instability (Kaushal and Freudenreich, 2019; van Wietmarschen et al., 2020; Zhang and Freudenreich, 2007; Zhao et al., 2018). For example, highly-expanded (TA)n repeats found in cancers with MSI are susceptible to replication fork stalling, collapse and chromosomal deletions (van Wietmarschen et al., 2020). Here, we provide further evidence that the same expanded (TA)n repeats in MSI cell lines that require WRN helicase for their resolution extrude into stable cruciform structures. Previous studies could not detect expansion of (TA)n repeats by PCR amplification or by short-read sequencing due to the inability of Taq polymerases to progress through these sites. Instead, Southern blotting or long-read PacBio SMRT sequencing, which utilizes polymerases with displacement synthesis activity, were necessary(van Wietmarschen et al., 2020). In the current study, we show that S1-END-seq provides an alternative means to detect cruciform-forming (TA)n expansions, a biomarker of microsatellite instability.
Our study also revealed the existence of thousands of hPu/hPy mirror repeats that can form H-DNA structures in the human genome. Based on our findings that H-DNA in human cells is 1) enriched in cycling cells; 2) disrupted by treatment with specific polyamides; 3) enhanced in transformed cells; 4) transiently formed during iPSC differentiation into neurons; 5) increased by replication stress and 6) targeted for mutagenesis - we conclude that S1-END-seq demonstrates the existence of H-DNA structures in vivo.
We observe thousands of DNA triplexes in human transformed cell lines, while human iPSCs, primary human skin fibroblasts and normal human epithelial keratynocytes show virtually no triplex formation. Unexpectedly, during iPSC differentiation into neurons, thousands of H-DNA structures form until cells exited the cell cycle. Why H-DNA forms specifically during iPSC differentiation and its potential impact on genome function remains to be determined.
During replication, a transiently single-stranded segment within the lagging strand template called the Okazaki initiation zone is formed(DePamphilis and Wassarman, 1980). If not instantly coated by RPA, ssDNA has the potential to fold into alternative DNA structures, such as triplexes(Krasilnikova and Mirkin, 2004), (Khristich et al., 2020). Interestingly, the average length of S1-sensitive triplexes (Figure S3B) is similar to the size of an Okazaki fragment (~200 bp long)(Ogawa and Okazaki, 1980). Since RPA affinity for purines is approximately 50-fold lower than its affinity for pyrimidines (Kim et al., 1992), long stretch of purine ssDNA formed during lagging strand synthesis may go unprotected, increasing the probability of forming a triplex. Indeed, previous reports in yeast and mammalian cells have shown that hPu/hPy mirror repeats can block replication fork progression when purines are replicated on the lagging strand (Follonier et al., 2013; Kim et al., 2008; Krasilnikova and Mirkin, 2004).
Non-B-DNA motifs, including potential H-DNA forming sequences, are correlated with increased mutability in cancer (Georgakopoulos-Soares et al., 2018; Guiblet et al., 2021; Zhao et al., 2018). Consistent with this, we found that a subset of sequences with triplex-forming potential (S1-END-seq peaks) are enriched in mutations in cancers. Morever, we demonstrate that replication stress induces genetic instability targeted to hPu/hPy triplex repeats. The high levels of replication stress found within many cancer cells frequently leads to RPA exhaustion.(Toledo et al., 2017; Toledo et al., 2013) This could create naked single-stranded segments in the lagging strand that would be predicted to fuel triplex formation (Follonier et al., 2013; Kim et al., 2008; Krasilnikova and Mirkin, 2004), which in turn could potentially explain why cancer cell lines harbored higher levels of H-DNA compared to primary cells. Further studies will be needed to explain the variability in the abundance of H-DNA across cell types and their dynamics during cell cycle and differentiation.
Limitations of the study
The detection of alternative DNA structures by S1-END-seq is limited to ssDNA regions that are accessible to the S1 nuclease and are converted into DSBs. For this reason, some dynamic DNA structures reported to form in vivo, such as transcription bubbles, replication forks, double helical Z-DNA and G quadruplexes might not be easily detectable by S1-END-seq. Moreover, S1-END-seq can only detect alternative DNA structures when they occur recurrently in the genome within a population of cells.
A potential caveat is that S1-END-seq relies on DNA dechromatinization, which creates negative DNA supercoiling, and low pH, which favors H-DNA. Since both these factors are known to fuel triplex formation in vitro(Mirkin and Frank-Kamenetskii, 1994), this could theoretically generate triplex formation during sample processing for S1-END-seq. However, we provide several lines of evidence to support the conclusion that DNA triplexes are formed in vivo. This includes an overlap between triplexes detected by S1-END-seq and P1-END-seq carried out at physiological pH; their dynamics during the cell cycle; their abundance in transformed vs. non-transformed cells; their transient presence during cell differentiation; and their association with genome instability.
STAR Methods
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Andre Nussenzweig (andre_nussenzweig@nih.gov).
Materials availability
All unique/stable reagents generated in this study are available from the Lead Contact with a completed Materials Transfer Agreement.
Data and Code Availability
The accession number for the datasets reported in this paper is available at GEO with accession number: GSE204808.
This paper does not report original code.
Any additional information required to reanalyze the data reported in this paper is available from the Lead Contact upon request.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Cell lines and cell culture
KM12 cell line containing doxycycline(dox)-inducible shWRN transgene were previously generated (Chan et al., 2019). Colon cancer cancer (KM12, SW837, SW48, SW620 and RKO), RPE-hTERT TP53KO, HEK293T, HACAT and HeLa cell lines were grown in medium supplemented with 10% fetal bovine serum (FBS), penicillin 100 μg ml−1 (Sigma Aldrich), streptomycin 100 μg ml−1 (Sigma Aldrich), and L-glutamine 292 μg ml−1 (Gibco). KM12, SW837, and SW48 cells were grown in RPMI1640 media (Gibco), SW620 was grown in Leibovitz’s L-15 (Gibco) media. HACAT, HeLa, RPE1-hTERT, HEK293T and RKO was grown in Dulbecco’s Modified Eagle Medium (DMEM – Gibco). MCF10A cells were cultured in DMEM/F-12 (Gibco) containing 5% horse serum (Gibco), Human Epidermal Growth Factor 20ng ml−1 (Sigma Aldrich), hydrocortisone 0.5 μg ml−1 (Sigma Aldrich), cholera toxin 100 ng ml−1 (Sigma Aldrich), insulin 10μg ml−1 (Sigma Aldrich), penicillin 100 μg ml−1 (Sigma Aldrich) and streptomycin 100 μg ml−1 (Sigma Aldrich). Primary human skin fibroblasts between passage 12–14 were grown in Dulbecco’s Modified Eagle Medium (DMEM – Gibco) supplemented with 20% fetal bovine serum (FBS), penicillin 100 μg ml−1 (Sigma Aldrich), streptomycin 100 μg ml−1 (Sigma Aldrich), and L-glutamine 292 μg ml−1 (Gibco). Normal human epithelial keratynocytes between passages 5 and 7 were grown in KGM™ Gold Keratinocyte Growth Medium BulletKit™ (Lonza). FRDA patient (GM15850) and an unaffected sibling (GM15851) lymphoblasts were grown in RPMI1640 media (Gibco) supplemented with 15% FBS, penicillin 100 μg ml−1 and streptomycin 100 μg ml−1. Cell lines were tested for mycoplama contamination. Cells were maintained at 37°C and 5% CO2.
iPS cell culture
For the studies presented in this paper, we used iPS cells from the WTC11 line, derived from a healthy human male and obtained from the Coriell cell repository. All policies of the NIH Intramural Research Program for the registration and use of this iPS cell line were followed. In brief, iPS cells were grown on tissue culture dishes coated with human embryonic stem cell-qualified Matrigel (Corning). Upon Matrigel removal, cells were maintained in Essential 8 Medium (Thermo Fisher) supplemented with 10 μM ROCK inhibitor (Selleckchem) in a 37 °C, 5% CO2 incubator. Medium was replaced every 1–2 days as needed. Cells were passaged with accutase (Thermo Fisher), 5–10 min treatment at 37 °C. Accutase was removed and cells were washed with PBS before re-plating. Following dissociation, cells were plated in E8 media supplemented with 10 μM RI to promote survival. RI was removed once cells grew into colonies of 5–10 cells.
iPS cell-derived i3Neuron differentiation and culture
For neuronal differentiation, 20–25 million iPS cells were plated on day 0 onto a 15 cm plate in N2 medium (Gibco) with N2 supplement (Thermo Fisher), supplemented with GlutaMAX (Thermofisher Fisher), MEM nonessential amino acids (NEAA) (Thermo Fisher), 10 μM ROCK inhibitor (Selleckchem), and 2 μg ml−1 doxycycline (Sigma Aldrich). N2 medium was changed once a day for two more days. On day 3, cells were replated onto freshly prepared dishes coated with freshly prepared poly-L-ornithine (Sigma Aldrich). Pre-neuron cells were cultured in i3Neuron Culture Media: BrainPhys media (StemCell Technologies) supplemented with B27 Plus Supplement (ThermoFisher Scientific), 10 ng ml−1 BDNF (PeproTech), 10 ng ml−1 NT-3 (PeproTech), 1 μg ml−1 mouse laminin (Sigma Aldrich), and 2 μg ml−1 doxycycline (Sigma Aldrich). i3Neurons were then fed by half media and collected in different time points after neuronal induction.
METHOD DETAILS
Generation of MLH1 knockout cell line
sgRNA targeting of MLH1 was designed using the Broad Institute design tool CRISPick (https://portals.broadinstitute.org/gppx/crispick/public). sgRNA target sequence is 5’-GATGGTTCGTACAGATTCCC-3’. sgRNA sequence was inserted in pLentiCRISPRv2 (Sanjana et al., 2014) vector (Addgene) and cloning was verified by Sanger sequencing. To produce lentivirus, sgRNA containing plasmid and packaging plasmids (pRSV-Rev (Dull et al., 1998), pMDLg-pRRE (Dull et al., 1998) and pHCMVG) were transfected in HEK293T cells and after 72 hours supernatant was collected and filtered using 0.45μm filter. RPE-hTERT TP53KO cells were seeded in 6 well plates and transduced with viral supernatant using 10μg/ml polybrene (Sigma Aldrich). After 48 hr transduction, cells were selected with neomycin (2.0 mg ml−1 – Sigma Aldrich) for five days and plated for single cell clones. MLH1 knock out was confirmed by Western blot.
Western blot
Cells were lysed in a buffer containing 50 mM Tris-HCl (pH 7.5), 200 mM NaCl, 1% Tween-20, 0.2% NP-40, 2 mM PMSF, and 50 mM β-glycerophosphate and protease inhibitor cocktail tablet (Roche). 40μg of protein lysates were loaded into SDS-PAGE mini-gels (Bio-Rad). The gel was then transferred onto nitrocellulose membranes for 2 hours at 100mV. Anti-MLH1 (1:1000) (BD Pharmingen) and anti-β-ACTIN (1:10000) (Sigma Aldrich) antibodies were diluted in Intercept Blocking Buffer (LI-COR Biosciences) and incubated overnight at 4°C. Fluorescent secondary antibody anti-mouse IgG-AlexaFluor488 (LI-COR Biosciences) was diluted (1:10000) in Intercept Blocking Buffer and incubated for 1 hour at room temperature. Image acquisition was perfomed using a Odyssey Clx machine (LI-COR Biosciences).
Chemicals
The following chemical compounds were used: doxycycline (Sigma Aldrich), aphidicolin (Sigma Aldrich), CDK4/6 inhibitor (Palbociclib - selleckchem), CDK1 inhibitor (RO-3306 – selleckchem). Synthesis of beta-alanine-linked pyrrole-imidazole polyamide PA1 was performed as previously described (Erwin et al., 2017). The concentration and treatment duration are described in the figure legends.
Flow cytometry
For cell cycle profiling, cells were were incubated with 10 μM (5-ethynyl-2’ -deoxyuridine – EdU, Sigma Aldrich) for 30 min at 37 °C. EdU stained was performed using Click-IT EdU Alexa Fluor 488 Flow Cytometry Assay Kit (ThermoFisher) according to the manufacturer’s instructions. DNA content was measured by DAPI (4′,6-diamidino-2-phenylindole, 0.5 μg ml−1). The CytoFLEX (Beckman Coulter) was used for data acquisition and data analysis.
END-seq, S1-END-seq and P1-END-seq
For END-seq, S1-END-seq and P1-END-seq, 10–20 million cells were collected and embedded in 1% agarose plugs. Samples were processed following the previously described protocols (Canela et al., 2016; Wong et al., 2021; Wu et al., 2021). For S1-END-seq, the cell containing plugs were treated with proteinase K (1 h at 50 °C, followed by 7 h at 37 °C), washed once with plug washing buffer (WB - 10 mM Tris, pH 8.0, 50 mM EDTA, diluted in nuclease free water), washed three times with Tris-EDTA (TE - 10 mM Tris, pH 8, 1 mM EDTA, in nuclease-free water) and then submitted to RNase A treatment for 1 h at 37 °C. Plugs were then washed with plug washing buffer three times, washed with elution buffer (EB - 10 mM Tris, pH 8.0, in nuclease-free water) and equilibrated with two washes of S1 nuclease buffer (40 mM sodium acetate pH 4.5, 300 mM NaCl, 2 mM ZnSO4). Each wash lasted 15 minutes. We added 200 U of S1 nuclease (ThermoFisher) to 100 μl S1 nuclease buffer per plug and incubated at 37 °C for 30 min before addition of EDTA (10 mM final concentration) to terminate the reaction. Note: this concentration of S1 is different from that used to detect SSBs in human neurons (1.8 U) (Wu et al., 2021). Finally, plugs were processed through the standard END-seq protocol. P1-END-seq followed the steps as S1-END-seq. NEB1.1 buffer (pH 7) was used as P1-buffer for washes and enzymatic reaction. Similarly to S1-END-seq, P1-END-seq was performed using of P1 nuclease (NEB) to 100 μl of NEB1.1 buffer per plug and incubated at 37 °C for 30 min before addition of EDTA (10 mM final concentration) to terminate the reaction.
Sequencing libraries of END-seq, P1-END-seq and S1-END-seq
Unless otherwise state, END-seq, P1-END-seq and S1-END-seq libraries were sequenced using 75-bp single-end read kits in Nextseq 550 platforms (Illumina). S1-END-seq was also sequenced using 150bp paired-end kits in MiSeq 200 platforms (Illumina). For each treatment condition the control (vehicle or non-treated) samples libraries were sequenced in parallel in the same flow cell. The data was processed using bcl2fastq software.
High molecular weight genomic DNA preparation and PacBio sequencing
Approximately 20 million cells were harvested by centrifuging at 500 g for 5 min at 4°C. High molecular weight DNA was prepared using Nanobind Big DNA CBB kit (Circulomics) according to manufacturer’s instructions. In brief, cells were washed with PBS and resuspended in 20 μl PBS followed by 20 μl proteinase K and 20 μl CLE3 buffer. The tube was pulse vortexed 10 times and incubated in thermomixture at 55°C for 10 minutes at 900 rpm, followed by RNAse A treatment for 3 minutes at room temperature (RT). To lyse the cells, 200 μl of BL3 buffer were added, pulse vortexed 10 times, and incubated at 55°C at 900 rpm for 10 minutes. Nanodisk was added to the cell lysate followed by 300 μl of isopropanol, mixed 5x by inversion, followed by further mixing at 9 rpm using tube rotator at RT for 10 minutes. Tubes were placed on magnetic tube rack in a way that Nanobind bead remained captured near the top to avoid touching DNA with pipette tip. Supernatant were discarded and beads were washed 4 times each with buffer CW1 and CW2. High molecular weight DNA was eluted by 100 ul of Qiagen EB buffer. Purity of the DNA was analyzed using TapeStation. DNA samples were sheared 10–13 kb fragments for library preparation. WGS libraries of the samples were sequenced on PacBio SMRT cell using Sequel II with standard PacBio protocol. Samples yielded 1.78 M to 2.35 M ccs reads, with mean read length 9.44 kb and mean quality QV38.0.
Genome alignment
END-seq, P1-END-seq and S1-END-seq reads were aligned to the reference genome (hg19) using bowtie (v.1.1.2) (Langmead et al., 2009) with parameters -n 3 −l 50 -k 1. Functions ‘view’ and ‘sort’ of samtools (v.1.11) (Li et al., 2009) were used to convert and sort the aligned .sam files to sorted .bam files. .bam files were further converted to .bed files using the bedtools (v.2.29.2) bamToBed command (Quinlan and Hall, 2010). PacBio circular consensus sequencing (CCS) reads were aligned by SMRT analysis (v10.1.0) pbmm2 (https://github.com/PacificBiosciences/pbmm2) to hg19 reference genome with parameters -- preset CCS.
Peak calling
MACS 1.4.3 (Zhang et al., 2008) was used with the the parameters -p 1e-5 --nolambda --nomodel --keep-dup = all to call peaks and the peaks were then filtered by a 10-fold enrichment over background.
QUANTIFICATION AND STATISTICAL ANALYSIS
Data analysis and data visualization
We applied the genomecov function of the bedtools to convert the bedfiles into bedGraph. Then, bedGraph files were converted into .bigwig using the UCSC utilities (Karolchik et al., 2004) and visualized on the UCSC Genome Browser (Kent et al., 2002).To quantify the preference for peaks to be one ended or two ended, we applied the following formula using the reads in the plus and minus strand: where a=reads on plus strand, b=reads on minus strand.
Repli-seq tracks for HeLa-S3 cells were obtained from UCSC Genome Browser (wgEncodeUwRepliSeqHelas3WaveSignalRep1) and for HCT116 was obtained from GEO (GSE158008) (Du et al., 2021). Values shown are Wavelet-smoothed Signal, calculated as a weighted average of the percentage-normalized signal from the six cell cycle fractions (G1/G1b, S1, S2, S3, S4, G2) that higher values correspond to earlier replications.
Somatic mutations in cancer were download from The International Cancer Genome Consortium Data Portal: https://dcc.icgc.org/releases/release_28/. Mutations and indels in pacBio data were called by DeepVariant (v1.1.0) (Poplin et al., 2018) and structure variations were identified by pbsv (https://github.com/PacificBiosciences/pbsv) command of SMRT analysis.
Repeats were defined using the annotated simple repeats from hg19 genome. Heat maps and aggregate plots were were calculated using computeMatrix and plotted by plotHeatmap of deepTools suite (Ramirez et al., 2016). For other data visualization, we used GraphPad prim v8 or R studio. Statistical analyses are specified in the figure legends. The figures were compiled in Adobe Illustrator.
Supplementary Material
Key resources table.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Purified Mouse Anti-MLH-1 | BD Pharmingen | Cat# 51-1327GR |
| Monoclonal Anti-β-Actin | Sigma Aldrich | Cat# A5441 |
| IRDye 680RD Goat anti-Mouse IgG (H+L) | LI-COR Biosciences | Cat# 926-68070; RRID:AB_10956588 |
| Bacterial and virus strains | ||
| Lentivirus: lentiCRISPRv2 | (Sanjana et al., 2014) | Cat# 52961; RRID: Addgene_52961 |
| Lentivirus: pHCMVG | ATCC | Cat# 75497 |
| Lentivirus: pRSV-Rev | (Dull et al., 1998) | Cat# 12253; RRID: Addgene_12253 |
| Lentivirus: pMDLg/pRRE | (Dull et al., 1998) | Cat# 12251; RRID: Addgene_12251 |
| Chemicals, peptides, and recombinant proteins | ||
| Puregene Proteinase K enzyme | QIAGEN | Cat# 158920 |
| Puregene RNase A Solution | QIAGEN | Cat# 158924 |
| T4 DNA Polymerase | NEB | Cat# M0203L |
| T4 Polynucleotide Kinase | NEB | Cat# M0201L |
| DNA Polymerase I, Large (Klenow) Fragment | NEB | Cat# M0210L |
| Exonuclease T (ExoT) | NEB | Cat# M0265L |
| Exonuclease VII (ExoVII) | NEB | Cat# M0379L |
| Klenow Fragment (3’ →5’ exo-) | NEB | Cat# M0212L |
| Quick Ligation Kit | NEB | Cat# M2200L |
| USER enzyme | NEB | Cat# M5505L |
| KAPA HiFi HotStart ReadyMix (2X) | Roche | Cat# KK2600 |
| MyOne Streptavidin C1 Beads | Thermo Fisher | Cat# 650-01 |
| Agencourt AMPure XP beads | Beckman Coulter | Cat# A63881 |
| Doxycycline hyclate | Sigma Aldrich | Cat# D9891 |
| DAPI | Thermo Fisher | Cat# 62248 |
| EdU | Thermo Fisher | Cat# A10044 |
| cOmplete, Mini Protease inhibitor cocktail | Roche | Cat# 11836153001 |
| Matrigel | Corning | Cat# 354277 |
| ROCK inhibitor | Selleckchem | Cat# S1049 |
| Accutase | Thermo Fisher | Cat# A1110501 |
| N2 supplement | Thermo Fisher | Cat# 17502048 |
| MEM nonessential amino acids (NEAA) | Thermo Fisher | Cat# 11140050 |
| Poly-L-ornithine | Sigma Aldrich | Cat# P3655 |
| BDNF | PeproTech | Cat# 450-02 |
| NT-3 | PeproTech | Cat# 450-03 |
| Mouse laminin | Sigma Aldrich | Cat# L2020 |
| CDK1 inhibitor (Ro-3306) | Selleckchem | Cat# S7747 |
| CDK4/6 inhibitor (PD-0332991) | Selleckchem | Cat# S1116 |
| Aphidicolin | Sigma Aldrich | Cat# A0781 |
| S1 Nuclease | Sigma Aldrich | Cat# EN0321 |
| P1 Nuclease | NEB | Cat# M0660S |
| Critical commercial assays | ||
| KAPA Library Quantification Kit | Roche | Cat# KK4824 |
| Click-IT EdU Alexa Fluor 488 Flow Cytometry Assay Kit | Thermo Fisher | Cat# C10425 |
| Nanobind Big DNA CBB kit | Circulomics | Cat# NB-900-001-01 |
| Deposited data | ||
| Raw and analyzed data | This paper | GEO: GSE204808 |
| Repli-seq HCT116 | (Du et al., 2021) | GEO: GSE158008 |
| Repli-seq HeLa | UCSC Genome Browser | wgEncodeUwRepliSeqHelas3WaveSignalRep1 |
| Somatic mutations in cancer | The International Cancer Genome Consortium Data Portal | https://dcc.icgc.org/releases/release_28/ |
| Experimental models: Cell lines | ||
| Human: iPS cell | Coriell | WTC11 |
| Human: KM12 (shWRN) | (Chan et al., 2019) | N/A |
| Human: SW837 | CCLE Broad Institute | N/A |
| Human: SW48 | CCLE Broad Institute | N/A |
| Human: SW620 | ATCC | CCL-227 |
| Human: RKO | ATCC | CRL-2577 |
| Human: RPE-hTERT TP53KO | Gift from Dr. Daniel Durocher laboratory | N/A |
| Human: RPE-hTERT TP53KO MLH1KO | This paper | N/A |
| Human: HEK293T | ATCC | CRL-3216 |
| Human: HACAT | Gift from Dr. Ramiro Iglesias-Bartolome laboratory | N/A |
| Human: HeLa | ATCC | CCL-2 |
| Human: MCF10A | ATCC | CRL-10317 |
| Human: Primary human skin fibroblasts | Gift from Dr. Christopher Pearson laboratory | 602-05 |
| Human: Normal human epithelial keratynocytes (NHEK) | Gift from Dr. Ramiro Iglesias-Bartolome laboratory | N/A |
| Human: Transformed B-Lymphocyte cell line derived from FRDA patient | Gift from Dr. Karen Usdin laboratory | GM15850 |
| Human: Transformed B-Lymphocyte cell line derived from healthy sibling of FRDA patient | Gift from Dr. Karen Usdin laboratory | GM15851 |
| Oligonucleotides | ||
| MLH1 sgRNA target sequence 5’-GATGGTTCGTACAGATTCCC-3’ | https://portals.broadinstitute.org/gppx/crispick/public | This paper |
| END-seq adaptor 1, 5’-phosphate -GATCGGAAGAGCGTCG TGTAGGGAAAGAGTGUU[Biotin-dT]U[Biotin-dT]UUACACTC TTTCCCTACACGACGCTCTTCCGATC∗T-3’ | (Canela et al., 2016) | N/A |
| END-seq adaptor 2, 5’-phosphate -GATCGGAAGAGCACAC GTCUUUUUUUUAGACGTGTGCTCTTCCGATC*T-3’ | (Canela et al., 2016) | N/A |
| Software and algorithms | ||
| DeepVariant (v1.1.0) | N/A | https://github.com/google/deepvariant |
| pbsv | N/A | https://github.com/PacificBiosciences/pbsv |
| deepTools suite | (Ramirez et al., 2016) | https://deeptools.readthedocs.io/en/develop/ |
| GraphPad prim v8 | GraphPad | https://www.graphpad.com |
| Bowtie 1.1.2 | (Langmead et al., 2009) | https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.1.2/ |
| MACS 1.4.3 | (Zhang et al., 2008) | https://pypi.org/pypi/MACS/1.4.3 |
| UCSC database | (Karolchik et al., 2004) | https://genome.ucsc.edu |
| UCSC Genome Browser | (Kent et al., 2002) | https://genome.ucsc.edu |
| Bedtools | (Quinlan and Hall, 2010) | https://github.com/arq5x/bedtools2 |
| R studio | R Studio Team | https://www.r-project.org/ |
| Other | ||
| NuPAGE 4-12% Bis-Tris Protein Gels | ThermoFisher | Cat# NP0321BOX |
| Odyssey® CLx Imaging System | LI-COR Biosciences | N/A |
| CytoFLEX | Beckman Coulter | N/A |
| Nextseq 500 | Illumina | N/A |
Highlights.
S1-END-seq provides a high-resolution view of DNA secondary structures formed in vivo
S1-END-seq detects cruciform structures in expanded (TA)n repeats in MSI cancer cells
S1-END-seq reveal thousands of DNA triplexes formed in hPu/hPy mirror repeats
DNA triplexes are hotspots for genome instability
Acknowledgements
We thank Dr. Joel Gottesfeld and Dr. Robert Wells for the insightful discussions. S.M.M. is supported by the NIH (grant no. 5R35GM130322 from NIGMS). A. A. is supported by the NINDS (NS108376), FARA and ALSAC. MN is supported by the NINDS (NS081366 and NS121038) and FARA. The A.N. laboratory is supported by the Intramural Research Program of the NIH, an Ellison Medical Foundation Senior Scholar in Aging Award (AG-SS-2633–11), the Department of Defense Awards (W81XWH-16–1-599 and W81XWH-19–1-0652), the Alex’s Lemonade Stand Foundation Award, and an NIH Intramural FLEX Award.
Footnotes
Declaration of Interests
The authors do not have interests to declare.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Agazie YM, Burkholder GD, and Lee JS (1996). Triplex DNA in the nucleus: direct binding of triplex-specific antibodies and their effect on transcription, replication and cell growth. Biochem J 316 (Pt 2), 461–466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Agazie YM, Lee JS, and Burkholder GD (1994). Characterization of a new monoclonal antibody to triplex DNA and immunofluorescent staining of mammalian chromosomes. J Biol Chem 269, 7019–7023. [PubMed] [Google Scholar]
- Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, et al. (2020). The repertoire of mutational signatures in human cancer. Nature 578, 94–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arlt MF, Mulle JG, Schaibley VM, Ragland RL, Durkin SG, Warren ST, and Glover TW (2009). Replication stress induces genome-wide copy number changes in human cells that resemble polymorphic and pathogenic variants. Am J Hum Genet 84, 339–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belotserkovskii BP, Mirkin SM, and Hanawalt PC (2013). DNA sequences that interfere with transcription: implications for genome function and stability. Chem Rev 113, 8620–8637. [DOI] [PubMed] [Google Scholar]
- Boukamp P, Petrussevska RT, Breitkreutz D, Hornung J, Markham A, and Fusenig NE (1988). Normal keratinization in a spontaneously immortalized aneuploid human keratinocyte cell line. J Cell Biol 106, 761–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burnett R, Melander C, Puckett JW, Son LS, Wells RD, Dervan PB, and Gottesfeld JM (2006). DNA sequence-specific polyamides alleviate transcription inhibition associated with long GAA.TTC repeats in Friedreich’s ataxia. Proc Natl Acad Sci U S A 103, 11497–11502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campuzano V, Montermini L, Molto MD, Pianese L, Cossee M, Cavalcanti F, Monros E, Rodius F, Duclos F, Monticelli A, et al. (1996). Friedreich’s ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion. Science 271, 1423–1427. [DOI] [PubMed] [Google Scholar]
- Canela A, Sridharan S, Sciascia N, Tubbs A, Meltzer P, Sleckman BP, and Nussenzweig A (2016). DNA Breaks and End Resection Measured Genome-wide by End Sequencing. Mol Cell 63, 898–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan EM, Shibue T, McFarland JM, Gaeta B, Ghandi M, Dumont N, Gonzalez A, McPartlan JS, Li T, Zhang Y, et al. (2019). WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature 568, 551–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium I.T.P.-C.A.o.W.G. (2020). Pan-cancer analysis of whole genomes. Nature 578, 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dayn A, Malkhosyan S, and Mirkin SM (1992). Transcriptionally driven cruciform formation in vivo. Nucleic Acids Res 20, 5991–5997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DePamphilis ML, and Wassarman PM (1980). Replication of eukaryotic chromosomes: a close-up of the replication fork. Annu Rev Biochem 49, 627–666. [DOI] [PubMed] [Google Scholar]
- Du Q, Smith GC, Luu PL, Ferguson JM, Armstrong NJ, Caldon CE, Campbell EM, Nair SS, Zotenko E, Gould CM, et al. (2021). DNA methylation is required to maintain both DNA replication timing precision and 3D genome organization integrity. Cell Rep 36, 109722. [DOI] [PubMed] [Google Scholar]
- Dull T, Zufferey R, Kelly M, Mandel RJ, Nguyen M, Trono D, and Naldini L (1998). A third-generation lentivirus vector with a conditional packaging system. J Virol 72, 8463–8471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ehmsen KT, and Heyer WD (2008). Saccharomyces cerevisiae Mus81-Mms4 is a catalytic, DNA structure-selective endonuclease. Nucleic Acids Res 36, 2182–2195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erwin GS, Grieshop MP, Ali A, Qi J, Lawlor M, Kumar D, Ahmad I, McNally A, Teider N, Worringer K, et al. (2017). Synthetic transcription elongation factors license transcription across repressive chromatin. Science 358, 1617–1622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernandopulle MS, Prestil R, Grunseich C, Wang C, Gan L, and Ward ME (2018). Transcription Factor-Mediated Differentiation of Human iPSCs into Neurons. Curr Protoc Cell Biol 79, e51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Filla A, De Michele G, Cavalcanti F, Pianese L, Monticelli A, Campanella G, and Cocozza S (1996). The relationship between trinucleotide (GAA) repeat length and clinical features in Friedreich ataxia. Am J Hum Genet 59, 554–560. [PMC free article] [PubMed] [Google Scholar]
- Follonier C, Oehler J, Herrador R, and Lopes M (2013). Friedreich’s ataxia-associated GAA repeats induce replication-fork reversal and unusual molecular junctions. Nat Struct Mol Biol 20, 486–494. [DOI] [PubMed] [Google Scholar]
- Frank-Kamenetskii MD, and Mirkin SM (1995). Triplex DNA structures. Annu Rev Biochem 64, 65–95. [DOI] [PubMed] [Google Scholar]
- Gaillard PHL, Noguchi E, Shanahan P, and Russell P (2003). The endogenous Mus81-Eme1 complex resolves Holliday junctions by a nick and counternick mechanism. Mol Cell 12, 747–759. [DOI] [PubMed] [Google Scholar]
- Georgakopoulos-Soares I, Morganella S, Jain N, Hemberg M, and Nik-Zainal S (2018). Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. Genome Res 28, 1264–1271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerhardt J, Bhalla AD, Butler JS, Puckett JW, Dervan PB, Rosenwaks Z, and Napierala M (2016). Stalled DNA Replication Forks at the Endogenous GAA Repeats Drive Repeat Expansion in Friedreich’s Ataxia Cells. Cell Rep 16, 1218–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grabczyk E, and Usdin K (2000). The GAA*TTC triplet repeat expanded in Friedreich’s ataxia impedes transcription elongation by T7 RNA polymerase in a length and supercoil dependent manner. Nucleic Acids Res 28, 2815–2822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, and Makova KD (2021). Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 49, 1497–1516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haniford DB, and Pulleyblank DE (1985). Transition of a cloned d(AT)n-d(AT)n tract to a cruciform in vivo. Nucleic Acids Res 13, 4343–4363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, and Kent WJ (2004). The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32, D493–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaushal S, and Freudenreich CH (2019). The role of fork stalling and DNA structures in causing chromosome fragility. Genes Chromosomes Cancer 58, 270–283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, and Haussler D (2002). The human genome browser at UCSC. Genome Res 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khristich AN, Armenia JF, Matera RM, Kolchinski AA, and Mirkin SM (2020). Large-scale contractions of Friedreich’s ataxia GAA repeats in yeast occur during DNA replication due to their triplex-forming ability. Proc Natl Acad Sci U S A 117, 1628–1637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim C, Snyder RO, and Wold MS (1992). Binding properties of replication protein A from human and yeast cells. Mol Cell Biol 12, 3050–3059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim HM, Narayanan V, Mieczkowski PA, Petes TD, Krasilnikova MM, Mirkin SM, and Lobachev KS (2008). Chromosome fragility at GAA tracts in yeast depends on repeat orientation and requires mismatch repair. EMBO J 27, 2896–2906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohwi Y, and Panchenko Y (1993). Transcription-dependent recombination induced by triple-helix formation. Genes Dev 7, 1766–1778. [DOI] [PubMed] [Google Scholar]
- Kotsantis P, Petermann E, and Boulton SJ (2018). Mechanisms of Oncogene-Induced Replication Stress: Jigsaw Falling into Place. Cancer Discov 8, 537–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kouzine F, Wojtowicz D, Baranello L, Yamane A, Nelson S, Resch W, Kieffer-Kwon KR, Benham CJ, Casellas R, Przytycka TM, et al. (2017). Permanganate/S1 Nuclease Footprinting Reveals Non-B DNA Structures with Regulatory Potential across a Mammalian Genome. Cell Syst 4, 344–356 e347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krasilnikova MM, and Mirkin SM (2004). Replication stalling at Friedreich’s ataxia (GAA)n repeats in vivo. Mol Cell Biol 24, 2286–2295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurahashi H, Inagaki H, Ohye T, Kogo H, Kato T, and Emanuel BS (2006). Palindrome-mediated chromosomal translocations in humans. DNA Repair (Amst) 5, 1136–1145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, Trapnell C, Pop M, and Salzberg SL (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu G, Myers S, Chen X, Bissler JJ, Sinden RR, and Leffak M (2012). Replication fork stalling and checkpoint activation by a PKD1 locus mirror repeat polypurine-polypyrimidine (Pu-Py) tract. J Biol Chem 287, 33412–33423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu LF, and Wang JC (1987). Supercoiling of the DNA template during transcription. Proc Natl Acad Sci U S A 84, 7024–7027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopez Castel A, Cleary JD, and Pearson CE (2010). Repeat instability as the basis for human diseases and as a potential target for therapy. Nat Rev Mol Cell Biol 11, 165–170. [DOI] [PubMed] [Google Scholar]
- Lyamichev VI, Mirkin SM, Danilevskaya ON, Voloshin ON, Balatskaya SV, Dobrynin VN, Filippov SA, and Frank-Kamenetskii MD (1989). An unusual DNA structure detected in a telomeric sequence under superhelical stress and at low pH. Nature 339, 634–637. [DOI] [PubMed] [Google Scholar]
- Maekawa K, Yamada S, Sharma R, Chaudhuri J, and Keeney S (2022). Triple-helix potential of the mouse genome. Proc Natl Acad Sci U S A 119, e2203967119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maizels N, and Gray LT (2013). The G4 genome. PLoS Genet 9, e1003468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McMurray CT (2010). Mechanisms of trinucleotide repeat instability during human development. Nat Rev Genet 11, 786–799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirkin EV, and Mirkin SM (2007). Replication fork stalling at natural impediments. Microbiol Mol Biol Rev 71, 13–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirkin SM (2001). DNA topology: Fundamentals. Encylopedia of Life Sciences. [Google Scholar]
- Mirkin SM (2007). Expandable DNA repeats and human disease. Nature 447, 932–940. [DOI] [PubMed] [Google Scholar]
- Mirkin SM (2008). Discovery of alternative DNA structures: a heroic decade (1979–1989). Front Biosci 13, 1064–1071. [DOI] [PubMed] [Google Scholar]
- Mirkin SM, and Frank-Kamenetskii MD (1994). H-DNA and related structures. Annu Rev Biophys Biomol Struct 23, 541–576. [DOI] [PubMed] [Google Scholar]
- Mirkin SM, Lyamichev VI, Drushlyak KN, Dobrynin VN, Filippov SA, and Frank-Kamenetskii MD (1987). DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330, 495–497. [DOI] [PubMed] [Google Scholar]
- Murchie AI, and Lilley DM (1987). The mechanism of cruciform formation in supercoiled DNA: initial opening of central basepairs in salt-dependent extrusion. Nucleic Acids Res 15, 9641–9654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murchie AI, and Lilley DM (1992). Supercoiled DNA and cruciform structures. Methods Enzymol 211, 158–180. [DOI] [PubMed] [Google Scholar]
- Ogawa T, and Okazaki T (1980). Discontinuous DNA replication. Annu Rev Biochem 49, 421–457. [DOI] [PubMed] [Google Scholar]
- Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, et al. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36, 983–987. [DOI] [PubMed] [Google Scholar]
- Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, and Manke T (2016). deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson J, Raguseo F, Nuccio SP, Liano D, and Di Antonio M (2021). DNA G-quadruplex structures: more than simple roadblocks to transcription? Nucleic Acids Res 49, 8419–8431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sakamoto N, Larson JE, Iyer RR, Montermini L, Pandolfo M, and Wells RD (2001). GGA*TCC-interrupted triplets in long GAA*TTC repeats inhibit the formation of triplex and sticky DNA structures, alleviate transcription inhibition, and reduce genetic instabilities. J Biol Chem 276, 27178–27187. [DOI] [PubMed] [Google Scholar]
- Samadashwily GM, Dayn A, and Mirkin SM (1993). Suicidal nucleotide sequences for DNA polymerization. EMBO J 12, 4975–4983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanjana NE, Shalem O, and Zhang F (2014). Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods 11, 783–784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sinden RR, Pytlos-Sinden MJ, and Potaman VN (2007). Slipped strand DNA structures. Front Biosci 12, 4788–4799. [DOI] [PubMed] [Google Scholar]
- Toledo L, Neelsen KJ, and Lukas J (2017). Replication Catastrophe: When a Checkpoint Fails because of Exhaustion. Mol Cell 66, 735–749. [DOI] [PubMed] [Google Scholar]
- Toledo LI, Altmeyer M, Rask MB, Lukas C, Larsen DH, Povlsen LK, Bekker-Jensen S, Mailand N, Bartek J, and Lukas J (2013). ATR prohibits replication catastrophe by preventing global exhaustion of RPA. Cell 155, 1088–1103. [DOI] [PubMed] [Google Scholar]
- van Wietmarschen N, Sridharan S, Nathan WJ, Tubbs A, Chan EM, Callen E, Wu W, Belinky F, Tripathi V, Wong N, et al. (2020). Repeat expansions confer WRN dependence in microsatellite-unstable cancers. Nature 586, 292–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C, Ward ME, Chen R, Liu K, Tracy TE, Chen X, Xie M, Sohn PD, Ludwig C, Meyer-Franke A, et al. (2017). Scalable Production of iPSC-Derived Human Neurons to Identify Tau-Lowering Compounds by High-Content Screening. Stem Cell Reports 9, 1221–1233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang G, and Vasquez KM (2014). Impact of alternative DNA structures on DNA damage, DNA repair, and genetic instability. DNA Repair (Amst) 19, 143–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wells RD, Collier DA, Hanvey JC, Shimizu M, and Wohlrab F (1988). The chemistry and biology of unusual DNA structures adopted by oligopurine.oligopyrimidine sequences. FASEB J 2, 2939–2949. [PubMed] [Google Scholar]
- West SC, Parsons CA, and Picksley SM (1987). Purification and properties of a nuclease from Saccharomyces cerevisiae that cleaves DNA at cruciform junctions. J Biol Chem 262, 12752–12758. [PubMed] [Google Scholar]
- Wong N, John S, Nussenzweig A, and Canela A (2021). END-seq: An Unbiased, High-Resolution, and Genome-Wide Approach to Map DNA Double-Strand Breaks and Resection in Human Cells. Methods Mol Biol 2153, 9–31. [DOI] [PubMed] [Google Scholar]
- Wu W, Hill SE, Nathan WJ, Paiano J, Callen E, Wang D, Shinoda K, van Wietmarschen N, Colon-Mercado JM, Zong D, et al. (2021). Neuronal enhancers are hotspots for DNA single-strand break repair. Nature 593, 440–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H, and Freudenreich CH (2007). An AT-rich sequence in human common fragile site FRA16D causes fork stalling and chromosome breakage in S. cerevisiae. Mol Cell 27, 367–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao J, Wang G, Del Mundo IM, McKinney JA, Lu X, Bacolla A, Boulware SB, Zhang C, Zhang H, Ren P, et al. (2018). Distinct Mechanisms of Nuclease-Directed DNA-Structure-Induced Genetic Instability in Cancer Genomes. Cell Rep 22, 1200–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The accession number for the datasets reported in this paper is available at GEO with accession number: GSE204808.
This paper does not report original code.
Any additional information required to reanalyze the data reported in this paper is available from the Lead Contact upon request.
