Skip to main content
Nature Communications logoLink to Nature Communications
. 2024 Nov 6;15:9582. doi: 10.1038/s41467-024-53917-8

Replication stress induces POLQ-mediated structural variant formation throughout common fragile sites after entry into mitosis

Thomas E Wilson 1,2,, Samreen Ahmed 1,2, Amanda Winningham 2, Thomas W Glover 1,2,
PMCID: PMC11541566  PMID: 39505880

Abstract

Genomic structural variants (SVs) greatly impact human health, but much is unknown about the mechanisms that generate the largest class of nonrecurrent alterations. Common fragile sites (CFSs) are unstable loci that provide a model for SV formation, especially large deletions, under replication stress. We study SV junction formation as it occurs in human cell lines by applying error-minimized capture sequencing to CFS DNA harvested after low-dose aphidicolin treatment. SV junctions form throughout CFS genes at a 5-fold higher rate after cells pass from G2 into M-phase. Neither SV formation nor CFS expression depend on mitotic DNA synthesis (MiDAS), an error-prone form of replication active at CFSs. Instead, analysis of tens of thousands of de novo SV junctions combined with DNA repair pathway inhibition reveal a primary role for DNA polymerase theta (POLQ)-mediated end-joining (TMEJ). We propose an important role for mitotic TMEJ in nonrecurrent SV formation genome wide.

Subject terms: Genomic instability, Mitosis, Non-homologous-end joining, Targeted resequencing


The mechanisms of nonrecurrent structural variant (SV) formation are poorly understood. Here, the authors sequenced thousands of SV junctions as they formed at common fragile sites in human cell lines to reveal a primary role for DNA polymerase theta-mediated end joining activated during mitosis.

Introduction

Chromosomal rearrangements, i.e., structural variants (SVs), represent a large proportion of genomic diversity and are responsible for numerous genomic disorders1. They also arise in somatic cells and are a major mutation type in cancers. A predominant class of SVs are kb- to Mb-scale copy number variants (CNVs), including simple deletions and duplications and more complex rearrangements2,3. There are two major types of larger ( > 10 kb) CNVs in human genomes as revealed by breakpoint structures. Recurrent CNVs are formed by meiotic unequal recombination between flanking segmental duplications4. Nonrecurrent germline CNVs and virtually all that arise in somatic and cancer cells can form anywhere in the genome and are characterized by short microhomologies or insertions at the breakpoint junctions5,6.

Despite the large impact of nonrecurrent CNVs on human health, our understanding of the mechanisms responsible for their formation is incomplete. One challenge is that a single mechanism may not explain all SV events. Microhomologies at breakpoint junctions suggested early models of SV formation by nonhomologous end joining (NHEJ) of DNA double-strand breaks (DSBs)7, since microhomologies cannot support the homologous recombination (HR) that drives recurrent CNV formation. However, some non-recurrent CNVs have multiple breakpoint junctions that are difficult to reconcile with NHEJ but might be explained by fork-stalling and template switching (FoSTeS), which invokes a DSB-independent switch of a nascent replication strand to a different template8,9, or microhomology-mediated break-induced replication (MMBIR), which is similar to FoSTeS but invokes a single-ended DSB at a stalled replication fork8,10. Importantly, FoSTeS and MMBIR are thought to act during S-phase at sites of stalled replication whereas NHEJ predominates in G111.

Over many experiments, we established common fragile sites (CFSs) as an experimental model for non-recurrent CNV formation1218. CFSs are genomic loci that show frequent gaps and breaks on metaphase chromosome spreads under replication stress. CFS expression is frequently induced by low-doses of the DNA B-family polymerase inhibitor aphidicolin (APH), but is also observed to arise spontaneously at a lower frequency18,19. CFSs were originally proposed to represent regions of DNA that remained unreplicated past S-phase19 and decades of work support conclusions that CFSs correspond to the subset of late-replicating genomic loci residing at the largest human genes. When transcribed, these genes become susceptible to impaired replication fork progression due to a paucity of active replication origins resulting from transcription that continues into S-phase17,20,21.

CFS loci are also hotspots for CNV formation under replication stress, both in human cancers18 and in cell culture systems following treatment with multiple agents including low doses of APH, hydroxyurea, and ionizing radiation1517,22. Like nonrecurrent CNVs in human genomes, CNVs formed at CFSs are characterized by short microhomologies, insertions, or blunt ends, leading us to propose they could be formed by template switching, NHEJ, or microhomology-mediated end joining13. An alternative potential mechanism was suggested by the findings of Minocherhomji et al23. who showed that APH-induced unreplicated genome regions complete replication in early mitotic prophase by a rescue process called mitotic DNA synthesis (MiDAS). Notably, peaks of MiDAS synthesis have a high correspondence with CFS genes24,25. Moreover, MiDAS often results in synthesis on just one sister chromatid suggesting it occurs by conservative break-induced replication (BIR)26, a variant HR pathway operating at single-ended DSBs that is prone to template switching and other errors27,28. These observations link MiDAS to CFSs, possibly as the mechanism by which SV junctions form at these loci29,30.

Most recently, a series of studies have described a specific form of end-joining catalyzed by DNA polymerase theta (POLQ), a pathway often called theta-mediated end-joining (TMEJ)31. TMEJ creates nonhomologous junctions with a significant frequency of templated insertions that provide microhomologies for bridging two DSB ends. Intriguingly, POLQ activity and TMEJ become activated upon entry into mitosis through mechanisms that include RHINO-directed recruitment to mitotic DNA breaks and POLQ phosphorylation by the PLK1 kinase3235. Thus, TMEJ joins NHEJ, FoSTeS, MMBIR, and MiDAS as a candidate mechanism for catalyzing SV junction formation at CFSs and elsewhere.

A limitation of prior literature is a paucity of prospective experimental tests of the hypothesized connections between replication stress, replication rescue and repair mechanisms, and SV formation based on direct assessment of de novo SV junctions. The high frequency of CNV formation at CFS loci provides a valuable experimental model for obtaining these important missing breakpoint junction data to add to indirect inferences from cancer genomes and cytological studies. We previously used whole genome microarrays to establish that large, actively transcribed CFS genes are hotspots for CNV formation, but throughput and resolution were low and CNVs could only be detected after clonal expansion1217. More recently, we established svCapture sequencing as a reliable method for detecting and characterizing locus-specific single-molecule SV junctions36.

Here, we used svCapture to enrich whole-genome sequencing within known CFS genes to determine the distribution, structure, breakpoint junctions, and mechanisms of SV formation during specific cell-cycle phases and in cells deficient in key repair processes. We reasoned that identifying the cell cycle phase at which SV junctions formed following APH-induced replication stress would guide the identification of the mechanism(s) involved since FoSTes and MMBIR would likely predominate in S-phase, MiDAS and TMEJ in M-phase, and NHEJ in the following G1 phase. We found that that many APH-induced SVs formed at CFSs during M-phase but that MiDAS was not required for this activity. In contrast, by examining SV frequencies in cell populations and the structure of tens of thousands of de novo CFS SV junctions, we identified TMEJ as a primary contributor to SV junction formation, with NHEJ having a lesser contribution. These results reveal a role for mitotic TMEJ in mediating large SV formation at CFSs and provide important insights into possible genome-wide SV mutation mechanisms in normal and cancer cells.

Results

svCapture detects large de novo deletion SVs in CFS genes

We used hybridization target capture, i.e., svCapture36, to enrich whole-genome sequencing near the middle of five previously established CFS genes in three cell lines to enable detection of de novo single-molecule SV formation in cell populations (Fig. 1A). Specifically, we targeted large genes PRKG1, NEGR1, and MAGI2 in fibroblast line UM-HF1 (HF1) and FHIT and WWOX in lymphoblastoid line GM12878 and colon cancer line HCT116 as model systems for replication stress-associated SV formation (Supplementary Fig. 1A and B). PRKG1, NEGR1, and MAGI2 are known CNV hotspots in HF1 cells17, whereas FHIT and WWOX are among the loci with the highest frequency of metaphase breaks and gaps in lymphoblastoid and HCT116 cells37.

Fig. 1. APH-induced SV junctions arise throughout large CFS genes in asynchronous normal and cancer cell lines.

Fig. 1

A svCapture sequencing was targeted near CFS gene centers to detect SV junctions with at least one breakpoint in a target region. B Timeline of asynchronous cell experiments, with SV induction by low-dose APH. C Example svCapture junction analysis from paired HF1 fibroblast samples shows the relationship between the sizes of SVs and their microhomologies, blunt ends, and insertions (see Methods). Each point is one intrachromosomal, single-molecule SV between 10 kb and 1.2 Mb. SVs are plotted in random order with a small amount of random noise added to the X-axis to aid in visualization of all data points. There are fewer control/blue points (n = 33) vs. APH-treated points (n = 222). D Induced SVs are strongly biased to deletions. APH doses are in µM. SV frequency is the observed junction count divided by the target region fold-coverage. HF1, UM-HF1; GM, GM12878; HCT, HCT116. Independent biological replicate (total SV) numbers by cell line/APH are: HF1/-, 2 (55); HF1/ + , 4 (2295); GM/-, 1 (64); GM/ + , 1 (338); HCT/-, 1 (73); HCT/ + ; 3 (3757). Error bars are mean +/− 2 SD of two or more replicates. E Induced deletion SV frequency. Each point aggregates all SVs from one independent biological replicate, e.g., from one set of colored junctions in panel C. Sample point colors denote shared experimental batches handled together. Replicate (SV) numbers by cell line/APH are: HF1/-, 2 (22); HF1/ + , 4 (2064); GM/-, 1 (42); GM/ + , 1 (313); HCT/-, 1 (51); HCT/ + ; 3 (3407). Error bars are mean +/− 2 SD of two or more replicates. Intergroup comparisons were made using a generalized linear regression model based on an over-dispersed Poisson, i.e., the negative binomial distribution, sampling SV junctions from sequenced haplotypes (see Methods). Throughout, two-tailed p-values of selected intergroup comparisons are marked as ns, not significant; *, p <= 0.01; **, p <= 0.001; ***, p <= 0.0001. HF1 p-value = 3.56e-63. F APH-induced deletion size distributions by cell line for the same data sources as in (E). Colored vertical lines are median SV sizes. G Distribution of APH-induced deletion breakpoints in HF1 cells for the same data sources as in (E), which extend beyond the capture targets (shaded) but less than the limits of the interrogation regions (plot width) or gene spans (vertical lines, transcribed left to right, arrow). Data are aggregated in 5 kb bins. H High net HF1 target region read coverage in 100 bp bins for the same data sources as in G, the normalization denominator for panel (I). Low-coverage bins reflect reduced capture probe density and/or sequencing efficiency. PRKG1 carried a clonal deletion in the HF1 cells under study at 51.595–51.645 Mb. I HF1 deletion breakpoint distributions in target regions for the same data sources as in G after normalizing to read coverage >=500, showing non-focal accumulation of SV junctions. Source data are provided as a Source Data file.

We first used a workflow in which SVs were allowed to accumulate asynchronously under replication stress caused by low doses of APH, an inhibitor of B-family replicative DNA polymerases α, δ and ε (Fig. 1B), determined empirically per cell line to slow but not stop replication and to produce an average of 2-5 chromosome gaps and breaks per cell. Throughout, we only scored SV junctions present in a single source DNA molecule to track de novo SVs formed during the experiment. Paired samples showed a clear increase in de novo SV Frequency, i.e., SV count normalized to target region coverage, in APH-treated vs. control cells (Fig. 1C). In some cell types, observed frequencies were consistent with as many as half of measured haplotypes acquiring a de novo SV (Fig. 1D), consistent with prior microarray results17.

SVs induced at CFS genes were strongly biased toward deletions over duplications, inversions (Fig. 1D), and inter-target translocations, even though svCapture reports all SV junction types detectable by short-read sequencing. The lower level of non-deletion SVs was sometimes significantly induced by APH (Supplementary Fig. 1C–G), but due to the greater abundance of DNA loss events originating from unreplicated DNA17,22, we mainly track deletions below. Figure 1E verifies reproducible deletion induction across multiple experimental batches (denoted by point color) in all cell lines (the single replicate of GM12878 cells is supported by further data below).

APH-induced deletion SVs showed mainly short microhomologies at breakpoint junctions and a median SV size of ~200 kb in all cell lines (Fig. 1C, F, and Supplementary Fig. 1C), matching the 186 kb median for microarray CNVs in human cell lines13,17. SVs tended to be smaller without APH induction and for duplications (Supplementary Fig. 1H and I). We therefore applied filters to only track SVs smaller than 1.2 Mb and larger than 10 kb (50 kb for inversions, Supplementary Fig. 1C) to maintain specificity.

SVs formed under aphidicolin stress arise throughout large gene bodies

Proposed mechanisms for genomic instability at CFS genes variably invoke replication delay17, local sequence features prone to polymerase stalling within CFS genes38,39, and additional influences secondary to transcription40,41. Our high-density collection of SV breakpoints informs this longstanding question. As expected, we saw the greatest frequency of deletion breakpoints within capture target regions (Fig. 1G, Supplementary Fig. 2A and D). However, some of the second SV breakpoints were outside the capture targets but remained almost entirely within the gene bodies. Like prior work17, duplications and inversions were more frequently located at the gene flanks as opposed to deletions, which clustered in the center of the genes (Supplementary Fig. 1D).

When we normalized deletion breakpoint density within capture targets to local sequencing coverage to account for variable svCapture efficiency and underlying clonal SVs (Fig. 1H, Supplementary Fig. 2B and E), we observed that breakpoints were distributed throughout the 250 kb or 400 kb capture targets (Fig. 1I, Supplementary Fig. 2C and F). No specific locations in any of the five targeted genes appeared to act as unexpectedly high frequency sites of focal SV formation. The pattern was consistent with replication forks failing stochastically throughout large, transcribed genes.

Together, svCapture recapitulated all aspects of SV formation at CFS genes under replication stress as previously seen in microarray data but with much greater resolution and data density. We do not know what fraction of SVs observed without APH treatment are library artifacts vs. rare background events, but SVs accumulated above that background must have arisen during an experiment and provide a rich signal of dozens to hundreds of sequenced SV junctions per replicate sample (Fig. 1C and E).

Many SVs induced at CFSs by aphidicolin stress form after passage into M-phase

A motivation in developing single-molecule svCapture was to determine when SVs form at CFS genes relative to replication. Replication stress occurs in S-phase, but we reasoned that subsequent junction formation could occur in S concurrent with fork failure, in G2, in M associated with MiDAS or other processes, or in the next G1 associated with 53BP1 foci42. We, therefore, purified timed, flow-sorted cells and used real-time svCapture to assess when in the cell cycle APH-induced SV junctions formed. This study addresses the S, G2, and M-phases.

Our first experimental paradigm matched that used to study MiDAS23. HCT116 cells were treated with APH and synchronized at the G2-M boundary using the CDK1 inhibitor RO3306 (Fig. 2A). Treatment timing helped ensure that arrested cells had experienced APH-induced replication stress in the prior S-phase. G2 (4 N DNA content, phospho-histone H3[pH3] negative) and M-phase (4 N, pH3 positive) cells were harvested by flow cytometry prior to or 3 h after release from RO3306 arrest, respectively (Fig. 2C and Supplementary Fig. 3A–B). svCapture revealed a small but statistically significant increase in deletion SV formation in G2 cells treated with APH in the prior S-phase (Fig. 2D). However, deletion yield increased consistently by an average of 4.6-fold in M relative to G2-phase cells (Fig. 2D). Because cells were held in colchicine and flow sorted, they could not have passed into the next G1, indicating greater SV formation in M as compared to G2-phase. SVs that formed in M-phase had similar properties to those from bulk asynchronous cultures, including a large median size, a bias toward deletions, and a lower rate of induced duplications and inversions with distinct breakpoint distributions (Supplementary Fig. 4A–C).

Fig. 2. APH induces SV junction formation during mitosis but independently of MiDAS.

Fig. 2

A, B Two synchronization paradigms used to harvest APH-treated cells in different, timed phases of the cell cycle, where G2 cells were harvested before and after release from RO3306 arrest, respectively. C Example flow sorting of HCT116 cells after release from RO3006. x-axis, DNA content; y-axis, pH3. Gated cell fractions in cell cycle stages G1, S, G2, and M are shown. D SV frequencies from synchronized HCT116 cells harvested in the indicated cell cycle phases, showing increased APH-induced SV yield when cells passed from G2 to M. Points aggregate sequenced SVs from one independent biological replicate. Point colors denote experimental batches. Error bars are mean +/− 2 SD of two or more replicates. Replicate (SV) numbers by release/cell cycle/APH are: -/G2/-, 6 (241); -/G2/ + , 6 (555); +/G2/-, 1 (54); +/G2/ + , 2 (263); +/M/-, 9 (492); +/M/ + , 12 (6286). P-values from two-sided negative binomial generalized linear model: -/G2/- vs. -/G2/ + , 7.80e-10; -/G2/+ vs. +/M/ + , 8.12e-21; +/G2/+ vs. +/M/ + , 3.40e-13; +/M/- vs. +/M/ + , 4.67e-67. *, p <= 0.01; **, p <= 0.001; ***, p <= 0.0001. E Timeline of experiments where cells in different cell cycle phases were flow sorted from asynchronous cultures. Colchicine improved M-phase cell yield and prevented re-entry into G1/S. F SV frequencies from HCT116 cells using the paradigm in E, with M-phase cells again showing higher SV yield. Plot elements and statistics are the same as D. Independent biological replicate (SV) numbers by cell cycle/APH are: S/-, 4 (145); S/ + 6 (1672); G2/− 2 (78); G2/ + 4 (546); M/− 2 (81); M/ + 4 (3680). P-values are: S/- vs. S/ + , 9.35e-89; G2/- vs. G2/ + , 9.93e-09; M/- vs. M/ + , 1.87e-22; G2/+ vs. M/ + , 1.88e-10. G Timeline of experiments where high-dose APH (2 µM) was added at release from RO3306 to inhibit MiDAS. EdU was added at RO3306 release only when visualizing MiDAS foci. H Example images of EdU foci scored in I and J, visualized in M-phase, showing singlet and doublet foci and the absence of foci with high-dose APH. Scale bar (white line): 5 µm. I Summary of EdU focus counts per metaphase from two independent biological experiments. Number of metaphase cells analyzed: no APH, 212; 0.2uM APH, 231; 0.2uM + 2uM APH, 200. J Comparison of EdU singlet and doublet focus yield between untreated and low-dose APH-treated cells over three independent biological experiments. Each point is one metaphase cell with at least one EdU focus, stratified by its singlet (x-axis) vs. doublet (y-axis) focus count. Number of EdU-positive metaphase cells/total metaphase cells analyzed: no APH, 29/364; 0.2uM APH, 184/317. K SV frequencies from HCT116 cells using the paradigm in G, with G2 cells harvested before RO3306 release, where suppression of MiDAS by high-dose APH in M had no impact on SV yield. Plot elements and statistics are the same as D. Independent biological replicate (SV) numbers by cell cycle/low APH/high APH are: G2/-/-, 6 (241); G2/ + /-, 6 (555); M/-/-, 9 (492); M/ + /-, 12 (6286); M/-/ + , 2 (86); M/ + / + , 2 (1239). P-values are: M/+/- vs. M/ + / + , 0.109. ns: not significant, p > 0.01. Source data are provided as a Source Data file.

To ensure RO3306 was not influencing results, e.g., by altering replication dynamics or inhibiting replication-associated repair in G243,44, we modified the workflow by either (i) harvesting G2 cells after release from RO3306 (Fig. 2B) or (ii) omitting RO3306 and using extended flow sorting to collect sufficient S, G2 and M-phase cells from asynchronous cultures (Fig. 2E and Supplementary Fig. 3C). In all cases, svCapture revealed a significant ~5-fold increase in deletion SV yield in M relative to G2-phase (Fig. 2D and F). To explore whether G2 cells were less capable of SV formation because they were dying, we assessed cell death by monitoring cleaved caspase 3 relative to a positive control treated with etoposide (Supplementary Fig. 3D). G2-phase HCT116 cells did not show an excess of cleaved caspase 3 despite being substantially less likely to have formed SV junctions than pH3-positive M-phase cells.

MiDAS is not required for SV formation at CFSs under aphidicolin stress

Preferential SV formation at CFSs in M-phase might suggest that SV junctions are created by error-prone MiDAS, which occurs preferentially in large, actively transcribed genes, including CFS genes2325. To test this possibility, we followed established protocols for inhibiting MiDAS by adding high-dose (2 µM) APH upon release from RO3306 arrest (Fig. 2G)23. To ensure MiDAS suppression, we added EdU to parallel cultures after RO3306 release and examined M-phase-specific EdU foci by fluorescence microscopy (Fig. 2G). High-dose APH abrogated MiDAS-associated EdU focus formation induced by low-dose APH (Fig. 2H–I). Interestingly, in contrast to some prior reports23, most MiDAS M-phase EdU foci induced by low-dose APH exposure in S-phase were doublets with signal on both chromatids consistent with semi-conservative DNA replication (Fig. 2J), not the singlet foci restricted to one sister chromatid taken as evidence for conservative replication23. Most importantly, MiDAS inhibition by high-dose APH in M-phase HCT116 cells did not suppress deletion SV formation (Fig. 2K), a finding supported by a single-replicate experiment in GM12878 cells (Supplementary Fig. 4D).

MiDAS inhibition does not affect CFS expression in HCT116 or GM12878 cells

SV formation and CFS expression are different manifestations of replication stress at CFS loci18. MiDAS inhibition was reported to greatly reduce total gaps and breaks and specific CFS expression in U2OS cells and to generate CFS-associated ultrafine bridges and increased nondisjunction of chromosomes 3 and 16, which contain the FRA3B and FRA6D loci, in U20S cells and MRC5 human fetal lung fibroblasts23. Because we found no effect of MiDAS inhibition on FRA3B or FRA16D-associated SV formation in HCT116 or GM12878 cells, we explored the relationship between MiDAS, chromosome breakage, and specific CFS expression in those cell lines. We first determined the optimal timing of high-dose APH (2uM) for MiDAS inhibition as determined by EdU foci formation in M-phase in unsynchronized cultures (Fig. 3A). In HCT116 cells, a 1-2 h high-APH treatment before chromosome harvest eliminated MiDAS (Fig. 3B). In contrast to HCT116 cells synchronized with RO3306 (Fig. 2H), there were a small number of foci seen in HCT116 cells treated with high APH for 3 h, presumably representing cells entering mitosis from early G2 or S-phase (Fig. 3B). For GM12878 cells, a 1 h high-APH treatment eliminated MiDAS, with low levels of EdU foci appearing in cells treated for 2 h (Fig. 3B). Based on these results, we used 1 h and 2 h high-APH treatments for cytogenetic analyses with both cell types. As with SV formation, we did not observe a decrease in APH-induced CFS expression or in total gaps and breaks in either HCT116 or GM12878 cells upon MiDAS inhibition with high-dose APH (Fig. 3C–F) and observed an increase in HCT116 breaks with increasing high-APH treatment time.

Fig. 3. APH-induced chromosome gaps and breaks and CFS expression do not depend on MiDAS.

Fig. 3

A Timeline of experiments where high-dose APH (2 μM) and EdU were added at 1 h, 2 h, or 3 h prior to chromosome harvest to inhibit MiDAS and for visualizing MiDAS foci, respectively. B Comparison of EdU focus yield between untreated and high-dose APH-treated HCT116 and GM12878 cells. Gray bars are the mean of two independent biological replicates (orange and blue sample points). Number of HCT116 metaphase cells scored: 1 hr EdU 104 + 92; 2 hr EdU, 88 + 103; 3 hr EdU, 110 + 96; 1 hr EdU+2uMAPH, 109 + 100; 2 hr EdU+2uMAPH, 63 + 89; 3 hr EdU+2uM APH, 78 + 92. Number of GM12878 metaphase cells: 1 hr EdU, 59 + 102; 2 hr EdU, 59 + 88; 3 hr EdU, 41 + 92; 1 hr EdU+2uMAPH, 21 + 104; 2 hr EdU+2uMAPH, 29 + 83; 3 hr EdU+2uM APH, 21 + 77. C, D Total gaps and breaks (left) and CFS expression (right) in HCT116 and GM12878 cells, respectively, with and without high-dose APH to suppress MiDAS. Chromosome were analyzed in 25 metaphases from each of two biological replicates for each cell line. Gray bars, mean. E FRA3B and FRA116D CFS gaps/breaks (arrows) in a representative HCT116 cell treated with high-dose APH for 2 h from the set of experiments described in C. Scale bar (black line), 5 µm. F Additional examples of FRA3B and FRA16D CFS gaps/breaks (arrows) in HCT116 and GM12878 cells treated with high-dose APH for 2 h or 1 h, respectively, from the set of experiments described in (C) and (D). Scale bar (black line), 5 µm. Source data are provided as a Source Data file.

SVs induced by aphidicolin stress have TMEJ-like junction profiles

Our results support SV formation at large CFS genes by a pathway activated in M-phase other than MiDAS. To inform candidate alternative(s) we performed a detailed analysis of an extensive database of 11,641 de novo deletion junctions sequenced to base-pair resolution from asynchronous and M-phase cells with unperturbed DNA repair. Short-read sequencing cannot reveal SVs created by HR, but prior microarray work never suggested events of that class17. Instead, both microarray and svCapture data support junctions strongly consistent with DSB end joining, where paired breakpoint positions in SV alleles revealed a range of microhomology usage, blunt joints, and de novo base insertions (Fig. 4A–C). The distribution of these junction classes, characterized by a prominent peak of 2 bp microhomology and a substantial minority of de novo insertions, was strikingly reproducible across APH treatment conditions (Fig. 4C), cell line and cell harvest workflow, cell cycle phase, SV type, and MiDAS status (Supplementary Fig. 5A–D).

Fig. 4. Junction analysis of >11 K de novo deletions implicates TMEJ repair of DSBs created by replication fork cleavage.

Fig. 4

A Guide to drawing conventions used in this figure and models in supplemental figures. B SVs are oriented to transcription of the relevant CFS gene. Gene-proximal and gene-distal breakpoints can be paired via microhomologies that remove base pairs relative to the initial DSB ends, by blunt joints (not drawn), or by the insertion of novel bases, which might arise by template copying with sequential use of priming and resolving microhomologies. C Distribution of deletion junction microhomology and insertion sizes, i.e., breakpoint offsets, from all cell lines harvested asynchronously or in M-phase, stratified by APH induction. The microhomology peak is at 2 bp. Independent biological replicates (SVs) by APH: -, 14 (501); +, 23 (11140). D Yield of identified templates for the APH-induced samples and deletion SVs plotted in (C), stratified by insert size. 1099 of 3022 insertion-containing SVs analyzed had putative insertion templates identified. Smaller insertions were required to have more bases of flanking microhomology in candidate templates to maintain search specificity (see Methods), leading to the non-monotonic blue random expectation line. All insertion sizes showed significant enrichment of identified templates (p <= 0.05, red), as determined by a one-tailed assessment of the binomial distribution that asked whether the fraction of identified templates for that size exceeded the number expected based on random Poisson sampling of bases (see Methods). Exact p-values are provided in a Source Data file. E Number of templates identified for insertion SVs from the data sources in (D) within 500 bp of junction breakpoints. 2788 of 3022 (92%) of insertion SVs had zero or one template identified, establishing search specificity. F Pileup of identified insertion template locations for the data sources in (D). The plot is oriented like panel B with respect to the genomic DNA regions surrounding the two breakpoints, i.e., most templates were found in retained breakpoint segments. 738 templates are plotted in total. G Histogram of the bases contributing to insertion templates plotted in (F), emphasizing offset of foldback relative to cross-junction templates. Solid lines, left breakpoint; dashed lines, right breakpoint. Panels F and G share an X axis. H The recurrent structure of expansion-class insertions. See Supplementary Fig. 7 for models of how expansion might occur so that templates appear to cross into the deleted side of inferred breakpoint positions. I General model by which single-ended DSBs created at replication forks lead to SV formation mechanisms suggested by CFS SV junction analysis. Source data are provided as a Source Data file.

We characterized junction insertions in detail because they are a signature of TMEJ4547, an end-joining pathway recently implicated in mitotic DSB repair3234. Specifically, inserted bases are sometimes copied from template sequences near a DSB. The inferred repair process involves synthesis initiated from a priming microhomology flanking the inserted bases and eventual cross-DSB annealing via a second resolving microhomology on the opposite flank (Fig. 4B)31,46. Three TMEJ insertion classes with varying template orientations have been described, referred to here as foldback (also called inverse), cross-junction synthesis (also called direct), and a more complex and rarer strand-switching mechanism (Supplementary Fig. 6A–C)46.

We searched for insertion templates 500 bp upstream and downstream of each reference genome breakpoint (Fig. 4B), requiring at least 7 bp template spans to promoted specificity. We found a significantly higher fraction of templates than expected by random chance across all insertion sizes from 1 to 20 bp (Fig. 4D and E). However, we did not find templates for most insertions and the hit rate decreased with insertion size, possibly due to a correlated increase in the frequency of untemplated or multi-template events.

A highly informative footprint emerged from the 1099 identified templates that featured a peak of both priming and resolving microhomology lengths at 2 to 3 bp (Supplementary Fig. 5E) and a net total template size, including flanking microhomologies, of typically less than 10 bp (Supplementary Fig. 5F). Templates were almost always found on the retained side of genomic SV breakpoints (Fig. 4F), suggesting the search ensues after a DSB separates replicated/retained DNA from unreplicated/lost DNA. There was roughly equal utilization of foldback and cross-junction synthesis on either side of the SV junction, suggesting a random search (Fig. 4F). However, the search was strongly restricted in distance from the junction, with a strong peak of template bases within 20 bp of breakpoints (Fig. 4F, G). Rare templates found at greater distances have increased likelihood of being random sequence matches (Supplementary Fig. 8A).

Implications of insertion template patterns for fork cleavage leading to SVs

The mechanistic specificity of insertion templates was supported by a shift of foldback template bases nearest the junction toward positions 4 bp further away from the junction as compared to cross-junction synthesis (Fig. 4G and Supplementary Fig. 8B). This shift is consistent with the need for hairpin formation during foldback synthesis in most inverse insertions, which precludes the placement of priming microhomologies at the extreme DSB terminus. However, we did observe a rare class of inverse insertions occurring at short palindromic sequences (Fig. 4F and Supplementary Fig. 6E). Because the palindrome bases could not all anneal in a hairpin, these insertions, and perhaps some that we describe as foldback events, may have occurred by an alternative bimolecular synthesis model that invokes the sister chromatid as template (Supplementary Fig. 6D).

Surprisingly, we found some insertion templates on the top genome strand relative to a deletion that appeared to cross from the retained into the lost portion of the corresponding genomic breakpoint (orange lines in Fig. 4F). The events were characterized by short, often interrupted, direct repeats that were expanded from two to three repeat units in the final junction, so we refer to them as expansion insertions (Fig. 4H). In Supplementary Fig. 7, we draw two parsimonious models for these events (others may be possible). Importantly, designations of retained and lost DNA are relative to sequence alignment breakpoint positions, not the source DSB termini whose structure we do not know. For expansion junctions, the algorithm calls an insertion with a breakpoint upstream of the added repeat unit, which may or may not correspond to the starting DSB end. In one model, the leading strand template is cleaved and used as the template for fill-in synthesis of the resulting 5’ overhanging DSB, where strand slippage would create the observed expansion (Supplementary Fig. 7). An alternative model invokes cleavage of the lagging strand template to yield a 3’ overhanging DSB that again uses the leading strand template, now within a daughter strand gap, to create the expansion insertion (Supplementary Fig. 7). Expansion insertions have not been described for TMEJ occurring at CRISPR/Cas9-mediated DSBs31,46, possibly because the mechanisms invoke either 5’ overhanging DSBs or bimolecular reactions at forks (Fig. 4I) that do not apply to Cas9 blunt ends at simple DSBs.

Chemical POLQ inhibition and POLQ knockout differentially impact SV formation

Because our junction analysis implicated TMEJ as a possible mechanism of SV formation at large CFS genes, we modified asynchronous and M-phase svCapture workflows to incorporate chemical POLQ inhibitors (Fig. 5A, B) and CRISPR-mediated POLQ knockout (KO) cell clones. We validated TMEJ loss using a published assay based on PCR detection of intracellular joining of transfected oligonucleotides (Fig. 5C and Supplementary Fig. 9)48.

Fig. 5. POLQ inhibition and knockdown reduce SV formation in mitosis.

Fig. 5

A, B Modification of asynchronous and synchronization timelines, respectively, where DSB repair inhibitors were added prior to either APH addition or release into M-phase. C Loss of TMEJ in POLQ-/- HF1 and HCT116 cells as determined by joining of transfected oligonucleotides. Number of independent biological replicates: HF1 WT, 2; HF1 POLQ-/-, 3; HCT116 WT, 1; HCT116 POLQ-/-, 2. Sample point colors denote experimental batches. Gray bars, mean. D HF1 asynchronous deletion SV frequency with POLQ KO. Sample point colors denote experimental batches. Error bars are mean +/− 2 SD of two or more independent biological replicates. Replicate (total SV) numbers by CRISPR/APH are: -/-, 1 (12); -/+, 3 (1785); POLQ/-, 3 (20); POLQ/ + , 5 (1363). P-values from two-sided negative binomial generalized linear model: -/+ vs. POLQ/ + , 1.83e-47; POLQ/- vs. POLQ/ + , 4.90e-64. **, p <= 0.001; ***, p <= 0.0001. E HCT116 asynchronous deletion SV frequency with POLQ KO and inhibition by ART558. Plot labeling and statistics are the same as (D). Replicate (total SV) numbers by CRISPR/APH/ART558 are: -/-/-, 1 (51); -/+/-, 3 (3407); -/+/+, 2 (1799); POLQ/ + /-, 2 (1505). P-values are: -/+/- vs. -/+/+, 1.89e-49;-/+/- vs. POLQ/ + /-, 6.09e-05. F HCT116 M-phase deletion SV frequency with POLQ KO and inhibition by ART558 and novobiocin (NVB). Plot labeling and statistics are the same as D. Replicate (total SV) numbers by CRISPR/APH/ART558/NVB are: -/-/-/-, 8 (425); -/-/+/-, 2 (114); -/-/-/+, 1 (37); -/+/-/-, 11 (5975); -/+/+/-, 4 (1175); -/+/-/+, 2 (352); POLQ/ + /-/-, 2 (153). P-values are: -/+/-/- vs. -/+/+/-, 5.42e-04; -/+/-/- vs. -/+/-/+, 7.11e-10; -/+/-/- vs. POLQ/ + /-/-, 8.42e-28. GI APH-induced junction insertion/microhomology size distributions for the data sources in D to F, respectively. SV counts are shown above plots for all sample groups. J Summary of deletion junction property distributions for asynchronous plus M-phase samples. Each point is one biological replicate across all cell lines. The x-axis is the percent of deletion junctions in a sample that had 2 to 15 bp insertions, the y-axis is the average microhomology length of junctions without insertions. Numbers of replicates by CRISPR/NVB/ART558 are: -/-/-, 25; -/-/+, 7; -/+/-, 2; POLQ/-/-, 9. K Differences between repair \ression paradigms with respect to the SV junctions that are ultimately sequenced. The SV frequency plot is conceptual and does not represent actual data. Source data are provided as a Source Data file.

Results consistently supported a role for POLQ in SV formation at CFSs but varied by cell type and method with different degrees of SV loss. Asynchronous POLQ-/- HF1 and HCT116 cells each showed a reproducible, significant, but partial loss of deletions as well as other types of SVs relative to wild-type (Fig. 5D and E and Supplementary Fig. 10A and B). Deletion SV reduction was again partial in HCT116 asynchronous or M-phase cells treated with the POLQ inhibitors ART55849 or novobiocin (NVB)50,51 (Fig. 5E, F). In contrast, APH-treated POLQ-/- HCT116 M-phase cells showed baseline levels of SV formation with no apparent induction by APH (Fig. 5F and Supplementary Fig. 10C).

Because SV frequency alone might not reveal the full role of TMEJ if another repair pathway could partially replace it, we analyzed properties of residual SVs from cells with impaired TMEJ. SV sizes were similar regardless of POLQ status (Supplementary Fig. 10D–F). In contrast, we observed shifts in junction distributions comprising changes in microhomology lengths and insertion frequencies. POLQ-/- cells across all cell lines and workflows showed a near absence of insertions >=3 bp and a shift in peak microhomology length from 2 bp to 1 bp (Fig. 5G–I and Supplementary Fig. 10G–I). In contrast, POLQ chemical inhibition did not substantially reduce insertions while the shift toward shorter microhomologies sometimes remained apparent (Fig. 5H–I and Supplementary Fig. 10H–I). We found templates for insertions detected in ART558 and NVB-treated samples much like uninhibited cells, although with a higher relative rate of expansion insertions (Supplementary Fig. 11). These results were supported by a single experimental replicate in GM12878 cells (Supplementary Fig. 10J–L).

Fig. 5J shows how APH-induced POLQ-proficient, POLQ-inhibited, and POLQ-/- samples group with respect to the insertion frequency and average microhomology lengths of their deletion SVs. The dynamics of these different manipulations must be carefully considered when interpreting results (Fig. 5K). POLQ protein loss is distinct from its chemical inhibition, which may only partially inhibit enzymatic activity and/or permit structural roles to be fulfilled. Moreover, inhibitors were used transiently whereas KO cells lacked POLQ from the time they were cloned. SVs that arose as background events before an experiment would form in POLQ proficient vs. deficient states for inhibition vs. KO, respectively, and these background events become a larger fraction of detected SVs as APH induction decreases (Fig. 5K).

TMEJ and NHEJ cooperate in SV formation in some asynchronous cells

Despite challenges comparing POLQ chemical inhibitors and KO clones, results above demonstrate that some SV formation at CFSs can occur without POLQ, especially in asynchronous cultures. To explore the interplay between TMEJ and NHEJ in different cell cycle stages, we added chemical inhibition of DNA-PKcs using NU744152 and CRISPR/Cas9 KO of DNA ligase IV gene LIG4 (Supplementary Fig. 12A and B)53,54. svCapture deletion SV yield in asynchronous HF1 cells was not altered by NU7441 in either wild-type or POLQ-/- backgrounds (Fig. 6A). In contrast, NU7441 significantly decreased deletion yield relative to ART558 or POLQ KO in asynchronous HCT116 cells (Fig. 6B) and in GM12878 cells in a single replicate (Supplementary Fig. 12C). This synergy was especially apparent when ART558 was added to asynchronous cultures of LIG4-/- HCT116 cells, which abrogated APH-induced SV formation (Fig. 6B).

Fig. 6. Cell-cycle dependent interplay between TMEJ and NHEJ in SV formation at CFSs.

Fig. 6

A Deletion SV frequency in asynchronous, APH-induced HF1 cells as a function of POLQ knockout and NHEJ inhibition with Nu7441. Sample point colors denote experimental batches. Error bars are mean +/− 2 SD of two or more independent biological replicates. Replicate (total SV) numbers by CRISPR/Nu7441 are: -/-, 3 (1785); -/+, 2 (1343); POLQ/-, 5 (1363); POLQ/ + , 2 (524). P-values from two-sided negative binomial generalized linear model: -/- vs. -/+, 0.180; POLQ/- vs. POLQ/ + , 0.119. ns, not significant; **, p <= 0.001; ***, p <= 0.0001. B Deletion SV frequency in asynchronous, APH-induced HCT116 cells as a function of POLQ and LIG4 knockout, and inhibition of NHEJ with Nu7441 or TMEJ with ART558. Plot labeling and statistics are the same as A. Replicate (total SV) numbers by CRISPR/ART558/Nu7441 are: -/-/-, 3 (3407); -/-/+, 2 (2512); -/+/-, 2 (1799); -/+/+, 2 (848); POLQ/-/-, 2 (1505); POLQ/-/ + , 2 (668); LIG4/-/-, 2 (2322); LIG4/ + /-, 2 (116). P-values are: -/+/- vs. -/+/+, 5.59e-64; POLQ/-/- vs. POLQ/-/ + , 6.23e-08; LIG4/-/- vs. LIG4/ + /-, 8.89e-197. C Like A and B, now for HCT116 cells released into M-phase following RO3306 arrest. Plot labeling and statistics are the same as A. Replicate (total SV) numbers by CRISPR/ART558/Nu7441 are: -/-/-, 11 (5975); -/-/+, 2 (1473); -/+/-, 4 (1175); -/+/+, 2 (681); POLQ/-/-, 2 (153); POLQ/-/ + , 2 (148). P-values are: -/-/- vs. -/-/+, 0.376; -/+/- vs. -/+/+, 0.571. For clarity, panels A to C only show samples induced to form SVs with low-dose APH. Horizontal dashed lines indicate the cell-line-specific SV level consistently observed without APH addition. D Like Fig. 4C, showing no effect of NHEJ loss on junction microhomology and insertion profiles. Independent biological replicates (total SV) numbers by cell cycle/CRISPR/Nu7441 are: async(-)/-/-, 8 (4005); async(-)/-/+, 5 (2622); async(-)/LIG4/-, 2 (1603); M/-/-, 14 (6919); M/-/ + , 1 (1057). E Timeline for harvesting S-phase HCT116 cells from asynchronous cultures. F Deletion SV frequency in S-phase HCT116 cells as a function of POLQ and LIG4 knockout, and inhibition of NHEJ with Nu7441 or TMEJ with ART558. Plot labeling and statistics are the same as A. Replicate (total SV) numbers by APH/CRISPR/ART558/Nu7441 are: -/-/-/-, 4 (145); +/-/-/-, 6 (1672); +/LIG4/-/-, 4 (841); +/LIG4/ + /-, 4 (139); +/POLQ/-/-, 4 (565); +/POLQ/-/ + , 4 (205). P-values are: -/-/-/- vs. +/-/-/-, 9.35e-89; +/-/-/- vs. +/LIG4/-/-, 5.96e-02; +/-/-/- vs. +/POLQ/-/-, 1.67e-04; +/LIG4/-/- vs. +/LIG4/ + /-, 1.95e-14; +/POLQ/-/- vs. +/POLQ/-/ + , 3.89e-04. Source data are provided as a Source Data file.

A possible confounder above is that loss of both TMEJ and NHEJ can impair cell growth in some contexts55, although we did not observe excessive cell death. To restrict the timeframe during which cells were doubly deficient, we added NU7441 to M-phase HCT116 cells just before release from RO3306 and observed that NU7441 now had no incremental impact on SV formation (Fig. 6C). As noted above, POLQ KO alone was sufficient to abrogate APH-induced SV formation in M-phase HCT116 cells (Fig. 6C). Throughout, loss of NHEJ by either chemical inhibition or LIG4 KO had no impact on junction insertion/microhomology distributions or insertion template locations (Fig. 6D and Supplementary Fig. 12D), consistent with these baseline junction properties being driven by TMEJ. Interestingly, NU7441 also had no impact on junction insertion/microhomology distributions in POLQ-/- cells (Figure Supplementary Fig. 12E).

Because non-TMEJ pathways might have a greater impact outside of M-phase, we repeated DNA repair manipulations above and harvested S-phase cells from asynchronous cultures treated with low-dose APH for 18 hours (Fig. 6E). Although SV rates were lower in S as compared to M-phase, APH clearly induced some SV formation in S (Fig. 6F). Following results above, impairing NHEJ alone with LIG4 KO had no impact on SV formation except for one of four replicates that we consider to be an outlier (Fig. 6F). In contrast, impairing TMEJ alone with POLQ KO partially reduced SV formation, while combined inhibition of both pathways by two experimental manipulations reduced SV formation to baseline levels (Fig. 6F). These results continue to reveal a larger role of TMEJ over NHEJ in CFS SV formation.

Discussion

This study addresses nonrecurrent SVs that arise in non-repetitive loci by mechanisms linked to negative interactions between transcription and replication. Unlike the early replicating genome, where these interactions include machinery collisions and R-loop formation22,5658, instability in the late replicating genome relates more to conflicts that cause unreplicated DNA to be propagated into M-phase2325. The most unstable late-replicating loci are at the largest human CFS genes, where S-phase transcription impedes origin usage and completion of replication22,41,59. We show that some of the mechanisms evolved to resolve the resulting unreplicated DNA prior to cell division can create large non-recurrent SVs in at least some genomic loci.

We monitored SV junctions as they formed in cells using svCapture. svCapture36 could detect timed de novo SV formation in part due to the high rate of CNV formation at transcribed CFS genes under replication stress17 and in part because it offers single-molecule sensitivity36. Prior work examined features such as single-chromosome end-to-end fusions in Ku-deficient cells, radial formation in Fanconi anemia cells, and the persistence of DNA damage evident as DNA repair foci33,60, but ours is a direct demonstration of intrachromosomal SV junction formation in M-phase, including large SVs spanning hundreds of kb. We characterized 48,362 such de novo SVs in controlled, prospective experiments with a combined 322,303-fold target coverage, which provides a uniquely powerful dataset for exploring junction formation mechanisms. Multiple observations affirm that we are measuring bona fide SV junctions. All aspects of CFS CNVs observed in microarray studies1218 were recapitulated by svCapture, but with much greater resolution and depth. Moreover, junction microhomology and templated insertions changed with cellular DNA repair capacity and must therefore comprise SVs created in cells.

Mathematical models61 have shown that stochastic fork failure coupled with an inability to rescue replication via dormant origins can alone yield the central peak of SV junctions we see within CFS genes17. Consistently, no subregions in any of the CFS genes we studied acted as more highly localized SV hotspots. The large introns in CFS genes often have AT-rich sequences that have been invoked as important for CFS expression through the formation of focal flexibility peaks38,39. Biochemical studies of DNA polymerase progression through specific focal regions of CFS genes, especially sequences in WWOX, further showed they can impose mechanistic barriers to replication62,63. However, svCapture data linked CFS gene transcription to a diffuse distribution of de novo junctions. A limitation in making these conclusions is that extensive processing of DSBs could make SV distributions more diffuse than the underlying source lesions, whose structure cannot be determined from junction sequences.

CFSs replicate as late as M-phase17,20,21,23,64, and the distribution of SV junctions we observed at CFS genes corresponds well to the location of MiDAS synthesis peaks observed under replication stress24,25. Thus, replication inhibition and the initiation of instability begin in S-phase but are not resolved until late in the cell cycle. We hypothesized SV formation would also occur in late G2 or M-phase coincident with completion of replication, which was confirmed by direct measurement that 80% of pre-G1 CFS SV junction formation occurs after cells pass into M-phase. Specifically, most SV junction formation occurred after the appearance of pH3 in chromatin coincident with MiDAS timing as previously reported23. Importantly, RO3306 was not artifactually suppressing SV formation in G2-phase secondary to CDK1 inhibition43,44. Colchicine addition and flow sorting further ensured that junctions reported here were not formed secondary to rupture of ultrafine bridges and progression into the next G1. However, the current work does not address the non-exclusive possibility that additional SV formation might occur in G1 associated with the appearance of 53BP1 bodies23,42, especially in the absence of POLQ. Such studies will be complicated by the need to distinguish between SVs formed in M vs. G1 following a stressed S-phase, and by technical factors related to cell purity, but will be important for a complete understanding of SV formation timing.

Within M-phase, multiple possibilities existed for SV junction formation at CFSs. Because MiDAS is an error-prone form of conservative replication23,26,65, and because MiDAS hotspots are found in CFS loci2325, it was a plausible mechanism for executing SV junction formation. However, MiDAS proved to not be required nor did suppressing it significantly increase SV junction formation or reduce CFS expression in GM12878 or HCT116 cells, contrary to prior studies with other cell types23. Thus, MiDAS appears to be a genome preserving pathway for completing synthesis of unreplicated DNA in mitosis, but MiDAS and SV formation do not appear to function as competing pathways. Importantly, the spans of unreplicated DNA passing into M-phase at CFS genes are likely exceptionally large and more numerous relative to expectations for the rest of the genome and under more typical cellular stresses than chemical inhibition with APH. MiDAS may never be fully effective in completing replication of hundreds of kb of CFS DNA, which might explain our data if SV formation could occur even after aborted MiDAS.

Detailed analysis of 33,279 fully sequenced SV junctions showed a remarkably reproducible junction pattern across SV types, cell cycle phases, and APH-induction status. In repair-proficient cells, that profile included a prominent peak at 2 bp microhomology and a long tail of ~20% de novo base insertions, a signature of TMEJ indicative of DSBs being a key intermediate in CFS SV formation46,47,66. Short-read sequencing has detection biases against longer insertions >20 bp but is fully able to reveal nonhomologous junctions with several bases of microhomology or insertion. Thus, the junction profile we observed is a reliable signature for the mechanism(s) catalyzing de novo SV junction formation at CFSs, especially in M-phase. SV junctions arising at CFS genes used insertion templates restricted to a window of approximately 20 bp into the presumed DSB terminus. These insertion events were lost in POLQ-KO, which establishes constraining parameters against which biochemical and structural studies of POLQ should be compared. Interestingly, templates were found for a minority of de novo insertions. We suspect this is mostly due to iterative copying of multiple templates and/or untemplated synthesis by POLQ66,67 but cannot rule out other processes. We also cannot rule out non-TMEJ contributions of POLQ, e.g., its role in gap filling following BRCA1/2 loss or PARP inhibition68, but the most parsimonious explanation of our data is that POLQ catalyzes junction formation over long distances through TMEJ. Studies of additional TMEJ factors will help support this conclusion. Even if gaps and associated processing by POLQ prove important in some modes of replication stress, resolution of lesions to the class of SV junctions we observed seems to require DSB end joining.

The best demonstration of a role for POLQ in SV junction formation was the loss of APH-induced M-phase SVs in POLQ-/- HCT116 cells. That result is strongly consistent with recent studies identifying POLQ suppression by RAD52 and BRCA2 prior to M-phase69 and its activation in M-phase by RHINO-mediated recruitment to DSBs and phosphorylation by PLK13234. Our results support that TMEJ acts an error-prone rescue pathway for dealing with unreplicated DNA in M-phase to prevent mitotic catastrophe. TMEJ has been shown to produce short 50-200 bp deletions in mutation accumulation experiments in C. elegans70 and 5-50 bp deletions at Cas9-induced DSBs in mouse cells71 but those small SVs are best modeled as occurring via resection of a single two-ended DSB. Our work establishes that POLQ can mediate M-phase formation of large, multi-lesion, spontaneous SVs >100 kb in mammalian cells.

Given the above, we were surprised by the partial effects of POLQ inhibitors on SV formation in multiple cell lines, including the widely used and more specific agent ART55849,72. POLQ inhibition did sometimes reduce microhomology lengths similarly to POLQ KO, arguing we effectively inhibited POLQ, but with a relative persistence of templated insertions. Because our experiments measured SV junction formation as the primary outcome, they may reveal a separation of POLQ functions73. ART558 binds to the polymerase catalytic domain of PolQ and inhibits its activity49, whereas NVB inhibits the POLQ ATPase50,51. Neither removes the potential for POLQ acting in non-catalytic ways. Based on our results, microhomology use appears to be more impacted by alterations in POLQ catalytic activities whereas insertion formation strongly depends on structural functions. However, chemical inhibition of POLQ was transient whereas POLQ KO preceded the experimental window, which impacts the SVs that svCapture would detect as background events.

Despite evidence that TMEJ acts in M-phase as a source of large CFS SVs, that association is incomplete. Even with POLQ KO, some de novo SV formation remained in asynchronous cell cultures. This effect could not be explained by different M-phase behavior of cancerous and normal cells74 because it was observed in both the HF1 and HCT116 cell lines. Some SV formation must be catalyzed by another end-joining pathway, at least as a backup to TMEJ. Importantly, in both the current work and our prior study of murine Xrcc4-/-embryonic stem cells75, we saw no impact of the loss of NHEJ alone on SV formation even in S-phase, identifying it as secondary to TMEJ. It is difficult to rule out that some SVs detected in S or G2 were formed in a prior M-phase but given that NHEJ is thought to be largely inactive in M-phase76,77, it seems more likely that non-TMEJ pathways had a greater role in SV formation outside of M-phase. Of note, DNA polymerase lambda has been suggested to mediate end joining independently of NHEJ and TMEJ and might also play a role in SV formation78.

Insertion templates that appeared to cross breakpoint junctions were unexpected but can be modeled as arising from repeat expansions created at the single-ended DSBs known to be formed in CFSs79, presumably in M-phase by fork cleavage that activates replication rescue. Both the MUS81 and GEN1 structure-specific nucleases are required for CFS expression80,81 and are excellent candidates for creating those source DSBs leading to SV formation. If the slippage expansion model is correct, it would implicate MUS81 as it is thought to cleave the leading strand template at stalled forks, the orientation that would yield 5’ overhangs82. The sister-chromatid synthesis model might implicate GEN1 or another nuclease, but we currently favor leading strand cleavage in part because the locations of inverse templates appeared to favor unimolecular foldback synthesis over bimolecular sister synthesis. More work is required to address fork cleavage polarity leading to SV formation, but in either case, symmetrical cleavage of stalled forks has been proposed to lead to HR-independent SCEs35,83 where DSB repair by TMEJ would obligatorily lead to deletion SVs corresponding to unreplicated DNA spans. It is less obvious how TMEJ would lead to duplications and inversions, which, although less frequent, were seen at CFSs and bore the hallmarks of TMEJ.

Our results provide key insights into the ways that error-prone replication rescue in M-phase can proceed from DSB formation to the creation of various types of SVs. It is not yet known to what extent the final resolution of replication-associated damage throughout the genome is deferred until M-phase, but MiDAS-associated propagation of unreplicated, R-loop-associated DNA into M-phase has been observed in BRCA2 and RAD51-deficient cells and cells with cyclin E1 overexpression57,84,85. Moreover, we have observed replication stress-induced CNVs that mimic clinically important mammalian nonrecurrent CNVs at numerous non-CFS loci using microarrays17. Taken together, it appears possible that some SVs in non-CFS loci may also be created in M-phase by TMEJ. Indeed, our results closely match prior descriptions of microhomologies and frequent insertions at SV junctions in both normal human CNVs47,86,87 and SVs in cancers8890, implicating POLQ and mitotic TMEJ as potential mechanisms in mammalian SV mutagenesis. If true, the risk of TMEJ-mediated SV formation would likely be elevated in cancers and other cells that depend on TMEJ due to down-regulation of HR or NHEJ91,92.

Methods

Cell Culture Models

UM-HF1 fibroblasts

UM-HF1 (abbreviated HF1 throughout) is a XY male, euploid, TERT-immortalized human foreskin-derived fibroblast cell line derived and maintained at the University of Michigan17 subject to human data access restrictions. It has known CFSs/CNV hotspots at genes PRKG1, NEGR1, and MAGI2 that provide excellent svCapture signal in a non-cancerous cell line17,36, but HF1 cells are not easily synchronized for cell-cycle analysis. HF1 cells were cultured at 37 C with 5% CO2 in Dulbecco’s Modified Eagle Medium supplemented with 13% fetal bovine serum (FBS), 2mM L-glutamine, and 100 U/ml penicillin-streptomycin (Gibco).

GM12878 lymphoblastoid cells

GM12878, (Coriell, RRID CVCL_7526), is a highly studied XX female, euploid, EBV-immortalized human lymphoblastoid cell line generated as part of the HapMap Project. It has CFSs common to lymphoblastoid cells at genes WWOX and FHIT that allow direct comparison of CFSs and SVs in a suspension cell line. GM12878 cells were cultured at 37 C with 5% CO2 in RPMI 1640 medium supplemented with 15% FBS, 2mM L-glutamine, and 100 U/ml penicillin-streptomycin.

HCT116 colon cancer cells

HCT116 (ATCC, RRID CVCL_0291) is a highly studied male, mismatch-repair deficient, human colon cancer cell line. This work established that genes WWOX and FHIT are SV hotspots in HCT116, consistent with CFS expression analysis, gene expression analysis using Bru-seq93, and genomic analysis that showed baseline SVs in these genes94. HCT116 cells were cultured at 37 C with 5% CO2 in McCoy’s 5 A medium supplemented with 10% FBS, 2mM L-glutamine, and 100 U/ml penicillin-streptomycin.

CRISPR-Cas9-mediated gene knockout

For CRISPR-Cas9 mediated knockout (KO) of POLQ or LIG4 in HCT116 cells, single guide RNAs (sgRNAs, Supplementary Data 4) were designed using CHOPCHOP95 and cloned into the PX459 plasmid, which carries the human U6 promoter, Cas9 gene, and puromycin resistance gene96. The plasmids were transfected into HCT116 cells using Lipofectamine 3000 (Invitrogen). For POLQ KO in HF1 fibroblasts, vector pLentiCRISPR v2 with integrated sgRNAs (GenScript) was transfected into HF1 cells by the University of Michigan Vector Core. Following transfection, cells were subjected to selection with 1 µg/ml puromyocin and KO clones were established by plating at low-density and isolating single colonies with cloning rings for expansion in multi-well dishes. Following clonal expansion, PCR with primers flanking the sgRNA binding sites was used to amplify the target alleles followed by Sanger sequencing. The resulting mixed allelic sequence traces were analyzed using Synthego ICE software97. Clones were preferred for further use when each of the two alleles yielded distinct frameshift mutations (Supplementary Data 4). Clones that yielded identical biallelic mutations were confirmed with ddPCR to establish a copy number of two for the mutant allele. LIG4 KO clones were further validated with immunoblotting. The large protein size and low expression of POLQ resulted in inconclusive westerns, so POLQ KO clones were validated using the TMEJ assay described below. At least two independent clones of all mutated cell lines were frozen at early passage post-cloning for subsequent SV analysis.

Replication stress induction and monitoring

The DNA polymerase inhibitor aphidicolin (APH, Sigma) was dissolved in DMSO at a stock concentration of 200 µM. For CNV and CFS induction, cells were cultured with APH as follows: HF1, 0.6 µM; GM12878, 0.4 µM; HCT116, 0.2 µM. These doses were empirically determined per cell line to be consistent with slowed but continued cell division and to produce approximately 2-5 chromosome gaps and breaks per cell in GM12878 cells. These concentrations of APH are defined as low-dose APH throughout. The duration of low-dose APH treatment and its timing relative to other cell manipulations varies is shown in timelines in relevant figures. High-dose APH (2 µM) and EdU (10 µM), to suppress MiDAS or reveal MiDAS foci, respectively, were added 1 to 3 h prior to harvest.

Chromosome breaks and common fragile sites

CFSs and total gaps and breaks were scored on Giemsa-banded metaphase preparations following 24 h low-dose APH induction with or without the addition of high-dose APH as described above. Cells were harvested for chromosome preparations using standard conditions of a 20 to 45 min Colcemid treatment (50 ng/ml: Gibco) followed by a 15 min incubation in 0.075 M KCl hypotonic solution at 37 C and multiple changes of Carnoy’s fixative (3:1 methanol:acetic acid). Fixed cells were dropped onto glass slides to generate metaphase spreads and slides were baked overnight at 60 C before Giemsa banding. For Giemsa banding, slides were dipped in water, treated with trypsin solution (0.0005% trypsin and 0.02% Tyrode’s diluted in HBSS) for 50 s, followed by two rinses with 0.9% NaCl, stained in Giemsa staining solution (5% Giemsa in Gurr Buffer, pH 6.8) for 5 min, followed by two sequential rinses in water. Metaphase chromosomes were visualized using Zeiss Axiphot microscope and chromosome breaks and gaps were analyzed in 25-50 metaphases from each experimental sample.

Cell synchronization and flow sorting

Cells were treated with low-dose APH for an initial 6 h and then arrested at the G2/M boundary by addition of 9 µM (HCT116) or 10 µM (GM12878) RO3306 (ApexBio) with continued APH for an additional 18 h. Parallel cultures were then either harvested for flow sorting of S and G2 fractions or washed three times with PBS and released into warm media containing 75 ng/ml colchicine for 3 h for flow sorting of G2 and M fractions and also with 10 µM EdU for cells used to visualize MiDAS foci. For cells treated with novobiocin (NVB, 150 µM, Sigma), the drug was added together with low-dose APH and added back again after release from RO3306. ART558 (5 µM or 10 µM, MedChemExpress) or Nu7741 (2 µM, Fisher Scientific) were added 2 h prior to harvest for S and G2 fractions or RO3306 release and again added back to the media after release. High-dose (2 µM) APH was added for 3 h after cells were released from RO3306. When performing cell cycle analysis without RO3306, asynchronous HCT116 cells were treated for 24 h with low-dose aphidicolin. Three hours prior to harvest 100 ng/ml Colcemid was added to the media to enrich the M-phase population.

For flow cytometry, cells were harvested with trypsinization, collected in cold media, spun down (5 min, 500xg, 4 C), and fixed in 70% ethanol overnight at −20C at a concentration of 1×106 to 2×106 cells/ml. Cells were then washed with PBS, permeabilized with 0.25% Triton X-100 in PBS on ice for 15 min, spun down, and stained with phospho-histone H3 (pH3) antibody (Cell Signaling Technology) conjugated to Alexa fluor 488 (Invitrogen) at a 1:50 dilution in antibody staining buffer (0.5% bovine serum albumin (BSA) in PBS) for 1 h. Cells were washed twice with antibody staining buffer and stained with a solution of 100 µg/ml propidium iodide and 100 µg/ml RNAse. Samples were then submitted to the University of Michigan Flow Cytometry Core for collecting cell cycle fractions using a FACS Aria III (BD Bioscience) or Bigfoot Cell Sorter (ThermoFisher). Gating established that S-phase fractions had a DNA content between 2 N and 4 N and were pH3 negative, G2-phase fractions had 4 N DNA content and were pH3 negative, and M-phase fractions had 4 N DNA content and were pH3 positive. Flow sorting was continued until at least 200,000 cells had been collected from all target cell cycle fractions in a sample.

MiDAS assessment

After initial cell harvesting, cells treated with EdU after release from RO3306 were prepared for metaphase analysis as described above for chromosome spreads. MiDAS activity was then assessed using Click-iT reaction and Alexa Fluor 488 azide (Invitrogen). Slides were first treated with 4% formaldehyde in PBS for 4 min, washed three times with PBS, and blocked with 3% BSA in PBS for 30 min. Permeabilization and the Click-iT reaction were performed according to the manufacturer’s instructions. Slides were then washed with 3% BSA/0.5% Triton X-100 in PBS three times for 10 min per wash, rinsed with water and mounted with Prolong Gold DAPI Antifade mounting media (Sigma). Metaphase chromosomes were visualized using Zeiss Axiphot fluorescence microscope. Images were acquired using CellSens software. EdU quantification was done manually at the microscope, counting for every cell the number of EdU foci stratified by whether just one (singlet) or both (doublet) chromatids were labeled.

TMEJ and NHEJ assays

To monitor the efficacy of chemical and genetic interventions intended to inhibit POLQ/TMEJ or NHEJ, we used an assay based on transfected oligonucleotides substrates subjected to intracellular end joining48. The substrates were prepared by the Dale Ramsden laboratory by annealing substrates in a buffer containing 10 mM Tris-HCL, pH 7.5, 100 mM NaCl, and 0.1 mM EDTA. 5 ng of NHEJ substrate or 500 ng of TMEJ substrate were then electroporated separately in a solution containing 500 ng pUC19 plasmid, 0.16 µl 2X PBS, 0.84 µl EB buffer and buffer R in a total of 10 µl into 200,000 cells using the Neon system with a single pulse of 1350 V (GM12878) or 1530 V (HCT116) for 20 ms. Prior to electroporation, cells were pretreated for 2 h with 10 µM ART558, 150 µM NVB, or 2uM Nu7441. After electroporation, cells were incubated in antibiotic-free media supplemented with the appropriate drug for another 30 min at 37 C, followed by a wash with PBS, and incubated at 37 C for 15 min in 40 µl HBSS containing 125U Benzonase and 5 mM magnesium chloride. DNA was extracted using the QIAamp DNA mini kit (Qiagen) following the manufacturer’s protocol with the addition of 1 mM EDTA added to buffer ATL. Samples were then analyzed using qPCR with TaqMan Fast Advanced Master Mix primers and probes using 7500 Real-Time PCR System (Applied Biosystems). The cycling conditions were 50 C for 2 min, then 95 C for 2 min, followed by 40 cycles of 15 s at 95 C and 1 min at 60 C. For POLQ CRISPR KO, samples were normalized to the NHEJ substrate whereas LIG4 CRISPR KO samples were normalized to the TMEJ substrate. Equal amounts of DNA were used for samples treated with inhibitors. The wild-type or untreated samples were used as a reference to calculate ΔΔCt values.

Apoptosis via cleaved caspase

To assess cell death that might result from the cell treatments above, we monitored cleaved caspase-3, a marker for apoptosis. Cells were treated and prepared as described above for synchronization and flow cytometry with the addition of cleaved caspase-3 antibody (Cell Signaling Technology) conjugated to Alexa Fluor 647 and added together with pH3 antibody. As a positive control for apoptosis, cells were separately treated with 10 µM etoposide (Sigma) for 72 hrs. Data were assessed for the percentage of cells positive for cleaved caspase-3.

svCapture library preparation and sequencing

At least 200,000 cells were collected and centrifuged for 10 min at 1000 rpm from either bulk asynchronous cultures or flow-sorted cell-cycle phases. Supernatant was removed until 100 µl remained and 200 µl DNA/RNA shield (Zymo Research) was added with 15 µl 20 mg/ml proteinase K and incubated for 20 min at room temperature. Genomic DNA was purified using Quick-DNA microprep plus kit according to the manufacturer’s instructions (Qiagen). Further processing steps through high-throughput sequencing were performed at the University of Michigan Advanced Genomics Core. Bead-based tagmentation libraries were prepared with the Illumina DNA Prep with enrichment kit, using 300 ng of genomic DNA, IDT for Illumina unique dual barcodes, and library PCR amplification of nine cycles. Libraries were quantified using Qubit and quality checked using an Agilent TapeStation to ensure that at least 350 ng and preferably 500 ng of prepared library was available to support robust target capture.

Hybridization capture probes were targeted to the central 250 kb or 400 kb of large CFS genes, the region of peak accumulation of SV breakpoints17. Final probes were designed to be target-specific and synthesized by Twist Biosciences using their proprietary algorithms and used as provided by the vendor. Capture was performed by pooling 500 ng of each library and hybridizing with 4 μl Twist Biosciences probes and 6 μl PCR grade water. Target enrichment on magnetic beads was performed according to manufacturer instructions. Retained library fragments were amplified with 12 cycles of PCR for sequencing.

Sequencing reads were obtained in the 2 × 150 format using Illumina NovaSeq 6000 or Illumina NovaSeq X Plus. Barcoded samples were pooled and subjected to a sequencing depth calculated to yield a projected coverage of ∼2,000-fold in the capture target regions based on experience36, typically 2.5% of a NovaSeq S4 flow cell per sample. Insert size and target region coverage were maintained over narrow ranges over all analyzed samples (Supplementary Fig. 1B).

svCapture data analysis

svCapture pipeline execution

We previously reported the svCapture data analysis pipeline and Shiny app36 constructed in the svx-mdi-tools suite of the Michigan Data Interface (MDI). Version 2.0.0 of the tool suite or higher was used for most data analysis, contemporary with version 3.0.0 of the genomex-mdi-tools suite dependency and versions 1.3.2 and 1.8.2 of the mdi-pipeline-frameworks and mdi-apps-framework, respectively. Additional program dependency versions are set by the conda environment definitions tied to tool suite versions.

The following steps match previous pipeline descriptions36: (i) read trimming, merging, and quality filtering using fastp98, (ii) read alignment to the genome using bwa mem99, (iii) aggregation of reads into read groups representing unique source DNA molecules, and (iv) SV junction detection using discordant alignment of paired reads and split reads. We applied the ‘align‘, ‘collate‘, and ‘extract‘ pipeline actions to individual samples to discover potential discordant read alignments. The ‘find‘ action was then applied at once to all samples sequenced together in an experimental batch to find candidate SV junctions unique to one sample. The GRCh38/hg38 genome was used for all analyses and capture target regions were padded on each side by 800 kb to set the allowable SV breakpoint regions.

Additions to the svCapture pipeline for this work relate to integrating results across multiple experimental batches, delivered by the ‘assemble‘ pipeline action and svCaptureAssembly Shiny app initialized in svx-mdi-tools v1.8.0. These tools apply standardized SV filtering, coverage assessments, and junction analysis, and assemble SV, target, and sample-level metadata into tables. We created individual assemblies for each cell line and another with all cell lines together. Most figures were generated using the svCaptureAssembly app working from those assemblies.

SV filtering

Filtering when counting SVs is essential to ensure that true, on-target SVs are counted in preference to read artifacts. Our goal throughout was to count only de novo SVs that arose during the experimental window prior to their expansion by replication. Accordingly, we only counted SV junctions found in a single source DNA molecule in a single sample as defined by the molecule's outer endpoints. Those source DNA molecules needed to be sequenced by at least three redundant read pairs to support their validity, since chimeric PCR artifacts arising late in PCR typically have only one matching read pair36. Additional filters included (i) a required mapping quality of 30 or higher in one flanking alignment and 20 or higher in both, (ii) a requirement that at least one SV breakpoint fell in a capture target, with the other falling in a padded target region as defined above, and (iii) exclusion of deletion and duplication SVs less than 10 kb or greater than 1.2 Mb to match the established properties of SVs arising at CFSs17. Inversions were filtered to exclude events less than 50 kb due to a known artifact class in transposase libraries of false small inversions with large microhomology tracts resulting from intramolecular hybridization and synthesis during end-filling of Tn5-cleaved DNA ends (Supplementary Fig. 1C)36,100.

Target coverage assessment

Calculating effective target region coverage is essential for comparing samples. Unlike single-nucleotide variants (SNVs), not all bases in reads are equally able to report on the existence of a true SV. SV junctions cannot be detected near the ends of reads because a minimal span of approximately 20 bases must be properly aligned to the genome on each side of the junction. Accordingly, we adjusted non-SV source DNA molecules to ensure that only bases that could have reported an SV junction were included in coverage calculations. Source molecules with fewer than three read pairs were excluded, and the length of the remaining molecules was adjusted by subtracting 2 x 20 bp = 40 bp from the actual length to disregard terminal bases where SV junctions could not be called. Target region coverage was calculated as the sum of all adjusted, on-target source molecule lengths divided by the summed length of all unpadded target regions. For visualization, base-level coverage was averaged over 100 bp bins. Coverage is not uniform throughout target regions due to variable capture probe efficiency, read mappability, and underlying clonal SVs in cell lines. However, these systematic variations applied similarly to all samples of a cell line processed with the same capture probes.

SV junction analysis

SV junction types and local structures

The svCapture pipeline reports without preference each of the four canonical types of SV junctions – deletions, duplications, inversions. and translocations1 – where, as applied here, the first three types arise in a single capture target while translocations join two different targets. Importantly, short reads can only detect junction sequences consistent with end joining; junctions arising through long blocks of homology are invisible to svCapture1,36. To characterize junctions, the last aligned bases nearest the junction on either side were defined as the two breakpoint positions in two coordinate systems: the reference genome and the SV-containing source DNA molecule. The distance between the junction positions in the reference genome defined the SV size. The junction was labeled as a microhomology event if the two junction positions overlapped in the source molecule such that the same read bases aligned to both reference breakpoints. The breakpoint positions of blunt joints abutted in the source molecule, whereas de novo insertions were evident as read bases between the breakpoint positions that did not align to either reference breakpoint. We plot insertions as positive offsets of breakpoint positions in the SV molecule and microhomologies as negative offsets, i.e., overlapping alignments.

To compare junction profiles between experimental groups, we first calculated the average microhomology length of all junctions that did not have a de novo insertion, e.g., R expression ‘mean(microhomologyLength[microhomologyLength >= 0])‘, to reveal the strand annealing preferences of the underlying mechanism(s). We further calculated the fraction of all junctions that had a de novo insertion between 2 and 15 bp, inclusive, to reveal the extent to which those mechanisms supported potentially templated insertions.

Insertion template discovery and characterization

Some novel bases inserted at junctions are known to be copied from template bases near the reference genome breakpoints46,47,66. Locating templates is complicated by the fact that inserted sequences are often too short to be uniquely identified. However, base constraints when comparing a junction to a candidate template extend beyond the inserted bases to include priming and resolving microhomologies used during repair46. To maximize sensitivity and specificity while searching for possible insertion templates, we adjusted the required number of flanking microhomology bases to ensure that a total of at least seven bases with one base of flanking homology on each side were used as a template query sequence. Thus, a single-base insertion was queried using three bases of initial microhomology on each side, a four-base insertion was queried using two bases of microhomology on each side, and insertions of five or more bases were queried with one base of microhomology on each side.

We searched for exact matches of the minimal search sequence for a given insertion junction in both strands of both reference genome breakpoints in a span 500 bp upstream and downstream of the junction position. Thus, we searched widely over the reference bases that were both retained and lost at the junction over a total of 2 breakpoints x 2 strands x 2 junction sides x 500 bp per side = 4 kb genomic sequence. When a template match was found, the flanking microhomologies were expanded from the initial query to the maximum possible extent, stopping just before the first mismatch between the junction and template sequences. If multiple possible matches were found, we preferred the template with the longest span, including the flanking microhomologies, or, if that did not differentiate them, the template closest to the junction.

The type of the final selected insertion template, if any, was defined by its strand and placement in the reference breakpoints. As shown in figures, relative to a deletion SV junction, foldback and palindrome insertion templates were found in retained genomic sequence on the bottom genome strand, cross-junction templates were found in retained genomic sequence on the top strand, strand-switching templates were found on the bottom strand at least partially in the lost sequence, and expansion templates were found on the top genome strand in a manner that crossed the breakpoint junction position from retained into lost sequence. Importantly, retained and lost segments are defined relative to breakpoint positions, as defined above, not to the position of the underlying DSB ends, which cannot be ascertained from junction sequences.

Quantification and statistical analysis

Number and source of experimental replicates

Throughout, figures are labeled to indicate the number of SVs or other relevant input counts that contributed to each plot. The number of samples contributing to each experimental group is evident from the plotted sample data points and enumerated in figure legends. Because of the time and expense involved in svCapture experiments, it was not possible to repeat all controls in all experimental batches. Accordingly, we plot all relevant data points that match an experimental group regardless of when they were acquired and indicate the shared experimental batches of different data points by their sample point color.

In general, our approach was to analyze sufficient replicates until the relationship between key experimental groups was established by statistical methods below. However, for completeness in reporting results, we sometimes show supplemental experiments performed in a single replicate with a given cell line. In such cases the data relationships match results from other cell lines and do not form the primary basis of data interpretation; single-replicate experiments should always be interpreted with caution and integrated in this way. Error bars on SV frequency plots represent the mean +/− 2 standard deviations or two or more data points.

Comparing intergroup SV Frequency

svCapture is a Poisson process in which an integer number of SVs of a particular class, most notably de novo deletions, are detected over an interval manifest as the library read depth, i.e., more SVs are detected the more deeply a sample is sequenced. Thus, the estimated Poisson rate for an svCapture sample is the number of detected SVs divided by the on-target coverage, noting that target regions were the same size in all samples from a given cell line. We label this Poisson rate SV Frequency to emphasize the intuitive sense that it approximates the fraction of aggregated target alleles in cells, i.e., target haploid equivalents, that carried a de novo SV. Consistent with prior microarrays results17 this fraction can exceed 50% of alleles under replication stress, reflecting the high mutational potential of the CFS hotspot genes we targeted.

Inter-sample differences in factors such as APH preparation potency, library insert sizes, fold target enrichment, and complexity are expected to influence SV count variance beyond random sampling, so we modeled svCapture data as an overdispersed Poisson using the negative binomial distribution (NBD). Moreover, svCapture results aggregate libraries prepared and sequenced over several years. Although data show a high degree of reproducibility, batch effects such as unidentified differences in conditions and kits over time might also contribute to inter-sample variance.

To appropriately compare SV Frequency between experimental groups, e.g., wild type vs. mutant, we performed pairwise intergroup comparisons using a generalized linear model based on the NBD in which the number of detected SVs, nSVs, varied as a function of the experimental group and batch as independent covariates. Target region coverage per sample was included as an offset parameter of slope 1 to effectively model SV Frequency. The relevant R language expression is ‘fit = MASS::glm.nb(nSvs ~ group + batch + offset(log(coverage))‘, where ‘group‘ and ‘batch‘ are categorical variables and the p-value of the intergroup comparison was obtained from the ‘group‘ variable as ‘coef(summary(fit))[2,4]‘. When glm.nb failed to return a result, the p-value was calculated using the glm function poisson model without overdispersion; such cases are denoted with a dashed significance line in plots. Throughout, we used p <= 0.01 as a significance threshold, with plot labels *, p <= 0.01; **, p <= 0.001; and ***, p <= 0.0001.

Assessing insertion template enrichment

Some insertion templates are expected to be found locally by chance at a Poisson rate of mu = 4 kb search space x 1 / 4nTemplateBases. The probability of finding at least one local match corresponds to R expression ‘trialSuccessProb = 1 - dpois(0, mu)‘. To determine the statistical significance of the actual number of templates found, we took each junction sequence at a given insertion size as an independent Bernoulli trial and estimated the p-value using R expression ‘1 - pbinom(nFound − 1, nSearched, trialSuccessProb)‘, where nSearched and nFound are the number of searched and found insertion templates, respectively, and the resulting p-value is the likelihood of finding nFound or more templates by random chance. A p-value < 0.05 was taken as significant evidence for the contribution of local templates to the appearance of insertion junctions.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

41467_2024_53917_MOESM2_ESM.pdf (90.6KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (111KB, xlsx)
Supplementary Data 2 (7.5MB, xlsx)
Supplementary Data 3 (108.7KB, xlsx)
Supplementary Data 4 (14.1KB, xlsx)
Reporting Summary (3.9MB, pdf)

Source data

Source Data (667.9KB, zip)

Acknowledgements

This work was funded by grants CA200731 and GM147026 from the National Institutes of Health to T.E.W. and T.W.G. We thank Dale Ramsden for reagents, advice in establishing TMEJ/NHEJ assays, and for valuable discussion. We thank Pamela Bennett-Baker for assistance with reagent preparation and method validation in early stages of the work and Charles Kazazian for assistance with cytogenetics. We thank the University of Michigan Advanced Genomics Core for skilled handling of svCapture library preparation and sequencing, and the University of Michigan Flow Cytometry Core for expert assistance with long flow sorting sessions. We thank Patrick O’Brien and Martin Arlt for critical reading of the manuscript and valuable input over many years.

Author contributions

T.E.W. designed the experiments, wrote the data analysis pipeline, analyzed data, and wrote the manuscript; S.A. designed and performed the experiments and analyzed data; A.W. assisted in the design and performance of experiments; T.W.G. designed the experiments, performed cytogenetic analysis, analyzed data, and wrote the manuscript.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Data availability

svCapture sequencing data have been deposited in two repositories with sample lists provided as Supplementary Data 1. Data from cell line UM-HF1 require human data access restrictions and were deposited into the Database of Genotypes and Phenotypes (dbGaP) and are available with the dbGaP Study Accession phs003121.v2.p1 [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003121.v2.p1]. Data from commercially available cell lines HCT116 and GM12878 were deposited into the NCBI Sequence Read Archive (SRA) with the BioProject ID PRJNA1085257. Flow cytometry data were deposited into Mendeley Data and are available at 10.17632/mz53d2486n.1101. The main processed data outputs of the svCapture pipeline are provided as Supplementary Data 2 and 3, included in the Zenodo code set linked to GitHub alongside the job scripts that generated them (see below), or in a separate Zenodo dataset carrying larger output files at 10.5281/zenodo.10916986102, including data packages and app bookmarks. Source data are provided with this paper.

Code availability

Code comprising the svCapture data analysis pipeline and app can be found at GitHub (https://github.com/wilsontelab/svx-mdi-tools/tree/v2.0.3)103. Data-specific job scripts used to execute the pipeline for samples in this manuscript and associated support files, including resource files, sample lists, and job logs, can be found at GitHub (https://github.com/wilsontelab/publications/tree/main/CFS-M_phase-PolQ-2023)104.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Thomas E. Wilson, Email: wilsonte@umich.edu

Thomas W. Glover, Email: glover@umich.edu

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-024-53917-8.

References

  • 1.Laufer, V. A., Glover, T. W. & Wilson, T. E. Applications of advanced technologies for detecting genomic structural variation. Mutat. Res. Rev. Mutat. Res. 792, 108475 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature464, 704–712 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature470, 59–65 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Conover, H. N. & Argueso, J. L. Contrasting mechanisms of de novo copy number mutagenesis suggest the existence of different classes of environmental copy number mutagens. Environ. Mol. Mutagen57, 3–9 (2016). [DOI] [PubMed] [Google Scholar]
  • 5.Povirk, L. F. Biochemical mechanisms of chromosomal translocations resulting from DNA double-strand breaks. DNA Repair (Amst.)5, 1199–1212 (2006). [DOI] [PubMed] [Google Scholar]
  • 6.Ghezraoui, H. et al. Chromosomal translocations in human cells are generated by canonical nonhomologous end-joining. Mol. Cell55, 829–842 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhao, B., Rothenberg, E., Ramsden, D. A. & Lieber, M. R. The molecular basis and disease relevance of non-homologous DNA end joining. Nat. Rev. Mol. Cell Biol.21, 765–781 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Harel, T. & Lupski, J. R. Genomic disorders 20 years on-mechanisms for clinical manifestations. Clin. Genet. 93, 439–449 (2018). [DOI] [PubMed] [Google Scholar]
  • 9.Lee, J. A., Carvalho, C. M. & Lupski, J. R. A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell131, 1235–1247 (2007). [DOI] [PubMed] [Google Scholar]
  • 10.Hastings, P. J., Ira, G. & Lupski, J. R. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 5, e1000327 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kakarougkas, A. & Jeggo, P. A. DNA DSB repair pathway choice: an orchestrated handover mechanism. Br. J. Radio.87, 20130685 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Durkin, S. G. et al. Replication stress induces tumor-like microdeletions in FHIT/FRA3B. Proc. Natl Acad. Sci. USA105, 246–251 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Arlt, M. F. et al. Replication stress induces genome-wide copy number changes in human cells that resemble polymorphic and pathogenic variants. Am. J. Hum. Genet. 84, 339–350 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Durkin, S. G. & Glover, T. W. Chromosome fragile sites. Annu Rev. Genet. 41, 169–192 (2007). [DOI] [PubMed] [Google Scholar]
  • 15.Arlt, M. F., Ozdemir, A. C., Birkeland, S. R., Wilson, T. E. & Glover, T. W. Hydroxyurea induces de novo copy number variants in human cells. Proc. Natl Acad. Sci. USA108, 17360–17365 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Arlt, M. F., Rajendran, S., Birkeland, S. R., Wilson, T. E. & Glover, T. W. Copy number variants are produced in response to low-dose ionizing radiation in cultured cells. Environ. Mol. Mutagen55, 103–113 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wilson, T. E. et al. Large transcription units unify copy number variants and common fragile sites arising under replication stress. Genome Res. 25, 189–200 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Glover, T. W., Wilson, T. E. & Arlt, M. F. Fragile sites in cancer: more than meets the eye. Nat. Rev. Cancer17, 489–501 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Glover, T. W., Berger, C., Coyle, J. & Echo, B. D. N. A. polymerase alpha inhibition by aphidicolin induces gaps and breaks at common fragile sites in human chromosomes. Hum. Genet. 67, 136–142 (1984). [DOI] [PubMed] [Google Scholar]
  • 20.Le Beau, M. M. et al. Replication of a common fragile site, FRA3B, occurs late in S phase and is delayed further upon induction: implications for the mechanism of fragile site induction. Hum. Mol. Genet. 7, 755–761 (1998). [DOI] [PubMed] [Google Scholar]
  • 21.Darling, WangL., Zhang, J., Huang, J. S., Liu, H. & Smith, W. DI. Allele-specific late replication and fragility of the most active common fragile site, FRA3B. Hum. Mol. Genet. 8, 431–437 (1999). [DOI] [PubMed] [Google Scholar]
  • 22.Park, S. H. et al. Locus-specific transcription silencing at the FHIT gene suppresses replication stress-induced copy number variant formation and associated replication delay. Nucleic Acids Res49, 7507–7524 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Minocherhomji, S. et al. Replication stress activates DNA repair synthesis in mitosis. Nature528, 286–290 (2015). [DOI] [PubMed] [Google Scholar]
  • 24.Macheret, M. et al. High-resolution mapping of mitotic DNA synthesis regions and common fragile sites in the human genome through direct sequencing. Cell Res. 30, 997–1008 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ji, F. et al. Genome-wide high-resolution mapping of mitotic DNA synthesis sites and common fragile sites by direct sequencing. Cell Res. 30, 1009–1023 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bhowmick, R., Minocherhomji, S. & Hickson, I. D. RAD52 facilitates mitotic DNA synthesis following replication stress. Mol. Cell64, 1117–1126 (2016). [DOI] [PubMed] [Google Scholar]
  • 27.Saini, N. et al. Migrating bubble during break-induced replication drives conservative DNA synthesis. Nature502, 389–392 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wu, X. & Malkova, A. Break-induced replication mechanisms in yeast and mammals. Curr. Opin. Genet. Dev.71, 163–170 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Smith, C. E., Llorente, B. & Symington, L. S. Template switching during break-induced replication. Nature447, 102–105 (2007). [DOI] [PubMed] [Google Scholar]
  • 30.Pardo, B. & Aguilera, A. Complex chromosomal rearrangements mediated by break-induced replication involve structure-selective endonucleases. PLoS Genet. 8, e1002979 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ramsden, D. A., Carvajal-Garcia, J. & Gupta, G. P. Mechanism, cellular functions and cancer roles of polymerase-theta-mediated DNA end joining. Nat. Rev. Mol. Cell Biol.23, 125–140 (2022). [DOI] [PubMed] [Google Scholar]
  • 32.van Vugt, M. & Tijsterman, M. POLQ to the rescue for double-strand break repair during mitosis. Nat. Struct. Mol. Biol.30, 1828–1830 (2023). [DOI] [PubMed] [Google Scholar]
  • 33.Brambati, A. et al. RHINO directs MMEJ to repair DNA breaks in mitosis. Science381, 653–660 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gelot, C. et al. Polθ is phosphorylated by PLK1 to repair double-strand breaks in mitosis. Nature621, 415–422 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Deng, L. et al. Mitotic CDK promotes replisome disassembly, fork breakage, and complex dna rearrangements. Mol. Cell73, 915–929 e916 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wilson, T. E., Ahmed, S., Higgins, J., Salk, J. J. & Glover, T. W. svCapture: efficient and specific detection of very low frequency structural variant junctions by error-minimized capture sequencing. NAR Genom. Bioinform5, lqad042 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Le Tallec, B. et al. Common fragile site profiling in epithelial and erythroid cells reveals that most recurrent cancer deletions lie in fragile sites hosting large genes. Cell Rep.4, 420–428 (2013). [DOI] [PubMed] [Google Scholar]
  • 38.Zlotorynski, E. et al. Molecular basis for expression of common and rare fragile sites. Mol. Cell Biol.23, 7143–7151 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Irony-Tur Sinai, M. & Kerem, B. Insights into common fragile site instability: DNA replication challenges at DNA repeat sequences. Emerg. Top. Life Sci.7, 277–287 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Brison, O. et al. Transcription-mediated organization of the replication initiation program across large genes sets common fragile sites genome-wide. Nat. Commun.10, 5693 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Blin, M. et al. Transcription-dependent regulation of replication dynamics modulates genome stability. Nat. Struct. Mol. Biol.26, 58–66 (2019). [DOI] [PubMed] [Google Scholar]
  • 42.Harrigan, J. A. et al. Replication stress induces 53BP1-containing OPT domains in G1 cells. J. Cell Biol.193, 97–108 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mocanu, C. et al. DNA replication is highly resilient and persistent under the challenge of mild replication stress. Cell Rep.39, 110701 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Brison, O. et al. Mistimed origin licensing and activation stabilize common fragile sites under tight DNA-replication checkpoint activation. Nat. Struct. Mol. Biol.30, 539–550 (2023). [DOI] [PubMed] [Google Scholar]
  • 45.Hwang, T. et al. Defining the mutation signatures of DNA polymerase θ in cancer genomes. NAR Cancer2, zcaa017 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Stroik S., Luthman A. J. & Ramsden D. A. Templated insertions-DNA repair gets acrobatic. Environ. Mol. Mutagen.65, 82–89 (2024). [DOI] [PMC free article] [PubMed]
  • 47.Schimmel, J., van Schendel, R., den Dunnen, J. T. & Tijsterman, M. Templated insertions: a smoking gun for polymerase theta-mediated end joining. Trends Genet. 35, 632–644 (2019). [DOI] [PubMed] [Google Scholar]
  • 48.Luedeman, M. E. et al. Poly(ADP) ribose polymerase promotes DNA polymerase theta-mediated end joining by activation of end resection. Nat. Commun.13, 4547 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Zatreanu, D. et al. Polθ inhibitors elicit BRCA-gene synthetic lethality and target PARP inhibitor resistance. Nat. Commun.12, 3636 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Syed, A. et al. Novobiocin blocks nucleic acid binding to Polθ and inhibits stimulation of its ATPase activity. Nucleic Acids Res. 51, 9920–9937 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zhou, J. et al. A first-in-class Polymerase Theta Inhibitor selectively targets Homologous-Recombination-Deficient Tumors. Nat. Cancer2, 598–610 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Leahy, J. J. et al. Identification of a highly potent and selective DNA-dependent protein kinase (DNA-PK) inhibitor (NU7441) by screening of chromenone libraries. Bioorg. Med Chem. Lett.14, 6083–6087 (2004). [DOI] [PubMed] [Google Scholar]
  • 53.Grawunder, U. et al. Activity of DNA ligase IV stimulated by complex formation with XRCC4 protein in mammalian cells. Nature388, 492–495 (1997). [DOI] [PubMed] [Google Scholar]
  • 54.Frank, K. M. et al. Late embryonic lethality and impaired V(D)J recombination in mice lacking DNA ligase IV. Nature396, 173–177 (1998). [DOI] [PubMed] [Google Scholar]
  • 55.Patterson-Fortin, J. et al. Targeting DNA repair with combined inhibition of NHEJ and MMEJ induces synthetic lethality in TP53-mutant cancers. Cancer Res. 82, 3815–3829 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Macheret, M. & Halazonetis, T. D. Intragenic origins due to short G1 phases underlie oncogene-induced DNA replication stress. Nature555, 112–116 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Groelly, F. J. et al. Mitotic DNA synthesis is caused by transcription-replication conflicts in BRCA2-deficient cells. Mol. Cell82, 3382–3397.e3387 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Thongthip, S., Carlson, A., Crossley, M. P. & Schwer, B. Relationships between genome-wide R-loop distribution and classes of recurrent DNA breaks in neural stem/progenitor cells. Sci. Rep.12, 13373 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Letessier, A. et al. Cell-type-specific replication initiation programs set fragility of the FRA3B fragile site. Nature470, 120–123 (2011). [DOI] [PubMed] [Google Scholar]
  • 60.Rogers, C. B. et al. Fanconi anemia-associated chromosomal radial formation is dependent on POLtheta-mediated alternative end joining. Cell Rep.42, 112428 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Moreno, A. et al. Unreplicated DNA remaining from unperturbed S phases passes through mitosis for resolution in daughter cells. Proc. Natl Acad. Sci. USA113, E5757–E5764 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Walsh, E., Wang, X., Lee, M. Y. & Eckert, K. A. Mechanism of replicative DNA polymerase delta pausing and a potential role for DNA polymerase kappa in common fragile site replication. J. Mol. Biol.425, 232–243 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kaushal, S. et al. Sequence and nuclease requirements for breakage and healing of a structure-forming (at)n sequence within fragile site FRA16D. Cell Rep.27, 1151–1164.e1155 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Maccaroni K., Balzano E., Mirimao F., Giunta S. & Pelliccia F. Impaired replication timing promotes tissue-specific expression of common fragile sites. Genes (Basel)11, 326 (2020). [DOI] [PMC free article] [PubMed]
  • 65.Deem, A. et al. Break-induced replication is highly inaccurate. PLoS Biol.9, e1000594 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Carvajal-Garcia, J. et al. Mechanistic basis for microhomology identification and genome scarring by polymerase theta. Proc. Natl Acad. Sci. USA117, 8476–8485 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Kent T., Mateos-Gomez P. A., Sfeir A. & Pomerantz R. T. Polymerase θ is a robust terminal transferase that oscillates between three different mechanisms during end-joining. Elife5, e13740 (2016). [DOI] [PMC free article] [PubMed]
  • 68.Belan, O. et al. POLQ seals post-replicative ssDNA gaps to maintain genome stability in BRCA-deficient cancer cells. Mol. Cell82, 4664–4680 e4669 (2022). [DOI] [PubMed] [Google Scholar]
  • 69.Llorens-Agost, M. et al. POLθ-mediated end joining is restricted by RAD52 and BRCA2 until the onset of mitosis. Nat. Cell Biol.23, 1095–1104 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Roerink, S. F., van Schendel, R. & Tijsterman, M. Polymerase theta-mediated end joining of replication-associated DNA breaks in C. elegans. Genome. Res. 24, 954–962 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Wyatt, D. W. et al. Essential roles for polymerase θ-mediated end joining in the repair of chromosome breaks. Mol. Cell63, 662–673 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Schimmel, J. et al. Modulating mutational outcomes and improving precise gene editing at CRISPR-Cas9-induced breaks by chemical inhibition of end-joining pathways. Cell Rep.42, 112019 (2023). [DOI] [PubMed] [Google Scholar]
  • 73.Ronson, G. E. et al. Mechanisms of synthetic lethality between BRCA1/2 and 53BP1 deficiencies and DNA polymerase theta targeting. Nat. Commun.14, 7834 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Graber-Feesl, C. L., Pederson, K. D., Aney, K. J. & Shima, N. Mitotic DNA synthesis is differentially regulated between cancer and noncancerous cells. Mol. Cancer Res. 17, 1687–1698 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Arlt, M. F., Rajendran, S., Birkeland, S. R., Wilson, T. E. & Glover, T. W. De novo CNV formation in mouse embryonic stem cells occurs in the absence of Xrcc4-dependent nonhomologous end joining. PLoS Genet. 8, e1002981 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Terasawa, M., Shinohara, A. & Shinohara, M. Canonical non-homologous end joining in mitosis induces genome instability and is suppressed by M-phase-specific phosphorylation of XRCC4. PLoS Genet. 10, e1004563 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Benada, J., Burdová, K., Lidak, T., von Morgen, P. & Macurek, L. Polo-like kinase 1 inhibits DNA damage response during mitosis. Cell Cycle14, 219–231 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Chandramouly, G. et al. Polλ promotes microhomology-mediated end-joining. Nat. Struct. Mol. Biol.30, 107–114 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Corazzi, L. et al. Linear interaction between replication and transcription shapes DNA break dynamics at recurrent DNA break Clusters. Nat. Commun.15, 3594 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Ying, S. et al. MUS81 promotes common fragile site expression. Nat. Cell Biol.15, 1001–1007 (2013). [DOI] [PubMed] [Google Scholar]
  • 81.Benitez, A. et al. GEN1 promotes common fragile site expression. Cell Rep.42, 112062 (2023). [DOI] [PubMed] [Google Scholar]
  • 82.Nickoloff, J. A., Sharma, N., Taylor, L., Allen, S. J. & Hromas, R. Nucleases and Co-Factors in DNA Replication Stress Responses. DNA (Basel)2, 68–85 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Heijink, A. M. et al. Sister chromatid exchanges induced by perturbed replication can form independently of BRCA1, BRCA2 and RAD51. Nat. Commun.13, 6722 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Bhowmick, R. et al. RAD51 protects human cells from transcription-replication conflicts. Mol. Cell82, 3366–3381.e3369 (2022). [DOI] [PubMed] [Google Scholar]
  • 85.Audrey, A. et al. RAD52-dependent mitotic DNA synthesis is required for genome stability in Cyclin E1-overexpressing cells. Cell Rep.43, 114116 (2024). [DOI] [PubMed] [Google Scholar]
  • 86.Kidd, J. M. et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell143, 837–847 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Conrad, D. F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet. 42, 385–391 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature578, 112–121 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Fujimoto, A. et al. Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer. Genome Med. 13, 65 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell153, 919–929 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Ceccaldi, R. et al. Homologous-recombination-deficient tumours are dependent on Poltheta-mediated repair. Nature518, 258–262 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Schrempf, A., Slyskova, J. & Loizou, J. I. Targeting the DNA Repair Enzyme Polymerase theta in Cancer Therapy. Trends Cancer7, 98–111 (2021). [DOI] [PubMed] [Google Scholar]
  • 93.Ramkumar, K. et al. Mechanistic evaluation and transcriptional signature of a glutathione S-transferase omega 1 inhibitor. Nat. Commun.7, 13084 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res52, D1210–d1217 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Labun, K. et al. CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Res47, W171–w174 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc.8, 2281–2308 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Conant, D. et al. Inference of CRISPR Edits from Sanger Trace Data. Crispr j.5, 123–130 (2022). [DOI] [PubMed] [Google Scholar]
  • 98.Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (2013).
  • 100.Gregory, T. et al. Characterization and mitigation of fragmentation enzyme-induced dual stranded artifacts. NAR Genom. Bioinform2, lqaa070 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Wilson T. E., Ahmed S., Glover T. W. Flow cytometry data. Mendeley Data, 10.17632/mz53d2486n.1 (2024).
  • 102.Wilson T. E., Ahmed S., Glover T. W. Pipeline output files. Zenodo, 10.5281/zenodo.10916986 (2024).
  • 103.Wilson T. E. svx-mdi-tools code repository. Zenodo, 10.5281/zenodo.7871676 (2024).
  • 104.Wilson T. E. Data-specific job scripts and resource files. Zenodo, 10.5281/zenodo.10728194 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

41467_2024_53917_MOESM2_ESM.pdf (90.6KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (111KB, xlsx)
Supplementary Data 2 (7.5MB, xlsx)
Supplementary Data 3 (108.7KB, xlsx)
Supplementary Data 4 (14.1KB, xlsx)
Reporting Summary (3.9MB, pdf)
Source Data (667.9KB, zip)

Data Availability Statement

svCapture sequencing data have been deposited in two repositories with sample lists provided as Supplementary Data 1. Data from cell line UM-HF1 require human data access restrictions and were deposited into the Database of Genotypes and Phenotypes (dbGaP) and are available with the dbGaP Study Accession phs003121.v2.p1 [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003121.v2.p1]. Data from commercially available cell lines HCT116 and GM12878 were deposited into the NCBI Sequence Read Archive (SRA) with the BioProject ID PRJNA1085257. Flow cytometry data were deposited into Mendeley Data and are available at 10.17632/mz53d2486n.1101. The main processed data outputs of the svCapture pipeline are provided as Supplementary Data 2 and 3, included in the Zenodo code set linked to GitHub alongside the job scripts that generated them (see below), or in a separate Zenodo dataset carrying larger output files at 10.5281/zenodo.10916986102, including data packages and app bookmarks. Source data are provided with this paper.

Code comprising the svCapture data analysis pipeline and app can be found at GitHub (https://github.com/wilsontelab/svx-mdi-tools/tree/v2.0.3)103. Data-specific job scripts used to execute the pipeline for samples in this manuscript and associated support files, including resource files, sample lists, and job logs, can be found at GitHub (https://github.com/wilsontelab/publications/tree/main/CFS-M_phase-PolQ-2023)104.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES