Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2021 Mar 24;19(3):e3000886. doi: 10.1371/journal.pbio.3000886

Genome-wide analysis of DNA replication and DNA double-strand breaks using TrAEL-seq

Neesha Kara 1, Felix Krueger 2, Peter Rugg-Gunn 1, Jonathan Houseley 1,*
Editor: Tanya Paull3
PMCID: PMC8021198  PMID: 33760805

Abstract

Faithful replication of the entire genome requires replication forks to copy large contiguous tracts of DNA, and sites of persistent replication fork stalling present a major threat to genome stability. Understanding the distribution of sites at which replication forks stall, and the ensuing fork processing events, requires genome-wide methods that profile replication fork position and the formation of recombinogenic DNA ends. Here, we describe Transferase-Activated End Ligation sequencing (TrAEL-seq), a method that captures single-stranded DNA 3′ ends genome-wide and with base pair resolution. TrAEL-seq labels both DNA breaks and replication forks, providing genome-wide maps of replication fork progression and fork stalling sites in yeast and mammalian cells. Replication maps are similar to those obtained by Okazaki fragment sequencing; however, TrAEL-seq is performed on asynchronous populations of wild-type cells without incorporation of labels, cell sorting, or biochemical purification of replication intermediates, rendering TrAEL-seq far simpler and more widely applicable than existing replication fork direction profiling methods. The specificity of TrAEL-seq for DNA 3′ ends also allows accurate detection of double-strand break sites after the initiation of DNA end resection, which we demonstrate by genome-wide mapping of meiotic double-strand break hotspots in a dmc1Δ mutant that is competent for end resection but not strand invasion. Overall, TrAEL-seq provides a flexible and robust methodology with high sensitivity and resolution for studying DNA replication and repair, which will be of significant use in determining mechanisms of genome instability.


TrAEL-seq provides genome-wide base pair resolution maps of exposed DNA 3’ ends; this reveals replication fork stalling and normal replication profiles in asynchronous, unlabelled wildtype cell populations, along with the sites of resected DNA breaks.

Introduction

DNA double-strand breaks (DSBs) can be caused by exogenous agents (e.g., ionising radiation), defective cellular processes (e.g., replication–transcription collisions or topoisomerase dysfunction), or intentionally by the cell (e.g., in meiosis or immunoglobulin recombination) [13]. We have a detailed understanding of DSB repair pathways based on decades of research [46] but much less understanding of which pathways are used in a given genomic context in response to particular types of damage.

Prior to the introduction of high-throughput sequencing methods, genome-wide studies of DSB formation and processing were largely restricted to meiotic recombination, where frequent DSBs at well-defined sites can be stabilised either before or after end resection and mapped on microarrays [79]. However, these microarray methods lacked the signal-to-noise ratio required for DSB detection in other situations, and so the development of the direct DSB sequencing method BLESS marked a step change in mapping technologies [10]. In BLESS, an adaptor is directly ligated to the DSB end to prime Illumina sequencing reads, allowing precise mapping and relative quantification of breaks. Modifications of BLESS have improved ligation efficiency (END-seq [11], DSB-capture [12]), quantitation (qDSB-seq [13], BLISS [14]), signal-to-noise and generality (BLISS [14], i-BLESS [15]), and variants have been developed for specific systems including meiosis (S1-seq [16]). These methods differ in detail but all involve blunting of the DNA end with nuclease activities that remove 3′ extended single-stranded DNA to form a double-stranded end for adaptor ligation. This can be a problem as end resection forms long tracts of 3′ extended single-stranded DNA each side of a DSB that are degraded by blunting, such that the sequencing adaptor is ligated to the chromosomal DNA many kilobases from the original break site if resection has occurred. Other strategies for DSB mapping include direct labelling of DNA ends with biotin or extracting protein-linked DNA on glass fibre, to allow fragment purification prior to ligation of sequencing adaptors (Break-seq, CC-seq) [1719]; however, like BLESS, these yield the locations of 5′ rather than 3′ ends. Therefore, if resection has occurred, the original location of DNA breaks as opposed to the end point of end resection cannot be mapped by any of these methods, which is problematic as DSB repair is often easiest to inhibit postresection (such as in classic rad51Δ or rad52Δ mutants in yeast).

Profiles yielded by DSB mapping methods can rarely be considered in isolation as replication has a dramatic influence on the distribution of DNA strand breaks in a cell [13,15]; replication defects can be a primary cause of DNA damage but replication also provides both opportunity and the requirement to repair existing lesions. Replication forks moving rapidly through chromosomes stall at protein obstacles, DNA damage, and through collisions with the transcription machinery [2022], and must be restarted by pathways that carry an increased risk of mutation [2023]. Understanding the distribution and causes of DNA damage across the genome therefore requires integration of DSB profiles with approaches to monitor DNA replication.

Many methods for mapping DNA replication have been developed, which can be broadly divided into those which measure copy number changes through S-phase and those which analyse replication forks or replication bubbles directly. Copy number analysis stratifies the genome based on replication timing and defines early and late-firing origins [2427]. This requires segregation of cell populations at different stages of replication or between replicating and non-replicating cells, either by cell cycle synchronisation or, more flexibly, by flow cytometry. Copy number methods are well refined, and the innate simplicity of this approach has even allowed application to single cells, revealing surprising uniformity in replication profiles across mammalian cells [28,29]. However, these methods do not have the resolution to detect individual origins in mammalian cells unless markedly different in timing, and a range of other more specialised approaches have been applied to study replication initiation [30,31], particularly by isolating short nascent DNA strands to identify individual origins or initiation zones [3234]. Methods have also been developed to detect replication fork directionality through isolation and sequencing of Okazaki fragments (OK-seq) [35,36]; as well as revealing origins, these methods identify regions that are uniformly replicated in the forward or reverse direction and termination zones in which replication direction will vary depending on the point at which forks converge in individual cells. Although powerful, methods for direct analysis of forks and origins are technically demanding since replication bubbles, short nascent strands and Okazaki fragments are rare species that need to be carefully separated from each other and from contaminating genomic DNA. As an alternative, PU-seq uses a relatively simple DNA library preparation to identify leading and lagging strands based on ribonucleotide incorporation but does require very specific DNA polymerase mutants with reduced ribonucleotide discrimination [37].

Direct ligation of a sequencing adaptor to the 3′ end of individual DNA strands would be a very attractive means of quantifying DNA damage irrespective of DNA resection, and direct labelling of DNA 3′ ends may reveal replication fork direction, particularly in mutants unable to ligate Okazaki fragments. Some methods aimed at mapping single-strand breaks and base changes theoretically have this capability [38,39], and very recently, the Ulrich lab described such a method, GLOE-seq, that is capable of replication profiling in DNA ligase-deficient yeast and human cells and also maps DSBs, although activity on resected substrates was not tested [40]. Here, we describe an alternative method, Transferase-Activated End Ligation sequencing (TrAEL-seq), which accurately maps DNA 3′ ends at DSBs that have undergone DNA resection. Remarkably, in addition to resected DSBs, we find that TrAEL-seq can profile DNA replication fork direction with excellent sensitivity even in wild-type yeast and mammalian cell populations without labelling or synchronisation.

Results

Implementation of TrAEL-seq

Various ligases can attach single-stranded DNA linkers to the 3′ end of single-stranded DNA, but efficiency is generally poor. An alternative method described by Miura and colleagues utilises terminal deoxynucleotidyl transferase (TdT) to add 1 to 4 adenosine nucleotides onto single-stranded DNA 3′ ends, forming a substrate for DNA adaptor ligation by RNA ligases [41,42] (Fig 1A steps i and ii). On a test substrate in vitro, TdT added 1 to 3 nucleotide A tails to >95% of single-stranded DNA molecules, which was ligated with approximately 10% efficiency to TrAEL-seq adaptor 1 using truncated T4 RNA ligase 2 KQ (Fig 1B).

Fig 1. TrAEL-seq accurately maps and quantifies 3′ ends of DNA.

Fig 1

(A) Schematic representation of the TrAEL-seq method. Agarose-embedded genomic DNA is used as a starting material, plugs are washed extensively to remove unligated TrAEL adaptor 1, and agarose is removed prior to Bst 2.0 polymerase step. The blunting and ligation of TrAEL-adaptor 2 is performed using a NEBNext Ultra II DNA kit, and TrAEL-adaptor 2 homodimers removed by washing streptavidin beads before USER enzyme treatment. The finished material is ready for PCR amplification using the NEBNext amplification system. Note that TrAEL-seq reads map antisense to the cleaved strand, reading the complementary sequence starting from the first nucleotide before the cleavage site. *—biotin moiety, U—deoxyuracil, N—any DNA base, rA—adenosine. (B) In vitro assay of adaptor ligation. An 18-nucleotide single-stranded DNA oligonucleotide was treated with or without TdT, then ligated to TrAEL adaptor 1 using T4 RNA ligase 2 truncated KQ. Products were separated on a 15% PAGE gel and visualised by SYBR Gold staining. (C) Scatter plot comparing read counts from yeast DNA digested with SfiI, PmeI, and NotI, along with the genome average, based on END-seq and TrAEL-seq. Note that the genome average signal encompasses all single-copy 13 bp regions that do not overlap with a site, while restriction enzyme quantitation represents reads mapping to 13 bp around the recognition site (SfiI site is 13 bp, NotI / PmeI sites were extended to 13 bp). (D) Precision mapping of SfiI cleavage sites by TrAEL-seq and END-seq. SfiI sites, which contain 5 degenerate bases were split into those that contain no A’s at the cleavage site (GGCCNNNB|BGGCC, 87 sites, upper panel) or A’s flanking the cleavage site (GGCCNNNA|AGGCC, 15 sites, lower panel), considering cleavage sites on forward and reverse strands separately. Mapped locations of 3′ ends were averaged across each category of site and expressed as a percentage of all 3′ ends mapped by each method to that category of site. (E) Comparison of meiotic DSB profiles from dmc1Δ cells performed by TrAEL-seq and sae2Δ cells by S1-seq (SRA accession: SRP261135) [45]. Both techniques should map Spo11 cleavage sites in the given mutants. Regions of 25 kb and 2.5 kb on chromosome III are shown for reads counted in 20 bp windows. The lowest panel shows 500 bp around the major peak for reads counted at single bp resolution. (F) Scatter plot of log-transformed normalised read counts at all 3,907 Spo11 cleavage hotspots annotated by Mohibullah and Keeney, comparing dmc1Δ TrAEL-seq with sae2Δ S1-seq data (right) [16,45,47] (SRA accession: SRP261135). Numerical data underlying this figure can be found in S1 Data, gel image in S1 Raw Images. DSB, double-strand break; TdT, terminal deoxynucleotidyl transferase; TrAEL-seq, Transferase-Activated End Ligation sequencing.

TrAEL-seq adaptor 1 is a hairpin that primes conversion of single-stranded ligation products to double-stranded DNA suitable for library construction, incorporates a biotin moiety flanked by deoxyuracil residues that allows selective purification and elution of ligation products, and includes an 8-nucleotide unique molecular identifier (UMI) for bioinformatic removal of PCR duplicates (Fig 1A). Once TrAEL-seq adaptor 1 is ligated, a thermophilic polymerase with strong strand displacement and reverse transcriptase activities extends the hairpin to form unnicked double-stranded DNA (Fig 1A, step iii), then the DNA is fragmented by sonication and adaptor-ligated material is purified on streptavidin magnetic beads (Fig 1A, steps iv and v). The DNA ends formed during fragmentation are polished and ligated to TrAEL adaptor 2 while still attached to the beads (Fig 1A, step vi), then the purified fragments flanked by TrAEL adaptors 1 and 2 are eluted by cleavage of the deoxyuracil residues prior to library amplification (Fig 1A, step vii). The resulting library is sequenced using a primer that anneals to TrAEL-seq adaptor 1, such that the TrAEL-seq read is the reverse complement of the original DNA 3′ end (Fig 1A, step viii).

Detection of 3′ extended DNA ends by TrAEL-seq

We tested TrAEL-seq on agarose-embedded yeast genomic DNA digested with restriction enzymes NotI, PmeI, and SfiI that yield 5′ extended, blunt, and 3′ extended ends, respectively, and generated a BLESS-type END-seq library from the same digested material for comparison (Fig 1C). The resulting TrAEL-seq library contained fragments of 200 to 2,000 bp as expected (S1A Fig), and sequencing data was processed through a custom bioinformatic pipeline to remove the A-tail, map the reads, and deduplicate by UMI (illustrated in S1B Fig). Comparing TrAEL-seq and END-seq data shows that both methods detect restriction enzyme cleavage sites: Efficiency is approximately equal on 3′ extended ends, END-seq is more efficient on 5′ extended ends, while TrAEL-seq unexpectedly performed better on the blunt PmeI ends (Fig 1C). Therefore, both methods efficiently detect DSBs even though the labelling strategies are very different.

The restriction enzyme SfiI has a degenerate recognition sequence (GGCCNNNN|NGGCC) that allows assessment of TrAEL-seq ligation efficiency on different 3′ end sequences, allowing us to ensure that there is no bias for DNA ends based on the 3′ or adjacent nucleotides (S1C Fig). Fine mapping of cleavages at the SfiI recognition site GGCCNNNN|NGGCC reveals differences between END-seq and TrAEL-seq: END-seq, in common with other BLESS-type methods, degrades the 3′ overhang and returns a consensus cleavage location 3′ of nucleotides 4 to 5 of the recognition site (Fig 1D). In contrast, TrAEL-seq can map the real cleavage site (3′ of nucleotide 8) and does so for >98% events, but only for SfiI sites lacking A nucleotides adjacent to the cleavage site (i.e., GGCCNNNB|BGGCC) (Fig 1D, top). This problem stems from the A-tails added by TdT, which cannot be distinguished from genome-encoded A’s. To reconcile this issue, we used a trimming algorithm that removes up to a maximum of 3 T’s from the start of the read. Since the average tail length is 2 to 4 nucleotides, this correctly maps the SfiI cleavage site to nucleotides 7 to 9 in >98% of reads, even when only the most challenging sites for mapping are considered (those with the structure GGCCNNNA|AGGCC) (Fig 1D, bottom). Importantly, this algorithm does not overtrim ends within genome-encoded A tracts such that the 10 SfiI sites with 2 or more 3′ A’s (GGCCNNAA|NGGCC) are mapped with the same accuracy (S1D Fig). We suggest that this overall mapping accuracy of >98% within ±1 nucleotide would be sufficient for almost all applications.

A major strength of TrAEL-seq should be the ability to map original sites of DSBs even after resection, a point in the homologous recombination process that is particularly amenable to stabilisation using mutations that prevent strand invasion. We chose meiosis as an in vivo model system to validate this as meiotic DSB patterns have been extremely well characterised. Meiotic DSBs formed by Spo11 are processed by Sae2 among other factors prior to resection, after which strand invasion into a homologous chromosome is mediated by Dmc1 [43,44]. Loss of Sae2 therefore stabilises DSBs prior to resection, whereas loss of Dmc1 stabilises DSBs after resection and before strand invasion. TrAEL-seq for the 3′ ends of resected DSBs in dmc1Δ cells 7 h after induction of meiosis revealed a DSB pattern very similar to that observed for unresected DSBs in an sae2Δ mutant mapped by S1-seq (a BLESS variant specific for meiotic recombination) (Fig 1E) [45]. TrAEL-seq technical replicates are highly reproducible across known hotspots of Spo11 cleavage (R = 0.99) (S1E Fig), and quantitation of these hotspots by TrAEL-seq correlates well to S1-seq in sae2Δ cells (R = 0.87) (Fig 1F, left) and Spo11 oligonucleotide sequencing (R = 0.85) (S1F Fig) [46,47]. Of the 3,907 known hotspots, TrAEL-seq detects 3,542 based on a threshold of 2 SDs above background, which lies between S1-seq (2,556), and Spo11 oligonucleotide sequencing (a much more labour-intensive method that forms the gold standard for meiotic DSB mapping, 3,784). TrAEL-seq sensitivity is broadly similar to CC-seq (a method specialised for protein-associated DNA ends [19]), which detects 3,223 sites by the same criteria. This shows that TrAEL-seq accurately maps and quantifies endogenous DSB sites even after end resection. Importantly, meiotic recombination is unusual in that mutants are known which completely stabilise DSBs, whereas stabilising breaks postresection is often more practical in other systems.

Overall, TrAEL-seq provides an effective method for detecting and quantifying DSBs genome-wide even after end resection.

High-resolution mapping of stalled replication forks by TrAEL-seq

Replication forks stall at various impediments during DNA replication and stalled forks may undergo reversal or cleavage as the cell attempts to restart replication (Fig 2A). The replication fork barrier (RFB) in the rDNA of budding yeast is a classic system for studies of replication fork stalling, and results from replication forks encountering the Fob1 protein bound to DNA [48]. Fob1 binds just downstream of the 35S ribosomal RNA (rRNA) gene and prevents the passage of replication forks moving against the direction of 35S transcription that would otherwise encounter the RNA polymerase I machinery head-on [49,50]. The RFB has been intensely studied as a model for stalled replication forks initiating recombination and genome rearrangement [51,52], and DSBs thought to stem from fork cleavage have been reported at the RFB based both on Southern blotting and qDSB-seq (a BLESS-type method for mapping double stranded DNA ends) [13,53,54].

Fig 2. Visualisation of replication fork stalling sites by TrAEL-seq.

Fig 2

(A) Potential processing pathways of a stalled replication fork. Lagging strand processing is likely to finish soon after stalling, and at least for the yeast RFB, it is known that the lagging strand RNA primer is removed [55]. The fork could then undergo fork reversal to yield a Holliday junction or be cleaved on the leading or lagging strand. Whereas cleavage is irreversible and requires a recombination event to restart the replication fork, reversed forks can revert to the normal replication fork structure by Holliday Junction migration (labelled HJ migration). The 3′ DNA ends predicted to be TrAEL-seq substrates are labelled with green dots. The RNA primer on the Okazaki fragment in the leftmost structure is shown in red. (B) Comparison of the yeast rDNA RFB signals in TrAEL-seq datasets compared to qDSB-seq (SRA accession: SRX5576747) [13] and GLOE-seq (SRA accessions: SRX6436839 and SRX6436840) [40]. Reads were quantified in 1 nucleotide steps and normalised to reads per million mapped. qDSB-seq data were obtained from S-phase synchronised cells, all other samples are from asynchronous log-phase cell populations growing in YPD media. Schematic diagram shows the positions of RFB elements previously mapped by 2D gel electrophoresis [49,50], and black triangles indicate previously mapped sites of DNA ends [53,55]. (C) rDNA TrAEL-seq reads in hESCs. Two biological replicates are shown, each an average of 2 technical replicates. Reads were summed in 100 bp sliding windows spaced every 10 bp. One rDNA repeat is shown, the RNA polymerase I-transcribed 45S RNA is shown as a grey line with mature rRNAs marked in green in the schematic diagram. Note that the 45S gene is shown as transcribed right to left to maintain consistency with the yeast data, such that the sequence is the reverse complement of the rDNA reference sequence U13369. The R repeats, which contain the RFBs, are marked in green, while the primary direction of replication is shown by a red arrow labelled as “Replication?” to take into account evidence that forks can move in both directions through the human rDNA. (D) Average TrAEL-seq profiles across centromeres +/− 1 kb for 3 biological replicates of wild-type cells (drawn in red, orange, and purple). Centromeres are categorised based on replication direction in the yeast genome assembly into those replicated forward (CEN3, CEN5, CEN13, CEN2), reverse (CEN11, CEN15, CEN10, CEN8, CEN12, CEN9), and those in termination zones that could be replicated in either direction (CEN14, CEN16, CEN1, CEN4, CEN7, CEN6), see S2C Fig for details. Read counts per million reads mapped were calculated in nonoverlapping 10 bp bins, vertical lines indicate annotated boundaries of centromeres. (E) Average TrAEL-seq profiles across tRNAs +/− 200 bp for 3 biological replicates of wild-type cells (drawn in red, orange, and purple). tRNAs are categorised into those for which transcription is codirectional with the replication fork and those for which transcription is head-on to the direction of the replication fork. tRNAs for which the replication direction is not well defined were excluded. Arrows indicate peaks that are dependent on replication direction. Read counts per million reads mapped were calculated in nonoverlapping 5 bp bins, vertical lines indicate annotated boundaries of tRNAs. Numerical data underlying this figure can be found in S2 Data. hESC, human embryonic stem cell; RFB, replication fork barrier; rRNA, ribosomal RNA; TrAEL-seq, Transferase-Activated End Ligation sequencing.

To detect replication forks stalled at the RFB and test the requirement for homologous recombination in resolution of these species, we prepared TrAEL-seq libraries from unsynchronised wild-type, fob1Δ, and rad52Δ cells growing at mid-log phase: fob1Δ cells lack RFB activity, while rad52Δ mutants cannot initiate homologous recombination. RFB signals should therefore be absent from fob1Δ, while signals representing DSBs formed by fork cleavage should accumulate in rad52Δ as this mutant cannot repair such DNA breaks once formed.

Two RFB sites are clearly visible in wild-type TrAEL-seq data as peaks of reverse strand reads but are absent in the fob1Δ mutant (Fig 2B, wild type and fob1Δ panels). These peaks are exactly reproduced between 2 libraries prepared independently from the same fixed cells (by different investigators working 6 months apart, S2A Fig) and are detected with high signal-to-noise in 3 wild-type biological replicates (S2B Fig). These sites correspond well with the RFB sites mapped using high-resolution gels [53,55] and are also visible in published qDSB-seq and GLOE-seq datasets, although TrAEL-seq data contains fewer additional peaks in this region than GLOE-seq data and the RFB peaks correspond more closely to known sites than qDSB-seq peaks (Fig 2B) [13,40].

To determine the applicability of TrAEL-seq to mammalian cells, we generated 2 TrAEL-seq datasets each from 2 biological replicate libraries of 0.5 million human embryonic stem cells (hESCs). A major peak was observed in the rDNA downstream of the RNA polymerase I termination site in both hESC biological replicates, on the reverse strand located in the most distal of the known RFB sites (Fig 2C) [56]. This observation is consistent with an efficient polar RFB located just downstream of the RNA polymerase I transcription unit, as seen in diverse species from plants to yeast to mice [49,5760]. Furthermore, we detect smaller but reproducible peaks on both strands in all 3 RFB sites, consistent with the low efficiency bidirectional RFB activity that has been reported in human cells based on 2D gels and DNA combing (Fig 2C) [56,61,62].

rDNA RFBs are not the only sites at which replication forks stall, for example, reported GLOE-seq peaks at yeast centromeres likely stem from replication forks stalling at centromeric chromatin [40,63]. To probe this relationship, we first stratified centromeres into those replicated only by reverse forks, those replicated only by forward forks, and those sited in termination zones where forks converge (S2C Fig). At centromeres replicated from one direction only, we observed an accumulation of reads on the opposite strand to the direction of replication located just before the centromere, while forks in termination zones that can be replicated in either direction displayed both peaks (Fig 2D and S1 File). A similar analysis of tRNA loci, which are also known to stall replication forks [64], yielded more complex patterns (Fig 2E). These regions displayed peaks upstream or downstream of the tRNA depending on the direction of replication (Fig 2E, arrows), consistent with previous studies that reported both codirectional and head-on tRNA transcription can stall replication forks, at least in the absence of replicative helicases [6467]. However, we also observed a major peak covering the first approximately 15 bp of the tRNA gene, which was not affected by replication direction and appears to mark a transcription-associated break on the template strand that must be a conserved feature of tRNA transcription as it is also detected in the hESC samples (S2D Fig). This aside, we find that sites of replication fork stalling both at the RFB and other sites are revealed by an accumulation of TrAEL-seq reads on the opposite strand to the direction of replication.

The structures resulting from stalled fork processing have various double-stranded 3′ ends that should be substrates for TrAEL-seq based on our restriction enzyme analysis (Figs 1C and 2A, green dots). However, no difference in signal intensity was observed between rad52Δ and wild type at the rDNA, centromeres or tRNAs, showing that these double-stranded ends are not normally processed by the homologous recombination machinery (Fig 2B, S2E and S2F Fig). DSBs formed in the rDNA are known to be repaired by homologous recombination, and although we and others have reported Rad52-independent recombination at the rDNA, these are rare events unknown in wild-type cells [6870]. If TrAEL-seq peaks represented fork cleavage events, we would expect a strong stabilisation in the rad52Δ mutant. So, based on the lack of stabilisation observed, we consider that the vast majority of DNA ends at sites of replication fork stalling represent reversed forks that can revert to normal replication fork structures by Holliday Junction migration without recombination (see Fig 2A and Discussion).

Taken together, these results show that TrAEL-seq allows sensitive and precise mapping of replication fork stalling, most likely through labelling of reversed replication forks.

TrAEL-seq profiles describe replication fork directionality

A striking feature of yeast TrAEL-seq data is the massive variation in strand bias of reads at different sites in the genome: A violin plot of the fraction of reverse reads in 1 kb bins shows 2 distinct peaks at 15% to 30% and 70% to 85%, a behaviour much less obvious in comparable GLOE-seq data (Fig 3A) [40]. TrAEL-seq read polarity in asynchronous wild-type cells (calculated from the difference between reverse and forward read densities) forms clear domains when plotted over large genomic regions that almost perfectly match the GLOE-seq map of Okazaki fragment ends in a Cdc9 DNA ligase depletion experiment, although with the opposite polarity (Fig 3B and S3A Fig) [40]. Mapping of Okazaki fragment ends is a well-validated method for detecting replication forks [35,36], and the tight correlation of TrAEL-seq data to Okazaki fragment distribution strongly suggests that TrAEL-seq detects processive replication forks even in wild-type cells. Indeed, the locations at which TrAEL-seq polarity switches from negative to positive coincide precisely with replication origins (autonomously replicating sequence or ARS elements) (Fig 3B, dotted vertical lines), and alignment of TrAEL-seq reads across 30 kb either side of all ARS elements reveals a switch in polarity as would be expected for replication forks diverging from replication origins (Fig 3C). Furthermore, TrAEL-seq reads in the rDNA reflect the known role of Fob1 in enforcing unidirectional rDNA replication, as reads are highly polarised in wild-type cells but this polarisation is absent in fob1Δ (S3B Fig).

Fig 3. TrAEL-seq is highly sensitive to replication fork direction.

Fig 3

(A) Polarity of TrAEL-seq and GLOE-seq reads assessed in 1 kb windows across the genome excluding windows overlapping multicopy regions, presented as the percentage of total reads that map to the reverse strand. The dotted line marks 50%, which equates to an absence of strand bias. TrAEL-seq libraries are 3 biological replicates of BY4741 wild type. GLOE-seq wild-type samples (SRA accessions: SRX6436839 and SRX6436840) were derived from asynchronous log phase cells growing in YPD, as were the TrAEL-seq samples. The cdc9 dataset is of synchronised cells depleted of the DNA ligase Cdc9 (SRA accession: SRX6436838). (B) Read polarity plots for TrAEL-seq BY4741 wild type growing at log phase on YPD and GLOE-seq Cdc9 depletion data (SRA accession: SRX6436838) across chromosome V, calculated as (R−F)/(R+F) where R and F indicate reverse and forward reads, respectively. TrAEL-seq data are an average of 2 technical replicates. Read polarity was calculated for 1,000 bp sliding windows spaced every 100 bp for all single-copy regions; gaps near 450 kb and 500 kb are Ty elements. Vertical dotted lines show locations of ARS elements. Note that the read polarity axis of the cdc9 data is inverted for easy comparison to TrAEL-seq as the cdc9 mutation enriches for 3′ ends on the lagging strand, whereas TrAEL-seq detects the 3′ end of the leading strand. (C) Average read polarity of TrAEL-seq and GLOE-seq datasets across 30 kb windows either side of annotated ARS elements. Calculated as the %tage of reverse reads amongst all reads. Samples are as in A. (D) Absolute TrAEL-seq read depth in reads per million mapped irrespective of read polarity, for the same sample shown in B. Read depth is broadly uniform across the single-copy genome except for a peak at the centromere (as in Fig 2D) and dips at each active ARS. (E) TrAEL-seq signals in wild-type cells arrested in G1 (top) or released into S (bottom). Read counts per million reads mapped were calculated for 1,000 bp sliding windows spaced every 100 bp for all single-copy regions, and strands are shown separately to reveal both the absolute read count and the read polarity at each point—read polarity distribution across the chromosome for S-phase cells is equivalent to Fig 3B. To allow comparison of read counts between 2 samples, G1 and G1->S samples were ligated to TrAEL adaptor 1 variants carrying 2 different barcodes. These samples were then pooled, processed, and sequenced together to maintain the relative read counts between the samples, and normalisation for each sample was to the total reads mapped across both libraries. To ensure that the different adaptor barcodes did not impact the result, 2 technical replicates were performed for each paired sample of G1 and G1->S with the barcode adaptors inverted. Data shown are an average of the technical replicates, but little difference was observed in relative library quantification that could be attributed to barcoding. Two biological replicates for the experiment are shown in red and blue. (F) Strength and reproducibility of read polarity amongst TrAEL-seq and GLOE-seq datasets. Read polarity was calculated in 1,000 bp windows spaced every 1,000 bp and shown as continuous lines. Three biological replicate datasets for wild-type TrAEL-seq are plotted on the upper graph and show the same replication profiles. Two wild-type GLOE-seq datasets are overlaid on the lower graph (SRA accessions: SRX6436839 and SRX6436840). TrAEL-seq and GLOE-seq datasets all derive from asynchronous cultures harvested during log phase growth in YPD [40]. Vertical dotted lines show locations of ARS elements. (G) Read polarity plot as in A for 2 biological replicates of BY4741 wild-type TrAEL-seq datasets compared to the RNase H2 mutants rnh201Δ and rnh202Δ and to topoisomerase I mutant top1Δ. (H) Read polarity plots of TrAEL-seq data for asynchronous wild-type hESCs, 2 biological replicates are shown each an average of 2 technical replicates. GLOE-seq data of LIG1-depleted HCT116 cells (average of SRA accessions: SRX7704535 and SRX7704534) are shown for comparison. Read polarity was calculated in 250 kb sliding windows spaced every 10 kb. Note that the polarity of the HCT116 data has been inverted to aid comparison with TrAEL-seq samples; this is highlighted by the scale being labelled in red. Profiles are broadly similar between the 2 cell types, but some origins are only active in hESCs; examples are indicated by green arrows. Numerical data underlying this figure can be found in S3 Data. ARS, autonomously replicating sequence; hESC, human embryonic stem cell; TrAEL-seq, Transferase-Activated End Ligation sequencing.

Absolute TrAEL-seq read density is largely uniform across the single-copy genome, except for pronounced dips at each ARS (Fig 3D), suggesting that TrAEL-seq signals are primarily derived from active replication forks with little underlying noise. If so, then TrAEL-seq signals should vary across the cell cycle. However, as with other sequencing methods, quantitative comparison of total TrAEL-seq signal between libraries is not straightforward, as there is no relationship between total read count in a library and amount of substrate in the original sample. To allow such comparisons, we modified the TrAEL-seq pipeline such that 2 samples are barcoded at an early stage and then pooled for processing, sequencing, and postprocessing as a single sample. This approach maintains the absolute ratio of substrate between the 2 samples, allowing quantitative comparison.

We applied this method to compare cells arrested in G1 using α-factor to cells from the same culture after release into S-phase. Two variants of TrAEL-seq adaptor 1 with unique barcodes were ligated to the G1 and G1->S samples which were then pooled, and in each experiment, we performed 2 technical replicates with the barcodes swapped to ensure that no quantitative differences emerged from the adaptors themselves. Two biological replicate experiments yielded essentially identical results, with the TrAEL-seq read count across single-copy regions being dramatically higher in the G1->S samples than in the G1-arrested samples. To illustrate both absolute read quantity and strand bias, we plotted the read counts on forward and reverse strands separately across chromosome V (Fig 3E); S-phase samples show strong signals that phase between forward and reverse reads across the chromosome, whereas signals from G1 cells are almost undetectable. Furthermore, the phasing between forward and reverse matches the read polarity variation of unsynchronised samples (compare Fig 3B and 3E). This experiment shows that TrAEL-seq signals primarily arise from active DNA replication forks and are very low in nonreplicating cells.

Phasing of read polarity was also noted in wild-type samples profiled by GLOE-seq but only weakly, whereas TrAEL-seq libraries display very strong read polarity differences that are highly reproducible and yield essentially identical replication profiles (Fig 3A and 3E, S3C Fig) [40]. As Sriramachandran and colleagues noted for GLOE-seq [40], the read polarity of this replication signal is opposite to what would be expected from labelling of 3′ ends in normal forks. There should never be fewer 3′ ends on the lagging strand than the leading strand, yet up to 90% of TrAEL-seq reads emanate from the leading strand. To explain the GLOE-seq signal, Sriramachandran and colleagues suggested that GLOE-seq labels sites at which DNA is nicked during removal of misincorporated ribonucleotides [40]. To test this idea, we generated TrAEL-seq libraries from rnh201Δ and rnh202Δ mutants that lack key components of RNase H2, the main enzyme that cleaves DNA at misincorporated ribonucleotides, along with a wild-type control [71,72]. Strikingly, read polarity in these mutants is equivalent to wild type, showing that the leading strand bias of TrAEL-seq reads is not caused by RNase H2 and therefore is unlikely to arise through excision of misincorporated ribonucleotides (Fig 3G and S3D Fig). It is also possible that TrAEL-seq (and indeed GLOE-seq) signals arise when the replication machinery encounters Top1 cleavage complexes [73], but we saw no reduction in TrAEL-seq polarity or signal in top1Δ cells (Fig 3G and S3D Fig). One further observation in this regard is that END-seq data show a polarity bias, albeit weak, that parallels the polarity bias in TrAEL-seq data generated from the same cells (S3E Fig). This suggests that double-stranded ends are also formed during normal replication, although these faint signals could also arise through cleavage of the delicate single-stranded regions of replication forks during processing.

We then asked if an equivalent strand bias is observed in the hESC libraries. The limited read coverage in these libraries only allowed read polarity to be determined in 250 kb windows, but nonetheless, a striking variation was observed across the genome (Fig 3H). Importantly, these profiles were very similar between technical and biological replicates and cannot therefore simply result from noise; this can be observed across defined genomic regions but is also clear in a scatter plot which shows that the average read polarity within each window correlates between the datasets (R = 0.84, S3F and S3G Fig). Furthermore, comparison to GLOE-seq results from a LIG1-depleted human cell line that is defective in Okazaki fragment ligation again revealed a striking similarity to the hESC TrAEL-seq data, although with the opposite polarity (Fig 3H and S3H Fig) [40]. Interestingly, a subset of origins were reproducibly detected in hESC samples but absent in the HCT116 data, consistent with evidence that origin usage differs between these cell lines (Fig 3H, green arrows) [24].

We therefore conclude that TrAEL-seq primarily detects processive replication forks and does so with exceptionally high signal-to-noise. TrAEL-seq profiles are highly reproducible and can be obtained from wild-type cells without need for cell synchronisation, sorting, or labelling. The 3′ ends detected by TrAEL-seq correspond to the leading rather than the lagging strand, despite the fact that many more 3′ ends occur on the lagging strand, and we suggest that these 3′ ends are exposed by replication fork reversal occurring either in vivo or during sample processing (see Discussion).

Environmental impacts on replication timing and fork progression

Finally, we asked whether TrAEL-seq can reveal replication changes or DNA damage, and in particular whether we can detect collisions between transcription and replication machineries.

Since all the yeast libraries generated up to this point had yielded essentially identical DNA replication profiles outside the rDNA, we were first keen to ensure that changes in replication profile are indeed detectable. We therefore examined cells lacking Clb5, a yeast cyclin B that plays a key role in the activation of late-firing replication forks [74]. The TrAEL-seq profile of clb5Δ was very similar to wild type across most of the genome, but certain origins were clearly absent or strongly repressed, resulting in extended tracts of DNA synthesis from adjacent origins visible as regions of very different polarity (Fig 4A, green arrows, S4A Fig). This is as predicted for clb5Δ mutants and confirms that TrAEL-seq is indeed sensitive to changes in replication profile.

Fig 4. Detection of replication variation using TrAEL-seq.

Fig 4

(A) Read polarity plot for TrAEL-seq data of clb5Δ versus wild type over a representative region of chr IX. Arrows indicate ARS elements that are not activated in the absence of Clb5. (B) Line plot showing forward and reverse strand TrAEL-seq read counts across the GAL genes for wild-type cells maintained on YP raffinose or 5 h after addition of galactose to 2%. Reads were quantified in 100 bp sliding windows spaced every 10 bp. (C) MA plots showing the change in read count against the average read count for each 100 bp window in the single-copy genome between cells maintained on raffinose and cells exposed to galactose. Separate plots are shown for forward and reverse reads; read counts were normalised to total library size. (D) Plots of average TrAEL-seq read density around the TSS in the highest or lowest 25% expressed genes based on NET-seq data for wild-type yeast growing on YPD (SRA: SRX031059). Genes were categorised into those orientated head-on or codirectional with replication based on TrAEL-seq replication profiles. Data are shown for wild-type BY4741 cells growing on YPD. (E) Example location in which a termination zone differs depending on carbon source. Read polarity was calculated in 1 kb windows spaced every 1 kb. Green lines show cells grown on glucose and purple lines cells grown on raffinose or raffinose plus galactose. (F) Violin plots of regions showing large and significant read polarity differences between cells grown on glucose and nonglucose carbon sources (defined using sets given below). Read polarity data are shown for wild type and clb5Δ grown on glucose (green) and raffinose or raffinose plus galactose (purple). Differences observed in wild type are suppressed in clb5Δ. To define this set of regions, read polarity was calculated across the single-copy genome in 1 kb windows, then each window was compared between the 2 sets by t test with a Benjamini and Hochberg correction. As many samples as possible were included in these sets for best separation based on media: glucose (3 replicates of wild type plus rad52Δ, rnh201Δ, rnh202Δ) and nonglucose (wild type on raffinose, wild type on raffinose + galactose, dnl4Δ rad51Δ on raffinose, dnl4Δ rad51Δ on raffinose + galactose). Windows were then filtered for those with a difference in read polarity >0.4 between the 2 sets, leaving a set of 196 out of 12,182 (2.3%). Plots were split based on the direction of the difference in read polarity for clarity. Numerical data underlying this figure can be found in S7 Data. ARS, autonomously replicating sequence; TrAEL-seq, Transferase-Activated End Ligation sequencing; TSS, transcriptional start site.

We then engineered collisions between RNA polymerase II and the replisome by changing growth conditions to strongly induce certain genes; specifically, we added galactose to cells growing on raffinose, which strongly induces expression of galactose metabolising genes including GAL1, GAL7, and GAL10. Although these genes are adjacent, GAL1 is transcribed codirectionally with the replication fork, whereas GAL7 and GAL10 are orientated head-on to the fork (Fig 4B, schematic). On one hand, stalled replication forks have not been observed at this locus by 2D gels [65], but conversely, the strong activation of the GAL110 promoter has proven highly recombinogenic in various assays [7577]. We performed these experiments in wild-type cells and in a strain lacking both Dnl4, the DNA ligase required for nonhomologous end joining, and Rad51, the recA ortholog which mediates strand invasion for homologous recombination. dnl4Δ rad51Δ double mutants should be unable to repair DSBs irrespective of cell cycle phase and therefore should accumulate any DSBs that form.

Collisions would seem most likely where the replisome passes through the transcribed region of highly expressed genes oriented head-on to the direction of replication (such as GAL10 or GAL7), so we predicted that any consequent replication fork stalling would occur at the 3′ end of the gene or within the open reading frame. However, TrAEL-seq read densities across the GAL gene cluster provided little evidence for transcription-associated replication fork stalling within gene bodies. Instead, peaks of reverse reads formed at the 5′ end of the GAL10 gene, and also of the GAL7 gene, although the latter was less prominent, which suggests that the replication fork is stalled by chromatin or proteins bound at the promoter after passing through the body of the gene (Fig 4B and S4B Fig). The read accumulation is not dramatic, but compared to the rest of the single-copy genome, these sites showed the largest increase in read count between cells on raffinose only and those on raffinose plus galactose (Fig 4C and S4C Fig). As for the sites of fork stalling described above, we detected little difference between the recombination defective mutant (dnl4Δ rad51Δ), and the wild type showing that promoter signals must represent fork stalling events that are rarely processed to recombinogenic DSBs (S4B and S4C Fig). Furthermore, the region in which replication forks passing through the GAL locus encounter oncoming forks from ARS211 was unchanged on galactose, meaning that delays caused by fork stalling must be very transient (S4D Fig). Our evidence for minimal replisome pausing even at the most highly expressed genes contrasts with previous estimates based on DNA polymerase or γH2A occupancy [78,79] but is in keeping with more recent studies that have not observed defects in fork progression or activation of Mec1 when replication forks encounter highly transcribed genes [66,80].

To determine whether such signals are unique to the GAL genes, we categorised yeast genes both by orientation to the replication fork and by expression based on published NET-seq data for YPD [81] and derived plots of average TrAEL-seq read density around transcriptional start sites (TSS) for wild-type cells growing on YPD. Highly expressed genes (top 25% by NET-seq) orientated head-on to the replication fork show a small but sharp peak before the TSS (Fig 4D, top panel). This peak is dependent on replication, being absent from highly expressed genes orientated codirectionally with the replication fork, and also from highly expressed head-on genes in G1-arrested cells (Fig 4D, middle panel, S4E Fig). Similarly, the peak depends on transcription and is absent from head-on genes in the bottom 25% of expressed genes (Fig 4D, bottom panel). This shows that replication forks are more prone to pausing at the TSS of highly expressed head-on orientated genes; we also note that TrAEL-seq signals from these genes phase around the TSS with nucleosome spacing, suggesting these interactions reinforce nucleosome positioning.

Unexpectedly, we noted changes in termination zones elsewhere in the genome when comparing the 4 samples from the galactose induction experiment, which were grown on raffinose or raffinose with galactose, to other wild-type and mutant TrAEL-seq libraries for which cells were grown on glucose (see, for example, Fig 4E). Comparing cells based on growth media rather than genotype, we discovered significant and substantial (p < 0.01, average read polarity change >0.4) differences in read polarity for approximately 2% of the single-copy genome. The most prominent differences affected a subset of termination zones where the average site at which forks converge moved by up to 10 kb (Fig 4E). This change would be most easily attributed to a change in replication timing, and indeed the clb5Δ mutant, although grown on glucose, showed the same average read polarity at the media-dependent sites as the cells grown on nonglucose carbon sources (raffinose and/or galactose) (Fig 4F). This suggests that the timing of replication firing is altered depending on carbon source, consistent with a previous report that Clb5 nuclear import is suppressed in yeast growing in ethanol [82].

Together, these data show that replication profiling by TrAEL-seq is sufficiently sensitive to reveal differences in fork direction and processivity.

Discussion

Here, we have demonstrated that TrAEL-seq maps the 3′ ends of resected DSBs, sites of replication fork stalling and normal DNA replication patterns genome-wide and with base pair resolution. Methods to map the 3′ ends of resected DNA are desirable for genome-wide studies of homologous recombination as these are the critical species that undergo strand invasion. Similarly, detection of DNA 3′ ends at stalled replication forks is an important indicator of potentially recombinogenic intermediates. TrAEL-seq profiles all these species with excellent signal-to-noise and therefore provides a general method for the detection of DNA processing events that could result in genome instability. It is interesting to note that the primary source of noise in TrAEL-seq is actually normal replication forks. This raises questions as to the frequency with which leading strand 3′ ends become detached during normal replication (discussed below) but also provides a major unanticipated application for the method. In contrast to other methods for profiling replication fork directionality (notably through Okazaki fragment sequencing), TrAEL-seq works in wild-type cells, requires neither labelling nor synchronisation of cells, and does not involve complex sample preparation procedures, making TrAEL-seq versatile and straightforward to implement across a range of experimental contexts.

A proposed mechanism for replication fork detection by TrAEL-seq

TrAEL-seq was designed to detect free 3′ ends of single-stranded DNA and was not expected to label undisturbed replication forks in normal cells. Why therefore is TrAEL-seq so sensitive to replication fork direction? Although TrAEL-seq may have some capacity to label 3′ ends in normal replication fork structures, we cannot see why TrAEL-seq would outperform GLOE-seq in detecting such ends, and the bias towards the leading strand would be very hard to explain. Instead, we suggest that replication forks frequently rearrange, either in vivo or during sample processing, to make the leading strand 3′ end accessible to TdT while the lagging strand 3′ end remains largely inaccessible. Transient fork reversal would have this effect, yielding TdT-accessible leading strand ends without irreversible changes in fork structure (Fig 5, free 3′ ends labelled with green dots). Only a small subset of these events need to undergo sufficient reversal for the nascent lagging and leading strands to anneal, which would form the replication-linked double-stranded DNA ends that we detect by END-seq (Fig 5, middle and right structures, S3E Fig). It remains to be determined if these rearrangements occur in vivo, and if so would require surprisingly frequent fork reversal, although for TrAEL-seq labelling the reversal required is minimal—in reality only a flap displacement (Fig 5, left and middle structures). Although DNA replication is highly processive overall, in vitro measurements have shown that the yeast leading and lagging strand polymerases dissociate after less than 1 kb of DNA synthesis [83], and this may allow helicases to access and unwind the nascent leading strand.

Fig 5. Proposed mechanism for replication fork detection by TrAEL-seq.

Fig 5

Replication forks that would normally be undetectable by TrAEL-seq undergo very limited reversal to yield a free 3′ end that can be labelled by TdT (green dot, middle structure). Further reversal yields a double-stranded end that can be labelled by TrAEL-seq or BLESS-type methods. Purple circles highlight the area of difference between the structures. TdT, terminal deoxynucleotidyl transferase; TrAEL-seq, Transferase-Activated End Ligation sequencing.

Alternatively, it is possible that the TrAEL-seq replication signal derives from cleaved replication forks, but we think this is highly unlikely for the following reasons: (1) The rad52Δ mutant used here had almost no growth defect and showed no detectable difference in TrAEL-seq profile, and (2) there is no difference in detection of early and late replicating genome regions in TrAEL-seq, whereas the activity of structure-specific endonucleases that could cleave replication forks is tightly restricted to G2/M [84]. Replication-linked double-stranded DNA ends have been clearly observed by BLESS-type methods in cells exposed to replication stress [13,15,85] and interpreted as evidence that replication forks are cleaved either during the restart process or as a pathogenic end point. However, fork cleavage is not required to initiate recombination during replication fork restart [86], and it is quite possible that apparent DSBs are actually double-stranded ends of reversed forks. Direct observation of cleaved forks at the rDNA RFB has been reported based on Southern blot [53,54,68], but we note that these signals could also arise from fork reversal (S5 Fig). This distinction is important as cleaved forks must be resolved by recombination of some sort, whereas reversed forks can revert by Holliday Junction migration. Overall, the existence of frequent DSBs in wild-type cells under normal conditions (quantified at 1 DSB per cell per S-phase for the RFB alone [13]) is hard to reconcile with the minimal growth phenotype of mutants lacking critical DNA repair factors such as Rad52. We suggest that the vast majority of such events detected by TrAEL-seq and other DNA end-mapping methods are actually reversed replication forks that are rapidly resolved by fork migration.

Complementary methods probe different aspects of DNA damage

Although TrAEL-seq and the recently described GLOE-seq method in theory act equivalently by labelling and profiling DNA 3′ ends, we find that these methods have completely different strengths and weaknesses. TrAEL-seq proves superior for detection of replication fork direction and stalling, which likely arises through a sensitivity to replication fork structure. In contrast, the DNA denaturing step required for GLOE-seq labelling erases fork structure and reveals real accumulations of strand breaks as opposed to conformational changes in the replication fork. Therefore, future studies employing both methods in parallel are likely to be particularly informative for understanding the dynamics of replication forks on encountering obstacles. It should also be noted that the lack of a denaturing step in TrAEL-seq makes it insensitive to single-strand breaks and nicks, and therefore GLOE-seq is much better suited for detection of such ends.

Genome-wide analysis of DNA processing events requires high-resolution methods that can detect changes at both 5′ and 3′ DNA ends. BLESS-type methods degrade or fill in 3′ ends to yield the location of matching 5′ ends, and our implementation of TrAEL-seq now provides a complementary method to map 3′ ends. We suggest that for dissecting mechanisms of DSB processing and repair, these methods will be most powerful when employed together. In addition to the TrAEL-seq protocol, we therefore also provide an implementation of BLESS/END-seq that utilises small numbers of cells and follows the same library construction procedure as TrAEL-seq, making processing of the same sample in parallel by both methods straightforward. Indeed, we have successfully performed TrAEL-seq and END-seq on two-halves of the same agarose plug.

For general replication analysis, most existing methods profile either fork direction or origin timing, whereas acquisition of information on both parameters from the same samples would be very helpful. The recently described D-Nascent method can determine fork direction and origin timing, but only after cell synchronisation and label incorporation [87]. The ability of TrAEL-seq to obtain replication direction profiles from asynchronous unlabelled wild-type cells will allow easy integration with other methods under diverse growth conditions. For example, ethanol fixed cells collected for sort-seq [27] could also be profiled by TrAEL-seq to provide both replication timing and direction. However, some adjustments will be needed when combining TrAEL-seq with replication timing methods that involve labelling with deoxyuridine derivatives (e.g., REPLI-seq) as USER is employed in TrAEL-seq to elute libraries prior to amplification.

Overall, TrAEL-seq provides a unique addition to complement existing methods for genome-wide analysis of DNA replication and DNA damage. The relatively simple experimental protocol, high signal-to-noise ratio, and lack of requirement for treatment or purification of cells prior to harvest should render TrAEL-seq particularly suitable for a wide range of experimental systems.

Materials and methods

Yeast strains and culture

Strains used are listed in S1 Table. All media components were purchased from Formedium, all media was filter sterilised. YP media was supplemented with the given carbon source from 20% filter-sterilised stock solutions. For growth to log phase, cells were inoculated in 4 ml media and grown for approximately 6 h at 30°C with shaking at 200 rpm before dilution at approximately 1:10,000 in 25 ml YPD (1:500 for YP raffinose or 1:2,000 for synthetic complete media) and growth continued at 30°C 200 rpm for approximately 18 h until OD reached 0.4 to 0.7 (mid-log). Cells were centrifuged 1 min at 4,600 rpm, resuspended in 70% ethanol at 1 × 107 cells/ml and stored at −70°C.

For meiosis, SK1 dmc1Δ diploid cells from a glycerol stock were patched overnight on YP 2% Glycerol then again for 7 h on YP 4% glucose before inoculating in 4 ml YPD and growth for 24 h, then inoculated to OD 0.2 in 20 ml YP acetate for overnight growth to approximately 4 × 107 cells/ml in a 100-ml flask at 30°C with shaking at 200 rpm. Meiosis was initiated by washing cells once with 20 ml SPO media (0.3% KOAc, 5 mg/L uracil, 5 mg/L histidine, 25 mg/L leucine, 12.5 mg/L tryptophan, 0.02% raffinose), then resuspending in 20 ml SPO media and incubating for 7 h at 30°C in a 100-ml flask with shaking at 250 rpm. Cells were harvested and fixed with 70% ethanol as above.

For G1 arrest, BY4741 wild-type cells were grown in 20 ml YPD at 30°C 200 rpm for approximately 18 h to 0.5 × 107 cells/ml (mid-log), then α-factor added to 5 μg/ml (from Zymo Y1001 stock diluted to 5 mg/ml in DMSO) and cells maintained at 30°C 200 rpm for 1 h. Another aliquot of α-factor was added to 10 μg/ml total and cells maintained at 30°C 200 rpm for 1 more hour. At this point, >90% cells were Schmoos and no small budded cells were visible. Half the cells were harvested by centrifugation 1 min at 4,600 rpm and resuspended in 70% ethanol at 1 × 107 cells/ml. The other half were centrifuged 1 min at 4,600 rpm, washed twice with prewarmed YPD at 30°C, then resuspended in 10 ml prewarmed YPD and transferred to a prewarmed 25 ml flask. Cells were maintained at 30°C 200 rpm until most cells showed small buds (approximately 50 min), then harvested as above. All cells were stored at −70°C.

hESC culture

Undifferentiated H9 hESCs were maintained on Vitronectin-coated plates (ThermoFisher Scientific A14700) in TeSR-E8 media (StemCell Technologies 05990). All hESCs were cultured in 5% O2, 5% CO2 at 37°C.

Agarose embedding of yeast cells

Cells in ethanol (1 to 3 × 107 per plug) were pelleted in round bottom 2 ml tubes by centrifuging 30 s 20,000g, washed once in 1 ml PFGE wash buffer (10 mM Tris HCl (pH 7.5), 50 mM EDTA) and resuspended in 60 μl same with 1 μl lyticase (17 U/ μl in 10 mM KPO4 pH7, 50% glycerol, Merck >2,000 U/mg L2524). Samples were heated to 50°C for 1 to 10 min before addition of 40 μl molten CleanCut agarose (Bio-Rad 1703594), vortexing vigorously for 5 s before pipetting in plug mould (Bio-Rad 1703713) and solidifying 15 to 30 min at 4°C. Each plug was transferred to a 2-ml tube containing 500 μl PFGE wash buffer with 10 μl 17 U/μl lyticase and incubated 1 h at 37°C. Solution was replaced with 500 μl PK buffer (100 mM EDTA (pH 8), 0.2% sodium deoxycholate, 1% sodium N-lauroyl sarcosine, 1 mg/ml Proteinase K) and incubated overnight at 50°C. Plugs were rinsed with 1 ml TE, then washed 3 times with 1 ml TE for 1 to 2 h at room temperature with rocking; 10 mM PMSF was added to the second and third washes from 100 mM stock (Merck 93482). Plugs were then digested 1 h at 37°C with 1 μl 1,000 U/ml RNase T1 (Thermo EN0541) in 200 μl TE. RNase A was not used as it binds strongly to single-stranded DNA [88]. Plugs were stored in 1 ml TE at 4°C and are stable for >1 year.

Agarose embedding of hESC cells

Cells were detached using Accutase, counted and 1 × 106 cells were washed once in 5 ml L buffer (10 mM Tris HCl (pH 7.5), 100 mM EDTA, 20 mM NaCl) and resuspended in 60 μl L buffer in a 2-ml tube. Samples were heated to 50°C for 2 to 3 min before addition of 40 μl molten CleanCut agarose (Bio-Rad 1703594), vortexing vigorously for 5 s before pipetting in plug mould (Bio-Rad 1703713), and solidifying 15 to 30 min at 4°C. Each plug was transferred to a 2-ml tube containing 500 μl digestion buffer (10 mM Tris HCl (pH 7.5), 100 mM EDTA, 20 mM NaCl, 1% sodium N-lauroyl sarcosine, 0.1 mg/ml Proteinase K) and incubated overnight at 50°C. Plugs were washed and RNase T1 treated as for yeast.

TrAEL-seq library preparation and sequencing

Please note that a detailed TrAEL-seq protocol is provided in S2 File, and up-to-date protocols are available from the Houseley lab website https://www.babraham.ac.uk/our-research/epigenetics/jon-houseley/protocols

Preparation of TrAEL-seq adaptor 1: DNA oligonucleotide was synthesised and PAGE purified by Sigma-Genosys (Merck, United Kingdom):

[Phos]NNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTUGCGCAGGCCATTGGCC[BtndT]GCGCUACACTCTTTCCCTACACGACGCT

This oligonucleotide was adenylated using the 5′ DNA adenylation kit (NEB, E2610S) as follows: 500 pMol DNA oligonucleotide, 5 μl 10× 5′ DNA adenylation reaction buffer, 5 μl 1 mM ATP, 5 μl Mth RNA ligase in a total volume of 50 μl was incubated for 1 h at 65°C then 5 min at 85°C. Reaction was extracted with phenol:chloroform (pH 8), then ethanol precipitated with 10 μl 3M NaOAc, 1 μl GlycoBlue (Thermo AM9515), 330 μl ethanol and resuspended in 50 μl 0.1x TE.

Preparation of TrAEL-seq adaptor 2: DNA oligonucleotide was synthesised and PAGE purified by Sigma-Genosys (Merck):

[Phos]GATCGGAAGAGCACACGTCTGAACTCCAGTCUUUUGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T

Oligonucleotide was annealed before use: 20 μl 100 pM/μl oligonucleotide and 20 μl 10x T4 DNA ligase buffer (NEB) in 200 μl final volume were incubated in a heating block 95°C 5 min, then block was removed from heat and left to cool to room temperature over approximately 2 h.

Sample preparation: ½ an agarose plug was used for each library (cut with a razor blade), hereafter referred to as a plug for simplicity. All incubations were performed in 2 ml round bottomed tubes (plugs break easily in 1.5 ml tubes), or 15 ml tubes for high volume washes. For restriction enzyme digestion, a plug was equilibrated 30 min in 200 μl 1x CutSmart buffer (NEB), digested overnight at 37°C with 1 μl 20 U/μl NotI-HF (NEB R3189S) and 1 μl 10 U/μl PmeI (NEB R0560S) in 400 μl 1x CutSmart buffer, then 1 μl 20 U/μl SfiI (NEB R0123S) was added and incubation continued overnight at 50°C. The plug was rinsed with 1x TE before further processing.

Tailing and ligation: Plugs were equilibrated once in 100 μl 1x TdT buffer (NEB) for 30 min at room temperature, then incubated for 2 h at 37°C in 100 μl 1x TdT buffer containing 4 μl 10 mM ATP and 1 μl Terminal Transferase (NEB M0315L). Plugs were rinsed with 1 ml Tris buffer (10 mM Tris HCl (pH 8.0)), equilibrated in 100 μl 1x T4 RNA ligase buffer (NEB) containing 40 μl 50% PEG 8000 for 1 h at room temperature, then incubated overnight at 25°C in 100 μl 1x T4 RNA ligase buffer (NEB) containing 40 μl 50% PEG 8000, 1 μl 10 pM/μl TrAEL-seq adaptor 1 and 1 μl T4 RNA ligase 2 truncated KQ (NEB M0373L). Plugs were then rinsed with 1 ml Tris buffer, transferred to 15 ml tubes, and washed 3 times in 10 ml Tris buffer with rocking at room temperature for 1 to 2 h each, then washed again overnight under the same conditions.

DNA processing: Plugs were equilibrated for 15 min with 1 ml agarase buffer (10 mM Bis-Tris-HCl, 1 mM EDTA (pH 6.5)), then the supernatant removed and 50 μl agarase buffer added. Plugs were melted for 20 min at 65°C, transferred for 5 min to a heating block preheated to 42°C, 1 μl β-agarase (NEB M0392S) was added and mixed by flicking without allowing sample to cool, and incubation continued at 42°C for 1 h. DNA was ethanol precipitated with 25 μl 10 M NH4OAc, 1 μl GlycoBlue, 330 μl of ethanol and resuspended in 10 μl 0.1x TE. A volume of 40 μl reaction mix containing 5 μl isothermal amplification buffer (NEB), 3 μl 100 mM MgSO4, 2 μl 10 mM dNTPs, and 1 μl Bst 2 WarmStart DNA polymerase (NEB M0538S) was added and sample incubated 30 min at 65°C before precipitation with 12.5 μl 10 M NH4OAc, 1 μl GlycoBlue, 160 μl ethanol and redissolving pellet in 130 μl 1x TE. The DNA was transferred to an AFA microTUBE (Covaris 520045) and fragmented in a Covaris E220 using duty factor 10, PIP 175, Cycles 200, Temp 11°C, then transferred to a 1.5-ml tube containing 8 μl prewashed Dynabeads MyOne streptavidin C1 beads (Thermo, 65001) resuspended in 300 μl 2x TN (10 mM Tris (pH 8), 2 M NaCl) along with 170 μl water (total volume 600 μl) and incubated 30 min at room temperature on a rotating wheel. Beads were washed once with 500 μl 5 mM Tris (pH 8), 0.5 mM EDTA, 1 M NaCl, 5 min on wheel and once with 500 μl 0.1x TE, 5 min on wheel before resuspension in 25 μl 0.1x TE.

Library preparation: TrAEL-seq adaptor 2 was added using a modified NEBNext Ultra II DNA kit (NEB E7645S): 3.5 μl NEBNext Ultra II End Prep buffer, 1 μl 1 ng/μl sonicated salmon sperm DNA (this is used as a carrier), and 1.5 μl NEBNext Ultra II End Prep enzyme were added and reaction incubated 30 min at room temperature and 30 min at 65°C. After cooling, 1.25 μl 10 pM/μl TrAEL-seq adaptor 2, 0.5 μl NEBNext ligation enhancer, and 15 μl NEBNext Ultra II ligation mix were added and incubated 30 min at room temperature. The reaction mix was removed and discarded and beads were rinsed with 500 μl wash buffer (5 mM Tris (pH 8), 0.5 mM EDTA, 1 M NaCl), then washed twice with 1 ml wash buffer for 10 min on wheel at room temperature and once for 10 min with 1 ml 0.1x TE. Libraries were eluted from beads with 11 μl 1x TE and 1.5 μl USER enzyme (NEB) for 15 min at 37°C, then again with 10.5 μl 1x TE and 1.5 μl USER enzyme (NEB) for 15 min at 37°C, and the 2 eluates combined.

Library amplification: Amplification was performed with components of the NEBNext Ultra II DNA kit (NEB E7645S) and a NEBNext Multiplex Oligos set (e.g., NEB E7335S). An initial test amplification was used to determine the optimal cycle number for each library. For this, 1.25 μl library was amplified in 10 μl total volume with 0.4 μl each of the NEBNext Universal and any NEBNext Index primers with 5 μl NEBNext Ultra II Q5 PCR master mix. Cycling program: 98°C 30 s, then 18 cycles of (98°C 10 s, 65°C 75 s), 65°C 5 min. Test PCR was cleaned with 8 μl AMPure XP beads (Beckman A63881) and eluted with 2.5 μl 0.1x TE, of which 1 μl was examined on a Bioanalyser high sensitivity DNA chip (Agilent 5067–4626). Ideal cycle number should bring final library to final concentration of 1 to 3 nM, noting that the final library will be 2 to 3 cycles more concentrated than the test anyway. A volume of 21 μl of library was then amplified with 2 μl each of NEBNext Universal and chosen Index primer and 25 μl NEBNext Ultra II Q5 PCR master mix using same conditions as above for calculated cycle number. Amplified library was cleaned with 40 μl AMPure XP beads (Beckman A63881) and eluted with 26 μl 0.1x TE, then 25 μl of this was again purified with 20 μl AMPure XP beads and eluted with 11 μl 0.1x TE. Final libraries were quality controlled and quantified by Bioanalyser (Agilent 5067–4626) and KAPA qPCR (Roche KK4835).

Libraries were sequenced either on an Illumina MiSeq as 50 bp Single Read or an Illumina NextSeq 500 as High Output 75 bp Single End by the Babraham Institute Next Generation Sequencing facility.

TrAEL-seq with barcoded adaptor for quantitative comparison

Two additional variants of TrAEL adaptor 1 were synthesised, preadenylated, and purified as for TrAEL adaptor 1 above.

Index 1: [Phos]GACTNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTU GCGCAGGCCATTGGCC [BtndT] GCGCUACACTCTTTCCCTACACGAC GCT[Phos]

Index 2: [Phos]AGTCNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTU GCGCAGGCCATTGGCC [BtndT] GCGCUACACTCTTTCCCTACACGAC GCT[Phos]

The 3′ phosphate on these adaptors was designed to prevent potential circularisation of the adaptor and is removed by the additional phosphatase treatment noted below. We do not think this modification made a substantial difference.

For preparation of libraries from G1-arrested and G1->S cells, whole agarose plugs were prepared as written above. Plugs were cut in two and each half tailed and ligated as normal, with Index 1 or Index 2 adaptor substituted for TrAEL-seq adaptor 1. This resulted in 2 ligations per sample, one with index 1 and one with index 2. Plugs were then rinsed and washed in separate 15 ml tubes, but prior to incubation with agarase buffer plugs were pooled in pairs of different conditions with opposite indexes, e.g., G1—index 1 pooled with G1->S—index 2, and vice versa. Each pool was then processed in double the volume of reagents for agarase treatment and the first round of ethanol precipitation, followed by resuspension in 10 μL 0.1x TE. Each pooled sample was incubated with 29 μL water, 3 μL 100 mM MgSO4, 5 μL Isothermal amplification buffer, and 1 μL shrimp alkaline phosphatase (rSAP, M0371S) for 30 min at 37°C, followed by 10 min at 65°C. Then, 2 μL 10 mM dNTPs and 1 μL Bst 2.0 warmstart polymerase were added and incubation continued at 65°C for 30 min. The rest of the protocol was performed as normal.

END-seq library preparation

Note: This protocol is based on the original described by Canela and colleagues [11] but has a critical difference: The exonuclease-mediated blunting step designed for topoisomerase II ends did not work well on the 2 test substrates we use in yeast genomic DNA. Instead, best results were obtained by blunting 2 h or overnight with Klenow, which outperformed T4 DNA polymerase or a commercial DNA blunting kit.

Preparation of END-seq adaptor 1: DNA oligonucleotide was synthesised and PAGE purified by Sigma-Genosys (Merck); sequence is as described by Canela and colleagues [11]:

[Phos]GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGUU[BtndT]U[BtndT]UUACACTCTTTCCCTACACGACGCTCTTCCGATC*T

Annealed as for TrAEL-seq adaptor 2 above.

Preparation of END-seq adaptor 2c: DNA oligonucleotide was synthesised and PAGE purified by Sigma-Genosys (Merck), modified from Canela and colleagues [11] to prevent homodimers of adaptor from amplifying: [Phos]GATCGGAAGAGCTATTATTTAAATTTTAATTUGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T

Annealed as for TrAEL-seq adaptor 2 above.

Sample preparation: ½ an agarose plug was used for each library (cut with a razor blade), hereafter referred to as a plug for simplicity. All incubations were performed in 2 ml round bottomed tubes (plugs break easily in 1.5 ml tubes), or 15 ml tubes for high volume washes. Restriction enzyme digestion was performed as described for TrAEL-seq.

Blunting and ligation: The plug was equilibrated for 1 h at room temperature in 100 μl NEBuffer 2 with 0.1 mM dNTPs, then blunted overnight at 37°C in 100 μl NEBuffer 2 with 0.1 mM dNTPs and 1 μl Klenow (NEB M0210S). After rinsing twice with 1 ml Tris buffer, plug was transferred to a 15-ml tube and washed 3 times for 15 min each with 10 ml Tris buffer on rocker at room temperature before transfer to a new 2 ml tube. The plug was equilibrated with 100 μl CutSmart buffer containing 5 mM DTT and 1 mM dATP for 1 h at room temperature before incubation for 2 h at 37°C in another 100 μl of the same buffer containing 1 μl Klenow exo- (NEB M0212S) and 1 μl T4 PNK (NEB M0201S). Plug was rinsed twice with 1 ml Tris buffer, then washed once with 10 ml of Tris buffer for 15 min as above, then returned to a 2-ml tube. The plug was equilibrated for 1 h at room temperature in 100 μl 1x Quick Ligation buffer (NEB B6058S) containing 2.7 μl END-seq adaptor 1, then overnight at 25°C with another 100 μl of the same buffer containing 2.7 μl END-seq adaptor 1 and 1 μl high concentration T4 DNA Ligase (NEB M0202M). After rinsing twice with 1 ml Tris buffer, plug was transferred to a 15-ml tube and washed 3 times for 1 to 2 h each with 10 ml Tris buffer on rocker at room temperature, then again overnight.

DNA purification and library construction: The plug was transferred to a 1.5-ml tube and equilibrated 15 min with 1 ml agarase buffer (10 mM Bis-Tris-HCl, 1 mM EDTA (pH 6.5)), then the supernatant removed and 50 μl agarase buffer added to the plug. Plug was melted 20 min at 65°C, then transferred for 5 min to a heating block preheated to 42°C, 1 μl beta-agarase (NEB M0392S) was added and mixed by flicking without allowing sample to cool, and incubation continued at 42°C for 1 h. DNA was ethanol precipitated with 25 μl 10 M NH4OAc, 1 μl GlycoBlue, 330 μl of ethanol and resuspended in 130 μl 1x TE, 15 min at 65°C. From here, samples were sonicated, purified, and library construction performed as for TrAEL-seq, except that END-seq adaptor 2c was substituted for TrAEL-seq adaptor 2.

In vitro TrAEL activity and qPCR assays

For in vitro assays, 0.5 μl 10 μM DNA oligonucleotide CGCGGTAATTCCAGCTCCAA was treated with or without 0.5 μl TdT in 20 μl 1x TdT buffer containing 0.8 μl 10 mM ATP for 30 min at 37°C. Reactions were purified by phenol:chloroform extraction and ethanol precipitation and resuspended in 5 μl 10 mM Tris (pH 8). This was ligated to 1 μl TrAEL-seq adaptor 1 in 20 μl 1x T4 RNA ligase buffer containing 8 μl 50% PEG 8000 and 1 μl T4 RNA ligase 2 truncated KQ overnight at 25°C. Reactions were resolved on a 15% PAGE/8 M urea gel and stained with SYBR Gold (Thermo S11494) as per manufacturer’s instructions.

Data analysis

Unique Molecular Identifier (UMI) deduplication and mapping: Scripts used for UMI handling as well as more detailed information on the processing are available here: https://github.com/FelixKrueger/TrAEL-seq). Briefly, TrAEL-seq reads are supposed to carry an 8-bp in-line barcode (UMI) at the 5′-end, followed by a variable number of 1 to 3 thymines (T). Read structure is therefore NNNNNNNN(T)nSEQUENCESPECIFIC, where NNNNNNNN is the UMI, and(T)n is the poly(T). The script TrAELseq_preprocessing.py removes the first 8 bp (UMI) of a read and adds the UMI sequence to the end of the readID. After this, up to 3 T (inclusive) at the start of the sequence are removed. Following this UMI and Poly-T preprocessing, reads underwent adapter and quality trimming using Trim Galore (v0.6.5; default parameters; https://github.com/FelixKrueger/TrimGalore). UMI-preprocessed and adapter-/quality-trimmed files were then aligned to the respective genome using Bowtie2 (v2.4.1; option:—local; http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) using local alignments. Finally, alignment results files were deduplicated using UmiBam (v0.2.0; https://github.com/FelixKrueger/Umi-Grinder). This procedure deduplicates alignments based on the mapping position, read orientation, as well as the UMI sequence.

For samples carrying sample-level barcodes, the read structure is NNNNNNNNBBBB(T)nSEQUENCESPECIFIC, where NNNNNNNN is the UMI, BBBB is the sample barcode (currently either AGTC or GACT), and(T)n is the poly(T). A script handling the preprocessing of these libraries is available from the code repository (https://github.com/FelixKrueger/TrAEL-seq/blob/master/TrAELseq_preprocessing_UMIplusBarcode.py).

UMI deduplicated mapped reads were imported into SeqMonk v1.47 (https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/) and immediately truncated to 1 nucleotide at the 5′ end, representing the last nucleotide 5′ of the strand break. Reads were then summed in running windows or around features as described in figure legends. Windows overlapping with non-single-copy regions of the genome were filtered (rDNA, 2μ, mtDNA, CUP1, subtelomeric regions, Ty elements and LTRs), and total read counts across all included windows were normalised to be equal. Scatter plots and average profile plots were generated in SeqMonk, and in the latter case, the data were exported and plots redrawn in GraphPad Prism 8.

For read count quantification and read polarity plots, data were first imported into SeqMonk v1.47 and truncated to 1 nucleotide as described above. Reads (total or separate forward and reverse read counts) were quantitated in running windows as specified in the relevant figure legends before export for plotting using R v4.0.0 in RStudio using the tidyverse package [89,90]. For displaying read counts, values were plotted at the centre of the quantification window and displayed as a continuous line. For read polarity plots, read polarity values were calculated and plotted as either dots (individual samples) or as a continuous line (multiple sample display) for each quantification window using the formula read polarity = (R − F)/(R + F), where F and R relate to the total forward and reverse read counts respectively. The R code to generate these plots can also be found here: https://github.com/FelixKrueger/TrAEL-seq.

A note on read polarity: As a consequence of experimental design, the Illumina sequencing read is the reverse complement of the 3′ extended DNA to which TrAEL adaptor 1 was ligated, and so the first nucleotide of the read is the reverse complement of the last nucleotide 5′ of the break site. To minimise potentially confusing strand inversions, we did not invert the reads during the analysis. In contrast, Sriramachandran and colleagues reversed the polarity of all reads in the analysis pipeline for GLOE-seq [40], which explains the differences in polarity between equivalent analyses in that study and this study. The relationships between the libraries and read mapping statistics are summarised in S2 Table.

Supporting information

S1 Fig. TrAEL-seq library construction details.

(A) Example Bioanalyzer trace for the amplified library of NotI PmeI SfiI-digested yeast genomic DNA. A volume of 1 μl of the 10.5 μl final library was run on a DNA high sensitivity Bioanalyzer chip. This shows a complete absence of adaptor or primer dimers, which is only achieved after 2 successive AMPure purifications. This trace is typical for TrAEL-seq libraries. (B) Schematic of TrAEL-seq read processing pathway. TrAEL-seq reads are the reverse complement of the original DNA end. The 8 nucleotide UMI is removed and stored, then up to 3 T’s are removed from the 5′ of the read. Poor-quality reads and adaptor sequences are removed by TrimGalore, then reads are mapped using Bowtie 2. Deduplication is performed based on the UMI and the mapped start site by UMI grinder, then the reads are finally truncated to a single nucleotide representing the reverse complement of the terminal nucleotide of the original DNA strand. (C) Quantitation of DNA ends generated by SfiI digestion categorised by the 3′ nucleotide or the nucleotide adjacent to the 3′ nucleotide in TrAEL-seq data. Bars show mean and 1 SD. (D) Precision mapping of SfiI cleavage sites by TrAEL-seq and END-seq, as Fig 1D. This graph represents the 10 SfiI sites that have 2 or more As at the 3′ end (GGCCNNAA|NGGCC). In this category are 5 ends with 2 As, 2 ends with 3 As, and 3 ends with 4 As. Mapped locations of 3′ ends were averaged across each category of site and expressed as a percentage of all 3′ ends mapped by each method to that category of site. (E) Scatter plot of log-transformed normalised read counts at all 3,907 Spo11 cleavage hotspots annotated by Mohibullah and Keeney [1], comparing 2 technical replicate TrAEL-seq libraries generated from the same sample of dmc1Δ cells. The 2 libraries were prepared approximately 6 months apart by 2 different researchers from cells stored in 70% ethanol at −70°. (F) Scatter plot of log-transformed normalised read counts at all 3,907 Spo11 cleavage hotspots annotated by Mohibullah and Keeney, comparing dmc1Δ TrAEL-seq with data for Spo11-associated oligonucleotides [14] (SRA accession: SRR1976210). Numerical data underlying this figure can be found in S1 Data. TrAEL-seq, Transferase-Activated End Ligation sequencing; UMI, unique molecular identifier.

(TIF)

S2 Fig. Additional data for detection of replication fork stalling by TrAEL-seq.

(A) Reproducibility of RFB detection between 2 technical replicates. The 2 libraries were prepared approximately 6 months apart by 2 different researchers from cells stored in 70% ethanol at −70°. (B) Detection of RFB peaks without nonreproducible background peaks in 3 biological replicates TrAEL-seq libraries derived from wild-type cells. (C) Replication direction of centromeres, calculated based on the cdc9-AID GLOE-seq data (SRA accession: SRX6436838). Percentage of reverse reads was determined in the regions −1000 to −500 bp and +500 to +1000 bp relative to the annotated centromere, and the average of these values plotted. The region from −500 to +500 bp was excluded as replication fork stalling in this region obscures the replication direction. CEN2 is misleading as it is directly adjacent to a replication origin—see S1 File for profiles of individual centromeres. (D) Average TrAEL-seq profiles across tRNAs ±200 bp for 2 biological replicates of hESC cells, each averaged from 2 technical replicates. Reads are separated by orientation on forward or reverse strands; all tRNAs are included. Read counts per million reads mapped were calculated in nonoverlapping 5 bp bins. (E) Average TrAEL-seq profiles across all centromeres ±1 kb for wild-type and rad52Δ cells. Read counts per million reads mapped were calculated in nonoverlapping 10 bp bins. (F) Average TrAEL-seq profiles across all tRNAs ±200 bp for wild-type and rad52Δ cells. Read counts per million reads mapped were calculated in nonoverlapping 5 bp bins. Numerical data underlying this figure can be found in S2 Data. hESC, human embryonic stem cell; RFB, replication fork barrier; TrAEL-seq, Transferase-Activated End Ligation sequencing.

(TIF)

S3 Fig. Additional data for replication fork directionality of TrAEL-seq data.

(A) Scatter plot showing the percentage of reverse reads compared to all reads in 1 kb genomic windows spaced every 1 kb, comparing TrAEL-seq data from wild-type cells and GLOE-seq data from Cdc9-depleted cells (SRA accession: SRX6436838). (B) Read polarity plots showing TrAEL-seq data for wild type, fob1Δ, and rad52Δ across a single rDNA repeat. The 35S rRNA gene transcribed by RNA polymerase I is shown as a thicker grey line and is transcribed right to left in this representation. Mature rRNA genes are shown in black; the RFB and the ARS are also annotated. Inset is the region containing the RFB sites that is shown in Fig 2B. (C) Scatter plot showing the percentage of reverse reads compared to all reads in 1 kb genomic windows spaced every 1 kb, comparing TrAEL-seq data from 2 technical replicates of wild-type cells. (D) Read polarity plot across chromosome V for TrAEL-seq datasets of wild type compared to the RNase H2 mutants rnh201Δ and rnh202Δ and topoisomerase I mutant top1Δ. (E) Read polarity plot for chromosome V comparing END-seq and TrAEL-seq data generated from two-halves of an agarose plug containing 10 million wild-type 3xCUP1 cells grown in synthetic complete glucose media. Note that the scale for the END-seq data is expanded as the bias in read polarity is much smaller in END-seq libraries. (F) Scatter plot showing the percentage of reverse reads compared to all reads in 250 kb genomic windows spaced every 10 kb, comparing TrAEL-seq data for 2 technical replicates generated from the same hESC sample. (G) Scatter plot showing the percentage of reverse reads compared to all reads in 250 kb genomic windows spaced every 10 kb, comparing TrAEL-seq data for 2 biological replicates of hESCs, each averaged from 2 technical replicates. (H) Scatter plot showing the percentage of reverse reads compared to all reads in 250 kb genomic windows spaced every 10 kb, comparing TrAEL-seq data from hESC cells (average of 2 technical replicates) to GLOE-seq data from LIG1-depleted HCT116 cells (average of SRA accessions: SRX7704535 and SRX7704534). Numerical data underlying this figure can be found in S3S6 Data. ARS, autonomously replicating sequence; hESC, human embryonic stem cell; RFB, replication fork barrier; TrAEL-seq, Transferase-Activated End Ligation sequencing.

(TIF)

S4 Fig. Additional data for detection of environment-dependent replication differences.

(A) Scatter plot showing the percentage of reverse reads compared to all reads in 1 kb genomic windows spaced every 1 kb, comparing TrAEL-seq data wild type and clb5Δ (left). An equivalent comparison between wild type and rnh201Δ (which has a wild-type replication profile) is shown for comparison (right). (B) Plot of read count across the GAL locus on galactose induction for dnl4Δ rad51Δ mutant, as Fig 4B. (C) MA plots of changing read count across the genome on galactose induction for dnl4Δ rad51Δ mutant, as Fig 4C. (D) Read polarity plots showing the replication profile of the region surrounding the GAL locus with and without galactose induction. Green box shows the site at which the replication fork which passes through the GAL locus encounters the oncoming fork from ARS211. (E) Plot of average TrAEL-seq read density around the TSS in the highest 25% expressed genes orientated head-on with replication (as Fig 4D). Data are shown for G1 and G1->S samples (Fig 3E); genes are averaged together within each sample, but the difference in average read count between samples is maintained. The nonreplicating G1 sample contains far less reads on average across TSS regions, and the peak upstream of the TSS is absent. Numerical data underlying this figure can be found in S7 Data. TrAEL-seq, Transferase-Activated End Ligation sequencing; TSS, transcriptional start site.

(TIF)

S5 Fig. Means by which reversed forks could resemble DSBs in southern analysis.

All Southern blot analyses that have reported direct detection of DSBs at RFBs utilise a restriction digestion to separate the region of interest. For the yeast RFB, to our knowledge, the enzyme used has always been BglII, the cleavage sites for which lie 2.2 kb and 2.4 kb each side of the RFB. Forks that reverse past the BglII site would yield a BglII fragment the same size (2.2 kb) as a fork that is cleaved at the RFB. Only fragments that would hybridise to the probe (blue) are shown. DSB, double-strand break; RFB, replication fork barrier.

(TIF)

S1 Table. Yeast strains used in this study.

(XLSX)

S2 Table. List of all libraries produced during this work, including GEO accession and mapping statistics.

(XLSX)

S1 File. TrAEL-seq profiles at individual centromeres.

(PDF)

S2 File. Detailed TrAEL-seq protocol.

(DOC)

S1 Data. Underlying numerical data.

(XLSB)

S2 Data. Underlying numerical data.

(XLSB)

S3 Data. Underlying numerical data.

(XLSB)

S4 Data. Underlying numerical data.

(XLSB)

S5 Data. Underlying numerical data.

(XLSB)

S6 Data. Underlying numerical data.

(XLSB)

S7 Data. Underlying numerical data.

(XLSB)

S1 Raw Images. Raw gel image.

Note that not all lanes are presented in the manuscript.

(PDF)

Acknowledgments

We thank Paula Koko Gonzales and Nicole Forrester of the Babraham Institute Next Generation Sequencing facility for data generation, Scott Keeney for sharing unpublished data, Adele Marston and Aziz El Hage for yeast strains, Stephen Bevan for growing cells, and New England Biolabs technical support for helpful answers to a wide range of enzymology questions during the development of this method.

Abbreviations

ARS

autonomously replicating sequence

DSB

double-strand break

hESC

human embryonic stem cell

RFB

replication fork barrier

rRNA

ribosomal RNA

TdT

terminal deoxynucleotidyl transferase

TrAEL-seq

Transferase-Activated End Ligation sequencing

TSS

transcriptional start site

UMI

unique molecular identifier

Data Availability

All sequencing files are available from the GEO database (accession number(s) GSE154811. Numerical data is in S1S7 Data and image in S1 Raw Images.

Funding Statement

JH was funded by the Wellcome Trust [110216], JH, PRG and FK by the BBSRC [BI Epigenetics ISP: BBS/E/B/000C0423], NK was funded by the MRC [iCASE studentship] and Artios Pharma. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lam I, Keeney S. Mechanism and regulation of meiotic recombination initiation. Cold Spring Harb Perspect Biol. 2014;7(1):a016634. Epub 2014/10/18. 10.1101/cshperspect.a016634 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chi X, Li Y, Qiu X. V(D)J recombination, somatic hypermutation and class switch recombination of immunoglobulins: mechanism and regulation. Immunology. 2020;160(3):233–47. Epub 2020/02/08. 10.1111/imm.13176 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cannan WJ, Pederson DS. Mechanisms and Consequences of Double-Strand DNA Break Formation in Chromatin. J Cell Physiol. 2016;231(1):3–14. Epub 2015/06/05. 10.1002/jcp.25048 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Scully R, Panday A, Elango R, Willis NA. DNA double-strand break repair-pathway choice in somatic mammalian cells. Nat Rev Mol Cell Biol. 2019;20(11):698–714. Epub 2019/07/03. 10.1038/s41580-019-0152-0 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chang HHY, Pannunzio NR, Adachi N, Lieber MR. Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nat Rev Mol Cell Biol. 2017;18(8):495–506. Epub 2017/05/18. 10.1038/nrm.2017.48 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.San Filippo J, Sung P, Klein H. Mechanism of eukaryotic homologous recombination. Annu Rev Biochem. 2008;77:229–57. Epub 2008/02/16. 10.1146/annurev.biochem.77.061306.125255 . [DOI] [PubMed] [Google Scholar]
  • 7.Buhler C, Borde V, Lichten M. Mapping meiotic single-strand DNA reveals a new landscape of DNA double-strand breaks in Saccharomyces cerevisiae. PLoS Biol. 2007;5(12):e324. Epub 2007/12/14. 10.1371/journal.pbio.0050324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gerton JL, DeRisi J, Shroff R, Lichten M, Brown PO, Petes TD. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. PNAS. 2000;97(21):11383–90. Epub 2000/10/12. 10.1073/pnas.97.21.11383 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Borde V, Lin W, Novikov E, Petrini JH, Lichten M, Nicolas A. Association of Mre11p with double-strand break sites during yeast meiosis. Mol Cell. 2004;13(3):389–401. Epub 2004/02/18. 10.1016/s1097-2765(04)00034-6 . [DOI] [PubMed] [Google Scholar]
  • 10.Crosetto N, Mitra A, Silva MJ, Bienko M, Dojer N, Wang Q, et al. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat Methods. 2013;10(4):361–5. Epub 2013/03/19. 10.1038/nmeth.2408 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Canela A, Sridharan S, Sciascia N, Tubbs A, Meltzer P, Sleckman BP, et al. DNA Breaks and End Resection Measured Genome-wide by End Sequencing. Mol Cell. 2016;63(5):898–911. Epub 2016/08/02. 10.1016/j.molcel.2016.06.034 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lensing SV, Marsico G, Hansel-Hertsch R, Lam EY, Tannahill D, Balasubramanian S. DSBCapture: in situ capture and sequencing of DNA breaks. Nat Methods. 2016;13(10):855–7. Epub 2016/08/16. 10.1038/nmeth.3960 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhu Y, Biernacka A, Pardo B, Dojer N, Forey R, Skrzypczak M, et al. qDSB-Seq is a general method for genome-wide quantification of DNA double-strand breaks using sequencing. Nat Commun. 2019;10(1):2313. Epub 2019/05/28. 10.1038/s41467-019-10332-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yan WX, Mirzazadeh R, Garnerone S, Scott D, Schneider MW, Kallas T, et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat Commun. 2017;8:15058. Epub 2017/05/13. 10.1038/ncomms15058 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Biernacka A, Zhu Y, Skrzypczak M, Forey R, Pardo B, Grzelak M, et al. i-BLESS is an ultra-sensitive method for detection of DNA double-strand breaks. Commun Biol. 2018;1:181. Epub 2018/11/06. 10.1038/s42003-018-0165-9 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mimitou EP, Yamada S, Keeney S. A global view of meiotic double-strand break end resection. Science. 2017;355(6320):40–5. Epub 2017/01/07. 10.1126/science.aak9704 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chakraborty A, Jenjaroenpun P, Li J, El Hilali S, McCulley A, Haarer B, et al. Replication Stress Induces Global Chromosome Breakage in the Fragile X Genome. Cell Rep. 2020;32(12):108179. Epub 2020/09/24. 10.1016/j.celrep.2020.108179 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hoffman EA, McCulley A, Haarer B, Arnak R, Feng W. Break-seq reveals hydroxyurea-induced chromosome fragility as a result of unscheduled conflict between DNA replication and transcription. Genome Res. 2015;25(3):402–12. Epub 2015/01/23. 10.1101/gr.180497.114 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gittens WH, Johnson DJ, Allison RM, Cooper TJ, Thomas H, Neale MJ. A nucleotide resolution map of Top2-linked DNA breaks in the yeast and human genome. Nat Commun. 2019;10(1):4846. Epub 2019/10/28. 10.1038/s41467-019-12802-5 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gomez-Gonzalez B, Aguilera A. Transcription-mediated replication hindrance: a major driver of genome instability. Genes Dev. 2019;33(15–16):1008–26. Epub 2019/05/28. 10.1101/gad.324517.119 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Crossley MP, Bocek M, Cimprich KA. R-Loops as Cellular Regulators and Genomic Threats. Mol Cell. 2019;73(3):398–411. Epub 2019/02/09. 10.1016/j.molcel.2019.01.024 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Cortez D. Replication-Coupled DNA Repair. Mol Cell. 2019;74(5):866–76. Epub 2019/06/08. 10.1016/j.molcel.2019.04.027 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Powers KT, Washington MT. Eukaryotic translesion synthesis: Choosing the right tool for the job. DNA Repair (Amst). 2018;71:127–34. Epub 2018/09/04. 10.1016/j.dnarep.2018.08.016 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhao PA, Sasaki T, Gilbert DM. High-resolution Repli-Seq defines the temporal choreography of initiation, elongation and termination of replication in mammalian cells. Genome Biol. 2020;21(1):76. Epub 2020/03/27. 10.1186/s13059-020-01983-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Muller CA, Hawkins M, Retkute R, Malla S, Wilson R, Blythe MJ, et al. The dynamics of genome replication using deep sequencing. Nucleic Acids Res. 2014;42(1):e3. Epub 2013/10/04. 10.1093/nar/gkt878 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Marchal C, Sasaki T, Vera D, Wilson K, Sima J, Rivera-Mulia JC, et al. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq. Nat Protoc. 2018;13(5):819–39. Epub 2018/03/31. 10.1038/nprot.2017.148 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Batrakou DG, Muller CA, Wilson RHC, Nieduszynski CA. DNA copy-number measurement of genome replication dynamics by high-throughput sequencing: the sort-seq, sync-seq and MFA-seq family. Nat Protoc. 2020;15(3):1255–84. Epub 2020/02/14. 10.1038/s41596-019-0287-7 . [DOI] [PubMed] [Google Scholar]
  • 28.Takahashi S, Miura H, Shibata T, Nagao K, Okumura K, Ogata M, et al. Genome-wide stability of the DNA replication program in single mammalian cells. Nat Genet. 2019;51(3):529–40. Epub 2019/02/26. 10.1038/s41588-019-0347-5 . [DOI] [PubMed] [Google Scholar]
  • 29.Dileep V, Gilbert DM. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. Nat Commun. 2018;9(1):427. Epub 2018/02/01. 10.1038/s41467-017-02800-w . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mesner LD, Valsakumar V, Karnani N, Dutta A, Hamlin JL, Bekiranov S. Bubble-chip analysis of human origin distributions demonstrates on a genomic scale significant clustering into zones and significant association with transcription. Genome Res. 2011;21(3):377–89. Epub 2010/12/22. 10.1101/gr.111328.110 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Langley AR, Graf S, Smith JC, Krude T. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq). Nucleic Acids Res. 2016;44(21):10230–47. Epub 2016/09/03. 10.1093/nar/gkw760 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cayrou C, Coulombe P, Vigneron A, Stanojcic S, Ganier O, Peiffer I, et al. Genome-scale analysis of metazoan replication origins reveals their organization in specific but flexible sites defined by conserved features. Genome Res. 2011;21(9):1438–49. Epub 2011/07/14. 10.1101/gr.121830.111 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cadoret JC, Meisch F, Hassan-Zadeh V, Luyten I, Guillet C, Duret L, et al. Genome-wide studies highlight indirect links between human replication origins and gene regulation. PNAS. 2008;105(41):15837–42. Epub 2008/10/08. 10.1073/pnas.0805208105 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Besnard E, Babled A, Lapasset L, Milhavet O, Parrinello H, Dantec C, et al. Unraveling cell type-specific and reprogrammable human replication origin signatures associated with G-quadruplex consensus motifs. Nat Struct Mol Biol. 2012;19(8):837–44. Epub 2012/07/04. 10.1038/nsmb.2339 . [DOI] [PubMed] [Google Scholar]
  • 35.Smith DJ, Whitehouse I. Intrinsic coupling of lagging-strand synthesis to chromatin assembly. Nature. 2012;483(7390):434–8. Epub 2012/03/16. 10.1038/nature10895 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Petryk N, Kahli M, d’Aubenton-Carafa Y, Jaszczyszyn Y, Shen Y, Silvain M, et al. Replication landscape of the human genome. Nat Commun. 2016;7:10208. Epub 2016/01/12. 10.1038/ncomms10208 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Keszthelyi A, Daigaku Y, Ptasinska K, Miyabe I, Carr AM. Mapping ribonucleotides in genomic DNA and exploring replication dynamics by polymerase usage sequencing (Pu-seq). Nat Protoc. 2015;10(11):1786–801. Epub 2015/10/23. 10.1038/nprot.2015.116 . [DOI] [PubMed] [Google Scholar]
  • 38.Cao H, Salazar-Garcia L, Gao F, Wahlestedt T, Wu CL, Han X, et al. Novel approach reveals genomic landscapes of single-strand DNA breaks with nucleotide resolution in human cells. Nat Commun. 2019;10(1):5799. Epub 2019/12/22. 10.1038/s41467-019-13602-7 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Cao B, Wu X, Zhou J, Wu H, Liu L, Zhang Q, et al. Nick-seq for single-nucleotide resolution genomic maps of DNA modifications and damage. Nucleic Acids Res. 2020;48(12):6715–25. Epub 2020/06/03. 10.1093/nar/gkaa473 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sriramachandran AM, Petrosino G, Mendez-Lago M, Schafer AJ, Batista-Nascimento LS, Zilio N, et al. Genome-wide Nucleotide-Resolution Mapping of DNA Replication Patterns, Single-Strand Breaks, and Lesions by GLOE-Seq. Mol Cell. 2020;78(5):975–85 e7. Epub 2020/04/23. 10.1016/j.molcel.2020.03.027 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schmidt WM, Mueller MW. Controlled ribonucleotide tailing of cDNA ends (CRTC) by terminal deoxynucleotidyl transferase: a new approach in PCR-mediated analysis of mRNA sequences. Nucleic Acids Res. 1996;24(9):1789–91. Epub 1996/05/01. 10.1093/nar/24.9.1789 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Miura F, Shibata Y, Miura M, Sangatsuda Y, Hisano O, Araki H, et al. Highly efficient single-stranded DNA ligation technique improves low-input whole-genome bisulfite sequencing by post-bisulfite adaptor tagging. Nucleic Acids Res. 2019;47(15):e85. Epub 2019/05/23. 10.1093/nar/gkz435 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Neale MJ, Keeney S. Clarifying the mechanics of DNA strand exchange in meiotic recombination. Nature. 2006;442(7099):153–8. Epub 2006/07/14. 10.1038/nature04885 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Mimitou EP, Symington LS. DNA end resection—unraveling the tail. DNA Repair (Amst). 2011;10(3):344–8. Epub 2011/01/14. 10.1016/j.dnarep.2010.12.004 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Claeys Bouuaert C, Tischfield SE, Pu S, Mimitou EP, Arias-Palomo E, Berger JM, et al. Structural and functional characterization of the Spo11 core complex. Nat Struct Mol Biol. 2021;28(1):92–102. Epub 2021/01/06. 10.1038/s41594-020-00534-w . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pan J, Sasaki M, Kniewel R, Murakami H, Blitzblau HG, Tischfield SE, et al. A hierarchical combination of factors shapes the genome-wide topography of yeast meiotic recombination initiation. Cell. 2011;144(5):719–31. Epub 2011/03/08. 10.1016/j.cell.2011.02.009 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mohibullah N, Keeney S. Numerical and spatial patterning of yeast meiotic DNA breaks by Tel1. Genome Res. 2017;27(2):278–88. Epub 2016/12/08. 10.1101/gr.213587.116 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kobayashi T, Horiuchi T. A yeast gene product, Fob1 protein, required for both replication fork blocking and recombinational hotspot activities. Genes Cells. 1996;1(5):465–74. 10.1046/j.1365-2443.1996.d01-256.x . [DOI] [PubMed] [Google Scholar]
  • 49.Ward TR, Hoang ML, Prusty R, Lau CK, Keil RL, Fangman WL, et al. Ribosomal DNA replication fork barrier and HOT1 recombination hot spot: shared sequences but independent activities. Mol Cell Biol. 2000;20(13):4948–57. Epub 2000/06/10. 10.1128/mcb.20.13.4948-4957.2000 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Kobayashi T, Nomura M, Horiuchi T. Identification of DNA cis elements essential for expansion of ribosomal DNA repeats in Saccharomyces cerevisiae. Mol Cell Biol. 2001;21(1):136–47. 10.1128/MCB.21.1.136-147.2001 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Voelkel-Meiman K, Keil RL, Roeder GS. Recombination-stimulating sequences in yeast ribosomal DNA correspond to sequences regulating transcription by RNA polymerase I. Cell. 1987;48(6):1071–9. 10.1016/0092-8674(87)90714-8 . [DOI] [PubMed] [Google Scholar]
  • 52.Huang GS, Keil RL. Requirements for activity of the yeast mitotic recombination hotspot HOT1: RNA polymerase I and multiple cis-acting sequences. Genetics. 1995;141(3):845–55. Epub 1995/11/01. . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Burkhalter MD, Sogo JM. rDNA enhancer affects replication initiation and mitotic recombination: Fob1 mediates nucleolytic processing independently of replication. Mol Cell. 2004;15(3):409–21. Epub 2004/08/12. 10.1016/j.molcel.2004.06.024 . [DOI] [PubMed] [Google Scholar]
  • 54.Weitao T, Budd M, Hoopes LL, Campbell JL. Dna2 helicase/nuclease causes replicative fork stalling and double-strand breaks in the ribosomal DNA of Saccharomyces cerevisiae. J Biol Chem. 2003;278(25):22513–22. Epub 2003/04/11. 10.1074/jbc.M301610200 . [DOI] [PubMed] [Google Scholar]
  • 55.Gruber M, Wellinger RE, Sogo JM. Architecture of the replication fork stalled at the 3′ end of yeast ribosomal genes. Mol Cell Biol. 2000;20(15):5777–87. Epub 2000/07/13. 10.1128/mcb.20.15.5777-5787.2000 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Akamatsu Y, Kobayashi T. The Human RNA Polymerase I Transcription Terminator Complex Acts as a Replication Fork Barrier That Coordinates the Progress of Replication with rRNA Transcription Activity. Mol Cell Biol. 2015;35(10):1871–81. Epub 2015/03/18. 10.1128/MCB.01521-14 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Lopez-estrano C, Schvartzman JB, Krimer DB, Hernandez P. Co-localization of polar replication fork barriers and rRNA transcription terminators in mouse rDNA. J Mol Biol. 1998;277(2):249–56. Epub 1998/06/06. 10.1006/jmbi.1997.1607 . [DOI] [PubMed] [Google Scholar]
  • 58.Gerber JK, Gogel E, Berger C, Wallisch M, Muller F, Grummt I, et al. Termination of mammalian rDNA replication: polar arrest of replication fork movement by transcription termination factor TTF-I. Cell. 1997;90(3):559–67. Epub 1997/08/08. 10.1016/s0092-8674(00)80515-2 . [DOI] [PubMed] [Google Scholar]
  • 59.Sanchez-Gorostiaga A, Lopez-Estrano C, Krimer DB, Schvartzman JB, Hernandez P. Transcription termination factor reb1p causes two replication fork barriers at its cognate sites in fission yeast ribosomal DNA in vivo. Mol Cell Biol. 2004;24(1):398–406. Epub 2003/12/16. 10.1128/mcb.24.1.398-406.2004 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Lopez-Estrano C, Schvartzman JB, Krimer DB, Hernandez P. Characterization of the pea rDNA replication fork barrier: putative cis-acting and trans-acting factors. Plant Mol Biol. 1999;40(1):99–110. Epub 1999/07/08. 10.1023/a:1026405311132 . [DOI] [PubMed] [Google Scholar]
  • 61.Little RD, Platt TH, Schildkraut CL. Initiation and termination of DNA replication in human rRNA genes. Mol Cell Biol. 1993;13(10):6600–13. Epub 1993/10/01. 10.1128/mcb.13.10.6600 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Lebofsky R, Bensimon A. DNA replication origin plasticity and perturbed fork progression in human inverted repeats. Mol Cell Biol. 2005;25(15):6789–97. Epub 2005/07/19. 10.1128/MCB.25.15.6789-6797.2005 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Greenfeder SA, Newlon CS. Replication forks pause at yeast centromeres. Mol Cell Biol. 1992;12(9):4056–66. Epub 1992/09/01. 10.1128/mcb.12.9.4056 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Deshpande AM, Newlon CS. DNA replication fork pause sites dependent on transcription. Science. 1996;272(5264):1030–3. Epub 1996/05/17. 10.1126/science.272.5264.1030 . [DOI] [PubMed] [Google Scholar]
  • 65.Ivessa AS, Lenzmeier BA, Bessler JB, Goudsouzian LK, Schnakenberg SL, Zakian VA. The Saccharomyces cerevisiae helicase Rrm3p facilitates replication past nonhistone protein-DNA complexes. Mol Cell. 2003;12(6):1525–36. Epub 2003/12/24. 10.1016/s1097-2765(03)00456-8 . [DOI] [PubMed] [Google Scholar]
  • 66.Osmundson JS, Kumar J, Yeung R, Smith DJ. Pif1-family helicases cooperatively suppress widespread replication-fork arrest at tRNA genes. Nat Struct Mol Biol. 2017;24(2):162–70. Epub 2016/12/20. 10.1038/nsmb.3342 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Azvolinsky A, Dunaway S, Torres JZ, Bessler JB, Zakian VA. The S. cerevisiae Rrm3p DNA helicase moves with the replication fork and affects replication of all yeast chromosomes. Genes Dev. 2006;20(22):3104–16. Epub 2006/11/23. 10.1101/gad.1478906 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Sasaki M, Kobayashi T. Ctf4 Prevents Genome Rearrangements by Suppressing DNA Double-Strand Break Formation and Its End Resection at Arrested Replication Forks. Mol Cell. 2017;66(4):533–45 e5. Epub 2017/05/20. 10.1016/j.molcel.2017.04.020 . [DOI] [PubMed] [Google Scholar]
  • 69.Jack CV, Cruz C, Hull RM, Keller MA, Ralser M, Houseley J. Regulation of ribosomal DNA amplification by the TOR pathway. PNAS. 2015;112(31):9674–9. 10.1073/pnas.1505015112 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Houseley J, Tollervey D. Repeat expansion in the budding yeast ribosomal DNA can occur independently of the canonical homologous recombination machinery. Nucleic Acids Res. 2011;39(20):8778–91. Epub 2011/07/20. 10.1093/nar/gkr589 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Nick McElhinny SA, Kumar D, Clark AB, Watt DL, Watts BE, Lundstrom EB, et al. Genome instability due to ribonucleotide incorporation into DNA. Nat Chem Biol. 2010;6(10):774–81. Epub 2010/08/24. 10.1038/nchembio.424 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Kellner V, Luke B. Molecular and physiological consequences of faulty eukaryotic ribonucleotide excision repair. EMBO J. 2020;39(3):e102309. Epub 2019/12/14. 10.15252/embj.2019102309 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Strumberg D, Pilon AA, Smith M, Hickey R, Malkas L, Pommier Y. Conversion of topoisomerase I cleavage complexes on the leading strand of ribosomal DNA into 5′-phosphorylated DNA double-strand breaks by replication runoff. Mol Cell Biol. 2000;20(11):3977–87. Epub 2000/05/11. 10.1128/mcb.20.11.3977-3987.2000 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Donaldson AD, Raghuraman MK, Friedman KL, Cross FR, Brewer BJ, Fangman WL. CLB5-dependent activation of late replication origins in S. cerevisiae. Mol Cell. 1998;2(2):173–82. Epub 1998/09/12. 10.1016/s1097-2765(00)80127-6 . [DOI] [PubMed] [Google Scholar]
  • 75.Hull RM, Cruz C, Jack CV, Houseley J. Environmental change drives accelerated adaptation through stimulated copy number variation. PLoS Biol. 2017;15(6):e2001333. 10.1371/journal.pbio.2001333 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Thomas BJ, Rothstein R. Elevated recombination rates in transcriptionally active DNA. Cell. 1989;56(4):619–30. 10.1016/0092-8674(89)90584-9 . [DOI] [PubMed] [Google Scholar]
  • 77.Hull RM, King M, Pizza G, Krueger F, Vergara X, Houseley J. Transcription-induced formation of extrachromosomal DNA during yeast ageing. PLoS Biol. 2019;17(12):e3000471. Epub 2019/12/04. 10.1371/journal.pbio.3000471 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Szilard RK, Jacques PE, Laramee L, Cheng B, Galicia S, Bataille AR, et al. Systematic identification of fragile sites via genome-wide location analysis of gamma-H2AX. Nat Struct Mol Biol. 2010;17(3):299–305. Epub 2010/02/09. 10.1038/nsmb.1754 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Azvolinsky A, Giresi PG, Lieb JD, Zakian VA. Highly transcribed RNA polymerase II genes are impediments to replication fork progression in Saccharomyces cerevisiae. Mol Cell. 2009;34(6):722–34. Epub 2009/06/30. S1097–2765(09)00383–9 [pii] 10.1016/j.molcel.2009.05.022 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Forey R, Poveda A, Sharma S, Barthe A, Padioleau I, Renard C, et al. Mec1 Is Activated at the Onset of Normal S Phase by Low-dNTP Pools Impeding DNA Replication. Mol Cell. 2020;78(3):396–410 e4. Epub 2020/03/15. 10.1016/j.molcel.2020.02.021 . [DOI] [PubMed] [Google Scholar]
  • 81.Churchman LS, Weissman JS. Nascent transcript sequencing visualizes transcription at nucleotide resolution. Nature. 2011;469(7330):368–73. Epub 2011/01/21. 10.1038/nature09652 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Rossi RL, Zinzalla V, Mastriani A, Vanoni M, Alberghina L. Subcellular localization of the cyclin dependent kinase inhibitor Sic1 is modulated by the carbon source in budding yeast. Cell Cycle. 2005;4(12):1798–807. Epub 2005/11/19. 10.4161/cc.4.12.2189 . [DOI] [PubMed] [Google Scholar]
  • 83.Chilkova O, Stenlund P, Isoz I, Stith CM, Grabowski P, Lundstrom EB, et al. The eukaryotic leading and lagging strand DNA polymerases are loaded onto primer-ends via separate mechanisms but have comparable processivity in the presence of PCNA. Nucleic Acids Res. 2007;35(19):6588–97. Epub 2007/10/02. 10.1093/nar/gkm741 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Dehe PM, Gaillard PHL. Control of structure-specific endonucleases to maintain genome stability. Nat Rev Mol Cell Biol. 2017;18(5):315–30. Epub 2017/03/23. 10.1038/nrm.2016.177 . [DOI] [PubMed] [Google Scholar]
  • 85.Tubbs A, Sridharan S, van Wietmarschen N, Maman Y, Callen E, Stanlie A, et al. Dual Roles of Poly(dA:dT) Tracts in Replication Initiation and Fork Collapse. Cell. 2018;174(5):1127–42 e19. Epub 2018/08/07. 10.1016/j.cell.2018.07.011 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Ait Saada A, Lambert SAE, Carr AM. Preserving replication fork integrity and competence via the homologous recombination pathway. DNA Repair (Amst). 2018;71:135–47. Epub 2018/09/18. 10.1016/j.dnarep.2018.08.017 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Muller CA, Boemo MA, Spingardi P, Kessler BM, Kriaucionis S, Simpson JT, et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat Methods. 2019;16(5):429–36. Epub 2019/04/24. 10.1038/s41592-019-0394-y . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Dona F, Houseley J. Unexpected DNA loss mediated by the DNA binding activity of ribonuclease A. PLoS ONE. 2014;9(12):e115008. 10.1371/journal.pone.0115008 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Wickham H, Averick M, Bryan J, Chang W, Mcgowan LDA, François R, et al. Welcome to the Tidyverse. J Open Source Softw. 2019;4(43):1686. 10.21105/joss.01686 [DOI] [Google Scholar]
  • 90.Team RC. R: A language and environment for statistical computing. 2013;R Foundation for Statistical Computing, Vienna, Austria.:URL http://www.R-project.org/. [Google Scholar]

Decision Letter 0

Roland G Roberts

4 Aug 2020

Dear Jon,

Thank you for submitting your manuscript entitled "Genome-wide analysis of DNA replication and DNA double strand breaks by TrAEL-seq" for consideration as a Methods and Resources by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Aug 06 2020 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli

Roland G Roberts, PhD,

Senior Editor

PLOS Biology

Decision Letter 1

Roland G Roberts

21 Sep 2020

Dear Jon,

Thank you very much for submitting your manuscript "Genome-wide analysis of DNA replication and DNA double strand breaks by TrAEL-seq" for consideration as a Methods and Resources paper at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by three independent reviewers.

You’ll see that all three reviewers are broadly positive about the study, but raise a number of overlapping concerns that will need to be addressed, some involving additional experimental work (e.g. both revs #1 and #3 suggest running TrAEL-seq after alpha-factor arrest); there are also a range of textual and presentational requests.

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.

We expect to receive your revised manuscript within 3 months.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Best wishes,

Roli

Roland G Roberts, PhD,

Senior Editor,

rroberts@plos.org,

PLOS Biology

*****************************************************

REVIEWERS' COMMENTS:

Reviewer #1:

The authors introduce a method for the genome-wide detection of DNA 3' ends that they call TrAEL-seq. The initial experiments clearly demonstrate the sensitivity of the approach to detect free 3' ends generated by restriction enzyme cuts. The authors then go on to assess the pattern of meiotic double strand breaks, sites of replication fork stalling and the pattern of DNA replication. Intriguingly, and somewhat frustratingly, the authors don't have an explanation for why TrAEL-seq can be used to determine the direction of replication forks. Overall, TrAEL-seq looks to be a useful addition to the wide range of methods available to assess DNA ends and DNA replication.

I have a number of comments that I hope might help the authors improve their manuscript:

1. This manuscript is primarily a methods paper and therefore I think it is important that a number of additional controls are considered:

- for the second half of the manuscript the authors use TrAEL-seq to assess DNA replication. However, I didn't spot an experiment that shows that the signal the authors are detecting is DNA replication dependent. A simple TrAEL-seq on arrested yeast cells (e.g. in alpha-factor) would provide clear evidence that the output reported is replication dependent. There are multiple places in the manuscript where the authors describe the signal and state that it is DNA replication dependent, however without a non-replicating control I don't see how they can support these statements. Furthermore, there is likely to be some level of background (there is to any genomics method) and a non-replicating control should allow the authors to assess this. This would be particularly important when the authors compare mutants or growth conditions (glucose vs. other carbon sources), since the fraction of S phase cells is very likely to be different and therefore background signal could be influencing the results.

- that authors use restriction with SfiI to introduce the method. These were well thought through and well executed experiments. This is an enzyme that leaves a 3' overhang. At various points through the manuscript the authors mention that TrAEL-seq has differing sensitivities to various 3' ends (e.g. figure 2A coloured dots). It would be useful for the authors to properly assess this by performing a TrAEL-seq experiment with a cocktail of various enzymes that leave different produces (nicking, 5' overhand, 3' overhand, blunt ends).

- the authors undertake a comprehensive analysis of the bases detected at the 3' end of the SfiI digested material. I would like to see the authors undertake a comparable analysis to see if there is any preference for the base at or close by to the detected 3' end when TrAEL-seq was applied to various genomic DNA substrates.

- how much does the TrAEL-seq read density across the genome vary? In Fig. 3B (and similar figures) the authors present the strand bias, but not the total number of reads mapping to each region. This would be a useful analysis, particularly when considering what might be the source of the 3' ends detected by TrAEL-seq. In figure 4B the density on the reverse strand is shown and appears to have some periodicity; I'd encourage the authors to look further as to the nature and potential source of this periodicity.

2. The authors don't discuss the difference between wild-type and rad52-delta cells at centromeres (rad52-delta gives reduced peaks) and tRNA genes (rad52-delta seems identical to wild-type). Why is the signal at centromeres reduced in the rad52 mutant?

3. The authors speculate as to why TrAEL-seq has the ability to report on replication fork direction, but (as they clearly state) they don't have a confirmed molecular mechanism. As such, I think they should be more careful in their terminology in some places, particularly in referring to the y-axis of plots such as Fig 3B, D, F & G as "Replication Fork Direction". They are not measuring replication fork direction (and don't have a explanation for why what they are measuring correlates with replication fork direction) and therefore this shouldn't be used to label these axis. The authors should find an alternative description, such as "strand bias" for the manuscript text and figure labels.

4. Are the authors confident that RNase T1 (used to treat DNA in plugs) cannot digest adjacent to ribose bases incorporated into genomic DNA? If it can this could offer an alternative explanation for the observed strand bias that allows the authors to infer replication fork direction. However, it would also require the authors to reassess the rnh201-null/rnh202-null experiments, since in this case they would not be reporting what the authors think they are.

5. What is the difference between the dnl4/rad51 double mutant and rad52-delta data?

6. In the results the authors suggest that "the replication fork is stalled by chromatin or proteins bound at the promoter" when the GAL genes are induced. The authors should test this hypothesis by looking in their other datasets at the promoters of constitutively highly expressed genes.

7. The authors have a highly plausible explanation for how 3' ends might report replication fork direction: "The mechanism responsible must frequently rearrange replication forks to make the leading strand 3' end accessible to TdT while the lagging strand 3' end remains largely inaccessible. We suggest that transient fork reversal would have this effect..." First, the use of the word "must" here seems too strong given how many unknowns remain (for example, see suggested control experiments above). Second, I suspect that this transient fork reversal is more likely to take place in the plug after Proteinase K treatment rather than in cells. The DNA is treated with Proteinase K at 50 °C which should allow for transient melting of the free 3' end (on the leading strand) and thus interconversion between the species drawn out in figure 5. Both the elevated temperature and the removal of proteins would seem to make this interconversion more likely than in cells.

Finally, the authors use green and red dots to distinguish certain features (fig 2A & 5), plus possibly green and orange violins (fig 4E). I'm red/green colour-blind and the colours used are indistinguishable to me. I don't know what the PLoS family of journals policy is on the use of colours accessible to all, but I'd encourage the authors to change this. More information and advise is available here:

https://thenode.biologists.com/data-visualization-with-flying-colors/research/

Minor comments:

- introduction: "This can be a problem as end resection forms extended tracts of 3' extended single stranded DNA..." I think this could be more clearly phrased to avoid using the word "extended" twice.

- introduction: "However, these methods do not have the resolution to detect individual origins unless markedly different in timing, and a range of other more specialised approaches have been applied to study replication initiation..." This is not correct. In budding yeast (the organism used for most of the presented work) copy number approaches are more than sufficient to detect individual replication origins. Furthermore, copy number approaches are very much more straightforward than the approach presented in this paper.

- results: "This was ligated with ~10% efficiency to pre-adenylated TrAEL-seq adaptor 1 using truncated T4 RNA ligase 2 KQ, a ligase that is specific for 5' adenylated adaptors (Fig. 1B)". I did not understand what the authors were doing here; why was the adaptor pre-adenylated and why are they interested in a 5' adenylated adaptor when TrAEL-seq uses a 3' adenylated substrate?

- results: "...consistent with previous studies that reported both co-directional and head-on tRNA transcription can stall replication forks, at least in the absence of replicative helicases (Fig. 2E arrows) [65-67]." The authors have missed a pre-genomics study from Carol Newlon's group:

Deshpande, A. M., & Newlon, C. S. (1996). DNA replication fork pause sites dependent on transcription. Science (New York, NY), 272(5264), 1030-1033. http://doi.org/10.1126/science.272.5264.1030

This paper uses DNA 2D gels to detect replication fork pausing at a tRNA genes in wild-type cells.

- results: "There should never be less 3' ends on the lagging strand..." less should be fewer.

- results: "ARS211" should be in italics.

- discussion "existing methods profile either fork direction or origin timing" - I think that recently developed single molecule nanopore methods, such as D-NAscent, can profile both fork direction and origin timing.

- fig 1A: it would be useful for the authors to label at least a couple of the free ends with 5' and 3'.

- fig 1B: which lanes contain substrate? What's the difference between lanes 4 & 5? Or 0 and 2?

- fig 1E: I think that a further 'zoom in' on one of the peaks would be a valuable addition.

- fig 2D: why have the authors presented a meta-analysis of the signal across the various classes of centromeres? It would be useful for the authors to show each individual centromere in the supplement.

- fig 3G: why have the authors reversed the y-axis for the human GLOE-seq data, but not the yeast?

- fig 4B: it would be valuable for the authors to show the forward strand reads in addition to the reverse strand reads.

- fig 4C: I'm not sure I fully understand how the data has been normalised for these two plots. What is responsible for the asymmetry? There seem to be many more locations where the read count is lower (e.g. below -50) than higher (e.g. above +50).

- fig S1A: a scale bar should be added.

- fig S1B: it is not clear what the x-axis scale is. Is this the difference in cycle number compared to the control?

- fig S2B: why is there such a large difference in peak height between the two conditions?

Reviewer #2:

In this manuscript, Kara and colleagues describe a novel NGS-based assay to map DNA double-strand breaks called Transferase-Activated End Ligation sequencing (TrAEL-seq). This method adds to the long list of assays (BLESS, BLISS, Break-seq, END-seq, i-BLESS, GLOE-seq…) that have been recently developed to map single- and double-strand DNA breaks and have revolutionized the fields of DNA repair and genome stability. The authors convincingly show that this method can efficiently capture single stranded DNA 3' ends genome-wide in yeast and in human cells. From this respect, this method differs from BLESS and related assays (BLISS, END-seq, i-BLESS…) in that it maps the real cleavage site and does not require the degradation of the 3' overhang, which can be several kb long. Yet, TrAEL-seq is not just another method to map DSBs. Indeed, the authors show that TrAEL-seq provides also a very sensitive and accurate method to map replication fork direction (RFD) in unsynchronized populations of cells. The quality of RFD maps provided for yeast and human ES cells is very impressive. It seems to match the performance of more labor-intensive assays such as OK-seq without requiring the use of ligase mutants. Moreover, the authors show that TrAEL-seq can be used to map programmed pause sites at the rDNA of yeast and human cells and at tRNA genes with an unprecedented resolution. They propose that the ability of TrAEL-seq to map paused and moving forks is due to frequent fork reversal, exposing a 3' overhang on the leading strand that can be extended by TdT to capture it.

Beyond the validation of the assay, the manuscript provides convincing evidence that transcription does not induce replication fork pausing in budding yeast, which remains a controversial issue (see specific issues below). It also shows that changes in carbon sources in the growth medium alters the position of replication termination sites, presumably by altering the timing of origin firing. It also shows that the replication fork barrier is polar in hESCs, as it is the case in yeast and unlike in human cancer cell lines.

Overall, the manuscript is very well written and the data are clear and convincing. The experimental pipeline is well described and the detailed protocol provided in supplementary materials will be very useful to those interested in implementing TrAEL-seq in their laboratories. Importantly, the authors also provide advices on how to combine TrAEL-seq with other methods and stress the fact that many of the existing DSB mapping assays are complementary and should be combined to embrace the complexity of DNA damage and repair processes. In conclusion, I have no doubt that this novel assay will be quickly adopted by a wide community and especially to those working on DNA replication and genomic instability. However, the following issues need to be addressed before publication.

Specific issues:

1. Page 1 and 2, the authors review the existing NGS-based assays to map DNA breaks but omitted Break-seq, an assay developed by the Feng lab (Hoffman et al, 2015, Genome Research 25, 402). Is there any reason for that? If not, it would be fair to mention this method together with the others.

2. Modifications of BLESS (page 2) have not only improved ligation efficiency, but also increased the signal-noise ratio. As discussed in ref 15, the main limitation of the original BLESS assay was a very high noise caused by artifacts associated with formaldehyde fixation.

3. One of the most striking features of TrAEL-seq is its sensitivity for replication forks. The author's explanation that limited fork reversal occurs very frequently even at unchallenged forks is plausible. Another likely possibility that is not discussed by the authors is that branch migration or partial unwinding of nascent DNA occurs after DNA extraction. Would it be possible to experimentally address this possibility?

4. Figure 4B shows that highly expressed PolII genes do not induce replication fork pausing, unlike tRNA and rRNA genes. This observation argues against earlier studies in budding yeast (e.g. Azvolinsky et al., 2009, Mol. Cell 34, 722) but is consistent with a more recent study showing that spontaneous replication stress in yeast is not caused by replication-transcription conflicts (Forey et al., 2020, Mol Cell 78, 396). This issue should be discussed in the manuscript.

5. Figure 4B also shows that breaks occur at the promoter of GAL7 and GAL10 genes upon galactose induction. Do these breaks depend on DNA replication or are they also detected in G1 cells?

6. Although the conclusion that ~2% of the yeast genome shows altered replication direction between glucose and non-glucose media is valid, the way Fig. 4E was built is unclear. This figure aims at comparing RFD in cells grown on different carbon sources "by filtering windows for those with a difference in RFD>0.4 between the two sets". However, several strains grown on glucose (fob1, rad52, rnh201, rnh202, clb5) were not tested on raffinose and reciprocally for dln4 rad51, so it is not clear which two sets were compared to select windows with a RFD>0.4. For instance, what is clb5 grown on glucose compared to? If it is to wild type cells on raffinose, it is misleading to assume that the difference observed is due to the carbon source. Since samples are not labeled in the figure, it gives the impression that authors are comparing apples and pears. The authors should rather restrict the main figure to wild type cells grown on glucose, raffinose and raffinose + galactose and move the figure with the whole set of data to supplementary material after clearly explaining how samples were compared to each other. Focusing the main figure on wild type cells would also stress the fact that the main determinant of this RFD difference is the absence of glucose and not the induction of GAL genes.

Reviewer #3:

General comments:

1. The overall clarity of the paper could be improved by making it clear in Figure 1A that TrAEL-seq reads are the opposite polarity of the actual inferred 3' end. Throughout the paper, I was occasionally confused by this, until noticing in figure caption 1A that "TrAEL-seq reads map antisense to the cleaved strand, reading the complementary sequence starting from the first nucleotide before the cleavage site.". This description itself is confusing and would be clarified with a better and more complete figure in Figure 1A (and elsewhere.

This confusion arises again later when comparing to GLOE-seq data.

2. Abstract. Please revise extensively to make the text appropriate for a general non-specialist audience (e.g. PLoS Biology). The opening the abstract requires some general information about DNA replication (what it is, what it is for, what the problems/questions are) that sets the scene. As written it jumps in too fast to a technical aspect of DNA replication fork stalling. Indeed, to a non-specialist it is not even clear the text is talking about *DNA* replication.

3. In general, please take time to split the text up more into smaller paragraphs with more headings, and make sure each experimental section is appropriately introduced. As currently written, it will confuse a lot of the PLoS Biology readership. Currently, too many things are left unstated. Please be explicit.

---

Page 8. Major comments:

1. The authors contend that they are able to map nascent replication forks. If this is correct, the authors should test by preparing libraries in alpha-factor arrest (or G2/M arrest). A more complex experimental test (that may not work) would be a release into S-phase and demonstrate both that signals arise and also move in the expected direction of replication.

2. The authors suggest that they are detecting transient instances of fork reversal. Whilst possible, it strikes me as highly unlikely that such events would be so frequent and so prevalent as to generate the data in Fig 3. I would like to suggest an alternative explanation: That they are detecting instances of fork collision with Top1 on the leading strand ahead of the fork. Please refer to: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC85758/

The above paper demonstrates how only Top1 CCs on the leading strand template generate DSBs. Indeed this model could explain the strand disparity observed by TrAEL-seq. (Note that the Top1-CC would prevent detection of signal with the opposing polarity, because the Top1-CC would prevent ligation.).

3. The authors do not appear to have normalised their data for relative sequencing depth; thus any comparisons between libraries/methods are circumspect and misleading as presented. Please: A) Tabulate the read depth and mapping/filtering etc of all libraries. B) Present the consistency/correlation of repeat TrAEL-seq datasets to justify pooling. (I assume there are biological duplicates and that they were pooled in some way?)

---

Other specific comments:

Page 2. "DNA breaks cannot therefore be mapped post-resection by BLESS-type methods, which is problematic as DSB repair is often easiest to inhibit post-resection (such as in classic rad51Δ or rad52Δ mutants in yeast)."

This statement needs revising: This is absolutely not true (as written). DSBs can be (and have been!) mapped, but lack nucleotide precision of the original site.

Page 2. "Profiles yielded by BLESS-type methods can rarely be considered in isolation as replication has a dramatic influence on the distribution of DNA strand breaks in a cell;"

Is there a reference to back up this very strong ("dramatic influence") statement? Has BLESS (or other) been performed in strains arrested versus going through replication? If so, please reference, and or amend the statement to make it clear that it is the authors' view that the distribution will be altered based on knowledge of fork-stalling/blocks etc.

Page 2. "Methods have also been developed to detect replication fork directionality through isolation and sequencing of Okazaki fragments (OK-seq) [32, 33]"

Since a fairly exhaustive discussion of general methods are being presented, I suggest also including Pu-Seq (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4789492).

Page 2. Please hyphenate "single-stranded DNA".

Page 3. First two paragraphs. The structure of the paper here is very confusing. It opens with a description/overview of the method (and mentions the use of T4 RNA ligase), but then in paragraph 2 'implementation", the text reads as if the RNA ligase has not yet been mentioned. This is very confusing. Please revise the text so that the order is logical and suitable for a broad readership. In general, many of the paragraphs/sentences could be greatly improved with simple leaders such as: " In order to do X, we did Y...".

Page 3. Is the efficiency of tailing and ligating to a (very) short 18 nt oligo a suitable way to measure the efficiency of tailing and ligating to rare ssDNA ends within large genomic DNA fragments? What is the estimated relative molarity of ends in the test vs real reaction?

MINOR: Fig 1 legend. Please don't refer to a ssDNA as 18 bp (base pairs) in length when there is no base pairing. It is 18 nucleotides (nt) in length.

Page 3. Please define ∆Ct in the main text. To maximise accessibility, please avoid all unnecessary jargon and abbreviations.

Page 3 "DNA was digested prior to ligation with the restriction enzyme SfiI, then ΔCt was calculated for qPCR reactions that detect TrAEL adaptor 1 ligated to an SfiI cleavage site in the ribosomal DNA (rDNA) (Fig. S1A)."

Confusing sentence structure, please revise to:

"DNA was digested with the restriction enzyme SfiI prior to ligation TrAEL adaptor 1, then ΔCt was calculated for qPCR reactions that detect TrAEL adaptor 1 ligated to an SfiI cleavage site in the ribosomal DNA (rDNA) (Fig. S1A)."

Page 3. "Data from two experiments was compared with two libraries previously

generated using an END-seq protocol..."

This is very unclear. What is meant by previously generated? Are these published data from somewhere? Or new by the lab? If this is the first time these data have been presented, why does the text state "Previously generated"? What point is being made by such a statement?

Minor. Figure 1B, there is no description of what the short vertical bars and dots indicate. (I presume means and raw datapoints.)

Page 3. The final paragraph is very technical and hard to follow. The figure legend for S1C does not help at all. In addition, Presentation of a single bioanalyser trace in Fig S1D is unhelpful if the point is to demonstrate primer removal: the input lanes need to also be shown. (Note, I'm really not sure that this panel and the text is needed at all...but in its current form it is of very limited value.)

Was the genomic DNA digested to completion? Or was it partial? Where is the raw data supporting this?

Page 4. "Comparing TrAEL-seq data for SfiI digested genomic DNA to an END-seq library generated from equivalently digested material shows high concordance (Fig. 1C)."

To aid reader comprehension, the opening sentence of this section should first indicate that TrAEL-seq libraries were sequenced, then briefly describe how the data were processed (i.e. mapping, genome, tools used, and any filtering etc) before comparing with END-seq data. This is especially confusing (as currently written) since all prior data presentation concerned qPCR at specific test loci.

Page 4. What does "unambiguously" refer to here? How would the data look if the 70 SfiI sites had been detected "ambiguously"?

Page 4. Without further details about how mapping, trimming and/or filtering were performed (summarised clearly in the main text), interpretation of the nucleotide accuracy is circumspect. For example, How many rA bases are assumed to have been added? Are all T bases trimmed prior to (or after) mapping?

Page 4. "We suggest that this overall mapping accuracy of >99% within ±1 nucleotide would be sufficient for almost all applications."

This estimation is only relevant for DSBs generated by SfiI. DSBs from other sources may form at loci with longer runs of genome-encoded adenosines, where the algorithm may not perform as well. (Actually, what is the algorithm that has been used? Why is it not explained in the main text?) This sentence is therefore misleading, as it implies that the algorithm accuracy (">99% within ±1 nucleotide") is true for "almost all applications". Please address this by estimating accuracy at sites with longer runs of genomic-encoded adenosines or by revising the interpretation/presentation of this accuracy estimation (and by explaining how this algorithm works).

Page 4. "A major strength of TrAEL-seq should be the ability to map DSB sites after resection," AND "This shows that TrAEL-seq accurately maps endogenous resected DSBs."

These sentences could be clarified to indicate that TrAEL-seq maps the original site of DSB formation, rather than the endpoint of resection.

Page 4. "Meiotic DSBs formed by Spo11 are processed by Sae2 amongst other factors prior to resection, after which strand invasion into a sister chromatid is mediated by Dmc1 [40, 41]."

In meiosis, homologous recombination occurs between homologous chromosomes, rather than sister chromatids. Please revise.

Page 4. "TrAEL-seq for resected DSBs in dmc1Δcells 7 hours after induction of meiosis revealed a DSB pattern very similar to that observed forunresected DSBs in an sae2Δ mutant mapped by S1-seq (a BLESS variant specific for meioticrecombination) (Fig. 1E)."

Please cite the paper from which the sae2D S1-seq data originate.

Page 4 and Figure 1F. "Across all hotspots for Spo11 cleavage, quantitation of DSB usage frequency by TrAEL-seq correlated well with S1-seq (R2=0.86) (Fig. 1F, left), and to a similar extent with Spo11- associated oligonucleotide sequencing (R2=0.84) (Fig. 1F, right)"

These correlations whilst useful are rather crude comparisons. Since an advertised benefit of the method is the nucleotide accuracy, a more detailed comparison showing how the nucleotide accuracy of S1-seq and TrAEL-seq compares within a zoomed-in strong hotspot region is expected.

Indeed, since much is being made of the single-nucleotide accuracy, the authors should compare their dmc1 TrAEL-seq to meiotic CC-seq data (Gittens et al 2019 https://www.nature.com/articles/s41467-019-12802-5), which, unlike Spo11-oligo seq, has no mapping ambiguity (no tailing is involved).

Additionally, an important point about sensitivity is missing here: How many of the 3901 hotspots were detected by TrAEL-seq? How does this compare to the number detected by S1-seq (and/or CC-seq datasets)?

Specifically, the Y-axis of Fig 1F is about 5-fold lower in TrAEL-seq. Assuming all reads are expressed as hits per million mapped reads (or equivalent? *Please add this information to all figures*), then this strongly suggests that the signal to noise of TrAEL-seq is significantly lower than S1-seq, which itself is lower than Spo11-oligo seq.

Page 5. "Two RFB sites are readily visible in wild type TrAEL-seq data as peaks of antisense reads relative to the direction of replication" AND "At centromeres replicated from one direction only, we observed an accumulation of reads antisense to the direction of replication just before the centromere, while forks in termination zones that can be replicated in either direction displayed both peaks (Fig. 2D)."

These sentences are unclear. The use of "sense" and "antisense" is typically in relation to transcription, yet replication is mentioned instead. This section could be clarified by referring to the orientation of the 3' DNA end itself, relative to the leading/lagging strand. E.g. add: "…, corresponding to 3' DNA ends on the leading strand".

More specifically, do peaks on the R-strand indicate nascent (genomic) 3' ends on the F or R strands? Please revise text to make sure this information is absolutely clear throughout the manuscript.

Page 5. "although TrAEL-seq data displays higher signal-to-noise ratios than GLOE-seq data with peaks that correspond more closely to known sites than qDSB-seq

peaks (Fig. 2B) [13, 36]."

Where is the evidence for greater signal to noise in TrAEL-seq vs GLOE-seq? Perhaps the second replicate of GLOE-seq is worse...but what is the cause of this? Were TrAEL-seq replicates highly correlated? Where are these data demonstrating that fact (both genome-wide, and specifically at this locus?). Reproducibility of TrAEL-seq is a very important point that must be presented and commented upon.

Page 5. "It should be noted that the TrAEL-seq and the GLOE-seq datasets used for this

analysis derived from asynchronous cells whereas the cells for qDSB-seq had been tightly synchronised in S phase, underlining the high sensitivity of TrAEL-seq for stalled replication forks."

Please revise. Just because TrAEL-seq detects a signal in the asynchronous cells says nothing in of itself of the sensitivity compared to other assays (which is the impression teh text is trying to make). Specifically, the synchronized qDSB-seq signal is at least 20-fold stronger. If the authors want to compare the sensitivity, they should synchronise their cells to make the data comparable.

However...I am now additionally concerned that the authors have not normalised their data for total sequencing depth. This is essential if any comparisons are to be made between the various samples and sequencing techniques: All data need to be presented as reads per million mapped.

A table stating the read depths and mapping statistics of all individual libraries used here is also *essential*.

Page 5. "Importantly, no difference was observed between the rad52Δ signal and the wild type, showing that these double stranded ends are not normally processed by the homologous recombination machinery (Fig. 2B compare wild type and rad52Δ panels)."

As stated above, this comparison is only valid if read depths have been normalised between samples. Please clarify and revise the interpretation if not.

In general the logic of this section is very poorly introduced and explored. The authors make far too many implicit jumps in data interpretation without fist introducing the concepts that they are testing (i.e that if this signal is a reversed fork, one would expect Rad52 dependence). This needs radical revision to be acceptable text for a broad journal like PLoS Biology: in its current form, non-specialist (and even specialist) readers will be lost.

The fact that the peak is similar (assuming still true after normalisation) in the presence and absence of Rad52, needs more careful interpretation and clearer description. The jump to Holliday junction processing makes no sense without greater, slow, detail explaining the logic.

Page 5. "To determine the applicability of TrAEL-seq to mammalian cells, we generated TrAEL-seq libraries from 0.5 million human embryonic stem cells (hESC) that were either undifferentiated or subjected to retinoic acid-induced differentiation."

What is the purpose of the retinoic acid-induced differentiation here? Do the authors expect to see a difference at the rDNA?

Furthermore, the data here is incredibly noisy, and far from convincing. How many other genomic sites shows such similar peaks? What methods do the authors have to demonstrate that these signals are real and not noise?

What was the total read depth of these human samples?

How is mapping accuracy validated within the R repeats? Can the spike be an artefact stemming from copy number differences between the sample and reference? Or due to mis-mapping of reads to the repeat array? All these details are completely glossed over, leaving this reviewer far from convinced by the data presented.

Page 5 and Fig 2DE. The three colours are incredibly hard to distinguish. Why are they not labelled in the plot itself? (why only in the figure legend?). Is the reader meant to interpret anything from the differences (or not) between these mutants?

I also have concerns that only smoothed data (I assume that is what the wiggly lines are?) is plotted rather than nucleotide-resolution peaks of the raw signals.

Fig 2E. The signals at the tRNA are intriguing, but lead to a query: Is there any possibility that nascent RNA can act as a 3' end for TrAEL-seq? From Fig 1A, I assume a 3' RNA end would efficiently ligate to the first adapter, and also be reverse-transcribed by the Bst2.0 polymerase. From the methods it appears that adapter 2 is 5' phosphorylated, thus would be able to ligate to the new DNA 3' end generated by Bst2.0 (even though it would be a putative a DNA/RNA hybrid molecule). This product would then be a substrate for PCR because a single DNA strand would have been created with adapter 1 on the left and adapter 2 on the right.

I would like the authors to respond to this and explain how they can exclude this as a molecule that they may be inadvertently detecting using TrAEL-seq. I recognise that the authors use RNAseT1 in their DNA preparations, but I do not know if this enzyme cleaves the DNA/RNA hybrids present at sites of nascent transcription that I am suggesting may a source of 3' RNA ends.

Separate form these comments above, where is the evidence that the peak in the tRNA gene is replication dependent (and thus a site of replication stalling?). The peak in the tRNA is similar in both local replication orientations. Could it instead be (if not a labelled RNA species), a site of increased DNA template breakage due to the high levels of transcription going on here?

Page 6. The section: "A replication signal of the same polarity was noted...yet up to 90% of TrAEL-seq reads emanate from the leading strand."

Is incredibly hard to follow in relation to the preceding text figures, as the authors now refer to the position of the 3' end (I think?) (rather than the antisense TrAEL-seq read). I would suggest that throughout the paper/figures, the inferred position of the 3' end is exclusively referred to, for clarity. This is how the data were presented in the GLOE-seq paper, which will also aid comparison by the reader.

Please clearly summarise/conclude this section: Is TrAEL-seq detecting nascent 3' ends of the leading strand? If so, please state this clearly! If this is the case, why are nascent 3' ends on the lagging strand not also detected? (there should be many more of them too).

Please also define RFD in the legend.

In general, it seems inappropriate to refer to the plot in Fig 3B as an RFD plot (really it is just a strand ratio), since at this point it is not clear what signal TrAEL-seq is detecting.

Page 6. "suggesting that double stranded ends are also formed during normal replication".

Please consider that ssDNA/dsDNA junctions are likely to be mechanically fragile (even in agarose), and may preferentially break during processing generating a structure labelled by END-seq.

Page 6. "…although with the opposite polarity as expected (Fig. 3F and S3D) [36]."

Incorrect figure references. I think the author is referring to Fig. 3G and S3E.

Page 7. "The TrAEL-seq profile of clb5Δ was very similar to wildtype across most of the genome, but certain origins were clearly absent or strongly repressed, resulting in extended tracts of DNA synthesis from adjacent origins (Fig. 4A, green arrows). This is as predicted for clb5Δ mutants and confirms that TrAEL-seq is indeed sensitive to changes in replication profile."

The clb5 data are convincing. Could the authors strengthen the manuscript further by adding a supplementary figure showing a global analysis of the differences between wild-type and clb5∆ RFD plots, relative to the differences between two samples not expected to have differing origin usage? E.g. two scatterplot analyses.

Page 7. "stalling but not in the expected location."

Please clarify what the expected location was...the reader will be completely lost at this point without a more careful, and slower, presentation of ideas, observations, and interpretations.

Fig 4B. This figure is not very convincing. Why are only R-strand reads presented? PLease add an X- scale. What smoothing was applied to the data? (Was it a rectangle or a Hann window, please specify) What was the justification for such a smoothing window? Were the data normalised for relative read depth between the Raffinose and R+Galactose libraries? If the data in 4C are 100 bp bins

Page 8. "Together, these data show that replication profiling by TrAEL-seq is sufficiently sensitive to reveal subtle differences in fork direction and processivity."

Where is the evidence to indicate that what is detected by TrAEL-seq is "subtle"? By what benchmark are these subtle effects?

Page 8, discussion: "Here we have demonstrated that TrAEL-seq maps resected DSBs,"

Here and elsewhere in the text, I think it is important to clarify the text to make it clear that the TrAEL-seq technique maps the 3' end of resected DSBs. i.e. the presumed DSB end.

Figure 1. "Note that TrAEL-seq reads map antisense to the cleaved strand, reading the complementary sequence starting from the first nucleotide before the cleavage site."

This is unclear because the diagram depicts a DSB, where both strands are cleaved. Which is the "cleaved strand"? Could the figure be labelled more clearly to indicate the first sequenced base.

Decision Letter 2

Roland G Roberts

10 Feb 2021

Dear Jon,

Thank you for submitting your revised Methods and Resources entitled "Genome-wide analysis of DNA replication and DNA double strand breaks by TrAEL-seq" for publication in PLOS Biology. I've now obtained advice from two of the original reviewers and have discussed their comments with the Academic Editor. 

Based on the reviews, we will probably accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers. Please also make sure to address the following data and other policy-related requests.

IMPORTANT:

a) Please attend to the remaining requests from the reviewers.

b) I wonder if the title might be slightly easier to parse if you use "using" instead of "by." Or flip it round to give "TrAEL-seq is a method for genome-wide analysis of DNA replication and DNA double-strand breaks"...

c) Please address my Data Policy requests further down.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

-  a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

-  a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

-  a track-changes file indicating any changes that you have made to the manuscript. 

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information  

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Early Version*

Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods 

Please do not hesitate to contact me should you have any questions.

Best wishes,

Roli

Roland G Roberts, PhD,

Senior Editor,

rroberts@plos.org,

PLOS Biology

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797 

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication. 

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 1CDEF, 2BCDE, 3ABCDEFGH, 4ABCDEF, S1CDEF, S2ABCDEF, S3ABCDEFGH, S4ABCDE. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

 ------------------------------------------------------------------------

BLOT AND GEL REPORTING REQUIREMENTS:

For manuscripts submitted on or after 1st July 2019, we require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare and upload them now. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements 

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

[identifies himself as Conrad Nieduszynski]

I would like to commend the authors on the excellent work they have undertaken to address reviewer comments. The manuscript is much improved and I believe ready for publication.

Minor comment: "although TrAEL-seq data contains less additional peaks in this region than" should be "fewer" rather than "less".

In the revised manuscript the authors make the observation of phasing between TrAEL-seq signal and nucleosomes. This is precisely what I was anticipating seeing. Replication fork velocity is stochastic (just compare the length of tracks seen in multiple double pulse labelled combing/fibre studies), but the source of this variability is unknown. Multiple lines of evidence suggest this is sequence independent. Therefore, a possible explanation is that DNA replication frequently 'pauses', if only very briefly, as the fork pushes through nucleosomes (and as new Okazaki fragments are primed). I suggest that this would provide an explanation for the authors observation on phasing. Following this up is far beyond the scope of this manuscript, but in the future should be possible with various mutants that alter the position of nucleosomes and/or the ability of the fork to replicate through nucleosomes.

Reviewer #3:

TRAEL-seq second round review

The authors have done an excellent job answering the extensive comments and suggestions that were provided during the review process. The rebuttal comments are measured and well explained. The updated text and figures have improved clarity. I thank the authors for the time spent providing additional analysis, comparison to other methodological techniques, and for providing a detailed breakdown of the datasets presented in the study (in particular to include statistics on mapping and de-duplication). My main concern with data presentation was clarity on which strand was being referred to, and normalisation of all data to reads per million mapped. Both changes have been incorporated. The authors additionally provide G1-arrested, and even a very exciting attempt at the G1 > S release. Both sets demonstrate clearly the S-phase dependence of the mapped signals.

I have one minor comment/suggestion that I think will aid a non-specialist reader. Would it be possible to make the diagrams of the rDNA in Fig S3B and Fig 2B more consistent with one another? Specifically, how do the three RFBs referred to in Fig 3B (and the two strong TRAEL-seq peaks) relate to the single RFB (and dispersed TRAEL-seq peak profiles) in Fig S3B? Why do the data look so different?

Also, it is not stated, but is the orientation of the rDNA repeat drawn consistent with the orientation in the standard S. cerevisiae reference sequence? This is probably obvious to the authors, but since there are no chromosomal coordinates indicated in Fig 3B, it is not clear, and thus perhaps this information can be added to the legend.

Overall, this is an interesting and potentially very useful analysis method that expands the tools available to researchers in a new and sensitive way. The authors should be commended on its development, and on their manuscript.

Decision Letter 3

Roland G Roberts

17 Feb 2021

Dear Jon,

On behalf of my colleagues and the Academic Editor, Tanya Paull, I'm pleased to say that we can in principle offer to publish your Methods and Resources paper "Genome-wide analysis of DNA replication and DNA double strand breaks using TrAEL-seq" in PLOS Biology, provided you address any remaining formatting and reporting issues. These will be detailed in an email that will follow this letter and that you will usually receive within 2-3 business days, during which time no action is required from you. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have made the required changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Biology. 

Best wishes,

Roli 

Roland G Roberts, PhD 

Senior Editor 

PLOS Biology

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. TrAEL-seq library construction details.

    (A) Example Bioanalyzer trace for the amplified library of NotI PmeI SfiI-digested yeast genomic DNA. A volume of 1 μl of the 10.5 μl final library was run on a DNA high sensitivity Bioanalyzer chip. This shows a complete absence of adaptor or primer dimers, which is only achieved after 2 successive AMPure purifications. This trace is typical for TrAEL-seq libraries. (B) Schematic of TrAEL-seq read processing pathway. TrAEL-seq reads are the reverse complement of the original DNA end. The 8 nucleotide UMI is removed and stored, then up to 3 T’s are removed from the 5′ of the read. Poor-quality reads and adaptor sequences are removed by TrimGalore, then reads are mapped using Bowtie 2. Deduplication is performed based on the UMI and the mapped start site by UMI grinder, then the reads are finally truncated to a single nucleotide representing the reverse complement of the terminal nucleotide of the original DNA strand. (C) Quantitation of DNA ends generated by SfiI digestion categorised by the 3′ nucleotide or the nucleotide adjacent to the 3′ nucleotide in TrAEL-seq data. Bars show mean and 1 SD. (D) Precision mapping of SfiI cleavage sites by TrAEL-seq and END-seq, as Fig 1D. This graph represents the 10 SfiI sites that have 2 or more As at the 3′ end (GGCCNNAA|NGGCC). In this category are 5 ends with 2 As, 2 ends with 3 As, and 3 ends with 4 As. Mapped locations of 3′ ends were averaged across each category of site and expressed as a percentage of all 3′ ends mapped by each method to that category of site. (E) Scatter plot of log-transformed normalised read counts at all 3,907 Spo11 cleavage hotspots annotated by Mohibullah and Keeney [1], comparing 2 technical replicate TrAEL-seq libraries generated from the same sample of dmc1Δ cells. The 2 libraries were prepared approximately 6 months apart by 2 different researchers from cells stored in 70% ethanol at −70°. (F) Scatter plot of log-transformed normalised read counts at all 3,907 Spo11 cleavage hotspots annotated by Mohibullah and Keeney, comparing dmc1Δ TrAEL-seq with data for Spo11-associated oligonucleotides [14] (SRA accession: SRR1976210). Numerical data underlying this figure can be found in S1 Data. TrAEL-seq, Transferase-Activated End Ligation sequencing; UMI, unique molecular identifier.

    (TIF)

    S2 Fig. Additional data for detection of replication fork stalling by TrAEL-seq.

    (A) Reproducibility of RFB detection between 2 technical replicates. The 2 libraries were prepared approximately 6 months apart by 2 different researchers from cells stored in 70% ethanol at −70°. (B) Detection of RFB peaks without nonreproducible background peaks in 3 biological replicates TrAEL-seq libraries derived from wild-type cells. (C) Replication direction of centromeres, calculated based on the cdc9-AID GLOE-seq data (SRA accession: SRX6436838). Percentage of reverse reads was determined in the regions −1000 to −500 bp and +500 to +1000 bp relative to the annotated centromere, and the average of these values plotted. The region from −500 to +500 bp was excluded as replication fork stalling in this region obscures the replication direction. CEN2 is misleading as it is directly adjacent to a replication origin—see S1 File for profiles of individual centromeres. (D) Average TrAEL-seq profiles across tRNAs ±200 bp for 2 biological replicates of hESC cells, each averaged from 2 technical replicates. Reads are separated by orientation on forward or reverse strands; all tRNAs are included. Read counts per million reads mapped were calculated in nonoverlapping 5 bp bins. (E) Average TrAEL-seq profiles across all centromeres ±1 kb for wild-type and rad52Δ cells. Read counts per million reads mapped were calculated in nonoverlapping 10 bp bins. (F) Average TrAEL-seq profiles across all tRNAs ±200 bp for wild-type and rad52Δ cells. Read counts per million reads mapped were calculated in nonoverlapping 5 bp bins. Numerical data underlying this figure can be found in S2 Data. hESC, human embryonic stem cell; RFB, replication fork barrier; TrAEL-seq, Transferase-Activated End Ligation sequencing.

    (TIF)

    S3 Fig. Additional data for replication fork directionality of TrAEL-seq data.

    (A) Scatter plot showing the percentage of reverse reads compared to all reads in 1 kb genomic windows spaced every 1 kb, comparing TrAEL-seq data from wild-type cells and GLOE-seq data from Cdc9-depleted cells (SRA accession: SRX6436838). (B) Read polarity plots showing TrAEL-seq data for wild type, fob1Δ, and rad52Δ across a single rDNA repeat. The 35S rRNA gene transcribed by RNA polymerase I is shown as a thicker grey line and is transcribed right to left in this representation. Mature rRNA genes are shown in black; the RFB and the ARS are also annotated. Inset is the region containing the RFB sites that is shown in Fig 2B. (C) Scatter plot showing the percentage of reverse reads compared to all reads in 1 kb genomic windows spaced every 1 kb, comparing TrAEL-seq data from 2 technical replicates of wild-type cells. (D) Read polarity plot across chromosome V for TrAEL-seq datasets of wild type compared to the RNase H2 mutants rnh201Δ and rnh202Δ and topoisomerase I mutant top1Δ. (E) Read polarity plot for chromosome V comparing END-seq and TrAEL-seq data generated from two-halves of an agarose plug containing 10 million wild-type 3xCUP1 cells grown in synthetic complete glucose media. Note that the scale for the END-seq data is expanded as the bias in read polarity is much smaller in END-seq libraries. (F) Scatter plot showing the percentage of reverse reads compared to all reads in 250 kb genomic windows spaced every 10 kb, comparing TrAEL-seq data for 2 technical replicates generated from the same hESC sample. (G) Scatter plot showing the percentage of reverse reads compared to all reads in 250 kb genomic windows spaced every 10 kb, comparing TrAEL-seq data for 2 biological replicates of hESCs, each averaged from 2 technical replicates. (H) Scatter plot showing the percentage of reverse reads compared to all reads in 250 kb genomic windows spaced every 10 kb, comparing TrAEL-seq data from hESC cells (average of 2 technical replicates) to GLOE-seq data from LIG1-depleted HCT116 cells (average of SRA accessions: SRX7704535 and SRX7704534). Numerical data underlying this figure can be found in S3S6 Data. ARS, autonomously replicating sequence; hESC, human embryonic stem cell; RFB, replication fork barrier; TrAEL-seq, Transferase-Activated End Ligation sequencing.

    (TIF)

    S4 Fig. Additional data for detection of environment-dependent replication differences.

    (A) Scatter plot showing the percentage of reverse reads compared to all reads in 1 kb genomic windows spaced every 1 kb, comparing TrAEL-seq data wild type and clb5Δ (left). An equivalent comparison between wild type and rnh201Δ (which has a wild-type replication profile) is shown for comparison (right). (B) Plot of read count across the GAL locus on galactose induction for dnl4Δ rad51Δ mutant, as Fig 4B. (C) MA plots of changing read count across the genome on galactose induction for dnl4Δ rad51Δ mutant, as Fig 4C. (D) Read polarity plots showing the replication profile of the region surrounding the GAL locus with and without galactose induction. Green box shows the site at which the replication fork which passes through the GAL locus encounters the oncoming fork from ARS211. (E) Plot of average TrAEL-seq read density around the TSS in the highest 25% expressed genes orientated head-on with replication (as Fig 4D). Data are shown for G1 and G1->S samples (Fig 3E); genes are averaged together within each sample, but the difference in average read count between samples is maintained. The nonreplicating G1 sample contains far less reads on average across TSS regions, and the peak upstream of the TSS is absent. Numerical data underlying this figure can be found in S7 Data. TrAEL-seq, Transferase-Activated End Ligation sequencing; TSS, transcriptional start site.

    (TIF)

    S5 Fig. Means by which reversed forks could resemble DSBs in southern analysis.

    All Southern blot analyses that have reported direct detection of DSBs at RFBs utilise a restriction digestion to separate the region of interest. For the yeast RFB, to our knowledge, the enzyme used has always been BglII, the cleavage sites for which lie 2.2 kb and 2.4 kb each side of the RFB. Forks that reverse past the BglII site would yield a BglII fragment the same size (2.2 kb) as a fork that is cleaved at the RFB. Only fragments that would hybridise to the probe (blue) are shown. DSB, double-strand break; RFB, replication fork barrier.

    (TIF)

    S1 Table. Yeast strains used in this study.

    (XLSX)

    S2 Table. List of all libraries produced during this work, including GEO accession and mapping statistics.

    (XLSX)

    S1 File. TrAEL-seq profiles at individual centromeres.

    (PDF)

    S2 File. Detailed TrAEL-seq protocol.

    (DOC)

    S1 Data. Underlying numerical data.

    (XLSB)

    S2 Data. Underlying numerical data.

    (XLSB)

    S3 Data. Underlying numerical data.

    (XLSB)

    S4 Data. Underlying numerical data.

    (XLSB)

    S5 Data. Underlying numerical data.

    (XLSB)

    S6 Data. Underlying numerical data.

    (XLSB)

    S7 Data. Underlying numerical data.

    (XLSB)

    S1 Raw Images. Raw gel image.

    Note that not all lanes are presented in the manuscript.

    (PDF)

    Attachment

    Submitted filename: Response to TrAEL reviewers.docx

    Attachment

    Submitted filename: Response to TrAEL reviewers 2.docx

    Data Availability Statement

    All sequencing files are available from the GEO database (accession number(s) GSE154811. Numerical data is in S1S7 Data and image in S1 Raw Images.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES