Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2018 Feb 5;115(8):1919–1924. doi: 10.1073/pnas.1719907115

Three classes of recurrent DNA break clusters in brain progenitors identified by 3D proximity-based break joining assay

Pei-Chi Wei a,b,c,1, Cheng-Sheng Lee a,b,c,1, Zhou Du a,b,c, Bjoern Schwer a,b,c,2, Yuxiang Zhang a,b,c, Jennifer Kao a,b,c, Jeffrey Zurita a,b,c, Frederick W Alt a,b,c,3
PMCID: PMC5828622  PMID: 29432181

Significance

Human brain neuron genomes can differ from one another, giving rise to brain mosaicism. We developed a sensitive DNA break joining assay that uses “bait” DNA breaks introduced on different chromosomes to detect endogenous “prey” DNA breaks across the mouse brain progenitor cell genome. This approach revealed 27 recurrently breaking sites, many of which occur in long neural-specific genes associated with mental illnesses and cancer. We have exploited the finding that bait and prey DSB join more frequently when on the same chromosome to increase assay sensitivity. This approach confirms previously identified breaking neural genes and identifies new ones, often with the same intriguing characteristics. Our study offer potential insights into brain diversification and disease.

Keywords: nonhomologous end-joining, neural stem cells, replication stress, neurodevelopment, recurrent DNA break clusters

Abstract

We recently discovered 27 recurrent DNA double-strand break (DSB) clusters (RDCs) in mouse neural stem/progenitor cells (NSPCs). Most RDCs occurred across long, late-replicating RDC genes and were found only after mild inhibition of DNA replication. RDC genes share intriguing characteristics, including encoding surface proteins that organize brain architecture and neuronal junctions, and are genetically implicated in neuropsychiatric disorders and/or cancers. RDC identification relies on high-throughput genome-wide translocation sequencing (HTGTS), which maps recurrent DSBs based on their translocation to “bait” DSBs in specific chromosomal locations. Cellular heterogeneity in 3D genome organization allowed unequivocal identification of RDCs on 14 different chromosomes using HTGTS baits on three mouse chromosomes. Additional candidate RDCs were also implicated, however, suggesting that some RDCs were missed. To more completely identify RDCs, we exploited our finding that joining of two DSBs occurs more frequently if they lie on the same cis chromosome. Thus, we used CRISPR/Cas9 to introduce specific DSBs into each mouse chromosome in NSPCs that were used as bait for HTGTS libraries. This analysis confirmed all 27 previously identified RDCs and identified many new ones. NSPC RDCs fall into three groups based on length, organization, transcription level, and replication timing of genes within them. While mostly less robust, the largest group of newly defined RDCs share many intriguing characteristics with the original 27. Our findings also revealed RDCs in NSPCs in the absence of induced replication stress, and support the idea that the latter treatment augments an already active endogenous process.


Classical nonhomologous end-joining (C-NHEJ) is a major DNA double-strand break (DSB) repair pathway in somatic cells that was first implicated based on its requisite role in V(D)J recombination in the developing lymphocytes (1, 2). Subsequently, we found that inactivation of XRCC4, a core C-NHEJ factor (3) specifically abrogates both lymphocyte and neuronal development due to unrepaired DSBs in progenitor cells (4). Similar findings have been reported for inactivation of DNA ligase 4 (5, 6), with which XRCC4 partners in C-NHEJ end-ligation (7). The unrepaired DSBs that cause blocked lymphocyte development in C-NHEJ–deficient mice are generated in antigen receptor genes by the RAG endonuclease, the protein complex that initiates V(D)J recombination (2). The nature of the DNA breaks that cause neuronal apoptosis in the absence of XRCC4 or DNA ligase 4 has remained unresolved, however. Nonetheless, these previous studies promoted speculation that specific DSBs might play a role in brain development or disease (8, 9). In this regard, more recent studies have highlighted the potential of genomic alterations to contribute to brain diversification and disease (10, 11). Somatic “brain only” mutations and genomic variations also have been implicated in neurodevelopmental and neuropsychiatric disorders (12). Over the past decade, we have developed and refined high-throughput genome-wide translocation sequencing (HTGTS) to identify recurrent endogenous DSBs (1315). Application of the HTGTS approach recently allowed us to map a set of recurrently breaking genes in mouse neural stem/progenitor cells (NSPCs) (16).

HTGTS maps, at nucleotide resolution, genome-wide DSBs based on their ability to translocate to a “bait” DSB introduced at a specific chromosomal location (1315). Bait DSBs can be either introduced ectopically by designer nucleases (14) or provided by endogenous DSBs, including RAG-initiated V(D)J recombination DSBs (1719) or clusters of activation-induced cytidine deaminase-initiated DSBs in IgH locus switch (S) regions during class switch recombination (CSR) in mature B cells (20). The ability of HTGTS to identify recurrent DSBs across the genome relies on cellular heterogeneity in 3D genome organization (2); however, due to the increased potential for interaction, the joining frequency between two separate DSBs is greatly enhanced if the two lie on the same cis chromosome (2, 14, 17), and is enhanced even further if the two lie within the same topological or loop domain (2, 17, 21). Indeed, B lymphocytes exploit enhanced DSB joining within a topological domain to promote robust joining of activation-induced cytidine deaminase-initiated S region DSBs separated by 100 s of kb to effect exon shuffling during CSR (20), and developing lymphocytes exploit joining within chromosomal loops to mediate physiological V(D)J recombination (18, 19).

To identify recurrent DSB clusters (RDCs) in the NSPC genome, we applied HTGTS to NSPCs from Xrcc4−/−p53−/− mice, as this background enhances HTGTS detection of genomic DSBs due to their persistence (2). HTGTS analyses from bait DSBs on three separate chromosomes revealed 27 RDCs found by at least two of the three chromosomal baits, along with many additional “candidate” RDCs found with only a bait from one chromosome (16). Notably, all 27 RDCs occurred within genes (“RDC genes”), and these genes shared an intriguing set of characteristics, including encoding surface proteins that organize brain architecture and neuronal junctions. Moreover, human counterparts of most mouse RDC genes had already been implicated genetically in neuropsychiatric disorders and cancer (16). RDC genes also tend to be very long, moderately transcribed, and late replicating. In the latter context, most RDCs appeared only after treatment of NSPCs with aphidicolin (APH) to create mild replication stress, and even those found spontaneously were enhanced as RDCs by APH treatment (16). The common transcription and replication characteristics of RDC genes suggest that, as has been proposed for related copy number variations (CNVs) (22), collisions between RNA and DNA polymerases might play a role in recurrent RDC DSBs (23, 24).

Our finding of numerous RDC candidates based on interaction with only one of our three baits used previously suggested that we could have missed many RDCs due to lower interaction with two or more of the bait locations and/or because they are weaker RDCs. Thus, based on our finding that joining of two DSBs occurs much more frequently if they lie on the same cis chromosome (2, 14, 17), we used sgRNAs specific for each of the 20 mouse chromosomes as bait for the generation of HTGTS libraries from control or APH-treated Xrcc4−/−p53−/− mouse NSPCs. This analysis robustly confirmed the 27 previously identified RDCs and conclusively identified a substantial number of new RDCs that shared a similar spectrum of intriguing characteristics with the initial 27.

Results

Use of HTGTS Bait DSBs to Identify RDCs on cis Chromosomes.

Our previous RDC identification studies used CRISPR/Cas9-induced HTGTS bait DSBs on mouse chromosomes 12, 15, and 16 (16) (Fig. 1A). To more completely identify RDCs by HTGTS, we exploited our finding that joining of two separate DSBs generally occurs much more frequently if the two lie on the same cis chromosome (2, 14, 17). Thus, for more complete coverage of the genome, we designed 17 additional sgRNAs to generate HTGTS bait DSBs, on each of the remaining 16 mouse autosomes and the X chromosome (Fig. 1A and SI Appendix, Table S1). Because the rejoining of two resected DSBs in close proximity to the bait break site is the most frequently detected event in HTGTS analyses, the sgRNAs were designed to target genomic sequences that were at least 5 Mb away from known RDC genes to avoid potentially confounding effects of resection events extending into adjacent RDC genes (16). In addition, to ensure that mapping from HTGTS bait DSBs was not influenced by repetitive sequences, we selected bait genomic locales that did not contain telomeric or simple repeat sequences. To maximize RDC DSB detection efficiency, we used Xrcc4−/−p53−/− primary mouse NSPCs for the current HTGTS experiments, as deficiency for XRCC4 facilitates DSB persistence and detection of translocations (2, 16).

Fig. 1.

Fig. 1.

Identification of NSPC RDCs by a proximal DSB joining approach. (A) Map illustrating 19 murine autosomes and X chromosome (gray hollow bars), the 20 HTGTS bait DSB locations (arrowheads), and the 113 RDC locations. Cen, centromere; Tel, telomere. Horizontal lollipop symbols mark the locations of RDCs in the murine NSPC genome. (B) Graph showing a total of 113 RDCs either identified previously by three HTGTS bait DSBs located at chromosomes 12, 15, and 16 (green dots) or newly identified by at least two of the 20 HTGTS bait DSBs (black dots) as indicated in A. The y-axis indicates the number of different HTGTS baits significantly joined to each DSBs in each RDC; the x-axis, the number of RDCs. The genes within the top six most frequently identified RDCs are listed in the orange box, and RDCs identified by more than seven chromosomal baits are listed in the blue box. The numbers of RDCs in the orange and blue boxes are indicated. The robustness scores for the RDCs are provided in Dataset S1 and SI Appendix, Fig. S3B.

We used the same general approach to identify RDCs as described previously (16) (SI Appendix, Fig. S1). The 17 Cas9:sgRNA constructs were introduced individually in Xrcc4−/−p53−/− NSPCs to experimentally induce a bait DSB on one specific chromosome at a time. In each case, NSPCs were treated with either APH to induce RDCs or diluted DMSO as the vehicle control. Specific HTGTS primers were designed for each bait site (SI Appendix, Table S2), and each individual Cas9:sgRNA HTGTS experiment was repeated at least three times and analyzed as described previously (16). A substantial proportion of HTGTS junctions in all experiments resulted from the rejoining of two resected ends of Cas9:sgRNA-induced bait DSBs, which were distributed mostly 10 kb around the break site in both APH-treated and control cells (SI Appendix, Table S3). In addition, HTGTS junctions representing joining to low-level APH-induced or endogenous DSBs not associated with either bait site resections or RDCs were enhanced within the cis chromosome via proximity-based mechanisms for each independent chromosome bait site, as expected (14, 16) (SI Appendix, Table S3).

To identify APH-induced RDCs, we applied our RDC identification pipeline (16) (SI Appendix, Methods) to HTGTS libraries generated from DNA of the DMSO- or APH-treated Xrcc4−/−p53−/− NSPCs expressing various independent bait sgRNAs (16). For these analyses, we included independent Chr-X-sgRNA libraries created in both male and female NSPCs, and we also reanalyzed our previously created Chr-12-sgRNA-1, Chr-15-sgRNA-1, and Chr-16-sgRNA-2 libraries to achieve a whole-genome view based on the full set of libraries (16) (Fig. 1A). This RDC discovery analysis revealed a total of 113 RDCs (Fig. 1A, SI Appendix, Fig. S1, and Dataset S1). We also applied a multiple-comparison correction test and confirmed that all 113 RDCs were free from false-positive calls (SI Appendix, Methods and Dataset S1). These 113 RDCs included the previously described 27 RDC genes identified by at least two of three chromosome 12, 15, and 16 HTGTS baits, along with 58 previously identified RDC candidates identified by only one of the three baits (16) (Dataset S1) and 28 additional RDCs. These 113 NSPC RDCs were distributed to all autosomes as well as to the X chromosome (Fig. 1A and Dataset S1). In this regard, we found RDCs on chromosomes not previously identified as RDC-containing, including chromosomes 2, 3, 7, 11, 13, and X (16) (Fig. 1A). We did not assay a bait from the Y chromosome; since it was not identified by baits on any other chromosome, and thus even candidate RDCs on it would not qualify as RDCs based on our current criterion of being identified by at least two independent baits. As for previously identified RDC genes, DSBs in newly identified RDC genes tended to map across the length of the RDC gene transcription units (Figs. 2 and 3; data available in the GEO database, accession no. GSE106822).

Fig. 2.

Fig. 2.

Joining frequency of genome-wide HTGTS bait DSBs to strong and weak RDC DSBs. (A, Upper) An HTGTS bait DSB (orange box) induced by Cas9:sgRNA (Chr-12-sgRNA-1, black arrowhead) joining to the prey DSBs (blue box) in the Npas3 RDC ∼40 Mb downstream of the bait DSB on chromosome 12. Cen, centromere; Tel, telomere. The green arrowhead indicates the HTGTS primer; dashed line/arrows indicate joining possibilities between the bait DSB and RDC DSBs. (A, Lower) The HTGTS prey junctions (black bars) distributed across the Npas3 gene and its surrounding genomic area on chromosome 12 in the APH-treated (+) or control (−) Xrcc4−/−p53−/− NSPCs. Yellow rectangles indicate overall RDC locations; RefGene (blue track) indicates the gene location. A total of 17,701 randomly selected HTGTS prey junctions from APH-treated or control experiments were plotted. (B, Upper) The joining between transchromosomal Cas9:sgRNA-induced HTGTS bait DSBs to the Npas3 RDC DSBs on chromosome 12. (B, Lower) The HTGTS prey junctions distributed across the Npas3 gene and its surrounding area. Each panel represents an independent experiment using a bait on the indicated chromosome. The location of RDCs identified by each chromosomal bait are indicated with yellow boxes and generally represent a subset of the longest RDCs identified. (C, Upper) The joining between HTGTS bait DSB induced by Cas9:sgRNA (Chr-12-sgRNA-1) and the Dgkb RDC DSBs ∼15 Mb downstream of the bait DSB. (C, Lower) Graph showing HTGTS prey junctions distributed across and surrounding the Dgkb gene. (D, Upper) Graph illustrating the joining between a transchromosomal Cas9:sgRNA-induced HTGTS bait DSB and the Dgkb RDC DSBs at chromosome 12. (D, Lower) HTGTS prey junctions distributed in and around the Dgkb gene. Panels are organized as described for B. (Scale bars: 1 Mb.) *Panels generated using previously published HTGTS datasets (16). Dataset S1 presents the MACS-based adjusted P values of RDCs.

Fig. 3.

Fig. 3.

Proximal intrachromosomal HTGTS bait DSBs facilitates spontaneous RDC identification. (A, Upper) The joining between a HTGTS bait DSB to Ctnna2 RDC DSBs located ∼5 Mb downstream of the bait DSB at chromosome 6. The figure is organized as described in Fig. 2A. (A, Lower) The HTGTS junction distribution across the Ctnna2 gene and its surrounding area in Xrcc4−/−p53−/− NSPCs. RDC areas are shaded in yellow. Ctnna2 RDC HTGTS junctions are significantly enriched in both the DMSO-treated control (P = 4.5 × 10−2) and APH-treated experiments (P = 7.0 × 10−75). Transcription activity of the Ctnna2 gene and its surrounding genomic DNA by GRO-seq is shown in the centromeric-to-telomeric direction (blue) and the telomeric-to-centromeric direction (red). The scale indicates normalized GRO-seq counts [reads per kilobase per million (RPKM)]. (B) Joining of HTGTS bait DSBs to Maml2/Mtmr2 RDC DSBs located ∼12 Mb upstream on chromosome 9. The figure is organized as described in the panel. Maml2/Mtmr2 RDCs are significant in the control (P = 4.5 × 10−2) and APH-treated experiments (P = 2.17 × 10−23).

RDC DSBs Translocate to Recurrent Bait DSBs Genome-Wide.

Neuronal PAS domain protein 3 (Npas3) and limbic system-associated membrane protein (Lsamp) were the first identified RDC genes, because they qualified as RDCs even in NSPCs not treated with APH (16). Notably, Npas3 and Lsamp were also the genes most frequently detected by different members of the set of chromosome-specific baits (Figs. 1B and 2 A and B). Indeed, both genes were detected as RDCs from bait DSBs on 15 different chromosomes (Fig. 1B; Npas3 complete example shown in Fig. 2 A and B). For Lsamp, this number does not include chromosome 16, on which it lies, because the bait DSB was too close to Lsamp (∼600 kb upstream) to be called as a separate RDC by the SICER program. At the other extreme, the least frequently detected RDCs, including the previously identified diacylglycerol kinase beta (Dgkb) and oxidation resistance 1 (Oxr1) genes, were identified only by baits from one other chromosome besides their host chromosomes (Fig. 1B, Dgkb example shown in Fig. 2 C and D). Neurexin 1 (Nrxn1) and pleiotrophin (Ptn) also were robustly identified, being found by 13 different bait chromosome DSBs, as were nuclear factor I/A (Nfia) and catenin alpha 2 (Ctnna2), which were identified by 11 chromosomal bait DSBs (Dataset S1 and Fig. 1B). In the majority of cases that could be examined, RDCs were detected most robustly from HTGTS bait DSBs on the same chromosome; representative examples for Npas3, Dgkb, and mastermind-like transcriptional coactivator 2 (Maml2)/myotubularin-related protein 2 (Mtmr2) are shown in Figs. 2 and 3B and SI Appendix, Fig. S2A, respectively.

The six most frequently detected RDCs (found by 11 or more chromosomal-specific baits) were identified in our previous studies with baits from only three chromosomes (16) (Fig. 1B, orange box). On the other hand, newly confirmed RDCs included one of three RDCs detected by 10 different chromosome baits (glutamate receptor, ionotropic, delta 2; Grid2), one of three detected with nine different chromosome baits (Maml2/Mtmr2), one of four found with eight chromosome baits (transcription factor 4; Tcf4), and five of 11 detected with seven chromosome baits [autism-susceptibility candidate 2 (Auts2), Neuroligin 2 (Nlgn2), low-density lipoprotein-related protein 1B (Lrp1b), semaphorin 6D (Sema6d), and Quacking/Parking] (Fig. 1B, blue box). Overall, it is notable that 19 of the known 27 RDC genes were in the found with at least seven different baits, while only eight of the 86 newly detected RDCs were found in this group (Fig. 1B, blue and orange boxes). Indeed, more than one-half of the newly identified RDCs were found with only their host chromosome bait and one or two others (Fig. 1B).

Identification of Additional Spontaneous RDCs by the Intrachromosomal Bait Approach.

The use of baits on each chromosome to detect RDCs in cis greatly enhanced the detection efficiency of weaker RDCs. To assay for additional “spontaneous” RDCs that are detectable in the absence of APH treatment, we analyzed the HTGTS libraries of DSMO-treated Xrcc4−/−p53−/− NSPCs through a SICER-based approach designed to detect RDCs in the nontreated NSPC genome (16). In most cases, genomic regions detected by this method were those that contained the sgRNA off-target (OT) sites (SI Appendix, Table S4). Nevertheless, we found two genomic regions that did not contain sgRNA OT sites that were significantly enriched with HTGTS junctions (Fig. 3). A SICER-called cluster in HTGTS libraries that used chromosome 6 bait occurred within the Ctnna2 gene on chromosome 6, which we previously found as an RDC gene only after APH treatment (16) (Fig. 3A). In this analysis, the chromosome 6 bait DSB was introduced ∼5 Mb upstream of Ctnna2. The other spontaneous RDC detected was within the Maml2/Mtmr2 gene cluster (Fig. 3B). The power of the chromosome-specific bait approach is also evident from the finding that Maml2/Mtmr2 barely reached significance after APH treatment in libraries from chromosome 4, 12, 13, 18, and 19 baits (SI Appendix, Fig. S2A). Finally, we note that, as for Lsamp1 and Npas3 (16), the HTGTS junction density across the Ctnna2 and Maml2/Mtmr2 RDC genes was further enhanced after APH treatment (Fig. 3 A and B).

Characteristics of Newly Identified RDCs.

All of the previously identified 27 RDCs were contained within a single gene, which in most cases was very long (16). Notably, all of the newly identified RDCs were within genes or gene clusters (Dataset S1 and Fig. 4A). We also tested RDC robustness based on a new RDC robustness test (SI Appendix, Methods). This analysis revealed that, while we identified many new RDCs, most of the previously reported RDCs that were discovered with just three separate bait DSBs (16) are among the most robust RDCs (SI Appendix, Fig. S3). We classified newly identified RDCs into three groups according to RDC DSB distribution. Group 1 RDCs occur within one, usually long gene; this category has 76 members, including most of our previously identified RDCs (Fig. 4A and Dataset S1; examples shown in Figs. 2 and 3A). While group 1 RDCs overall show a wide range in robustness, approximately 80% of the most robust RDCs are in group 1 (SI Appendix, Fig. S3 B and C). Group 2 RDCs contain multiple genes, with at least one gene >80 kb long; this category contains 34 members with varying degrees of robustness, including five that fall into the most robust RDC category (Figs. 3B and 4A; SI Appendix, Figs. S2 A and B, and S3B and Table S5; and Dataset S1). Group 3 RDCs include a cluster of multiple small (<20 kb) genes (examples in SI Appendix, Fig. S2 C and D); this category contains three members, one of which is robust, along with 24 genes and two long noncoding RNA (lncRNA) sequences (Fig. 4A; SI Appendix, Figs. S2D and S3 and Table S5; and Dataset S1).

Fig. 4.

Fig. 4.

Characteristics and functional classification of murine NSPC RDCs and their relevance to diseases. (A) Venn diagram of the indicated classes among the 113 RDCs, including 27 previously identified RDC genes (blue); 76 group 1 RDCs (pink), including 25 previously identified RDC genes; 34 group 2 RDCs (gray), including two previously identified RDC genes; and three group 3 RDCs (yellow). Additional group 1, 2, and 3 examples are shown in Figs. 2 and 3 and SI Appendix, Fig. S2. For viewing additional RDC junction distributions, all datasets are available in the GEO database (accession no. GSE106822). (B and C) Length (B) and transcription rate determined by GRO-seq (RPKM) (C) of the 27 RDC genes (blue), genes in newly identified group 1 RDCs (pink), genes >80 kb in newly identified group 2 RDCs (gray), and all genes in group 3 RDCs (orange). The number of genes analyzed in each group is indicated. Whiskers indicate minimum and maximum values; the top and bottom edges of the boxplots correspond to the 25th and 75th percentiles, respectively; and the horizontal line indicates medium values. *P < 0.05; ****P < 0.0001 (Mann–Whitney U test); n.s., P ≥ 0.05. (DF) Timing of replication of the newly identified group 1 RDC genes (D), group 2 RDCs (E), and group 3 RDCs (F). Average and SEM are shown. Details are provided in Materials and Methods and SI Appendix, Materials and Methods. Green, early; blue, late. The corresponding locations of each indicated RDC are provided in Dataset S1. (G and H) Venn diagram of the indicated gene function (G) and link to diseases (H) among the 51 newly identified group 1 RDCs. Details are provided in SI Appendix, Table S5.

The lengths of group 1 and 2 RDCs are comparable (mean, 1.07 ± 0.09 and 1.10 ± 0.13 Mb, respectively), while group 3 RDCs are much shorter (mean, 0.22 ± 0.05 Mb). The genes within the new group 1 RDCs are very long, comparable in length to most of the RDCs genes described previously (16) (Fig. 4B). However, the genes in the group 2 and group 3 RDCs are significantly shorter than those in group 1 RDCs (Fig. 4B). We also extracted transcription rate information for newly identified RDCs from our existing Xrcc4−/−p53−/− NSPC GRO-seq data (16). We found that the overall transcription rates of group 1 and 2 RDC genes are not significantly different, but the transcription rate of group 3 RDCs is significantly greater than that of group 1(Fig. 4C). Based on existing murine neural progenitor replication timing data (25), the majority of group 1 RDCs replicate late (Fig. 4D), whereas the majority of both group 2 and group 3 RDCs replicate early (Fig. 4 E and F).

To further compare the newly identified RDC genes with the 27 previously identified RDC genes, we performed gene function and disease association analyses based on the published literature (PubMed, OMIM at the National Center for Biotechnology Information website). Ten of the 51 newly identified group 1 RDCs (19.6%) harbor genes that encode cell membrane proteins that are adhesion molecules, 18 (35.3%) harbor genes implicated in synaptic functions, and 23 (45.1%) harbor genes implicated in neurogenesis (Fig. 4G and SI Appendix, Table S5). In addition, nearly all of the genes within the newly identified group 1 RDCs have been linked to diseases in mice, humans, or both, including neuropsychiatric and developmental disorders (36 of 51; 70.6%) and cancer (35 of 51; 68.6%) (Fig. 4H and SI Appendix, Table S5). Compared with the genes in group 1 RDCs, the genes in group 2 RDCs showed fewer neuronal functional correlations (16.9% implicated in neurogenesis, 22.5% implicated in synaptic function or neuronal plasticity, and 4.2% implicated as adhesion molecules), and also were less closely associated with neuropsychiatric disorders (39.4%) or cancer (38.0%) (SI Appendix, Table S5). Finally, only one of 24 small genes (Mef2d) and one of two lncRNA loci found within the group 3 clusters (Malat1) have been implicated in synaptic function (7.7%; SI Appendix, Table S5), and only two genes and two lncRNA loci (Mef2d and Cct3 and Malat1 and Neat1, respectively; SI Appendix, Table S5) have been associated with cancer. Indeed, the majority of the genes within group 3 RDCs function in general cellular processes (SI Appendix, Table S5).

In summary, the new group 1 RDCs harbor genes that generally share most of the characteristics of the previously discovered RDC genes (16), while a much smaller fraction of genes in group 2 RDCs share common characteristics with group 1 RDC genes, including found most frequently in neuropsychiatric diseases and/or cancer. Genes in group 3 RDCs appear to be a functionally quite distinct class.

Discussion

By generating HTGTS libraries from control or APH-treated mouse Xrcc4−/−p53−/− NSPCs in which bait DSBs were introduced separately on 20 different mouse chromosomes, we confirmed 27 previously identified RDCs and identified 86 new ones. All but eight RDCs were most robustly identified by bait DSBs on their host chromosome. The exceptions were RDCs lying too close to the bait DSB (Lasmp, Prkg1, Magi2, Macrod2, Auts2 and RDC-106) or too close to a separate robust RDC (Zbtb20 and RDC-072) to allow unequivocal identification by our strict pipeline. According to our RDC robustness estimation (Fig. 3B), 19 of the 31 most robust RDCs (which all had a RDC robustness score >50) were those that we previously identified using bait DSBs on just three chromosomes (Fig. 1B and SI Appendix, Fig. S3B). These findings strongly support the idea that cellular heterogeneity in genomic 3D proximity greatly facilitates the joining of HTGTS bait DSBs to most other classes of robust recurrent endogenous DSBs or recurrent DSB clusters genome-wide (2). However, while our new approach using baits on all chromosomes allows us to detect many new lower-level RDCs, including different classes of RDCs, it is possible that some RDCs are less prone to translocate to recurrent DSBs even in distant domains on the same chromosome, due to organizational or mechanistic features of their DSBs or their repair (18). In this regard, it may be possible to further reveal such putative additional RDCs by introducing bait DSBs within the topologic domain in which they lie or by using endogenous RDC DSB clusters within a domain of candidate RDCs as bait, as we have shown for V(D)J recombination and CSR DSBs (1820).

Based on our RDC robustness test (SI Appendix, Fig. S3), two of the six most robust RDCs—the Ctnna2 and Maml2/Mtmr2 genes—contain RDCs that are identifiable in the absence of replication stress, similar to the most robust Lsamp and Npas3 RDC genes (16) (SI Appendix, Fig. S3). These findings support the possibility that at least some RDCs have an intrinsic fragility augmented by induced replication stress (16). In this regard, replication stress induced by APH or hydroxyurea leads to CNVs, which in some cases correspond to known common fragile sites that often contain very large transcribed and late-replicating gene units (22, 26). In many cases, these late-replicating gene units correspond to the previously described RDCs (22). Based on our new dataset, such CNVs were found in 33 of 76 group 1 RDCs, 10 of 34 group 2 RDCs, and one of three group 3 RDCs in APH-treated mouse embryonic stem cells in which most of these overlapping RDC genes were actively transcribed (22). Similarly, CNVs also corresponded to 21 of the 76 group 1 and nine of the 34 group 2 murine NSPC RDC human orthologs in human fibroblasts treated with APH or hydroxyurea (22). Taken together, the foregoing findings suggest mechanistic overlaps between NSPC RDCs and CNVs occurring in other cell types in which particular RDC genes are transcribed. For group 1 RDCs and related CNVs, late S-phase transcription/replication collisions or the entry into mitotic phase with collapsed replication forks could lead to DSBs or other mechanisms of fragility (26). On the other hand, group 2 and group 3 RDCs harbor genes that are shorter and mostly early replicating and have higher transcription rates, as exemplified by the robust group 2 RDC gene Ptn (16). More than 600 early replicating fragile sites (ERFSs) have been found to occur in response to replication stress induced by hydroxyurea in B lymphocytes and are also proposed to result from conflicts between transcription and replication (27). We found that 13 of 76 group 1, eight of 34 group 2, and one of three group 3 APH-treated NSPC RDCs overlap with the B lymphocyte ERFSs. More studies are needed to determine potential relationships between certain NSPC RDCs and ERFSs.

Many newly identified group 1 RDCs also lie within long genes encoding proteins that regulate synaptic function/cell adhesion and/or have been associated with neuropsychiatric disorders or cancer (Fig. 4 G and H). The frequency with which DSBs that can be captured via translocation across the robust Lsamp RDC in NSPCs has been estimated to roughly approach the same order of magnitude as that IgH switch region breaks in activated B lymphocytes (16). Because group 1 RDC genes typically have long introns interspersed with small exons, most of their RDC DSBs occur within introns (16) (Figs. 2 and 3). For RDC genes lying within a specific topologic domain, two RDC gene DSBs would mostly be either rejoined or joined to other DSBs within different introns of the same RDC gene—the latter of which could functionally alter encoded proteins (28). In this regard, while many RDC genes are thought to produce numerous protein isoforms via differential RNA processing (29, 30), such plasticity conceivably could be augmented or “hardwired” by intragenic rearrangements (28). In this context, 30 RDCs, including the previously described nine RDCs (16), locate in CNV regions found in single human neurons (10) (SI Appendix, Fig. S4). However, due to the large size of currently characterized human neuron CNVs (∼5–15 Mb), the significance of this potential overlap awaits a higher-resolution map of single neuron genomic sequences. Regardless of any developmental role, RDC gene breakage and joining still might contribute to genetic variations associated with neuropsychiatric diseases and cancer (9, 16). Finally, to extend our findings to potential impact of RDC DSBs on neural development and/or neural diseases, it will be necessary to assay for RDC formation during neural differentiation and to assess potential impacts of endogenous or induced replication stresses on RDC formation in vivo.

Materials and Methods

Primary NSPC Isolation, Culture, and HTGTS Bait DSB Induction.

Primary Xrcc4−/−p53−/− NSPCs were prepared as described previously (16, 31). All related animal procedures were performed under protocol 14–10-2790R approved by the Institutional Animal Care and Use Committee of Boston Children’s Hospital. Details are provided in SI Appendix, Materials and Methods.

HTGTS.

Libraries were prepared as described previously (15, 16) and sequenced (Illumina MiSeq). Reads from demultiplexed FASTQ files were aligned to the genome build mm9/NCBI37 through Bowtie2, and processed through the HTGTS pipeline (15). In each library, only unique junctions were preserved for the RDC identification. SI Appendix, Table S3 lists the number of junctions for each experiment.

RDC Identification.

A SICER-based, unbiased, genome-wide method and a MACS-based method were both applied to identify APH-induced and spontaneous RDCs as described previously (16) and now further modified. A new method for evaluating relative RDC robustness was used as well. Details are provided in SI Appendix, Materials and Methods.

Replication Timing Analysis.

The median replication timing ratio of genomic regions was analyzed using Repli-chip datasets from murine 46C, TT2, and D3 ES cell-derived neural progenitor cells (25) by a custom Python script as described previously (16).

Supplementary Material

Supplementary File
pnas.1719907115.sapp.pdf (805.5KB, pdf)
Supplementary File
pnas.1719907115.sd01.xlsx (67.7KB, xlsx)

Acknowledgments

We thank members of the F.W.A. laboratory for stimulating discussions. This work in the F.W.A. laboratory was supported by the Boston Children’s Hospital Department of Medicine, the Harvard Brain Initiative Collaborative Seed Fund, and the Howard Hughes Medical Institute. P.-C.W. is supported by Charles A. King Trust Postdoctoral Research Fellowship Program, Bank of America, co-trustees. C.-S.L. is supported by a Cancer Research Institute Irvington Postdoctoral Fellowship.

Footnotes

The authors declare no conflict of interest.

Data deposition: Sequencing data have been deposited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo (accession no. GSE106822).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1719907115/-/DCSupplemental.

References

  • 1.Taccioli GE, et al. Impairment of V(D)J recombination in double-strand break repair mutants. Science. 1993;260:207–210. doi: 10.1126/science.8469973. [DOI] [PubMed] [Google Scholar]
  • 2.Alt FW, Zhang Y, Meng FL, Guo C, Schwer B. Mechanisms of programmed DNA lesions and genomic instability in the immune system. Cell. 2013;152:417–429. doi: 10.1016/j.cell.2013.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Li Z, et al. The XRCC4 gene encodes a novel protein involved in DNA double-strand break repair and V(D)J recombination. Cell. 1995;83:1079–1089. doi: 10.1016/0092-8674(95)90135-3. [DOI] [PubMed] [Google Scholar]
  • 4.Gao Y, et al. A critical role for DNA end-joining proteins in both lymphogenesis and neurogenesis. Cell. 1998;95:891–902. doi: 10.1016/s0092-8674(00)81714-6. [DOI] [PubMed] [Google Scholar]
  • 5.Barnes DE, Stamp G, Rosewell I, Denzel A, Lindahl T. Targeted disruption of the gene encoding DNA ligase IV leads to lethality in embryonic mice. Curr Biol. 1998;8:1395–1398. doi: 10.1016/s0960-9822(98)00021-9. [DOI] [PubMed] [Google Scholar]
  • 6.Frank KM, et al. DNA ligase IV deficiency in mice leads to defective neurogenesis and embryonic lethality via the p53 pathway. Mol Cell. 2000;5:993–1002. doi: 10.1016/s1097-2765(00)80264-6. [DOI] [PubMed] [Google Scholar]
  • 7.Lieber MR. The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway. Annu Rev Biochem. 2010;79:181–211. doi: 10.1146/annurev.biochem.052308.093131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gilmore EC, Nowakowski RS, Caviness VS, Jr, Herrup K. Cell birth, cell death, cell diversity and DNA breaks: How do they all fit together? Trends Neurosci. 2000;23:100–105. doi: 10.1016/s0166-2236(99)01503-9. [DOI] [PubMed] [Google Scholar]
  • 9.Weissman IL, Gage FH. A mechanism for somatic brain mosaicism. Cell. 2016;164:593–595. doi: 10.1016/j.cell.2016.01.048. [DOI] [PubMed] [Google Scholar]
  • 10.McConnell MJ, et al. Mosaic copy number variation in human neurons. Science. 2013;342:632–637. doi: 10.1126/science.1243472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Poduri A, Evrony GD, Cai X, Walsh CA. Somatic mutation, genomic variation, and neurological disease. Science. 2013;341:1237758. doi: 10.1126/science.1237758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.McConnell MJ, et al. Brain Somatic Mosaicism Network Intersection of diverse neuronal genomes and neuropsychiatric disease: The brain somatic mosaicism network. Science. 2017;356:eaal1641. doi: 10.1126/science.aal1641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chiarle R, et al. Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell. 2011;147:107–119. doi: 10.1016/j.cell.2011.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Frock RL, et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat Biotechnol. 2015;33:179–186. doi: 10.1038/nbt.3101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hu J, et al. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing. Nat Protoc. 2016;11:853–871. doi: 10.1038/nprot.2016.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wei PC, et al. Long neural genes harbor recurrent DNA break clusters in neural stem/progenitor cells. Cell. 2016;164:644–655. doi: 10.1016/j.cell.2015.12.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhang Y, et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell. 2012;148:908–921. doi: 10.1016/j.cell.2012.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hu J, et al. Chromosomal loop domains direct the recombination of antigen receptor genes. Cell. 2015;163:947–959. doi: 10.1016/j.cell.2015.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhao L, et al. Orientation-specific RAG activity in chromosomal loop domains contributes to Tcrd V(D)J recombination during T cell development. J Exp Med. 2016;213:1921–1936. doi: 10.1084/jem.20160670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dong J, et al. Orientation-specific joining of AID-initiated DNA breaks promotes antibody class switching. Nature. 2015;525:134–139. doi: 10.1038/nature14970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zarrin AA, et al. Antibody class switching mediated by yeast endonuclease-generated DNA breaks. Science. 2007;315:377–381. doi: 10.1126/science.1136386. [DOI] [PubMed] [Google Scholar]
  • 22.Wilson TE, et al. Large transcription units unify copy number variants and common fragile sites arising under replication stress. Genome Res. 2015;25:189–200. doi: 10.1101/gr.177121.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Helmrich A, Ballarino M, Tora L. Collisions between replication and transcription complexes cause common fragile site instability at the longest human genes. Mol Cell. 2011;44:966–977. doi: 10.1016/j.molcel.2011.10.013. [DOI] [PubMed] [Google Scholar]
  • 24.Aguilera A, García-Muse T. Causes of genome instability. Annu Rev Genet. 2013;47:1–32. doi: 10.1146/annurev-genet-111212-133232. [DOI] [PubMed] [Google Scholar]
  • 25.Hiratani I, et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 2008;6:e245. doi: 10.1371/journal.pbio.0060245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Glover TW, Wilson TE, Arlt MF. Fragile sites in cancer: More than meets the eye. Nat Rev Cancer. 2017;17:489–501. doi: 10.1038/nrc.2017.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Barlow JH, et al. Identification of early replicating fragile sites that contribute to genome instability. Cell. 2013;152:620–632. doi: 10.1016/j.cell.2013.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Alt FW, Wei PC, Schwer B. Recurrently breaking genes in neural progenitors: Potential roles of DNA breaks in neuronal function, degeneration and cancer. In: Jaenisch R, Zhang F, Gage F, editors. Genome Editing in Neurosciences. Springer; Basel: 2017. pp. 63–72. [PubMed] [Google Scholar]
  • 29.Schreiner D, et al. Targeted combinatorial alternative splicing generates brain region-specific repertoires of neurexins. Neuron. 2014;84:386–398. doi: 10.1016/j.neuron.2014.09.011. [DOI] [PubMed] [Google Scholar]
  • 30.Treutlein B, Gokce O, Quake SR, Südhof TC. Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing. Proc Natl Acad Sci USA. 2014;111:E1291–E1299. doi: 10.1073/pnas.1403244111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schwer B, et al. Transcription-associated processes cause DNA double-strand breaks and translocations in neural stem/progenitor cells. Proc Natl Acad Sci USA. 2016;113:2258–2263. doi: 10.1073/pnas.1525564113. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1719907115.sapp.pdf (805.5KB, pdf)
Supplementary File
pnas.1719907115.sd01.xlsx (67.7KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES