Abstract
Erroneous repair of DNA double-strand breaks by homologous recombination (HR) leads to loss of heterozygosity (LOH). Analysing 22 392 and 74 415 LOH events in 363 glioblastoma and 513 ovarian cancer samples, respectively, and using three different metrics, we report that LOH selectively occurs in early replicating regions; this pattern differs from the trends for point mutations and somatic deletions, which are biased toward late replicating regions. Our results are independent of BRCA1 and BRCA2 mutation status. The LOH events are significantly clustered near RNA polII-bound transcription start sites, consistent with the reports that slow replication near paused RNA polII might initiate HR-mediated repair. The frequency of LOH events is higher in the chromosomes with shorter inter-homolog distance inside the nucleus. We propose that during early replication, HR-mediated rescue of replication near paused RNA polII using homologous chromosomes as template leads to LOH. The difference in the preference for replication timing between different classes of genomic alterations in cancer genomes also provokes a testable hypothesis that replicating cells show changing preference between various DNA repair pathways, which have different levels of efficiency and fidelity, as the replication progresses.
INTRODUCTION
Loss of heterozygosity (LOH) is a common class of genomic alterations observed in cancer genomes, which occurs due to heterozygous deletion of one allele, or duplication of a maternal or paternal chromosome or chromosomal region and concurrent loss of the other allele; the latter is known as copy neutral LOH or uniparental disomy. Copy neutral LOH events arise via homologous recombination (HR)—a DNA double-strand break repair pathway (1). HR is active during and shortly after DNA replication—when sister chromatids and homologous chromosomes are easily available (2). DNA replication is spatially segregated such that some genomic regions are replicated early and others later during S phase (3). It was recently demonstrated that local DNA replication timing (RT) affects the patterns of point mutations (4–6), somatic copy number alterations (4,7,8) and rearrangements (9) in cancer and normal genomes—late replicating regions accumulate more mutations than early replicating regions (10). These findings prompt the question of whether LOH events, which are primarily replication-dependent phenomena, also show distinct patterns in the context of DNA RT.
Here, integrating genomic alteration data for 597 glioblastoma (GBM) (11) and 591 ovarian cystadenoma (12) samples from the cancer genome atlas (TCGA), and DNA RT data for multiple cell types (3), we survey the RT pattern of the genomic regions affected by LOH events, and discuss the findings in the context of the temporal expression pattern of the genes involved in the HR- and non-homologous end-joining (NHEJ)-mediated repair. We then compare and contrast the RT preference for LOH events with that for point mutations and somatic copy number alterations in cancer genomes. We further analyse the findings in the context of factors that are known to contribute to replication stress during early replication, and also the nuclear localization of homologous chromosome pairs. Finally, we conclude by discussing our findings in light of erroneous HR-mediated repair during early replication.
MATERIALS AND METHODS
We mapped all data sets to human reference genome version hg18. Various genomic and epigenomic features were downloaded from the UCSC genome browser (13) as appropriate.
DNA RT data set
We obtained RT data measured using a massively parallel sequencing-based technique across multiple human cell types from Hansen et al. (3). In this study, the RT of different genomic regions was categorized as ‘constant early’, ‘constant mid’, ‘constant late’ and ‘variable across cell types’. Some regions had no RT assigned because of coverage, mappability and other technical issues. We focused on genomic regions that had constant early and constant late RT across several human cell types throughout this article. Constant early and constant late RT regions covered 585.13 and 521.14 Mb of the genome, respectively. The remaining regions are termed as ‘other_RT’ regions.
LOH and other genomic alterations data sets
We have obtained genomic data for 597 GBM (11) and 591 ovarian cystadenoma samples (12) from TCGA. LOH status for the GBM and ovarian cancer samples was analysed using Illumina HumanHapMap550K and Human1MDuo microarrays, respectively, and processed by the Hudson Alpha Institute for Biotechnology using published protocols (11,12). The somatic copy number alteration data for the same samples were obtained from TCGA (11,12). We excluded the samples with potential systematic biases, and also the LOH events that were likely to occur via heterozygous deletion (Supplementary Module SM1), using our previously published approach (14). Our final data set had 22 392 and 74 415 LOH events in 363 GBM and 513 ovarian cancer samples, respectively.
Analytical approach and estimation of statistical significance
We used Bedtools (15) for calculating overlap between two genomic features (e.g. LOH and early replicating regions; getOverlap function) and for estimating intersection between multiple features (multiIntersectBed function). Some genomic regions did not have any RT assigned because of mappability, coverage and other technical issues. Hence, often some LOH end points did not have any RT assigned, but the genomic regions in their proximity did. To maximize biologically relevant overlap between the data sets, we considered a window of 1 kb centering each LOH end point and assigned the RT of that window as the RT of these end points.
We calculated (i) the observed (or expected) proportion of LOH end points in early RT regions as:
![]() |
(ii) the observed (or expected) proportion of LOH events with both end points in early RT regions as:
![]() |
and (iii) the observed (or expected) proportion of the length of LOH events in early RT regions as:
![]() |
We found excluding the LOH end points and stretches of genomic regions affected by LOH events that reside in other_RT regions provides a more meaningful interpretation of the observed preference for early (or late) RT regions and its statistical significance, compared with the cases where other_RT regions were included in the analysis.
We estimated statistical significance of the observed overlaps between LOH and RT patterns using permutation analysis with 10 000 iterations. It was shown that permutation allows preservation of higher-order genomic structures, and hence provides a more realistic P-value compared with other statistical tests. During the permutation analysis, we performed genome-wide shuffling using the shuffleBed function of the Bedtools (15) with default seed and other parameters, and also keeping the length of the LOH events unchanged. We also used two alternative permutation strategies: shuffleBed with the –chrom option to permute the LOH events within respective chromosomes, and shuffleBed with the –chrom and –excl options to permute the LOH events within respective chromosomes, after excluding selected (e.g. centrometric) regions.
Cell cycle-related gene expression
We obtained data on dynamic expression patterns of the genes during the cell cycle from multiple independent experiments in baker’s yeast (16–19) and human cell lines (20) as deposited in Cyclebase 2.0 (21). Peak time, periodicity and regulation of these genes were calculated using methods proposed by Gauthier et al. (22), and archived in the database. In brief, the P(per) was defined (22) as the chance of observing as great a periodicity by random shuffling of the individual time-point values of the expression profile. First, a Fourier score was obtained for each gene profile. Next, simulated profiles were generated from random shuffling of the data within the original profile 1 million times. The relative proportion of simulated profiles whose Fourier scores were greater than or equal to the gene’s true Fourier score was reported as the P(per). Due to the normalization techniques used by Gauthier et al. (22), P(per) can take values >1. A small P(per) indicated a highly periodic pattern of expression. If the expression data for a given gene were available from multiple experiments, the P(per) from individual experiments were multiplied to generate the final P(per).
![]() |
The P(reg) was defined (22) as an estimate that the magnitude of variance between experiments. First, for a given gene the standard deviation was obtained for the log-ratio profile. Then simulated profiles were created from the global distribution for 1 million iterations. The proportion of shuffled profiles whose standard deviations were greater than or equal to the gene’s standard deviation was calculated, and normalized to create the final P(reg). Due tothe normalization techniques used by Gauthier et al. (22), P(reg) can take values >1. A small P-value for regulation indicated low variance and a strongly regulated gene.
Peak time was calculated as a percentage, with both 0 and 100 representing the M/G1 transition phase during the cell cycle. To compute a peak time for a single gene across all available experiments, a sine wave was fitted to the combined expression profile, and the time scale was ‘shifted’ such that time was represented as a fraction of the cell cycle. In those cases where the expression pattern lacked periodicity at the cell cycle time scale, or the expression pattern between experiments was inconsistent, the peak time was reported as ‘Uncertain’.
Genomic and epigenomic features associated with replication stress
We analysed 76 common fragile sites (23), early replicating fragile sites (24), human genes obtained from Ensembl v54 (25), transcription start sites as in Ensembl v 54 (25), and the sites of RNA polII occupancy in GM12878, HUVEC, HeLa and K562 cell lines (13,26,27). Because transcription start sites are a single base-pair wide, we considered a window of ±5 kb while testing for overlap in both observed and expected cases. The regions marked as ‘standard peaks’ (StdPk.NarrowPk track from the ENCODE/Stanford/Yale/USC/Harvard group) were chosen as the sites of RNA polII occupancy in the four ENCODE cell lines (26,27).
Distance between homologous chromosomes
We obtained the data on the distance between homologous chromosomes in the EJ-30 human epithelial cancer cell line from Heride et al. (28). In brief, the authors used fluorescence in situ hybridization using advanced microscopy and image analysis tools to analyse in 3D the radial positions of 10 chromosomes (chr1, chr4, chr8 chr10, chr14, chr16, chr17, chr18, chr19 and chr21). Most of the chromosomes occupied specific nuclear positions in the genome and had small variance in inter-homolog distance (28). The nuclear localization and inter-homologous distance estimated in this study were comparable with that estimated in other human cell types (28,29).
RESULTS
Data sets analysed
We used RT data measured using a massively parallel sequencing-based technique across multiple human cell types (3). Some genomic regions replicated early (or late) irrespective of cell types (noted as constant early or constant late RT regions, respectively), whereas others had variable patterns. Throughout this article we focused on the genomic regions that were classified as constant early RT (total length 585.13 Mb) and constant late RT (total length 521.14 Mb).
We obtained the LOH data as available for 597 GBM (11) and 591 ovarian cancer (12) samples from TCGA. We performed extensive quality control steps, excluding the samples with potential systematic biases (e.g. batch effects, low signal to noise ratio), and also the LOH events that were likely to occur via heterozygous deletion (see Methods and Supplementary Module SM1). Our final data set had 22 392 and 74 415 copy neutral LOH events in 363 GBM and 513 ovarian cancer samples, respectively.
Genomic regions affected by LOH events are replicated predominantly early
HR-mediated repair can initiate near one end point of LOH events and proceed unidirectionally to the other end point, or start somewhere between and proceed bidirectionally up to the two end points of the LOH events. To investigate DNA RT patterns of the LOH events after considering these possibilities, we adopted three metrics, analysing DNA RT patterns—(i) at the LOH end points, (ii) over the length of the LOH events and (iii) focusing on only the small (<10 kb) LOH events, which are likely to have the same RT throughout the length.
First metric
To study DNA RT patterns at the LOH end points, we overlaid RT data and the LOH end points from TCGA ovarian cancer samples (12) on the human reference genome (Figure 1A), and found that 40 189 and 21 621 LOH end points occurred in constant early and constant late RT regions, respectively. There were, on average, 0.134 LOH end points per megabase (Mb) per sample in the constant early RT regions, and 0.081 LOH end points per Mb per sample in constant late RT regions in the filtered ovarian cancer data set. We compared the proportion of LOH end points in early (or late) RT regions with that expected by chance using permutation analysis (see Methods for details), and found that the observed preference for LOH end points to occur in the early RT regions was significantly higher compared with that expected by chance (permutation test; P-value <1 × 10−3; Figure 1B).
Figure 1.
(A) A schematic representation showing patterns of DNA RT at the LOH end points. Comparisons between the observed proportions of the (B) ovarian cancer and (C) GBM LOH end points in early RT regions (dashed vertical bar) with that expected when the LOHs are shuffled across the genome. (D) Proportion of the individual ovarian cancer and GBM samples, where observed proportion of LOH end points in early RT regions is higher than that in the shuffled distribution. (E) A schematic representation showing overlap between genomic regions affected by LOH events and early RT regions. Comparisons between the observed proportions of the length of the (F) ovarian cancer and (G) GBM LOH events covered by early RT regions (dashed vertical bar) with that expected when the LOHs are shuffled throughout the genome. (H) Proportion of the individual ovarian cancer and GBM samples, where observed proportion of the length of LOH events in early RT regions is higher than that in the shuffled distribution. We also obtained consistent results using alternative permutation approaches, as described in the Supplementary Module SM3.
We then repeated the analyses for TCGA GBM samples (11) and found that there were, on average, 0.055 LOH end points per Mb per sample in the constant early RT regions and 0.040 LOH end points per Mb per sample in constant late RT regions in the filtered data set. Once again, a permutation analysis revealed that in GBM samples, LOH end points also preferentially occurred in early RT regions (permutation test; P-value <1 × 10−3; Figure 1C).
To examine whether the aggregated patterns are biased by a small number of outlier samples, we repeated the analyses for individual samples. Although small number of LOH events in individual samples made the trends noisier, we found similar patterns for a majority of the GBM and ovarian cancer samples (Figure 1D)—highlighting that our aggregated results were not due to certain outlier samples.
Next, we calculated how often both the end points of LOH events in TCGA ovarian cancer (12) and GBM (11) samples resided in similar (i.e. both end points in constant early or constant late) or different (i.e. one end point in constant early and the other end point in constant late) RT regions (Methods). We found that the observed proportion of LOH events with early RT at both end points was significantly higher compared with that expected by chance (permutation test; P-value <1 × 10−3) for both the ovarian cancer and GBM data set, and that our aggregated results were not biased by outlier samples (Supplementary Module SM2). Taken together, our findings suggest that LOH end points preferentially occurred in early RT regions.
Second metric
To study RT patterns over the length of the LOH events, we calculated the proportion of the length of the genomic region affected by LOH events that replicated early and those that replicated late during the S phase (see Methods, Figure 1E). We found that the proportion of genomic regions affected by LOH events replicated predominantly early was higher compared with those replicated late, and the trend was statistically significant compared with that expected by chance (permutation test; P-value <1 × 10−2, Figure 1F–G), and that our aggregated results were not due to certain outlier samples (Figure 1H).
Third metric
We then focused on the small (<10 kb) LOH events, the majority of which are likely to have same RT across their length. For both ovarian cancer and GBM samples, using analytical approaches similar to that described previously, we found that the small LOH events were also significantly likely to have early RT at their end points and also over their length, compared with that expected by chance (permutation test; P-value <1 × 10−3).
Finally, we carried out extensive control calculations to account for potential caveats. We performed additional permutation analysis by: (i) randomizing the LOH events only within the same chromosomes, (ii) after excluding centromere regions and (iii) grouping the LOH events as those <1, 1–5 and >5 Mb in size, and in each case found consistent results for both the cancer data sets (Supplementary Module SM3). We found similar results irrespective of the germ line and somatic mutation status at the BRCA1 and BRCA2 loci (Supplementary Module SM3). DNA RT is correlated with many genomic and epigenomic features. Integrating chromatin (26), cytogenetic banding patterns (30) and GC content (13) data, we found that our results are consistent even after controlling for these potential covariates (Supplementary Module SM3). Integrating long-range interaction and repeat element data, we found that the two end points of LOH events frequently harbor similar repeat classes, and also are in proximity of each other in the 3D nucleus (Supplementary Module SM3); these attributes might facilitate co-operative HR-mediated repair within the same replication factory, but further studies are warranted. Taken together, our findings suggest that LOH events preferentially occur in early RT regions, and the results are similar across different cancer types, and robust toward the choice of data sets and statistical approaches.
LOH end points have different RT preferences compared with other types of genomic alteration
Different classes of genomic alterations, e.g. point mutations, somatic copy number alterations and LOH arise because of erroneous repair of DNA lesions by various DNA repair pathways. Recently, it was reported that local DNA RT also affects the patterns of point mutations (4–6) and copy number alterations (4,7,8)—point mutations are enriched in late replicating regions, and end points of somatic copy number alterations, especially deletions, occur at a high frequency in late replicating regions (10). Here we reported that, in contrast, the LOH end points selectively occur in early replicating regions in multiple cancer types (Figure 2A). The difference in RT patterns between these distinct classes of genomic alterations led us to ask whether the DNA repair pathways, especially the HR pathway that mediates LOH events (1), also show systematic changes in expression during different phases of the cell cycle.
Figure 2.
(A) Point mutations and copy number alteration (especially deletion) end points are prevalent in late RT regions, but LOH end points are more common in early replicating regions. Temporal patterns of expression of (B) RAD54B, (C) RAD54L, (D) RBBP8, (E) RAD54 and (F) RAD51 during cell cycle in yeast and human cell lines, derived from multiple independent experiments. Measure of periodicity P(per), variance between experiments P(reg) and peak time of expression are listed for each gene, as obtained from CycleBase 2.0 (21). High periodicity and tight regulation are indicated by small values of P(per) and P(reg), respectively.
HR pathway genes are active during early replication
We surveyed the temporal pattern of expression of the genes involved in the canonical DNA double-strand repair pathways during the cell cycle in yeast and humans. We obtained data on the dynamic expression pattern of the genes in the HR (i.e. RAD50, RAD51, RAD52, RAD54, BRCA2, XRCC2 XRCC3, NBN, MRE11, MUS81, GEN1, SHFM1, RBBP8) and NHEJ pathway (i.e. KU70, KU80, LIG4, HYRC, XRCC4) from multiple independent experiments (16–20) as deposited in the Cyclebase 2.0 (21). We found that mRNA expression of RAD51 and RAD54, which are important for initiation of HR-mediated repair, was high during early replication (G1-S phase) and decreased rapidly afterward (S-G2 phase); the pattern was consistent across independent experiments in both humans and yeast, and showed significant periodicity and low variance (Figure 2B–F; P(per) <1 × 10−5; see Methods for periodicity and variance calculation). Expression of other genes in the HR pathway, or those involved in the NHEJ pathway, did not show distinct cell cycle specific pattern (Supplementary Module SM4). Although we could not examine protein-level expression and post-transcriptional modifications on these genes, the observed findings are consistent with the model that HR-mediated repair is active even during early stages of DNA replication. This is in agreement with the report by Kadyk and Hartwell (1992) that HR-mediated DNA repair using homologous chromosomes leading to LOH can occur during G1 stage of the cell cycle (31). It prompted us to investigate whether certain types of replication stress might trigger HR-mediated repair during early replication using homologous chromosomes in the nucleus.
LOH events overlap with sites of high RNA polII occupancy
We next investigated whether certain genomic features, which are commonly associated with replicative stress, also significantly overlap with LOH events (Figure 3A). Common fragile sites are frequent sources of genomic instability (23). We first analysed whether the LOH events significantly overlap with the 76 well-characterized common fragile sites (23) and the newly reported early replicating fragile sites (ERFS) (24), which are implicated in somatic copy number alterations and translocations in lymphoma subtypes, respectively. Interestingly, we did not observe any significant overlap between short (<10 kb; the third metric) LOH events and both classes of fragile sites (CFS or ERFS; permutation test; P-value >5 × 10−2; Figure 3B) described above. The results were similar using the other two metrics as well.
Figure 3.
(A) Summary of different genomic features analysed in the context of LOH. (B) Comparisons of the observed (gray vertical line) extent of overlap between these features with short (<10 kb) LOH events in GBM and ovarian cancer samples, with that expected (light gray bars) when the LOHs are shuffled across the genome.
Even though transcription and replication are meant to be spatially segregated, collision of replication fork with paused RNA polII is another key cause of replicative stress (32). Combining transcription start sites and RNA polII occupancy data from multiple ENCODE cell lines, we found that the sites of high RNA polII occupancy significantly overlapped with transcription start sites (permutation test; P-value <1 × 10−4), which is consistent with the reports that promoter-proximal RNA polII pausing is common (33). Integrating LOH data from GBM and ovarian cancer samples, and using the third metric (LOH of size <10 kb), we found that both the transcription start sites of genes and the sites of RNA polII occupancy significantly overlapped with the short (<10 kb) LOH events (permutation test; P-value <1 × 10−3; Figure 3B). We did not have spatial resolution in the data sets to test whether RNA polII paused at pre-initiation complex or in early elongation contributed to this pattern. Nevertheless, a vast majority of these genes were expressed in the tumor samples and also in matched normal controls (12). Although the sites of RNA polII occupancy, present in two or more ENCODE cell lines, accounted for <1.5% of the early replicating regions, they overlapped with more than >10% of the short (<10 kb) LOH events in ovarian cancer samples. We found similar evidence for preferential overlap between the sites of RNA polII occupancy and LOH events using the other two metrics as well (Supplementary Module SM5), and even after adjusting for potential covariates such as GC content, gene density and size of the LOH events (permutation test; P-value <5 × 10−2). Even though we could not possibly test every possible covariate or analyse S-phase RNA polII occupancy data from the same samples, consistency of our findings across multiple data sets and analytical approaches hint that these issues are perhaps unlikely to bias our conclusions. Interpreting our findings in the context of recent reports (24,33–36), it is tempting to suggest that during early S phase, replicative stress in the vicinity of RNA polII (37), which are trapped as per-initiation complex or paused in early elongation (38), can invoke HR-mediated repair.
LOH frequency inversely correlates with inter-homologous chromosome distance
Before and during early replication (G1-S phase), homologous chromosomes frequently contribute templates for HR-mediated rescue of replication (31). Eukaryotic chromosomes occupy distinct nuclear territories, such that some pairs of homologous chromosomes (e.g. chr19) are closer to each other than other pairs (e.g. chr4; Figure 4A). Despite cell type-specific variation in nuclear organization, some chromosome pairs have shorter distance than others in the nucleus across different cell types (28,29). We investigated whether the relative frequency of LOH events differed between human chromosomes and whether relative proximity of homologous chromosomes correlated with this pattern. Indeed, overlaying inter-chromosomal distance data (28), we found that the relative frequency of LOH events per chromosome had significantly (Pearson correlation test; P-value <5 × 10−2) inverse correlation with inter-homolog distance in the nucleus (Figure 4B–C) for both GBM and ovarian cancer. Our findings were generally robust toward variation in inter-homologous chromosome distance (Supplementary Module SM6). We also obtained similar results using 3D fluorescence in situ hybridization-based inter-chromosome distance data for human fibroblast (29) (Supplementary Module SM6). It is likely that during early replication, when sister chromatids are forming, proximity of homologous chromosome copies is a key factor affecting HR-mediated repair leading to gene conversion and LOH.
Figure 4.
(A) The relative frequency of LOH events (per sample, per bp) in different chromosomes for the TCGA ovarian cancer and GBM samples. (B) Different chromosomes have different inter-homolog distance inside the nucleus. Scatterplot showing distribution of inter-homolog distance of human chromosomes against the relative frequency of LOH events per chromosome in the (C) ovarian cancer and (D) GBM samples adjusted by chromosome lengths.
DISCUSSION
Taken together, we have demonstrated that (i) LOH events preferentially occur in early RT regions, which is consistent with the temporal patterns of expression of HR pathway genes, (ii) RT preference for LOH events contrasts that for point mutations and somatic copy number alterations, (iii) LOH events significantly overlap with sites of high RNA polII occupancy near transcription start sites and (iv) the relative frequency of LOH events in human chromosomes correlates with the distance between homologous chromosomes in the nucleus. The preference for early RT was observed irrespective of the size of the LOH events, and mutation status of BRCA1 and BRCA2.
RNA polII pausing at pre-initiation complex or in early elongation is widespread in metazoans including humans (33,38). Paused RNA polII is known to interfere with advancing replisome contributing to replication stress (37), R-loop formation (39) and induce HR-mediated rescue (35). Although such repair is predominantly done using the sister chromatid as a template, the homologous chromosome copy may also be used, although at a lower frequency (31). The relative location of the homologous sequence, derived from sister chromatid or homologous chromosome, is suspected to influence the choice of template and the efficiency of HR-mediated repair (31). In light of these observations, our findings are consistent with a model that during early replication when sister chromatids are forming, HR-mediated rescue of replication forks near paused RNA polII using homologous chromosomes leads to LOH events in cancer genomes.
We also note potential caveats of the analysis. First, we prefer to take a conservative stance while inferring causality from correlation. Because data on chromatin, long-range interactions and temporal expression of the HR and NHEJ pathway genes were not derived from the same samples, we cautiously interpreted the findings. Second, we acknowledge that RNA polII pausing could be one of the factors that contribute to replicative stress leading to HR-mediated rescue (35,36), and many of these factors can be inter-related; thus, a more comprehensive survey is required to estimate their effects during early replication. Third, we were unable to consider intra-tumor heterogeneity, tissue-specific variation in RT and post-transcriptional modifications on the DNA repair pathway genes during cell cycle in our analysis. Nevertheless, our results are consistent across different tumor types, robust against the choice of data sets, size classes of LOH events, statistical approaches and potential covariates. Moreover, they are in agreement with current literature regarding the sources of replicative stress and HR-mediated repair. So, we anticipate that these issues are unlikely to bias our conclusions. Nevertheless, independent validation of our findings would establish the conclusions firmly.
Our findings highlight an important distinction between LOH and other classes of genomic alterations such as point mutations and somatic copy number alterations. Point mutations (4–6,40,41) and somatic copy number alterations (particularly deletions) (4,7,8) frequently occur in late RT regions. In contrast, we found that LOH events preferentially occur in early RT regions. In the early RT regions, which are also enriched in protein-coding genes (3), LOH-mediated gene conversion can potentially replace wild-type alleles with recessive deleterious alleles, leading to increased risk of manifestation of recessive deleterious traits, complicating the resulting phenotype in the affected individuals. Damage-induced hypermutability and error-prone repair of such regions could lead to further genetic changes (42,43). Furthermore, the difference in RT preference between different classes of genomic alterations also provokes a testable hypothesis whether replicating cells show any changing preference between various DNA repair pathways, which have different levels of efficiency and fidelity (1), as the replication progresses.
SUPPLMENTARY DATA
Supplementary Data are available at NAR Online. Supplementary Modules M1–M6.
FUNDING
University of Colorado School of Medicine; American Cancer Society [ACS-IRG 57-001-53]; National Cancer Institute Physical Sciences Oncology Center initiative [U54-CA143798]. Funding for open access charge: University of Colorado School of Medicine.
Conflict of interest statement. None declared.
Supplementary Material
ACKNOLWLEDGEMENTS
The authors thank Robert Sclafani, David Schwartz, Nancy Maizels, James DeGregori, Kornelia Polyak and the anonymous reviewers for insightful discussions and critical comments.
REFERENCES
- 1.Chapman JR, Taylor MR, Boulton SJ. Playing the end game: DNA double-strand break repair pathway choice. Mol. Cell. 2012;47:497–510. doi: 10.1016/j.molcel.2012.07.029. [DOI] [PubMed] [Google Scholar]
- 2.Mao Z, Bozzella M, Seluanov A, Gorbunova V. DNA repair by nonhomologous end joining and homologous recombination during cell cycle in human cells. Cell Cycle. 2008;7:2902–2906. doi: 10.4161/cc.7.18.6679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hansen RS, Thomas S, Sandstrom R, Canfield TK, Thurman RE, Weaver M, Dorschner MO, Gartler SM, Stamatoyannopoulos JA. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl Acad. Sci. USA. 2010;107:139–144. doi: 10.1073/pnas.0912402107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Koren A, Polak P, Nemesh J, Michaelson JJ, Sebat J, Sunyaev SR, McCarroll SA. Differential relationship of DNA replication timing to different forms of human mutation and variation. Am. J. Hum. Genet. 2012;91:1033–1040. doi: 10.1016/j.ajhg.2012.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu L, De S, Michor F. DNA replication timing and higher-order nuclear organization determine patterns of single nucleotide substitutions in cancer genomes. Nat. Commun. 2013;4:1502. doi: 10.1038/ncomms2502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Woo YH, Li WH. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nat. Commun. 2012;3:1004. doi: 10.1038/ncomms1982. [DOI] [PubMed] [Google Scholar]
- 7.Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489:519–525. doi: 10.1038/nature11404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.De S, Michor F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 2011;29:1103–1108. doi: 10.1038/nbt.2030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Drier Y, Lawrence MS, Carter SL, Stewart C, Gabriel SB, Lander ES, Meyerson M, Beroukhim R, Getz G. Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability. Genome Res. 2013;23:228–235. doi: 10.1101/gr.141382.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Donley N, Thayer MJ. DNA replication timing, genome stability and cancer: late and/or delayed DNA replication timing is associated with increased genomic instability. Semin. Cancer Biol. 2013;23:80–89. doi: 10.1016/j.semcancer.2013.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474:609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Karolchik D, Hinrichs AS, Kent WJ. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis… [et al.] 2012. The UCSC Genome Browser. Chapter 1, Unit 1 4. [DOI] [PubMed] [Google Scholar]
- 14.Pedersen BS, Konstantinopoulos PA, Spillman MA, De S. Copy neutral loss of heterozygosity is more frequent in older ovarian cancer patients. Genes Chromosomes Cancer. 2013;52:740–746. doi: 10.1002/gcc.22075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 1998;2:65–73. doi: 10.1016/s1097-2765(00)80114-8. [DOI] [PubMed] [Google Scholar]
- 17.de Lichtenberg U, Wernersson R, Jensen TS, Nielsen HB, Fausboll A, Schmidt P, Hansen FB, Knudsen S, Brunak S. New weakly expressed cell cycle-regulated genes in yeast. Yeast. 2005;22:1191–1201. doi: 10.1002/yea.1302. [DOI] [PubMed] [Google Scholar]
- 18.Pramila T, Wu W, Miles S, Noble WS, Breeden LL. The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev. 2006;20:2266–2278. doi: 10.1101/gad.1450606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 1998;9:3273–3297. doi: 10.1091/mbc.9.12.3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, Matese JC, Perou CM, Hurt MM, Brown PO, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell. 2002;13:1977–2000. doi: 10.1091/mbc.02-02-0030.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gauthier NP, Jensen LJ, Wernersson R, Brunak S, Jensen TS. Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results. Nucleic Acids Res. 2010;38:D699–D702. doi: 10.1093/nar/gkp1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gauthier NP, Larsen ME, Wernersson R, de Lichtenberg U, Jensen LJ, Brunak S, Jensen TS. Cyclebase.org—a comprehensive multi-organism online database of cell-cycle experiments. Nucleic Acids Res. 2008;36:D854–D859. doi: 10.1093/nar/gkm729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Durkin SG, Glover TW. Chromosome fragile sites. Annu. Rev. Genet. 2007;41:169–192. doi: 10.1146/annurev.genet.41.042007.165900. [DOI] [PubMed] [Google Scholar]
- 24.Barlow JH, Faryabi RB, Callen E, Wong N, Malhowski A, Chen HT, Gutierrez-Cruz G, Sun HW, McKinnon P, Wright G, et al. Identification of early replicating fragile sites that contribute to genome instability. Cell. 2013;152:620–632. doi: 10.1016/j.cell.2013.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, et al. Ensembl 2013. Nucleic Acids Res. 2013;41:D48–D55. doi: 10.1093/nar/gks1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, et al. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG, et al. ENCODE Data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013;41:D56–D63. doi: 10.1093/nar/gks1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Heride C, Ricoul M, Kieu K, von Hase J, Guillemot V, Cremer C, Dubrana K, Sabatier L. Distance between homologous chromosomes results from chromosome positioning constraints. J. Cell Sci. 2010;123:4063–4075. doi: 10.1242/jcs.066498. [DOI] [PubMed] [Google Scholar]
- 29.Bolzer A, Kreth G, Solovei I, Koehler D, Saracoglu K, Fauth C, Muller S, Eils R, Cremer C, Speicher MR, et al. Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS Biol. 2005;3:e157. doi: 10.1371/journal.pbio.0030157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Furey TS, Haussler D. Integration of the cytogenetic map with the draft human genome sequence. Hum. Mol. Genet. 2003;12:1037–1044. doi: 10.1093/hmg/ddg113. [DOI] [PubMed] [Google Scholar]
- 31.Kadyk LC, Hartwell LH. Sister chromatids are preferred over homologs as substrates for recombinational repair in Saccharomyces cerevisiae. Genetics. 1992;132:387–402. doi: 10.1093/genetics/132.2.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mortusewicz O, Herr P, Helleday T. Early replication fragile sites: where replication-transcription collisions cause genetic instability. EMBO J. 2013;32:493–495. doi: 10.1038/emboj.2013.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wu JQ, Snyder M. RNA polymerase II stalling: loading at the start prepares genes for a sprint. Genome Biol. 2008;9:220. doi: 10.1186/gb-2008-9-5-220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Saleh-Gohari N, Bryant HE, Schultz N, Parker KM, Cassel TN, Helleday T. Spontaneous homologous recombination is induced by collapsed replication forks that are caused by endogenous DNA single-strand breaks. Mol. Cell. Biol. 2005;25:7158–7169. doi: 10.1128/MCB.25.16.7158-7169.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Michel B, Flores MJ, Viguera E, Grompone G, Seigneur M, Bidnenko V. Rescue of arrested replication forks by homologous recombination. Proc. Natl Acad. Sci. USA. 2001;98:8181–8188. doi: 10.1073/pnas.111008798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Iraqui I, Chekkal Y, Jmari N, Pietrobon V, Freon K, Costes A, Lambert SA. Recovery of arrested replication forks by homologous recombination is error-prone. PLoS Genet. 2012;8:e1002976. doi: 10.1371/journal.pgen.1002976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bermejo R, Lai MS, Foiani M. Preventing replication stress to maintain genome stability: resolving conflicts between replication and transcription. Mol. Cell. 2012;45:710–718. doi: 10.1016/j.molcel.2012.03.001. [DOI] [PubMed] [Google Scholar]
- 38.Adelman K, Lis JT. Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet. 2012;13:720–731. doi: 10.1038/nrg3293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Aguilera A, Garcia-Muse T. R loops: from transcription byproducts to threats to genome stability. Mol. Cell. 2012;46:115–124. doi: 10.1016/j.molcel.2012.04.009. [DOI] [PubMed] [Google Scholar]
- 40.Schuster-Bockler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488:504–507. doi: 10.1038/nature11273. [DOI] [PubMed] [Google Scholar]
- 41.Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR. Human mutation rate associated with DNA replication timing. Nat. Genet. 2009;41:393–395. doi: 10.1038/ng.363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Burch LH, Yang Y, Sterling JF, Roberts SA, Chao FG, Xu H, Zhang L, Walsh J, Resnick MA, Mieczkowski PA, et al. Damage-induced localized hypermutability. Cell Cycle. 2011;10:1073–1085. doi: 10.4161/cc.10.7.15319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.De S, Babu MM. A time-invariant principle of genome evolution. Proc. Natl Acad. Sci. USA. 2010;107:13004–13009. doi: 10.1073/pnas.0914454107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.