The Arabidopsis genome replicates in two noninteracting compartments during early/mid and late S phase.
Abstract
Eukaryotes use a temporally regulated process, known as the replication timing program, to ensure that their genomes are fully and accurately duplicated during S phase. Replication timing programs are predictive of genomic features and activity and are considered to be functional readouts of chromatin organization. Although replication timing programs have been described for yeast and animal systems, much less is known about the temporal regulation of plant DNA replication or its relationship to genome sequence and chromatin structure. We used the thymidine analog, 5-ethynyl-2′-deoxyuridine, in combination with flow sorting and Repli-Seq to describe, at high-resolution, the genome-wide replication timing program for Arabidopsis (Arabidopsis thaliana) Col-0 suspension cells. We identified genomic regions that replicate predominantly during early, mid, and late S phase, and correlated these regions with genomic features and with data for chromatin state, accessibility, and long-distance interaction. Arabidopsis chromosome arms tend to replicate early while pericentromeric regions replicate late. Early and mid-replicating regions are gene-rich and predominantly euchromatic, while late regions are rich in transposable elements and primarily heterochromatic. However, the distribution of chromatin states across the different times is complex, with each replication time corresponding to a mixture of states. Early and mid-replicating sequences interact with each other and not with late sequences, but early regions are more accessible than mid regions. The replication timing program in Arabidopsis reflects a bipartite genomic organization with early/mid-replicating regions and late regions forming separate, noninteracting compartments. The temporal order of DNA replication within the early/mid compartment may be modulated largely by chromatin accessibility.
In each cell cycle, a cell must produce two identical copies of its genome during S phase. Most of our knowledge about genome replication in higher eukaryotes comes from studies in animals. These studies have indicated that replication is a temporally ordered process (Gilbert, 2010) that occurs in large domains of coordinate replication (replication domains) with multiple origins firing in concert during S phase (MacAlpine et al., 2004; Desprat et al., 2009; Schwaiger et al., 2009; Farkash-Amar and Simon, 2010). The replication timing programs of several metazoan genomes have been characterized (Schübeler et al., 2002; Woodfine et al., 2004; Hiratani et al., 2008; Schwaiger et al., 2009; Hansen et al., 2010). These studies revealed that early replicating chromatin is rich in genes, transcriptionally active, and contains euchromatic histone modifications (Schübeler et al., 2002; Woodfine et al., 2004; Hiratani and Gilbert, 2009; Hansen et al., 2010; Eaton et al., 2011; Lubelsky et al., 2014). Conversely, late replicating chromatin is enriched for heterochromatin and repetitive elements (Gilbert, 2002; Woodfine et al., 2004). Early and late replication domains correlate strongly with the “open” and “closed” compartments identified by chromatin conformation capture experiments (Ryba et al., 2010; Yaffe et al., 2010; Pope et al., 2014). These compartments, which are megabases in size, differ widely with respect to nuclease accessibility, gene density, transcriptional activity, and epigenetic marks (Lieberman-Aiden et al., 2009; Sexton et al., 2012). Hence, metazoan replication timing programs are predictive of important genomic features and can be considered functional readouts of chromatin organization (Rivera-Mulia et al., 2015).
Much less is known about how DNA replication occurs temporally and spatially across plant genomes. Although the DNA replication machinery and many aspects of chromatin biology are conserved between plants and animals, there are significant differences like the absence in plants of lamins and geminin (Shultz et al., 2007; Thorpe and Charpentier, 2017), which play key roles in chromatin organization and origin function in metazoans. In addition, fundamental processes such as transcriptional regulation have been shown to differ between plants and animals (Meyerowitz, 2002; Hetzel et al., 2016). There is also evidence that the spatiotemporal distribution of replicating DNA is different in plant nuclei than in metazoan cells (Bass et al., 2015). Hence, we cannot assume that DNA replication programs in plants mirror those in animals (Savadel and Bass, 2017).
Arabidopsis (Arabidopsis thaliana) is an important plant model system because of its small genome, which has been fully sequenced and is well annotated, and the broad range of genomic resources (Arabidopsis Genome Initiative, 2000; Provart et al., 2016). There are genome-wide data available for Arabidopsis chromatin accessibility, histone modifications, and chromatin interactions. Because of these resources, Arabidopsis is an ideal system for examining DNA replication programs in plants.
Our group previously published a description of the replication timing program for Arabidopsis chromosome 4 (Lee et al., 2010). In that study, Arabidopsis suspension cells were pulse-labeled with 5-bromo-2′-deoxyuridine (BrdU) for 1 h followed by nuclei separation based on DNA content using flow cytometry. Replication was examined in three nuclei populations corresponding to early, mid, and late S phase, using a 1-kb tiling microarray platform. While both the spatial resolution and labeling pulse length were comparable to similar studies with metazoans (Schübeler et al., 2002; Hiratani et al., 2008), no major differences were observed between the early and mid S-phase replication profiles for Arabidopsis. This finding led us to conclude that, different from animals, the order of origin activation in Arabidopsis in early and mid S phase is stochastic, and replication of euchromatin does not follow a strict temporal pattern. Unlike the Arabidopsis chromosome 4 replication timing profiles, we recently observed differences between the early and mid S-phase profiles during replication of the maize genome (Wear et al., 2017). To address these conflicting results, we reexamined the Arabidopsis replication program, focusing more closely on sequences replicating in early and mid S phase. In the process, we adapted our flow cytometry strategy and the Repli-Seq methodology to better distinguish between early and mid S replication. We generated a high-resolution replication timing map for the entire Arabidopsis genome, and correlated the replication program with chromatin state, accessibility, and interaction data.
RESULTS
Improving Resolution of the Replication Timing Protocol
We examined several factors that might improve our ability to resolve differences in replication timing. These included the analysis platform used to detect newly synthesized DNA, the thymidine analog used to pulse label nascent DNA, the length of the labeling period, and the flow cytometry strategy for separating nuclei in different stages of S phase.
Initially, we sought to improve our ability to distinguish sequences replicating in early versus mid S phase by using a more advanced NimbleGen microarray platform with shorter, more closely spaced probes to better resolve replicating DNA sequences. In this experiment, we used the same protocol as our previous study, with the exception of the array platform. The replication timing profiles generated using the NimbleGen arrays show more fine structure than those obtained from the tiling arrays (Supplemental Fig. S1). However, the overall replication profiles are very similar for the two array platforms with early and mid S-phase signals showing very high correlations on both platforms (Supplemental Fig. S2). Thus, we concluded that probe resolution was not a major factor in our ability to distinguish early and mid S-phase replication.
We then focused on reducing the labeling time and obtaining better separation of early and mid-replicating nuclei (Fig. 1, A and B; Bass et al., 2014; Wear et al., 2016). Arabidopsis cultured cells were pulse labeled with the thymidine analog, 5-ethynyl-2′-deoxyuridine (EdU) for 10 min. After formaldehyde fixation and nuclei isolation, the incorporated EdU was conjugated with Alexa Fluor 488 (AF488) azide using “Click” chemistry (Salic and Mitchison, 2008). Nuclei were then stained with DAPI and fractionated by flow cytometry using a two-color sort strategy based on EdU incorporation (AF488) and DNA content (DAPI). EdU-labeled nuclei were fractionated into early, mid and late S-phase populations (Fig. 1B, top), while nonreplicating G1 and G2 nuclei were excluded based on the absence of EdU. The S-phase gates were assigned by dividing the EdU arc into five equal sections based on DNA content, with the first, third, and fifth sections defined as early, mid, and late S phase. This resulted in narrower, better separated sorting gates than in our previous experiments, reducing the range of total DNA content within each S-phase fraction and minimizing cross contamination between fractions. Reanalysis of a sample from each fraction by flow cytometry showed minimal overlap (<5%) between nuclei populations (Fig. 1B, bottom; Supplemental Fig. S3). Sorted, unlabeled G1 nuclei were retained as a source of nonreplicating reference DNA. Newly replicated sequences were enriched from each population of nuclei by shearing isolated DNA followed by immunoprecipitation with an anti-AF488 antibody. Immunoprecipitated fractions were used for library construction and sequenced using an Illumina HiSeq 2000 platform. Experiments were performed in triplicate, with the three biological replicates generated at different times.
Figure 1.
Experimental pipeline for Arabidopsis replication timing experiment. A, Workflow from EdU labeling of Arabidopsis suspension cells through sequencing. B, Fluorescent activated sorting of S-phase nuclei populations and reanalysis. Top, A bivariate plot showing EdU incorporation into Arabidopsis nuclei as a function of DNA content. Intervals of fluorescence intensity (sorting gates) used to select nuclei in early, mid, and late S phase are shown by the blue, green, and red rectangles, respectively. The black rectangle shows unlabeled G1 nuclei. DNA from the G1 population was sequenced to provide a reference with uniform 2C DNA content. Bottom, The overlaid histograms of the reanalyzed nuclei from each of the selected populations, to assess the degree of cross-contamination between fractions. C, Distribution of raw read densities for each sequencing library in a representative 1-Mb region of chromosome 1. To check for consistency, read densities were scored individually for each biological replicate and visualized with IGV without further normalization. DNA libraries from independent biological replicates show essentially identical profiles. (All scales are 0–498 reads.)
Repli-Seq Data Analysis
Sequencing reactions generated between 22 and 43 million 101-bp paired-end reads, resulting in very high coverage (15–31×) of the Arabidopsis genome (Supplemental Table S1). Reads were mapped to the Arabidopsis reference genome (TAIR10), and their distribution visualized with Integrative Genomics Viewer (IGV; Thorvaldsdóttir et al., 2013). Each sorted fraction showed a distinctive replication pattern that was consistent across biological replicates (Fig. 1C). Replication signals were detected across the entire Arabidopsis genome in samples from all three portions of the S phase, although with very low intensity in some regions (Fig. 2B; Supplemental Fig. S4). The EdU labeling patterns were distinct for each time and highly similar between bioreplicates, suggesting that most of the low intensity signal is not background.
Figure 2.
Distribution of replication intensity on chromosome 1 in early, mid, and late S phase. A, Schematic representation of chromosome 1. The TE-rich pericentromere region is shaded black with the centromere position marked by a yellow circle. B, Visualization of the density of sequencing reads on chromosome 1 for cells in early (blue), mid (green), and late (red) S phase. Relative intensities of replication for each time define the predominant RT for different regions, as further illustrated in the small panels below the main profile. (Scale 0–4.15 normalized signal ratio in the main panel, 0–4.0 in the small panels.) C, Replication intensity profile of a representative 1-Mb region of chromosome 1. Some sequences replicate mainly in mid S phase. To illustrate the difference in replication intensity between early and mid S phase, the profiles for the two time points were overlaid using IGV. The overlap is indicated with dark green. Regions of alternating early and mid-replication are marked by the blue and green line above the overlay. The lower track shows the replication intensity in late S phase. (Scale 0–2.17 normalized signal ratio.) D, Plots showing the fraction of total replication as a function of relative distance from the centromere for early, mid, and late S phase. The two arms of each chromosome were considered separately, and replication activity was aggregated in intervals corresponding to 10% of the arm length. The relative contribution of the sequences within each interval to the total replication of each arm was expressed as the percentage of total replication activity for that chromosome arm in that portion of S phase. For each 10% interval, the colored dots indicate a chromosome arm, and the black circles mark the mean value of relative contribution to replication of that interval in all arms.
We computed Spearman correlation coefficients within biological replicates and across S-phase samples (Supplemental Fig. S5). The biological replicates had high correlation values (early r2 = 0.97, mid r2 = 0.92–0.95, late r2 = 0.88–0.90), demonstrating the reproducibility of the read distributions within portions of S phase. Correlation coefficients were lower across the different portions of S phase, indicating that each portion is associated with a unique read distribution. Importantly, the read distributions of the mid S-phase samples were clearly different from the early samples (r2 = −0.02–0.11).
Replication profiles were created using the Repliscan pipeline (Zynda et al., 2017). Read counts were averaged over nonoverlapping 1-kb bins, and the total number of reads per sample (sequencing depth) was normalized to 1× genome coverage using the RPGC method (Ramírez et al., 2016). Given their high correlations, the biological replicates were combined, and sequencing depth was normalized again prior to further analysis. To account for local variation in sequenceability, the normalized read densities were divided by the corresponding densities in the nonreplicating G1 reference DNA (Supplemental Fig. S6). Additional low-amplitude variations were removed using Haar transform wavelets level-3 (Percival and Walden, 2000) to produce smoothed, normalized read density profiles for early, mid, and late S phase.
We chose not to represent the data as a “log ratio,” as is often done in replication timing studies (Lee et al., 2010; Ryba et al., 2011; Pope et al., 2012), because low-intensity replication activity transformed to a “log ratio” would have resulted in a negative number (Supplemental Fig. S6). This creates problems for downstream analyses both computationally and conceptually. Moreover, the ability of log ratio plots to compress extreme values is not necessary here, because Repli-Seq profiles cover a limited range of values.
Distribution of Replication Activity within Chromosomes
As illustrated for Arabidopsis chromosome 1 (Fig. 2B, top; Supplemental Fig. S7), visualization of the replication activity at the whole chromosome scale shows a temporal pattern along the chromosome. Early replication intensity is stronger in the distal arms and decreases progressively toward the centromeres. Conversely, late replication is concentrated near the centromere. In contrast, replication during mid S phase is more evenly distributed. The trend is consistent across all five chromosomes, although the early replication signal is less intense in the short arms of the acrocentric chromosomes 2 and 4 (Supplemental Fig. S7).
Visualization on a smaller scale confirmed that the distal arms replicate mainly in early S and centromeric regions replicate in late S. It also revealed that the proximal arms tend to replicate predominantly in mid S phase, further supporting the trend described above (Fig. 2B, bottom). We quantified the fraction of replication at each time as a function of the distance from the centromere for all ten Arabidopsis chromosome arms. Because the chromosome arms vary in length, each arm was partitioned into ten equal size bins, and the fraction of total replication in each bin was determined at each time (Fig. 2D). When the results were plotted as a function of relative distance from the centromere, it was clear that early replication increases as the distance from the centromere increases (Fig. 2D, left). In contrast, nearly one-half of late replication occurs in the three bins closest to the centromere (Fig. 2D, right). Mid S replication is more uniformly distributed and clearly different from early replication (Fig. 2D, middle).
Early and mid S phase also have distinct features when examined on a fine scale. The differences were especially evident in regions where replication intensities were similar for both time points. Overlaying early and mid-replication profiles in those regions often produced a pattern of alternate early and mid local maxima (Fig. 2C, alternating blue and green line, top), suggestive of replication activity spreading over time from early replicating regions to surrounding mid replicating sequences.
Segmentation Analysis
To facilitate more detailed analysis, we partitioned the genome into segments with similar replication times (RTs) using the Repliscan pipeline (Zynda et al., 2017). This method allows for the possibility that replication of a given locus occurs in more than one time window. Our data showed that no sequence replicated exclusively in a single time window (Fig. 2B; Supplemental Fig. S4). Hence, for a given sequence, we will refer to the “prevalent” time of replication, in which the replication signal is stronger than at the other times. The Repliscan pipeline uses a two-step process to assign a prevalent RT to a 1-kb bin based on its replication intensity in early, mid and late S phase. First, 1-kb bins were classified either as replicating or nonreplicating based on a threshold established for each chromosome, and only bins with replication intensity above the threshold were used for segmentation analysis. Second, replication signals for each 1-kb bin were divided by the maximum value for that bin, scaling the largest value to 1 and all others between 0 and 1. The bin was then labeled as replicating predominantly at the time with a normalized signal >0.5. If the bin contained one or more signals within 50% of the highest signal, they were included in the classification. Adjacent bins with the same RT were merged into larger segments. With this approach, we identified segments replicating predominantly in early S phase (E), in both early and mid S phase (EM), only in mid S phase (M), in both mid and late S phase (ML), only in late S phase (L), in early and late S phase (EL), and at all the three times (EML; Fig. 3, A and B). Regions with replication signal below the threshold in all time points were not classified or included in our statistical analyses.
Figure 3.
Genomic segments with coordinated predominant RT. A, The color code of the RT segmentation classes in Figures 3 to 7. The genome was partitioned into segments with coordinated prevalent RTs (see “Materials and Methods”). Segments were color-coded as E (early, blue), EM (early and mid, turquoise), M (mid, green), ML (mid and late, yellow), L (late, red), EL (early and late, magenta), and EML (early, mid, and late, orange). B, A karyotype view of the Arabidopsis genome showing the partitioning into replication timing segments. Chromosomes are aligned at their centromeres, as indicated with the arrows. C, Partition of total genome replication into seven RT segment classes. The percent of the genome included in each RT class is shown. D, Size distribution of replication timing segments. The size of distribution of segments in each RT class are shown as box plots. The whiskers extend to 1.5 times the interquartile range. The sample size for each group is indicated above each box plot.
The cumulative genomic coverage of each RT class is shown in Figure 3C. A single prevalent time of replication was identified for more than one-half of the genome (31% E + 20% M + 7% L = 58%), while most of the rest of the genome was evenly split between EM (21%) and ML (20%). The EL and EML segment classes together constituted about 1% of the genome, and 2.5% of genome could not be classified. Given the clear separation of the sorting gates used to generate the early, mid, and late populations (Fig. 1B, top), it is noteworthy that 41% of the Arabidopsis genome replicates in the intermediate EM and ML classes. The timing heterogeneity may reflect the presence of subpopulations of cells with related but distinct replication programs and/or allelic heterogeneity that may have arisen during prolonged cell culture (Wang and Wang, 2012). The low coverage of L segments relative to E and M is also noteworthy, because the width (range of DNA content) of the three sorting gates was equivalent (Fig. 1B, top).
The distribution of replication timing segments is similar for the five Arabidopsis chromosomes with the exception of the short arms of chromosomes 2 and 4, which have very few early segments (Fig. 3B). The distal portions of longer chromosome arms are covered with large E segments (>50–100 kb) interspersed with small EM and M segments (<50 kb), while the pericentromeric regions contain mainly ML and L segments interspersed with short M and a few EM segments (Figs. 3B and 4A). This organization mirrors the large-scale distributions of replication activity of the three time points in Figure 2B and Supplemental Figure S7, and suggests that the majority of cells complete replication of the distal arms by mid S phase. Conversely, most pericentromeric replication occurs from mid S phase through late S phase. Together, the ML and L segments account for 27% of the genome, consistent with the estimated content of heterochromatin in the Arabidopsis genome (Roudier et al., 2011; Sequeira-Mendes et al., 2014; Wang et al., 2015). The proximal portions of the chromosome arms contain mostly smaller EM, M, and ML segments.
Figure 4.
Distribution of genomic features in replication timing segments. A, Comparison of replication timing with gene and TE profiles. Coverages of genes (magenta) and TEs (brown) on chromosome 1 were determined using Araport11 annotation and expressed as percentage of bases in 1-kb nonoverlapping bins. The data were smoothed using a 50-kb window and overlaid with IGV tracks. The distribution of replication timing segments along chromosome 1 is below. B, Distribution of genomic features in RT classes. Coverage of each type of feature in the RT classes is expressed as the percent of total coverage for that feature. The percentages indicate the cumulative genome coverage of each feature. The total exceeds 100% because features can be located on both DNA strands. The arrows indicate how RT coverage for each feature departs from the expected value based on relative genome coverage (χ2 adjusted residuals; Supplemental Table S2). Upward arrows are enriched (upper tertile), downward arrows are depleted (bottom tertile), and the circles are no difference (central tertile). C, Density of genes, pseudogenes, and TEs in each RT class. The number of genomic features overlapping segments of each RT class was scored and normalized to the total genomic coverage of that feature. D, Fraction of RT classes overlapping with genomic features. The overlap between replication timing segments and genes, TEs, and unannotated regions is expressed as the percentage of total coverage for segments for each RT class. Box plots show the distribution of relative overlap between the genomic features and replication timing segments. The unannotated portion of the genome was obtained by subtracting all the annotated features from the genome. E, Replication time for four groups of repetitive sequences. Raw reads from early, mid, and late S phase were aligned (BLAST E-value < 1e-8) to telomeric-related (TEL), centromeric-related (CEN), 45s, and 5s ribosomal DNA consensus sequences. The number of reads matching the query sequences was normalized to number of G1 reads for each sequence. The error bars indicate the sd of the three biological replicates for each S-phase population.
We also analyzed the number and size distribution of the replication timing segments (Fig. 3D; Table I). E, ML, and L segments on average are larger than EM and M segments. The size differences reflect the situation described above in which small EM and M segments are interspersed between larger E and ML segments or ML and L segments (Figs. 3B and 4A). There were more EM and M segments and fewer E and ML segments (Fig. 3D). We found only 201 L segments, possibly due to the reduced coverage of centromeric regions compared to the rest of the genome due to limited mappability of centromeric repeats. Our analysis methods also detected several EL and EML segments, but due to their small size and low frequency, we did not include them in subsequent analyses (Fig. 3D).
Table I. Size and number of RT classes.
RT Class | Mean Size (kb) | Median Size (kb) | IQRa (kb) | Segment Number |
---|---|---|---|---|
E | 59,430 | 42,000 | 64,000 | 617 |
EM | 18,810 | 14,000 | 18,000 | 1,300 |
M | 21,650 | 17,000 | 20,000 | 1,068 |
ML | 43,440 | 26,000 | 46,250 | 532 |
L | 45,030 | 20,000 | 44,000 | 201 |
EL | 8,404 | 8,000 | 9,000 | 37 |
EML | 6,553 | 5,000 | 7,000 | 121 |
Interquartile range.
Replication Time and Genomic Features
To explore the relationship between replication timing and major genomic features, we queried the Repli-Seq data using Araport11 genome annotations (Cheng et al., 2017). A visual comparison of the RT segmentation data with genes and transposable elements (Fig. 4A; Supplemental Fig. S9) showed that the gene-rich chromosome arms replicate in E, EM, and M while the TE-rich pericentromeric region replicates in ML and L, as described above and reported previously (Lee et al., 2010). To obtain a more detailed picture, we computed the cumulative overlaps of genes, pseudogenes, TEs, and unannotated sequences with RT classes for the entire Arabidopsis genome. The overlaps were expressed as a percent of total genomic coverage for a given feature to adjust for abundance differences (Fig. 4B). This analysis gave similar results as the visual inspection of chromosome 1, with segment coverage of genes highest in E and EM and TEs highest in ML and L.
To assess if the distributions of the genomic features across the RT classes are statistically different from the distribution across the whole genome, we built a contingency table with the absolute overlaps expressed as the number of 1-kb bins (Table II; Supplemental Table S2) and applied a χ2 test for homology. Differences in the overlaps showed high statistical significance (P value < 2.2E-16, χ2 = 25,561, df = 12). However, when analyzing a large population (n = 116,063), small differences between observed and expected values almost always generate a statistically significant P value (Sullivan and Feinn, 2012). For this reason, we estimated the “effect size” of the test, defined as the “magnitude of association between categorical variables” (Kotrlik et al., 2011), by calculating Cramer's V statistic. The Cramer V value for our data was 0.27, within the 0.2 to 0.4 range for a “moderate association” (Rea and Parker, 2005), indicating a nonrandom distribution of genomic features in the RT classes.
Table II. Overlap of genomic features and RT classes.
RT Class | Genes (kb) | Pseudogenes (kb) | TEs (kb) | Unannotated (kb) | Genome (kb) |
---|---|---|---|---|---|
E | 20,354 | 113 | 2,352 | 13,850 | 36,669 |
EM | 13,645 | 156 | 2,572 | 8,078 | 24,452 |
M | 13,896 | 222 | 3,145 | 5,857 | 23,120 |
ML | 7,803 | 373 | 9,024 | 5,912 | 23,112 |
L | 731 | 130 | 5,784 | 2,039 | 8,685 |
Next, we identified which genomic features and replication timing segments overlapped either more or less than expected by examining the sign and value of the χ2 adjusted residuals (Agresti, 2007). We split the adjusted residuals into tertiles and classified the relevant combinations as overrepresented (highest tertile), underrepresented (lowest tertile), or similar to expected (central tertile). The arrows and dots in Figure 4B indicate the assigned category.
The statistical analysis confirmed that genes are overrepresented in E, EM, and M and underrepresented in ML and L segments (Fig. 4B). Pseudogenes are enriched in ML segments. This may be due in part to the association of “processed pseudogenes,” the products of retrotransposition events (Zheng et al., 2007), with TE-rich pericentromeric regions that replicate in ML (Fig. 4A). Unannotated regions overlap more with E and less with M and ML segments relative the total genome. The enrichment of unannotated regions in E segments may reflect the fact that the distances between genes in the distal arms are generally much longer than the spaces between TEs or between genes and TE in the pericentromeres (Fig. 4A). It is worth noting that depletion of unannotated regions in L segments is not statistically significant and, instead, is most likely due to poor annotation of the centromeric regions.
We then determined the number of protein coding genes, pseudogenes, and TE genes in each RT class (Fig. 4C). To control for differences between RT segment coverage, the counts were normalized over the genomic coverage for each RT class and expressed as the number of elements per Mb. The densities of protein coding genes in E (287/Mb), EM (308/Mb), and M (291/MB) are very similar, then drop in ML (173/Mb) and L (58/Mb) segments. Conversely, TE genes are very sparse in E (4/Mb), EM (10/Mb), and M (18/Mb) but densely packed in ML (79/Mb) and L (155/Mb) segments. The density of pseudogenes across the RT classes is low due to their low number in the Arabidopsis genome.
We also computed the fraction of each segment covered by genes, TEs, and unannotated sequences and generated box plots showing the range of coverage within each RT class (Fig. 4D). Consistent with the other analyses, E, EM, and M segments are gene-rich (left) and depleted in TEs (center), while ML and L segments are TE-rich and have lower gene content. The unannotated region content of different RT classes is more uniform (right), with slightly higher content in E and EM.
Together, our results indicated that the genomic features associated with the M segments are more similar to those in E and EM segments than in ML and L segments. This was true even though the M segments replicate at a distinct stage of S phase and are more likely to be located in the proximal regions of the chromosome arms, while the E and EM segments are predominantly located in the distal regions.
The above analyses only used sequence tags that mapped uniquely to the Arabidopsis genome and, as such, did not address replication timing of repetitive sequences. To analyze replication timing of repeats, we queried all the reads after initial processing with TEL, CEN, 45S, and 5S repeat sequences from the Plant Repeat Databases (Ouyang and Buell, 2004). Arabidopsis telomeric sequences consist of 2- to 5-kb stretches of 5′-CCCTAAA-3′ repeat units (TEL; Richards and Ausubel, 1988), while centromeres and pericentromeres contain about 20,000 copies of a 180-bp satellite repeat (CEN) in long arrays extending for several megabases (Lermontova et al., 2015). The 570 to 750 copies per haploid genome of 45S rRNA genes (45S rDNA) form two 4-Mb arrays in nucleolar organizing regions located at the ends of the short arms of chromosome 2 and 4 (Copenhaver and Pikaard, 1996; Havlová et al., 2016). The pericentromeres of chromosome 3, 4, and 5 also contain heterogeneous arrays including about 1,000 copies of the 5S rDNA (Vaillant et al., 2007).
For each S-phase data set, we computed the fraction of reads aligning to each repeat consensus and normalized it to the fraction of reads in the G1 control that aligned to the same consensus (Fig. 4E). The resulting ratio is a measure of enrichment or depletion of a given repeat in reads from early, mid, or late S phase. CEN sequences are strongly enriched in late S phase and depleted in early and mid, in agreement with the late replication timing of the centromeres (Fig. 2B; Supplemental Fig. S7). TEL sequences replicate preferentially in early and mid S phase, but replication activity is also detectable in late S phase. The lack of a single predominant RT is likely due to asynchrony between telomeres. In human cells, the telomere replication program is chromosome-specific and influenced by sequences in subtelomeric regions (Arnoult et al., 2010). Replication of both 5S and 45S rDNA occurs primarily in late S phase, consistent with sequestration and silencing of most 5S and 45S rDNA gene copies by repressive heterochromatin (Layat et al., 2012). However, some 5S and 45S rDNA genes are transcriptionally active and packaged into permissive euchromatin (Douet and Tourmente, 2007; Hamperl et al., 2013; Dvořáčková et al., 2017). These active fractions may be the source of the 5S and 45S rDNA reads in the early and mid S-phase data sets.
Replication Time and Chromatin States
Chromatin structure influences the replication program (Hiratani et al., 2008; Schwaiger et al., 2009; Picard et al., 2014), with early replication associated with euchromatin and late replication associated with heterochromatin (Ding and MacAlpine, 2011). Some combinations of epigenetic marks occur together more frequently than others (Kharchenko et al., 2011; Roudier et al., 2011; Sequeira-Mendes and Gutierrez, 2016). These combinations define chromatin states (CS) that describe the local chromatin environment more accurately than the traditional binary classification and may correlate better with replication timing programs.
Arabidopsis chromatin has been classified into 6 different states (CS) using 16 epigenetic marks by Wang et al. (2015). We chose this classification, because it is biologically compatible with the large size of replication timing segments compared to other functional regions like transcription units. The classification described two euchromatic states (CS1 and CS5), two heterochromatic states (CS6 and CS3), and two intermediate states (CS2 and CS4). Chromatin void of any of the 16 histone marks was defined as “unclear” or CS0.
We used these CS to examine the relationship between chromatin structure and replication timing. First, we calculated the overlap between each CS and RT class (Fig. 5A). Applying the same procedure as for genomic features, we built a contingency table (Supplemental Table S3) and performed a χ2 test (P value < 2.2E-16, χ2 = 44,932, df = 24). The associated Cramer's V statistic is equal to 0.31, indicating a nonrandom distribution of CS in RT classes. The adjusted residuals for each combination of RT class and CS were classified in three tertiles indicated by the black arrows and dots in Figure 5A.
Figure 5.
Distribution of CS over replication timing segments. A, Overlap between CS and RT classes. The absolute overlap between RT classes and CS, which are defined as genomic intervals sharing the same combinations of epigenetic marks. The chromatin classifications are based on Wang et al. (2015). The arrows under the graph indicate how RT overlap with each CS departs from the expected value based on total genome coverage (χ2 adjusted residuals; Supplemental Table S4). Upward arrows are enriched, downward arrows are depleted, and the circles indicate no difference. B, Distribution of RT classes in CS. RT class coverage in each CS is expressed as a percentage of total coverage in that CS. The whole genome is reported as reference. C, Distribution of CS in RT classes. CS coverage in each RT class is expressed as a percentage of total coverage of that class. D, Correlation matrix between RT classes based on their distributions of CS. The correlation matrix shows the pairwise similarity between RT classes based on their relative CS coverage in C. Spearman correlation coefficients in each cell were used to draw a heat map (legend on the right).
Inspection of the overlap between CS and RT classes revealed that the heterochromatic CS6 and CS3 are more abundantly represented in late replicating regions. However, there is no simple relationship between CS and the replication timing segments (Fig. 5A). All of the CS except for CS6 and CS3 include readily discernible amounts of DNA replicating in each portion of S phase except for late. There is no clear difference in the distribution of RT classes for the euchromatic states, CS1 and CS5, and the intermediate states, CS2 and CS4. While there are small differences in the amount of early replication associated with CS1, CS5, CS2, and CS4, none of these nonheterochromatic states display a strong preference for any particular RT (c.f. the % RT class coverage in Figure 5B; Table III).
Table III. Overlap of chromatin states with RT classes.
RT Class | CS1 | CS5 | CS2 | CS4 | CS0 | CS6 | CS3 |
---|---|---|---|---|---|---|---|
E | 43% | 34% | 33% | 38% | 29% | 7% | 2% |
EM | 24% | 23% | 25% | 24% | 22% | 9% | 3% |
M | 22% | 28% | 26% | 18% | 16% | 13% | 6% |
ML | 11% | 14% | 16% | 17% | 23% | 48% | 31% |
L | 1% | 1% | 1% | 3% | 10% | 24% | 58% |
Each RT class also contains multiple CS (Fig. 5C). The most striking differences in CS content are found between the three “early to mid” RT classes and the ML and L classes. E, EM, and M have substantial amounts of CS1, CS5, CS2, and CS4, while the L is primarily the heterochromatic states, CS6 and CS3. The ML class includes a similar amount of CS6 but is greatly reduced for CS3, which is characterized by the canonical heterochromatin marks H3K27me1 and H3K9me2 (Luo et al., 2013). Instead, the ML class has a large fraction of CS4 and smaller amounts of CS1, CS5, and CS2, and appears transitional between the early to mid RT classes and the L class. This idea is supported by the pairwise Spearman correlation coefficients in the similarity matrix (Fig. 5D) showing that the chromatin composition of E, EM, and M are similar, while L has a distinctive heterochromatic signature and ML is in between.
Replication Timing and Chromatin Accessibility
Replication timing also correlates with chromatin accessibility (Farkash-Amar and Simon, 2010; Hansen et al., 2010; Yaffe et al., 2010; Takebayashi et al., 2012). In plants, open chromatin has been associated with higher gene density and higher levels of transcription (Zhang et al., 2012a; Vera et al., 2014), but these studies did not examine the relationship between chromatin accessibility and replication timing. Hence, we compared our replication timing data with the genome-wide mapping of 34,254 DNase I hypersensitive sites (DHS) by Sullivan et al. (2014). We calculated the number of DHS per kilobase for each replication timing segment and plotted the distribution of the DHS densities for each RT class (Fig. 6A). The number of DHS/kb progressively decreases from E to L segments. Interestingly, only E and EM show a median DHS density above the genome average (0.28 DHS/kb). Only about 25% of M segments contain more DHS than the average, while 25% of ML and 50% of L segments do not contain any DHS.
Figure 6.
Chromatin accessibility and replication timing. A, Density of DHS in segments of different RT classes. The box plot shows distributions of DHS densities in segments of each RT class. The whiskers extend to 1.5 times the interquartile range. The dashed line indicates the average DHS density of the genome (0.28 DHS/kb). B, DHS density in an early replicating region of chromosome 1. C, DHS density in a late replicating region of chromosome 1. For each region, DHS density profiles and local maxima of replication intensity in early, mid, and late S phase are compared. The DHS density profile was smoothed with a 5-kb sliding window, plotted as grayscale heat map (range 0–265), and overlaid with the replication timing profiles for early, mid, and late S phase.
To gain further insight into the relationship between DHS density and replication timing, regions of high DHS density were compared with regions showing high local replication activity in early, mid, or late S (Fig. 6B). There is an association between DHS site density and local maxima for replication in early S. In contrast, mid replication activity tends to decline around the regions of highest DHS density. There are many fewer DHS sites in centromeric and pericentromeric regions (Fig. 6C), and the DHS sites that are present in these regions do not overlap with local maxima of late replication. Instead, the peaks of DHS density in these regions are often associated with small peaks of early replication interspersed among the much stronger regions of late replication (Fig. 6C).
The DHS analysis indicated that an open chromatin structure is associated with early replication activity, whereas chromatin replicating in mid S phase, although still classified as euchromatic, is less accessible. This behavior suggests a sequential model for euchromatin replication, starting in regions that can be accessed readily by the replication machinery and then spreading to less accessible regions. In contrast, late replication activity appears unaffected by short-range variations in DHS density, raising the possibility that a different mechanism regulates replication timing within heterochromatin, possibly involving long-range, subnuclear topology similar to what has been suggested for larger genomes (Pope et al., 2014).
Replication Timing and Long-Range Chromosome Interactions
Chromosome conformation capture (Hi-C) techniques, which characterize long-distance interactions and reveal large-scale spatial patterns of chromatin, have uncovered two distinct subnuclear compartments in animals (Lieberman-Aiden et al., 2009; Hou et al., 2012; Zhang et al., 2012b). These compartments, which differ widely in nuclease accessibility, gene density, transcriptional activity, and epigenetic marks, correlate with early and late replicating domains that span 0.1 to 2 Mb (Ryba et al., 2010, 2011).
Hi-C analysis of the Arabidopsis genome has indicated that its spatial organization is much simpler. Arabidopsis telomeres interact more frequently with other telomeres and with the distal regions of their adjacent chromosome arms, while pericentromeres interact with the adjacent proximal regions of their chromosome arms as well as with other pericentromeres (Feng et al., 2014; Grob et al., 2014). This bipartite configuration recalls the overall distribution of replication activity in early, mid, and late S phase (Fig. 4A). To examine the relationship between three-dimensional proximity and replication timing patterns, we compared the RT classes to the chromosome conformation capture data sets described by Liu et al. (2016). We chose this data set because its reproducibility was established by an earlier study (Wang et al., 2015).
We aligned the Hi-C reads to the TAIR10 reference genome and identified significant interactions (P value < 0.001) at 100-kb resolution. To focus attention on long range interactions, we imposed a minimum 1-Mb separation between interacting loci because of the strong bias toward local interactions (Dekker et al., 2002; Lieberman-Aiden et al., 2009). We also did not consider interchromosomal interactions, because the in-solution ligation method used to generate this data set is known to inflate the number of trans interactions (Nagano et al., 2015). Finally, we excluded sequences within 1 Mb of telomeres, because telomeres tend to interact with very high frequency compared to the rest of the genome (Supplemental Fig. S10; Feng et al., 2014).
Significant Hi-C interactions and associated RT classes are shown for Arabidopsis chromosome 1 in Figure 7A. Three main groups of interactions are apparent, for example interactions within the pericentromere (Mb 13.5–16.5), within each chromosome arm, and between the distal parts of the two arms. This pattern agrees well with the large-scale pattern of early-replicating arms and late-replicating pericentromeres (Figs. 3B and 4A). Interestingly, while pericentromeric sequences mainly interact between themselves, the distal arms contact other early replicating regions on both chromosome arms. All chromosomes show a similar organization (Supplemental Fig. S11), except the short arms of the acrocentric chromosomes 2 and 4. These results suggested that sequences in spatial proximity within the nucleus tend to replicate at the same time during S phase, irrespective of their map positions along the chromosome.
Figure 7.
Chromatin long-range interactions and replication timing. A, Significant interactions identified by Hi-C analysis at 100-kb resolution (Liu et al., 2016) within chromosome 1 (P < 0.001). Replication timing segments are displayed in the outer circle, with the coordinates on the outside in Mb. The black lines connecting two 100-kb bins indicate their physical proximity in three-dimensional space. B, RT class composition of regions interacting with bins containing E, EM, M, ML, or L segments. Similar results were obtained when the bins in each interacting pair were switched (Supplemental Fig. S12). C, RT class composition of regions interacting with bins containing E, EM, M, ML, or L segments normalized for the total coverage of each group. The cumulative overlap of all the groups together is shown as reference. D, The correlation matrix shows the pairwise similarity of RT class coverage of bins interacting with bins containing E, EM, M, ML, or L segments (Supplemental Table S5). Spearman correlation coefficients in each cell were used to draw a heat map (legend on the right).
We then analyzed the pairs of interacting bins identified by Hi-C to determine the interaction profile for each RT class. The resolution of our replication data is much higher than the Hi-C data, so each Hi-C bin can contain multiple RT classes. To address this, we analyzed separately all the interacting pairs of Hi-C bins in which the first bin included a given RT class. Next, we summarized the RT segment classes in the second bin in the pair (Fig. 7B). Some pairs were assigned multiple times corresponding to each RT segment class included in the first bin. We performed the analysis in both directions with similar results, confirming that the choice of the first and second bins in each interacting pair did not influence the outcome (Supplemental Fig. S12). The E, EM, and M segment classes have nearly identical interaction profiles, with a slight increase of ML and L segments in bins interacting with EM and M segments relative to E. The L segments interact preferentially with ML and L bins, while the ML segments interact with all RT classes. The ML and L groups are smaller than the E, EM, and M groups due to the reduced genomic coverage of these classes (Fig. 3C). To account for this disparity, we expressed the interaction profiles as percent of total coverage for each interaction group (Fig. 7C; Table IV). The cumulative interaction profile of all the groups taken together is also shown for reference.
Table IV. Percent of RT classes of genomic bins establishing significant long-range interactions.
Bins with E | Bins with EM | Bins with M | Bins with ML | Bins with L | All Bins | |
---|---|---|---|---|---|---|
Interactions with E segments | 42% | 39% | 36% | 21% | 8% | 35% |
Interactions with EM segments | 27% | 26% | 25% | 19% | 9% | 24% |
Interactions with M segments | 20% | 21% | 21% | 21% | 11% | 20% |
Interactions with ML segments | 10% | 13% | 16% | 29% | 41% | 15% |
Interactions with L segments | 1% | 2% | 2% | 10% | 31% | 3% |
We calculated a Pearson correlation matrix for the overlap of the RT classes with interacting partners of each group (Supplemental Table S5) and plotted the results as a heat map (Fig. 7D). The interaction profiles of the E, EM, and M groups are strongly correlated, while the L group has a distinct and opposite interaction profile. The interactions of the ML group are intermediate, reinforcing the transitional nature of this RT class. The two interaction clusters related to replication timing, the E/EM/M cluster and the L cluster, correlate with the large-scale organization of chromosomes into early replicating arms and late replicating pericentromeres.
DISCUSSION
The Genome-Wide Arabidopsis Replication Program at High Resolution
We used a new high-resolution strategy to characterize the replication timing program of Arabidopsis suspension cells at the whole genome level. Nearly 60% of the Arabidopsis genome was classified as replicating principally in early, mid, or late S phase. Unlike our earlier study (Lee et al., 2010), clear differences were observed between the sequence populations replicating in early and mid S phase. However, 41% of the genome showed strong replication activity in more than one portion of S phase, indicative of heterogeneity in replication timing.
Several factors contributed to the increased resolution of our new strategy. Potentially most important, we shortened the labeling time from 1 h to 10 min after determining that the duration of S phase is only 1.5 to 1.9 h for our Arabidopsis cultured cells (Mickelson-Young et al., 2016). We also reduced the widths of the sorting gates and increased the distance between them to minimize cross contamination between nuclei in early, mid, and late S phase (Fig. 1). Finally, EdU conjugation to AF488 allowed us to use a two-way sorting strategy to resolve replicating from nonreplicating nuclei and reduce contamination of EdU-labeled DNA by unlabeled DNA in the immunoprecipitates.
The increased resolution is apparent in maps of the raw sequencing reads, which show distinct replication profiles for early, mid, and late S phase across three highly reproducible biological replicates (Fig. 1C; Supplemental Fig. S5). Although the narrow sorting gates captured only about 50% of the S-phase nuclei, the entire Arabidopsis genome was represented in the read profiles. This may reflect heterogeneity in RT among genome sequences and/or technical limitations associated with the sensitivity of the flow cytometer, as demonstrated by a study in human cells that used six sorting gates (Hansen et al., 2010).
The increased resolution is also evident in visual comparisons between the replication profiles for Arabidopsis chromosome 4 generated using a 1-h BrdU pulse versus the 10-min EdU pulse (Supplemental Fig. S13). The profiles for early S phase are very similar, but there are major differences in the mid and late S profiles obtained using the two protocols. These differences correspond to regions that overlap between early and mid or mid and late in the BrdU profiles. The overlap between adjacent time points most likely reflects the inclusion of regions that incorporated BrdU as cells moved from earlier to later S phase during the 1-h pulse, which represents ∼50% of the length of S phase in the Arabidopsis cultured cells (Mickelson-Young et al., 2016). Notably, there is less overlap between the profiles generated using a 10-min EdU pulse, indicating that the Arabidopsis replication timing program is less stochastic than proposed previously (Lee et al., 2010).
We presented the EdU replication profiles separately for each time point, rather than assign a unique RT to each locus based on the ratio between early and late, as is often described in the literature (Hiratani et al., 2008; Schwaiger et al., 2009; Gilbert, 2010). By doing so, we highlighted the fact that some sequences replicate with high intensity in more than one portion of S phase (Fig. 2B; Supplemental Fig. S4). This almost always happens in consecutive time points, like early-mid or mid-late S phase. However, because of the short pulse length, wide separation between the gates, and sharp separation between populations of sorted nuclei, the heterogeneity is unlikely to be a technical artifact. Given that a sequence can replicate only once in a single cell, this heterogeneity is most likely due to variation between cells in the suspension culture. However, differences between alleles at a locus, often generated in cell cultures by somaclonal variation (Wang and Wang, 2012) may also contribute to the observed heterogeneity.
Segmentation Analysis
To reduce the complexity of the data and assign RTs to regions across the Arabidopsis genome, we used the Repliscan pipeline (Zynda et al., 2017) to assign a predominant RT based on the relative intensity of normalized signal in all three time points. This analysis allowed us to score replication that occurs in more than one time window at a given locus, better representing heterogeneous replication.
The segmentation analysis assigned a single prevalent RT (E, M, or L) to more than one-half of the Arabidopsis genome, with the rest divided between EM and ML. Only 1% of the genome was not assigned to a single time or two adjacent times, underscoring the robustness of the segmentation analysis. The shorter labeling time and placement of gates to minimize overlap and emphasize mid-replicating sequences (Fig. 1) led to significant differences in segmentation from our earlier analysis of Arabidopsis chromosome 4 (Lee et al., 2010; Supplemental Fig. S8). In the current study, 17% of chromosome 4 was classified as EM compared to 37% in the previous study. Concomitantly, sequences classified as E increased to 26% from <1% and as M to 22% from 4%. Coverage of L was reduced to 9% from 44%, with most of the late replicating segments located in a few megabases near the centromere. This reduction may reflect the narrower late gate and a shift in its placement to improve resolution. However, sequences classified as ML increased to 23% from 6% and included regions previously regarded as late replicating (Fig. 1B), consistent with the shorter labeling time increasing resolution. EL and EML declined from 8% to 1%. The sizes of segments identified in this study (Fig. 3D) are comparable to the putative replicons described for Arabidopsis chromosome 4 (Lee et al., 2010) and some animal systems (MacAlpine et al., 2004; Lebofsky et al., 2006; Schwaiger et al., 2009). However, our analysis did not uncover evidence of the larger replication domains that have been described in mammals (Hiratani et al., 2008; Ryba et al., 2010).
The Replication Program and Genome Organization
All five Arabidopsis chromosomes showed the same general pattern of replication timing (Fig. 2B; Supplemental Fig. S7). At a macroscopic level, the distal portions of the chromosome arms replicate earlier than proximal regions, while pericentromeric and centromeric regions replicate last. The short arms of chromosomes 2 and 4 are exceptions, because they replicate mainly in M and ML, perhaps because of their proximity to pericentromeric regions. This organization agrees generally with the biphasic model of replication that we proposed previously for Arabidopsis (Lee et al., 2010). Analysis of RT classes in relation to genomic features (Fig. 4D) suggested that E, EM, and M segments are predominantly euchromatic, and ML and L segments are primarily heterochromatic. However, the distribution of CS across the RT classes is more complex, with each RT class including multiple CS and each chromatin state including several RT classes. This diversity suggests that, particularly in the portion of the genome classically regarded as euchromatic, replication timing may be determined to a large extent by factors that are independent of local CS or by epigenetic features not included in the chromatin state analysis.
Replication timing data are thought to integrate transcriptional, epigenetic, and spatial information across the genome (Hiratani and Gilbert, 2009), and its inclusion in modeling can inform chromatin state assignments. Wang et al. (2015) classified CS2 and CS4 as intermediate between euchromatin or heterochromatin. These assignments were based in part on the lack of transcription of CS2 and CS4 and no enrichment for histone marks associated with active transcription. However, the large amount of CS2 and CS4 in the E, EM, and M RT classes indicates that major fractions of these states are in an open, accessible conformation characteristic of euchromatin. Thus, CS2 and CS4 may include nontranscribed euchromatin that replicates with transcribed euchromatin (CS1 and CS5) during early to mid S phase. This idea is supported by the near absence of CS2 and only a small fraction of CS4 replicating with heterochromatin in late S phase. ML segments, which include both euchromatic (CS1, CS5, CS2, and CS4) and heterochromatic (CS6 and CS3) CS, represent a transition from replicating euchromatin to replicating heterochromatin.
Comparison of replication timing and chromosome conformation data showed that E, EM, and M segments interact with each other with equal frequency within and between the arms of a chromosome, L segments interact predominantly with Hi-C bins located in the pericentromeres that encompass ML and L RT classes, while ML segments interact with all RT classes (Fig. 7). This pattern of interaction is consistent with the Arabidopsis genome consisting of two main genomic compartments: one that replicates during early to mid S phase and another that replicates in late S phase. This bipartite chromosomal architecture is reminiscent of the “open” and “closed” compartments identified in the human genome (Lieberman-Aiden et al., 2009). The two compartments have distinctive epigenomic and expression features and correlate with RT (Hansen et al., 1996; Ryba et al., 2010). It has been proposed that because of the compact nature of the Arabidopsis genome and differences in chromatin organization between plants and metazoans, the pericentromeric regions and chromosome arms may correspond functionally to the closed and open compartments in mammalian genomes (Grob, et al., 2014; Feng et al., 2014).
The data sets used for chromatin state and the long-range interaction studies (and the DHS data discussed below) were generated from Arabidopsis seedlings. During plant development, actively proliferating cells are localized primarily to meristematic regions and primordia and include all cell cycle stages. As a consequence, only a small fraction of the cells used to create the seedling data sets were in S phase. For this reason, future studies that use chromatin data from mitotic cells may uncover relationships between replication timing and chromatin that were not apparent in the comparisons here.
Nature of Mid S-Phase Replication
Replication in mid S phase may reflect spreading from regions that initiate during early S phase and/or initiation and elongation events specific to mid S phase. In our data, the distributions of read densities are sharply different in early and mid S phase, with early reads displaying high local maxima separated by deep troughs, while mid reads are more evenly distributed with smaller peaks and dips. These profiles are consistent with models postulating firing of low efficiency origins during mid S phase (Guilbaud et al., 2011), as well as with other models involving replication of regions lacking origins by unidirectional fork progression (Desprat et al., 2009; Ryba et al., 2010). Both these mechanisms can be incorporated into a model in which origins are not distributed uniformly across a genome (Rhind, 2014; Kaykov et al., 2016) and compete for replication factors (Mantiero et al., 2011), with the likelihood of replication initiating in a given region depending primarily on its origin.
According to the above models, early replicating regions of the Arabidopsis genome would have more origins and origin clusters, and mid-replicating regions would have fewer, more dispersed origins but would not differ dramatically with respect to sequence composition or global chromatin features. The only genome-wide study describing putative origin sequences in Arabidopsis is biased for early replication due to the use of Suc starvation to arrest cells in G1 before release into BrdU in the presence of hydroxyurea to deplete nucleotide pools (Costas et al., 2011), and thus cannot provide insight into whether origins are enriched in early versus mid-replicating regions. However, our model is supported by the observation that Arabidopsis sequences replicating in early or mid S phase overlap similar genomic features (Fig. 4D) and display similar chromatin state (Fig. 5, C and D) and chromatin interaction profiles (Fig. 7, B–D). However, these regions have different sensitivities to DNAse I digestion, with early regions, but not mid regions, enriched for DHS sites (Fig. 6, B and C). Local maxima in early regions are DHS rich, while local maxima in mid regions are DHS depleted, suggesting that early replication is associated with a higher degree of chromatin accessibility than mid replication. In this context, it is interesting that the replication program of the human genome can be accurately simulated by a model in which an initiation probability landscape is determined by the locations of DHS sites (Gindin et al., 2014).
Comparison to the Maize Replication Timing Program
We recently characterized replication timing in maize (Zea mays) root tips labeled with EdU (Wear et al., 2017). The global distribution of the replication timing signals in maize and Arabidopsis are similar, with chromosome arms replicating earlier and pericentromeric and centromeric regions replicating later. Like Arabidopsis, maize replication is distributed across the RT classes. However, there are more early replicating regions and fewer late regions in Arabidopsis than maize. This difference likely reflects the very different genic and nongenic (TEs and noncoding sequences) content of the two genomes (Arabidopsis: 51% genic and 49% nongenic; maize: 8% genic and 92% nongenic), with genic sequences tending to replicate earlier. In addition, there are more dispersed blocks of ML and L replicating DNA in maize chromosome arms, which are typically organized into genic regions separated by TE clusters. Maize TEs (81% of the genome) are very abundant in all RT classes, with those closer to genes replicating earlier. In contrast, Arabidopsis TEs (20% of the genome) are located primarily in pericentomeric regions and enriched in ML and L classes.
There are other important similarities between the Arabidopsis and maize replication timing programs. Strikingly, the sizes of the RT segments are similar even though the maize genome is ∼20-fold larger than Arabidopsis. Some loci show heterogeneity with respect to replication timing in both plant species. Moreover, early replicating regions are more accessible than mid replicating regions. This comparison underscores the role of genome structure in replication timing and highlights common features that are independent of genome organization.
CONCLUSION
We developed a high-resolution approach to study the replication program of eukaryotic genomes and applied it to the model plant Arabidopsis, extending our previous analysis of chromosome 4 (Lee et al., 2010) to the entire genome. Our results confirmed the basic observation that euchromatin replicates during early and mid S phase and heterochromatin replicates in late S phase, similar to most other eukaryotes (Hiratani et al., 2008; Schwaiger et al., 2009; Ryba et al., 2010). However, in this study, we resolved better early and mid-replication patterns within euchromatin. Although very similar in their association with most genomic features and chromatin marks, early and mid-replicating sequences differ strikingly in chromatin accessibility as measured by DHS density. This finding is of particular interest in connection with a recent model proposing that origin accessibility to replication factors is one of the primary determinants of replication programs (Rhind, 2014). The model, which integrates sequential activation of origins with stochastic firing, efficiently predicted the human replication program (Gindin et al., 2014).
MATERIALS AND METHODS
Arabidopsis Cell Culture and Nuclei Isolation
The Arabidopsis (Arabidopsis thaliana) cell line Col-0 was maintained as described by Lee et al. (2010). Labeling followed the 7-d split protocol, in which 25 mL of fresh medium and 25 mL of a 7-d culture are mixed and grown for 16 h. At 16 h, the cells were labeled with 10 μm EdU (Life Technologies) for 10 min. Labeling was terminated by fixing the cells in 1% paraformaldehyde with gentle agitation for 10 min, followed by quenching the formaldehyde with 0.125 m Gly. Fixed cells were filtered through two layers of Miracloth mesh and transferred to 1× PBS. They were washed in phosphate buffered saline three times and snap frozen in liquid nitrogen. Cells from eight cultures were combined for each of three biological replicates.
Nuclei were isolated as described previously (Lee et al., 2010; Wear et al., 2016) with the addition of a Percoll gradient step. The frozen cell pellet was ground at 4°C in 40 mL of cell lysis buffer (15 mm Tris-HCl, pH 7.5, 2 mm EDTA, 80 mm KCl, 20 mm NaCl, 15 mm β-mercaptoethanol, and 0.1% Triton X-100) using a commercial blender. The ground cell suspension was incubated for 5 min at 4°C, filtered through two layers of Miracloth, and centrifuged at 400g for 5 min at 4°C. Nuclei were enriched using a Percoll step gradient as described by Folta and Kaufman (2006) with minor modifications. The nuclei pellet was resuspended in 25 mL of extraction buffer (2 m hexylene glycol, 20 mm PIPES-KOH, pH 7.0, 10 mm MgCl2, 5 mm β-mercaptoethanol) and centrifuged at 1,500g over a discontinuous density gradient (30% and 80% [v/v] Percoll in gradient buffer: 0.5 m hexylene glycol, 10 mm MgCl2, 5 mm PIPES-KOH, pH 7.0, 5 mm β-mercaptoethanol, and 1% w/v Triton X-100) for 30 min at 4°C. The nuclei recovered from the 30:80% Percoll interface were resuspended in 15 mL of gradient buffer and centrifuged at 1,500g over a cushion of 30% Percoll (v/v) in gradient buffer for 10 min at 4°C.
After washing the nuclei pellet in modified cell lysis buffer (15 mm Tris-HCl, pH 7.5, 2 mm, EDTA, 80 mm KCl, 20 mm NaCl, and 0.1% Triton X-100), the incorporated EdU was conjugated with Alexa Fluor 488 (AF488) using a Click-iT EdU Alexa Fluor 488 Imaging kit (Life Technologies) as described previously (Wear et al., 2016). Finally, the nuclei were resuspended in the original cell lysis buffer containing 2 μg/mL DAPI and filtered through a CellTrics 20-μm nylon mesh filter (Partec) just before flow cytometry and sorting.
Flow Cytometry and Sorting
An InFlux flow cytometer (BD Biosciences) equipped with UV (535 nm) and blue (488 nm) lasers was used to sort nuclei by DNA content (DAPI fluorescence) and EdU incorporation (fluorescence of the conjugated AF488). Events were triggered on forward-angle light scatter, and data were collected using 90° side scatter and 460/50-nm and 530/40-nm band-pass filters (Bass et al., 2014; Wear et al., 2016). Plots of side scatter versus 460/50 nm (DAPI) were used to set analysis and sorting gates that excluded cellular debris.
Substage gates were used to sort labeled nuclei into pools representing early, mid, and late S phase as well as unlabeled nuclei in G1 phase as a source of nonreplicating reference DNA. The sorting gates were separated from each other to minimize overlap between the sorted populations (Fig. 1B). For each biological replicate, between 90,000 and 160,000 nuclei for each S phase fraction and 1 million unlabeled G1 nuclei were collected in tubes containing STE buffer (100 mm NaCl, 10 mm Tris-HCl, pH 7.5, and 1 mm EDTA). A small sample of nuclei (∼12,000–16,000) was also sorted from each gate into cell lysis buffer augmented with 2 μg/mL DAPI and reanalyzed to determine the sort purity (Supplemental Fig. S3). Flow cytometry data were analyzed using FlowJo software (Tree Star).
Genomic DNA Extraction and Immunoprecipitation of EdU/AF488-Labeled DNA
Genomic DNA was extracted as described previously (Lee et al., 2010) with minor modifications. After overnight incubation with proteinase K, the samples were incubated with RNAse A (50 μg/mL) for 1 h at 37°C prior to addition of PMSF (0.7 mg/mL). The DNA was extracted once with phenol/chloroform/isoamyl alcohol (25:24:1) and twice with chloroform, and precipitated with 0.6 volumes of ice-cold isopropanol overnight at –20°C. The DNA was pelleted by centrifugation, washed twice with 1 mL of 70% ethanol, and resuspended in 130 μL of IP dilution buffer (167 mm NaCl, 16.7 mm Tris-HCl, pH 8, 1.2 mm EDTA, and 1.1% [v/v] Triton X-100). A Covaris S220 ultrasonicator was used to shear the DNA to an average size of 300 bp (parameters: intensity 5, duty cycle 10%, cycles per burst 200, treatment time 180 s).
After shearing, 370 μL of IP dilution buffer (Gendrel et al., 2005) was added, and the sheared DNA solution was precleared by gentle agitation in 20 μL of magnetic protein G beads (Dynabeads Life Technologies) preequilibrated with IP dilution buffer at 4°C for 1 h. The beads were removed with a magnet and newly synthesized DNA was immunoprecipitated by incubating with a 1:200 dilution of anti-Alexa Fluor 488 antibody (Molecular Probes; #A-11094) at 4°C overnight. The DNA-antibody complex was captured with 25 μL of preequilibrated protein G beads at 4°C for 2 h, followed by washing the beads as described by Gendrel et al. (2005). Bound DNA was eluted from the beads in 250 μL of elution buffer (1% [w/v] SDS and 100 mm sodium bicarbonate) at 65°C for 15 min, transferring the supernatant to a new tube and repeating the elution for a final volume of 500 μL. Eluted DNA was purified with QIAquick PCR purification kit (Qiagen) according to the manufacturer’s directions. To maximize DNA recovery, prewarmed (50°C) TE was used for the elution step.
Library Construction, Sequencing, and Analysis of Repli-Seq Data
Immunoprecipitated DNA was used to construct sequencing libraries with the NEXTflex Illumina ChIP-Seq Library Prep Kit (Bioo Scientific) using the ultra-low input protocol. After adapter ligation, the libraries were amplified with 18 cycles of PCR with the Expand High FidelityPLUS PCR System (Roche). For each experiment, individual samples were bar-coded and pooled. The libraries were sequenced with an Illumina Hi-Seq 2000 platform.
Raw sequencing data were processed using Trim Galore! (v0.3.7) to remove 3′ universal adapters from the paired reads, trim 5′ ends with fastq quality scores <20, and remove trimmed reads shorter than 40 bp. The quality controlled reads were then aligned to the Arabidopsis TAIR10 genome with BWA mem (v0.7.4) using default parameters (Li, 2013). After alignment, reads with multiple alignments were discarded using samtools 1.3 (Li et al., 2009). For mapping statistics and total sequence coverage, see Supplemental Table S1.
Data were then analyzed as described by Zynda et al. (2017). The scripts can be found at https://github.com/zyndagj/repliscan. Read densities were scored in 1-kb bins across the genome, and normalized using sequence depth scaling (Ramírez et al., 2016). The correlation between biological replicates was assessed using multiBigwigSummary and plotted as a heat map using plotCorrelation in Deeptools 2.0 suite (Ramírez et al., 2016). Replicates were highly correlated (Supplemental Fig. S5).
Biological replicates were aggregated by taking the median value in each 1-kb bin. Bins with coverage in the upper and lower 0.1% tails of a calculated normal distribution were removed. Values for each of the S-phase samples were divided by the value for the nonreplicating G1 reference in the corresponding bin to normalize for sequencing bias. To reduce noise, Haar wavelet smoothing was performed using the software package wavelets from Percival and Walden (2000). The Haar wavelet method was chosen because, unlike kernel smoothing methods, it reduces differential noise without spreading peak boundaries.
Classifying Predominant Replication Time
The method used to assign a predominant time of replication to each 1-kb bin across the genome is described by Zynda et al. (2017). Each bin was classified as replicating at a given time point if its normalized replication intensity was above a chromosome-specific threshold value, as calculated by the following procedure. Total coverage, defined as the fraction of the chromosome with a signal greater than the threshold in at least one RT window, was computed as a continuous function of the threshold value using a cubic spline interpolation across the replication values. The first derivative of the coverage function was then calculated using the central difference formula to show the rate of coverage change.
Starting from the point with the highest rate of coverage change (maximum first derivative), the threshold was lowered until the first derivative of the coverage versus threshold curve effectively flattened out. Below this point, any additional signals were uninformative because those regions had already been classified as replicating in other time points. The predominant RT for a given 1-kb bin was then assigned by considering the relative amounts of total replication signal in early, mid and late S phase. For each 1-kb bin, the three signals were divided by the maximum value, scaling the largest value to 1 and others between 0 and 1. The bin was labeled as the combination of times with a normalized signal above 0.5. This strategy allowed single prevalent time and combinatorial time classifications to be assigned to a given 1-kb bin. Bins were classified as undetermined if none the signals in any of the three time samples reached the threshold value.
Replication Intensity and Relative Distance from the Centromere
Centromere positions in each chromosome were identified with the bedtools 2.25.0 genomecov utility (Quinlan and Hall, 2010) as 1-kb bins with the maximum coverage of 180-bp repeats (Nagaki et al., 2003). Using normalized replication intensity in early, mid, and late S phase, the percent of total replication occurring in bins representing successive 10% portions of a given chromosome arm was calculated with a custom R script (R Development Core Team, 2016). Replication within each interval, expressed as percentage of total replication activity for that chromosome arm in that portion of S phase, was plotted as a function of the relative distance from the centromere (Fig. 2D) using the R package ggplot2 (Wickham, 2009).
Association of Replication Timing with Genomic Features and Repeat Sequences
Genomic annotation of genes, pseudogenes and TEs were obtained from the Araport11 database (TAIR10_GFF3_genes_transposons.AIP.gff.gz at https://www.araport.org/downloads/TAIR10_genome_release/annotation). Unannotated regions were defined as the difference between the genome and all the annotated features. For viewing in IGV 2.3.60 and comparison with Repli-Seq data, the coverage of genes and TEs was defined as the percentage of bases in a specified portion of the genome that overlap with that feature. Gene and TE coverage was scored in 1-kb bins with bedtools v2.25.0 genomecov and map utilities. For visualization in IGV, the data were smoothed using a 50-kb moving average with the R package zoo (Zeileis and Grothendieck, 2005). A custom script is available upon request.
Associations of genomic features with RT segmentation classes were computed with bedtools v2.25.0 intersect, and their statistical significance assessed with a χ2 test. The adjusted residuals (Agresti, 2007) were used to measure the relative contribution of each combination of genomic feature and RT class to assess the statistical significance of the associations.
Telomere-related, centromere-related, 45S, and 5S ribosomal DNA sequences were obtained from Plant Repeat Databases (Ouyang and Buell, 2004). The replication timing of each group of repeats was assessed as described by Gent et al. (2014). Reads from individual biological replicates of G1, early, mid, and late S phase samples were aligned to consensus sequences for each group using BLAST software (parameter “-e 1e-8”; Camacho et al., 2009). For each sample and biological replicate, the number of reads that aligned to each repeat family was normalized to the total number of reads present in the sample. Finally, the relative abundance of each family in the early, mid or late reads was normalized to the relative abundance of the same family in the G1 reference.
Association of Replication Timing with Chromatin States, DNAse I Hypersensitivity Sites, and Chromosome Conformation
Repli-Seq data were compared with the chromatin state data set produced by Wang et al. (2015). The overlaps in bp between each chromatin state and the five major RT segment classes were calculated using bedtools v2.25.0 intersect and plotted as absolute and relative coverage. Statistical significance was assessed with a χ2 test. We used the χ2 adjusted residuals (Agresti, 2007) to identify which RT classes were most different from the expected value in each chromatin state group of features, compared to the genome. The absolute coverage of each chromatin state in each RT class was used to compute the Spearman correlation coefficient between RT classes using the function cor in R and subsequently plotted as a heat map with the package corrplot (Wei and Simko, 2016).
To compare replication timing profiles with DHSs, we used the data set GEO accession PRJNA231710 described by Sullivan et al. (2014). The density of DHSs in each RT class (Fig. 6A) was determined using data from control experiments. The number of DNase cleavages from signal files (accessions GSM1289359 and GSM1289363) was averaged at 1-kb steps across the genome and smoothed using a 5-kb moving average. The resulting DHS density distribution was plotted as a heat map and overlaid with the early, mid, and late replication intensity signals. The DNaseI read density files (Col-0.7d_Seedling.NA.NA.DS19992.signal.bw and Col-0.7d_Seedling.NA.NA.DS21094.signal.bw) were downloaded from http://plantregulome.org (Sullivan et al., 2014). The DNaseI hypersensitive peak files (Col-0.7d_Seedling.NA.NA.DS19992.peaks.bed.gz and Col-0.7d_Seedling.NA.NA.DS21094.peaks.bed.gz) also were downloaded from http://plantregulome.org.
We used the data set (accession no. SRR2626429) described in Liu et al. (2016) for chromosome conformation analysis. Sequencing reads were aligned to the TAIR10 reference genome and experimental artifacts, like circularized fragments, PCR duplicates, relegated adjacent sequences, and wrong size fragments, were removed using HICUP with the default parameters (Wingett et al., 2015). Significant interactions, defined as pairs of loci that have a greater number of Hi-C reads than expected by chance (P < 0.001), were identified at 100-kb resolution using HOMER (Heinz et al., 2010) and visualized using the CIRCOS tool (Krzywinski et al., 2009) together with the genome segmentation in RT classes. Within each interacting pair of 100-kb bins, we randomized the first and second bins and split interaction in groups based on the content of RT classes in the first bin. The absolute and relative overlaps of the second bins with RT classes were computed with bedtools v2.25.0 intersect. A Pearson correlation matrix was computed using the function cor in R, and subsequently plotted as heat map with the package corrplot (Wei and Simko, 2016).
Accession Numbers
Repli-Seq data from this study is in the NCBI Sequence Read Archive (SRA) under the umbrella accession number PRJNA330547. The SRA numbers are: G1 SAMN05417671, Early SAMN05417674, Mid SAMN05417672, and Late SAMN05417673. Processed data files (E_ratio_3.smooth.bedgraph; M_ratio_3.smooth.bedgraph; L_ratio_3.smooth.bedgraph; ratio_segmentation.gff3) are available from the CyVerse (previously iPlant Collaborative; Merchant et al., 2016) Data Store. The Nimblegene microarray data for Arabidopsis chromosome 4 replication timing is at Gene Expression Omnibus under accession number GSE103321. The tiling microarray data for Arabidopsis chromosome 4 replication timing can be found at Array Express under accession number E-GEOD-30433.
Supplemental Data
The following supplemental materials are available.
Supplemental Table S1. Statistics for sequenced libraries.
Supplemental Table S2. Adjusted residuals for χ2 test on contingency Table II describing the overlaps between genomic features and RT classes.
Supplemental Table S3. Overlap between CS and RT classes.
Supplemental Table S4. Adjusted residuals relative to the χ2 test on the contingency Table III describing the overlaps between CS and RT classes.
Supplemental Table S5. Coverage of RT classes of genomic bins establishing significant long-range interactions.
Supplemental Figure S1. Replication timing profiles generated using tiling and Nimblegen arrays are similar.
Supplemental Figure S2. Spearman correlation matrix for tiling and Nimblegen array platforms.
Supplemental Figure S3. Sorting gates and reanalysis of sorted fractions.
Supplemental Figure S4. Distribution of read density for each sequencing library in representative 1-Mb regions of Arabidopsis chromosomes 1, 3, and 5.
Supplemental Figure S5. Spearman correlation matrix of read densities of sequenced samples.
Supplemental Figure S6. Comparison of linear ratio versus log2 ratio.
Supplemental Figure S7. Large-scale distribution of read density on the five Arabidopsis chromosomes.
Supplemental Figure S8. Comparison of the distribution of RT classes on Arabidopsis chromosome 4.
Supplemental Figure S9. Replication timing and genomic features.
Supplemental Figure S10. Hi-C background models generated with HOMER for Arabidopsis chromosome 1.
Supplemental Figure S11. Replication timing and chromosome conformation.
Supplemental Figure S12. Replication timing classes of genomic bins establishing significant interactions.
Supplemental Figure S13. Comparison of replication timing profiles using Nimblegen array and Repli-Seq.
Acknowledgments
We thank Leigh Mickelson-Young for her assistance in maintaining the Arabidopsis cell cultures and with the flow sorting.
Footnotes
This work was supported by the Plant Genome Research Program of the National Science Foundation (grant IOS-1025830 to L.H.-B., W.F.T., R.A.M., and M.W.V.).
Articles can be viewed without a subscription.
L.C., R.A.M., M.W.V., W.F.T., and L.H.-B. conceived the experiments; L.C., A.M.B., E.W., E.E.W., C.L., and T.-J.L. performed the experiments; L.C., P.P., G.J.Z., J.S., M.W.V., W.F.T., and L.H.-B. analyzed Repli-Seq data; L.C., W.F.T., and L.H.-B. wrote the manuscript with contributions from all authors; all authors read and approved the final manuscript.
References
- Agresti A. (2007) An Introduction to Categorical Data Analysis, Ed 2. Wiley-Interscience, Hoboken, NJ [Google Scholar]
- Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 [DOI] [PubMed] [Google Scholar]
- Arnoult N, Schluth-Bolard C, Letessier A, Drascovic I, Bouarich-Bourimi R, Campisi J, Kim SH, Boussouar A, Ottaviani A, Magdinier F, et al. (2010) Replication timing of human telomeres is chromosome arm-specific, influenced by subtelomeric structures and connected to nuclear localization. PLoS Genet 6: e1000920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bass HW, Hoffman GG, Lee T-J, Wear EE, Joseph SR, Allen GC, Hanley-Bowdoin L, Thompson WF (2015) Defining multiple, distinct, and shared spatiotemporal patterns of DNA replication and endoreduplication from 3D image analysis of developing maize (Zea mays L.) root tip nuclei. Plant Mol Biol 89: 339–351 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bass HW, Wear EE, Lee TJ, Hoffman GG, Gumber HK, Allen GC, Thompson WF, Hanley-Bowdoin L (2014) A maize root tip system to study DNA replication programmes in somatic and endocycling nuclei during plant development. J Exp Bot 65: 2747–2756 [DOI] [PubMed] [Google Scholar]
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10: 421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng C-Y, Krishnakumar V, Chan AP, Thibaud-Nissen F, Schobel S, Town CD (2017) Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J 89: 789–804 [DOI] [PubMed] [Google Scholar]
- Copenhaver GP, Pikaard CS (1996) RFLP and physical mapping with an rDNA-specific endonuclease reveals that nucleolus organizer regions of Arabidopsis thaliana adjoin the telomeres on chromosomes 2 and 4. Plant J 9: 259–272 [DOI] [PubMed] [Google Scholar]
- Costas C, de la Paz Sanchez M, Stroud H, Yu Y, Oliveros JC, Feng S, Benguria A, López-Vidriero I, Zhang X, Solano R, et al. (2011) Genome-wide mapping of Arabidopsis thaliana origins of DNA replication and their associated epigenetic marks. Nat Struct Mol Biol 18: 395–400 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dekker J, Rippe K, Dekker M, Kleckner N (2002) Capturing chromosome conformation. Science 295: 1306–1311 [DOI] [PubMed] [Google Scholar]
- Desprat R, Thierry-Mieg D, Lailler N, Lajugie J, Schildkraut C, Thierry-Mieg J, Bouhassira EE (2009) Predictable dynamic program of timing of DNA replication in human cells. Genome Res 19: 2288–2299 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ding Q, MacAlpine DM (2011) Defining the replication program through the chromatin landscape. Crit Rev Biochem Mol Biol 46: 165–179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Douet J, Tourmente S (2007) Transcription of the 5S rRNA heterochromatic genes is epigenetically controlled in Arabidopsis thaliana and Xenopus laevis. Heredity (Edinb) 99: 5–13 [DOI] [PubMed] [Google Scholar]
- Dvořáčková M, Raposo B, Matula P, Fuchs J, Schubert V, Peška V, Desvoyes B, Gutierrez C, Fajkus J (2017) Replication of ribosomal DNA in Arabidopsis occurs both inside and outside the nucleolus during S phase progression. J Cell Sci 131: jcs.202416. [DOI] [PubMed] [Google Scholar]
- Eaton ML, Prinz JA, MacAlpine HK, Tretyakov G, Kharchenko PV, MacAlpine DM (2011) Chromatin signatures of the Drosophila replication program. Genome Res 21: 164–174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farkash-Amar S, Simon I (2010) Genome-wide analysis of the replication program in mammals. Chromosome Res 18: 115–125 [DOI] [PubMed] [Google Scholar]
- Feng S, Cokus SJ, Schubert V, Zhai J, Pellegrini M, Jacobsen SE (2014) Genome-wide Hi-C analyses in wild-type and mutants reveal high-resolution chromatin interactions in Arabidopsis. Mol Cell 55: 694–707 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folta KM, Kaufman LS (2006) Isolation of Arabidopsis nuclei and measurement of gene transcription rates using nuclear run-on assays. Nat Protoc 1: 3094–3100 [DOI] [PubMed] [Google Scholar]
- Gendrel A-V, Lippman Z, Martienssen R, Colot V (2005) Profiling histone modification patterns in plants using genomic tiling microarrays. Nat Methods 2: 213–218 [DOI] [PubMed] [Google Scholar]
- Gent JI, Madzima TF, Bader R, Kent MR, Zhang X, Stam M, McGinnis KM, Dawe RK (2014) Accessible DNA and relative depletion of H3K9me2 at maize loci undergoing RNA-directed DNA methylation. Plant Cell 26: 4903–4917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert DM. (2002) Replication timing and transcriptional control: beyond cause and effect. Curr Opin Cell Biol 14: 377–383 [DOI] [PubMed] [Google Scholar]
- Gilbert DM. (2010) Evaluating genome-scale approaches to eukaryotic DNA replication. Nat Rev Genet 11: 673–684 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gindin Y, Valenzuela MS, Aladjem MI, Meltzer PS, Bilke S (2014) A chromatin structure-based model accurately predicts DNA replication timing in human cells. Mol Syst Biol 10: 722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grob S, Schmid MW, Grossniklaus U (2014) Hi-C analysis in Arabidopsis identifies the KNOT, a structure with similarities to the flamenco locus of Drosophila. Mol Cell 55: 678–693 [DOI] [PubMed] [Google Scholar]
- Guilbaud G, Rappailles A, Baker A, Chen C-L, Arneodo A, Goldar A, d’Aubenton-Carafa Y, Thermes C, Audit B, Hyrien O (2011) Evidence for sequential and increasing activation of replication origins along replication timing gradients in the human genome. PLOS Comput Biol 7: e1002322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamperl S, Wittner M, Babl V, Perez-Fernandez J, Tschochner H, Griesenbeck J (2013) Chromatin states at ribosomal DNA loci. Biochim Biophys Acta 1829: 405–417 [DOI] [PubMed] [Google Scholar]
- Hansen RS, Canfield TK, Fjeld AD, Gartler SM (1996) Role of late replication timing in the silencing of X-linked genes. Hum Mol Genet 5: 1345–1353 [DOI] [PubMed] [Google Scholar]
- Hansen RS, Thomas S, Sandstrom R, Canfield TK, Thurman RE, Weaver M, Dorschner MO, Gartler SM, Stamatoyannopoulos JA (2010) Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci USA 107: 139–144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Havlová K, Dvořáčková M, Peiro R, Abia D, Mozgová I, Vansáčová L, Gutierrez C, Fajkus J (2016) Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana. Plant Mol Biol 92: 457–471 [DOI] [PubMed] [Google Scholar]
- Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38: 576–589 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hetzel J, Duttke SH, Benner C, Chory J (2016) Nascent RNA sequencing reveals distinct features in plant transcription. Proc Natl Acad Sci USA 113: 12316–12321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hiratani I, Gilbert DM (2009) Replication timing as an epigenetic mark. Epigenetics 4: 93–97 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hiratani I, Ryba T, Itoh M, Yokochi T, Schwaiger M, Chang CW, Lyou Y, Townes TM, Schübeler D, Gilbert DM (2008) Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol 6: e245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hou C, Li L, Qin ZS, Corces VG (2012) Gene density, transcription, and insulators contribute to the partition of the Drosophila genome into physical domains. Mol Cell 48: 471–484 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaykov A, Taillefumier T, Bensimon A, Nurse P (2016) Molecular combing of single DNA molecules on the 10 megabase scale. Sci Rep 6: 19636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, et al. (2011) Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 471: 480–485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kotrlik JW, Williams HA, Jabor MK (2011) Reporting and interpreting effect size in quantitative agricultural education research. J Agric Educ 52: 132–142 [Google Scholar]
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19: 1639–1645 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Layat E, Sáez-Vásquez J, Tourmente S (2012) Regulation of Pol I-transcribed 45S rDNA and Pol III-transcribed 5S rDNA in Arabidopsis. Plant Cell Physiol 53: 267–276 [DOI] [PubMed] [Google Scholar]
- Lebofsky R, Heilig R, Sonnleitner M, Weissenbach J, Bensimon A (2006) DNA replication origin interference increases the spacing between initiation events in human cells. Mol Biol Cell 17: 5337–5345 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee T-J, Pascuzzi PE, Settlage SB, Shultz RW, Tanurdzic M, Rabinowicz PD, Menges M, Zheng P, Main D, Murray JAH, et al. (2010) Arabidopsis thaliana chromosome 4 replicates in two phases that correlate with chromatin state. PLoS Genet 6: e1000982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lermontova I, Sandmann M, Mascher M, Schmit A-C, Chabouté M-E (2015) Centromeric chromatin and its dynamics in plants. Plant J 83: 4–17 [DOI] [PubMed] [Google Scholar]
- Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: 1303.3997
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu C, Wang C, Wang G, Becker C, Zaidem M, Weigel D (2016) Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution. Genome Res 26: 1057–1068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lubelsky Y, Prinz JA, DeNapoli L, Li Y, Belsky JA, MacAlpine DM (2014) DNA replication and transcription programs respond to the same chromatin cues. Genome Res 24: 1102–1114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo C, Sidote DJ, Zhang Y, Kerstetter RA, Michael TP, Lam E (2013) Integrative analysis of chromatin states in Arabidopsis identified potential regulatory mechanisms for natural antisense transcript production. Plant J 73: 77–90 [DOI] [PubMed] [Google Scholar]
- MacAlpine DM, Rodríguez HK, Bell SP (2004) Coordination of replication and transcription along a Drosophila chromosome. Genes Dev 18: 3094–3105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mantiero D, Mackenzie A, Donaldson A, Zegerman P (2011) Limiting replication initiation factors execute the temporal programme of origin firing in budding yeast. EMBO J 30: 4805–4814 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merchant N, Lyons E, Goff S, Vaughn M, Ware D, Micklos D, Antin P (2016) The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol 14: e1002342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyerowitz EM. (2002) Plants compared to animals: the broadest comparative study of development. Science 295: 1482–1485 [DOI] [PubMed] [Google Scholar]
- Mickelson-Young L, Wear E, Mulvaney P, Lee T-J, Szymanski ES, Allen G, Hanley-Bowdoin L, Thompson W (2016) A flow cytometric method for estimating S-phase duration in plants. J Exp Bot 67: 6077–6087 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagaki K, Talbert PB, Zhong CX, Dawe RK, Henikoff S, Jiang J (2003) Chromatin immunoprecipitation reveals that the 180-bp satellite repeat is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163: 1221–1225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagano T, Várnai C, Schoenfelder S, Javierre B-M, Wingett SW, Fraser P (2015) Comparison of Hi-C results using in-solution versus in-nucleus ligation. Genome Biol 16: 175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ouyang S, Buell CR (2004) The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32: D360–D363 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Percival DB, Walden AT (2000) Wavelet Methods for Time Series Analysis. Cambridge University Press, Cambridge, UK [Google Scholar]
- Picard F, Cadoret J-C, Audit B, Arneodo A, Alberti A, Battail C, Duret L, Prioleau M-N (2014) The spatiotemporal program of DNA replication is associated with specific combinations of chromatin marks in human cells. PLoS Genet 10: e1004282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pope BD, Chandra T, Buckley Q, Hoare M, Ryba T, Wiseman FK, Kuta A, Wilson MD, Odom DT, Gilbert DM (2012) Replication-timing boundaries facilitate cell-type and species-specific regulation of a rearranged human chromosome in mouse. Hum Mol Genet 21: 4162–4170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pope BD, Ryba T, Dileep V, Yue F, Wu W, Denas O, Vera DL, Wang Y, Hansen RS, Canfield TK, et al. (2014) Topologically associating domains are stable units of replication-timing regulation. Nature 515: 402–405 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Provart NJ, Alonso J, Assmann SM, Bergmann D, Brady SM, Brkljacic J, Browse J, Chapple C, Colot V, Cutler S, et al. (2016) 50 years of Arabidopsis research: highlights and future directions. New Phytol 209: 921–944 [DOI] [PubMed] [Google Scholar]
- Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842 [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Development CoreTeam (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna [Google Scholar]
- Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, Manke T (2016) deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44: W160–W165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rea LM, Parker RA (2005) Designing and Conducting Survey Research: A Comprehensive Guide, Ed 3. Jossey-Bass, San Francisco, CA [Google Scholar]
- Rhind N. (2014) The three most important things about origins: location, location, location. Mol Syst Biol 10: 723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richards EJ, Ausubel FM (1988) Isolation of a higher eukaryotic telomere from Arabidopsis thaliana. Cell 53: 127–136 [DOI] [PubMed] [Google Scholar]
- Rivera-Mulia JC, Buckley Q, Sasaki T, Zimmerman J, Didier RA, Nazor K, Loring JF, Lian Z, Weissman S, Robins AJ, et al. (2015) Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. Genome Res 25: 1091–1103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roudier F, Ahmed I, Bérard C, Sarazin A, Mary-Huard T, Cortijo S, Bouyer D, Caillieux E, Duvernois-Berthet E, Al-Shikhley L, et al. (2011) Integrative epigenomic mapping defines four main chromatin states in Arabidopsis. EMBO J 30: 1928–1938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryba T, Battaglia D, Pope BD, Hiratani I, Gilbert DM (2011) Genome-scale analysis of replication timing: from bench to bioinformatics. Nat Protoc 6: 870–895 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryba T, Hiratani I, Lu J, Itoh M, Kulik M, Zhang J, Schulz TC, Robins AJ, Dalton S, Gilbert DM (2010) Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res 20: 761–770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salic A, Mitchison TJ (2008) A chemical method for fast and sensitive detection of DNA synthesis in vivo. Proc Natl Acad Sci USA 105: 2415–2420 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Savadel SD, Bass HW (2017) Take a look at plant DNA replication: recent insights and new questions. Plant Signal Behav 12: e1311437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schübeler D, Scalzo D, Kooperberg C, van Steensel B, Delrow J, Groudine M (2002) Genome-wide DNA replication profile for Drosophila melanogaster: a link between transcription and replication timing. Nat Genet 32: 438–442 [DOI] [PubMed] [Google Scholar]
- Schwaiger M, Stadler MB, Bell O, Kohler H, Oakeley EJ, Schübeler D (2009) Chromatin state marks cell-type- and gender-specific replication of the Drosophila genome. Genes Dev 23: 589–601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sequeira-Mendes J, Aragüez I, Peiró R, Mendez-Giraldez R, Zhang X, Jacobsen SE, Bastolla U, Gutierrez C (2014) The functional topography of the Arabidopsis genome is organized in a reduced number of linear motifs of chromatin states. Plant Cell 26: 2351–2366 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sequeira-Mendes J, Gutierrez C (2016) Genome architecture: from linear organisation of chromatin to the 3D assembly in the nucleus. Chromosoma 125: 455–469 [DOI] [PubMed] [Google Scholar]
- Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G (2012) Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148: 458–472 [DOI] [PubMed] [Google Scholar]
- Shultz RW, Tatineni VM, Hanley-Bowdoin L, Thompson WF (2007) Genome-wide analysis of the core DNA replication machinery in the higher plants Arabidopsis and rice. Plant Physiol 144: 1697–1714 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, Sandstrom R, Thurman RE, Neph S, Reynolds AP, et al. (2014) Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Reports 8: 2015–2030 [DOI] [PubMed] [Google Scholar]
- Sullivan GM, Feinn R (2012) Using effect size-or why the P value is not enough. J Grad Med Educ 4: 279–282 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takebayashi S, Ryba T, Gilbert DM (2012) Developmental control of replication timing defines a new breed of chromosomal domains with a novel mechanism of chromatin unfolding. Nucleus 3: 500–507 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorpe SD, Charpentier M (2017) Highlight on the dynamic organization of the nucleus. Nucleus 8: 2–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14: 178–192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaillant I, Tutois S, Cuvillier C, Schubert I, Tourmente S (2007) Regulation of Arabidopsis thaliana 5S rRNA genes. Plant Cell Physiol 48: 745–752 [DOI] [PubMed] [Google Scholar]
- Vera DL, Madzima TF, Labonne JD, Alam MP, Hoffman GG, Girimurugan SB, Zhang J, McGinnis KM, Dennis JH, Bass HW (2014) Differential nuclease sensitivity profiling of chromatin reveals biochemical footprints coupled to gene expression and functional DNA elements in maize. Plant Cell 26: 3883–3893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C, Liu C, Roqueiro D, Grimm D, Schwab R, Becker C, Lanz C, Weigel D (2015) Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res 25: 246–256 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Q-M, Wang L (2012) An evolutionary view of plant tissue culture: somaclonal variation and selection. Plant Cell Rep 31: 1535–1547 [DOI] [PubMed] [Google Scholar]
- Wear E, Concia L, Brooks A, Markham E, Lee T-J, Allen G, Thompson W, Hanley-Bowdoin L (2016) Isolation of plant nuclei at defined cell cycle stages using EdU labeling and flow cytometry. In Caillaud M-C, ed, Plant Cell Division. Springer, New York, pp 69–86 [DOI] [PubMed] [Google Scholar]
- Wear EE, Song JZ, Zynda GJ, LeBlanc C, Lee T-J, Mickelson-Young L, Concia L, Mulvaney P, Szymanski ES, Allen GC, et al. (2017) Genomic analysis of the DNA replication timing program during mitotic S phase in maize (Zea mays) root tips. Plant Cell [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei T, Simko V (2016) corrplot: Visualization of a Correlation Matrix corrplot: R package version 0.77. http://cran.r-project.org/package=corrplot
- Wickham H. (2009) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York [Google Scholar]
- Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, Fraser P, Andrews S (2015) HiCUP: pipeline for mapping and processing Hi-C data. F1000 Res 4: 1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woodfine K, Fiegler H, Beare DM, Collins JE, McCann OT, Young BD, Debernardi S, Mott R, Dunham I, Carter NP (2004) Replication timing of the human genome. Hum Mol Genet 13: 191–202 [DOI] [PubMed] [Google Scholar]
- Yaffe E, Farkash-Amar S, Polten A, Yakhini Z, Tanay A, Simon I (2010) Comparative analysis of DNA replication timing reveals conserved large-scale chromosomal architecture. PLoS Genet 6: e1001011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeileis A, Grothendieck G (2005) zoo: S3 infrastructure for regular and irregular time series. arXiv: math/0505527
- Zhang W, Wu Y, Schnable JC, Zeng Z, Freeling M, Crawford GE, Jiang J (2012a) High-resolution mapping of open chromatin in the rice genome. Genome Res 22: 151–162 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, McCord RP, Ho Y-J, Lajoie BR, Hildebrand DG, Simon AC, Becker MS, Alt FW, Dekker J (2012b) Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148: 908–921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, et al. (2007) Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 17: 839–851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zynda GJ, Song J, Concia L, Wear EE, Hanley-Bowdoin L, Thompson WF, Vaughn MW (2017) Repliscan: a tool for classifying replication timing regions. BMC Bioinformatics 18: 362. [DOI] [PMC free article] [PubMed] [Google Scholar]