Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2012 Mar 7;7(3):e32053. doi: 10.1371/journal.pone.0032053

Modeling Inhomogeneous DNA Replication Kinetics

Michel G Gauthier 1, Paolo Norio 2,3, John Bechhoefer 1,*
Editor: Jerome Mathe4
PMCID: PMC3296702  PMID: 22412853

Abstract

In eukaryotic organisms, DNA replication is initiated at a series of chromosomal locations called origins, where replication forks are assembled proceeding bidirectionally to replicate the genome. The distribution and firing rate of these origins, in conjunction with the velocity at which forks progress, dictate the program of the replication process. Previous attempts at modeling DNA replication in eukaryotes have focused on cases where the firing rate and the velocity of replication forks are homogeneous, or uniform, across the genome. However, it is now known that there are large variations in origin activity along the genome and variations in fork velocities can also take place. Here, we generalize previous approaches to modeling replication, to allow for arbitrary spatial variation of initiation rates and fork velocities. We derive rate equations for left- and right-moving forks and for replication probability over time that can be solved numerically to obtain the mean-field replication program. This method accurately reproduces the results of DNA replication simulation. We also successfully adapted our approach to the inverse problem of fitting measurements of DNA replication performed on single DNA molecules. Since such measurements are performed on specified portion of the genome, the examined DNA molecules may be replicated by forks that originate either within the studied molecule or outside of it. This problem was solved by using an effective flux of incoming replication forks at the model boundaries to represent the origin activity outside the studied region. Using this approach, we show that reliable inferences can be made about the replication of specific portions of the genome even if the amount of data that can be obtained from single-molecule experiments is generally limited.

Introduction

Cells must accurately duplicate their DNA content at every cell cycle. Depending on the organism, the process of DNA replication can initiate at one or multiple sites called origins of replication. The DNA is copied by a pair of oppositely moving replication forks that emerge from each origin. These forks actively replicate the genome away from the origin until they encounter another replication fork. DNA replication can thus be modeled as a series of nucleations, growth (perhaps including fork stalls and rescues [1], [2]), and coalescences occurring in an asynchronous parallel way until the whole genome is copied [3], [4] (Fig. 1).

Figure 1. Space-time representation of the replication kinetics.

Figure 1

The left-hand side shows the original (solid lines) and new synthesized (dashed lines) DNA while replications forks (triangles) are moving. In this example, the forks originate from two origins (circles) that are initiated at times Inline graphic and Inline graphic. The forks move at a constant speed until they coalesce with another fork (diamond at Inline graphic) or reach the ends of the molecule of length Inline graphic (around Inline graphic and Inline graphic). The right-hand side presents the space-time replication fraction Inline graphic, where Inline graphic is the position along the genome, of the same replication cycle. Orange and blue areas represent unreplicated (Inline graphic) and replicated DNA (Inline graphic), respectively.

The complexity of the replication process traces back to the observation that the initiation program can be inhomogeneous in both space and time (see [5][11] for examples). Spatially inhomogeneous replication firing can be caused by a variety of factors such as an inhomogeneous distribution of pre-replication complexes or their uneven activation during the S phase. This is believed to be caused by factors such as the primary sequence of DNA, the presence of transcription factor binding sites, the chromatin organization of the DNA template and by gene expression [5], [12], [13]. The variability of origin initiation times, on the other hand, can result from the stochastic recruitment of replication initiation factors and the level of checkpoint activity [14][16]. As a consequence of such stochastic initiation, replication origins can also be passively replicated by forks coming from neighboring origins. In summary, modeling DNA replication is challenging because the probability of initiation of an origin varies along the genome, the moment at which an origin fires is stochastic, and origins do not systematically fire at each cell cycle.

DNA replication modeling is also challenged by the lack of direct observations. Experimental techniques using immunofluorescent labels to observe the DNA synthesis provide only snapshots of the replication kinetics [17]. The modeling approach presented in this paper can be used to reveal the detailed replication program responsible for producing these snapshots (initiation rates, fork speeds, stalling events, etc).

Over the last decade, our group has developed an analytic approach to modeling DNA replication kinetics [3], [4], [18][25]. The approach is based on a formalism inspired by the Kolmogorov-Johnson-Mehl-Avrami (KJMA) theory of phase-transition kinetics in one spatial dimension [26][30]. In general, this approach has assumed that there was no significant spatial variation along the genome in the parameters characterizing replication. (Except for Ref. [18] in which we looked at replication in budding yeast, where origins have fixed locations. Reference [18] turns out to be somewhat different from the present case, where origin initiation occurs in extended zones that then show variation along the genome.) In particular, we assumed that origin initiation rates and the rate of DNA synthesis (fork progression velocities) were spatially uniform. Temporal variations, however, were included, and their effects can be important [18], [20], [22][24]. Because our approach gives analytical solutions for the evolution of experimentally measurable quantities such replication progress, fork densities, domain densities, and the like, it is particularly well suited for fitting to experimental data [18], [20]. This offers an advantage compared to other approaches based on lengthy Monte Carlo simulations [31][35] because it requires far less computational power to fit experimental data.

In this paper, we generalize our analytic approach to the case where initiation rates and fork velocities may vary in both space and time. We derive simple rate equations that can be solved numerically to obtain the mean-field space-time replication kinetics. We find the average fork densities in both directions, everywhere along the genome and at any moment during the synthesis (S) phase of the cell cycle. This technique can be used to analyze experimental data from molecular combing [36], [37] and microarrays [38][40]. In addition, since our approach allows us to determine quantities involving DNA replication initiation, progression and termination (e.g., coalescence probability profiles, replication time distributions, etc.), it is particularly suitable for fitting results obtained from experiments based on the single-molecule analysis of replicated DNA (SMARD) where molecules at all stages of DNA replication are considered and the steady-state distribution of replication forks can be determined within a specific portion of the genome [7]. On the other hand, the mean-field assumption assumes that the cell-to-cell variations in parameters relevant to replication are small. It also does not give the statistical variation expected from an analysis of a finite number of cells, even when all cells are identical. Both of these limitations can be addressed by Monte Carlo simulations, which should be seen as complementary to the present approach.

Methods

Simulating DNA replication

Although the goal of this paper is to be able to calculate the average replication kinetics without recourse to numerical simulations, we shall use simulations here to test our model solutions and, more extensively, to test our fitting procedures. As illustrated in Fig. 1, we model DNA replication using a series of origins from which a pair of replication forks emerge to bidirectionally duplicate the DNA. These forks move away from the initiation site until they coalesce with another fork or reach the end of the molecule. At this level of description, only the rate at which forks are initiated, Inline graphic, as well as their propagation speed, Inline graphic, is needed in order to simulate the process of DNA replication. We previously used a Monte Carlo simulation to study the case in which origin initiation rates and fork progression are spatially homogenous along the genome, i.e., Inline graphic and Inline graphic. Such processes are described in detail in Ref. [3]. However, experimental observations indicate that initiation rates can vary in both space and time along the genome and that the speed of replication forks is not necessarily uniform. Hence, the Monte Carlo simulations must be modified to model these inhomogeneous factors. In addition, since in mammalian cells initiation events frequently appear scattered across large genomic regions (rather than being limited to the precise DNA sequences), we included in our simulations the presence of initiation zones. We chose Gaussian profiles for the zones. Although the form of such zones is not clear experimentally, our formalism can work with zones that have an arbitrary initiation-rate profile along the genome (Inline graphic-axis).

As a test case for our new model, we simulated the replication of a genomic region of 1000 kb containing two Gaussian initiation zones of similar size (50 kb), as indicated in Fig. 2. Each zone is assumed to contain origin that fire at different times during the S phase and therefore referred to as “early” and “late.” In Fig. 2, the early zone is centered at 200 kb and is active at all times. The late zone, located at 800 kb, on the other hand, is active only for Inline graphic sec. The late zone is also assumed to be 10 times more efficient than the early one (Inline graphic initiations/kb/sec at the peak of the early zone). The initiation rate Inline graphic indicates the average number of initiations that occur at Inline graphic per length of unreplicated DNA per unit of time. This definition is motivated by the observation that each portion of the genome replicates only once per S phase. For this specific example, we set the fork velocity profile, Inline graphic to a constant value of 0.04 kb/sec. The simulation parameters chosen here are typical of replication in somatic cells in mammalian organisms [41]. Our method can easily include variable velocities, including fork blocks due to DNA damage. (Our approach allows both I and v to be space-time dependent but the results of our test case are easier to interpret when there is only one inhomogeneous contribution to the replication kinetics.) However, experimental results indicate that the effects of the inhomogeneity of Inline graphic are much more important than the effects of the inhomogeneity of Inline graphic (see below and Demczuk et al., unpublished). For simplicity, we used periodic boundary conditions (PBCs) for the fork propagation in our simulations (forks reaching a boundary are re-inserted at the other boundary). Therefore, it is formally equivalent to a circular chromosome (e.g., as in bacteria). Of course, whole-chromosome simulations of eukaryotic chromosomes would not use periodic boundary conditions and would take into account the specific (low) initiation rates found in telomeres [38].

Figure 2. Initiation profile .

Figure 2

Inline graphic used to produce the results presented in Fig. 3. The left-hand side is a density plot of the initiation rate, while the right-hand side shows Inline graphic at various time points. (For clarity, each curve is offset by Inline graphic/kb/sec from the previous one.) The initiation pattern is composed of two Gaussian initiation zones at 200 and 800 kb. The first, or “early,”zone is constant throughout time, while the second, more efficient, “late” zone is turned on at 5000 sec.

Simulation results for our model system for 1 and 1000 cell cycles are presented in Figs. 3 a and 3 b, respectively. Each figure is divided into five parts. Part I shows the replication fraction Inline graphic. For the one-cycle simulation, the value of Inline graphic is either 0 or 1 for unreplicated and replicated DNA. For a finite number of simulations, as in Fig. 3 b–I, the value of Inline graphic is the average value observed throughout the ensemble of simulated cycles (Inline graphic). The fraction Inline graphic thus gives the probability that a specific section of the genome located at Inline graphic is replicated a time Inline graphic. Parts II and III present the left- and right-moving replication forks. Only the trajectories of the forks are displayed for the case of one simulated cycle, while the average observed fork densities are reported for the case of many cycles. We refer to the fork densities presented Fig. 3 b–II and 3 b–III as Inline graphic, where the Inline graphic sign represents the right- and left-moving forks, respectively. These densities equal the number of forks moving in a specific direction per kb at Inline graphic. Finally, parts IV and V show where and when initiations and coalescences occur. In b–IV and b–V, these events are represented using probability density functions, Inline graphic and Inline graphic, for initiations and coalescences, respectively. These densities are expressed in units of 1/kb/sec and are normalized so that Inline graphic.

Figure 3. Comparison between one simulated replication cycle (a), 1000 simulation cycles (b), and our rate-equation solution (c).

Figure 3

In graph (I), the color scale goes from 0 (orange) to 1 (blue); in graphs (II) and (III), it goes from 0 (white) to 0.01/kb (black); in graphs (IV) and (V), it goes from 0 (white) to 1.5Inline graphic/kb/sec (black). In all cases, we used the initiation function Inline graphic presented in Fig. 2 and the fork velocity Inline graphic kb/sec. The genome size is 1000 kb, with periodic boundary conditions. Column I compares the replication fraction Inline graphic in the three cases. The dashed lines in b–I and c–I show the 10%, 50% and 90% contour curves. Columns II and III present the fork densities Inline graphic. Fork densities are expressed in forks/kb in (b) and (c) while trajectories only are shown for the single cell cycle in (a). Columns IV and V present the space-time probability density functions of observing an initiation, Inline graphic, or a coalescence, Inline graphic, respectively. Part (a) shows where and when initiations and coalescences from one cycle occurred while parts (b) and (c) represent probability densities in 1/kb/sec.

Our simulations give detailed information about the replication process and its statistics. Typical quantities of interest that we study include the distribution of whole-genome replication times, the average replication times of different regions, and the average number of initiations and coalescences (as well as their space-time distributions). However, while simulations based on a known scenario can reproduce any experimentally obtainable statistic, the calculation times are long. To fit unknown parameters to a set of experimental data would require large computational resources. This difficulty motivates the analytic methods presented in the next section. Although they use numerical methods to solve differential equations, they are orders of magnitude faster than simulation-based approaches.

Rate-equation approach

As mentioned above, we have developed a theoretical approach that can be substituted for numerical simulations in order to speed up the analysis of a given replication scenario when one is interested in the average replication kinetics. As we will show, integrating our rate-equations system also involves numerical steps, but our approach is still considerably faster than simulation-based models. Moreover, our method directly gives the mean-field kinetics of replicating DNA. This solution is equivalent to the simulation results in the limit where an infinite number of simulations is performed. (Compare Fig. 3 b to Fig. 3 c). In this sense, our technique provides the exact average replication program but does not give information about the cell-to-cell variability of the process, which can be obtained from simulations. Simulations are thus complementary to the present mean-field calculation method.

In this section, we introduce an analytical formalism to model the creation, propagation and annihilation of replication forks during DNA synthesis. To proceed, we derive a set of coupled differential equations that describe the change in the replication fork populations as a function of both the position along the genome (Inline graphic) and the time since the beginning of S phase (Inline graphic). As before, we define Inline graphic to be the probability that a given position of the genome Inline graphic is replicated at time Inline graphic, while Inline graphic represents the right- and left-moving fork densities.

Modeling the replication kinetics using rate-equations

We describe the space-time evolution of the average replication fraction Inline graphic as well as both average fork densities Inline graphic, assuming that the creation and propagation dynamics of the forks are inhomogeneous, i.e., that the initiation function Inline graphic and the fork speeds Inline graphic can vary in space and time. (Again, the Inline graphic signs refer to the direction of propagation of the forks.)

The first equation of our set gives the rate of change of the probability that a given location, Inline graphic, is replicated at time Inline graphic,

graphic file with name pone.0032053.e059.jpg (1)

which is simply given by the product of local fork densities times the rate at which a given fork synthesizes DNA.

The rate of change of the fork densities can be expressed in the form of a “transport” equation,

graphic file with name pone.0032053.e060.jpg (2)

with a “source” and a “sink” term on the right-hand side. The source term, Inline graphic, represents the initiation of new forks at a rate Inline graphic rescaled by the probability that the genome is not already replicated at that position, Inline graphic. The sink term represents the annihilation rate of forks as they coalesce with oppositely moving forks. The coalescence rate is proportional to the two local populations of forks and the relative speed at which these forks are merging. That rate must be normalized by the probability of not being replicated, Inline graphic. The Inline graphic sign on the left-hand side of Eq. 2 arises because both left- and right-moving forks are assigned positive velocities. An expression similar to Eq. 2 has been used to model the growth of crystal lamella [42].

Given a replication scenario for Inline graphic and Inline graphic, Eqs. 1 and 2 can be numerically integrated to obtain Inline graphic and Inline graphic. We solved our set of equations for the same conditions as used for the simulations presented in Fig. 3 b (i.e., Inline graphic given by Fig. 2 and Inline graphic kb/sec). We explicitly integrated our equations using Inline graphic kb and Inline graphic sec (we need Inline graphic to adequately solve this system). We also used PBCs to solve our equations, which means we used Inline graphic and Inline graphic for all Inline graphic. Using Inline graphic and Inline graphic for all Inline graphic as initial conditions, the solution presented in Fig. 3 c agrees with the simulation results of Fig. 3 b within statistical limits. Parts I to III are directly obtained from the solution of our three rate-equations. The densities of parts IV and V are, on the other hand, proportional to the source and sink terms of Eq. 2 , respectively. Hence,

graphic file with name pone.0032053.e081.jpg (3)

where

graphic file with name pone.0032053.e082.jpg (4)

is the average number of initiations per replication cycle. Similarly,

graphic file with name pone.0032053.e083.jpg (5)

where the average number of coalescences per cycle is given by

graphic file with name pone.0032053.e084.jpg (6)

The results of our numerical integrations are Inline graphic and Inline graphic. For a finite-size system with periodic boundary conditions such as the model presented in this paper, we must have Inline graphic. The 0.2% difference between our calculation results is simply due to round-off errors. Our calculation also matches the average number of Inline graphic initiations observed during our 1000 simulations. Finally, note that our model can also be solved using non-periodic boundary conditions in order to study replication of linear DNA. In such a case, the numbers of initiations Inline graphic and coalescences Inline graphic are still given by Eqs. 4 and 6, but we expect Inline graphic.

Start-time distributions

The stochasticity of the replication process modeled here implies that the start and end of S phase (defined by the first origin initiation and the last fork coalescence) occur at different times each single cycle. As illustrated in Fig. 3 a–I, the simulation starts at Inline graphic, but the actual duplication of the DNA does not start before Inline graphic sec. In other words, there is a distribution of replication start times (marked by the first initiation) and also a distribution of end times (marked by the last coalescence).

Our rate equations can be used to calculated the probability that replication has started, Inline graphic, as a function of time, which corresponds to the probability that at least one initiation occurs during the time interval Inline graphic. We calculate this probability in terms of a related quantity, Inline graphic, which is the number of initiations that are expected to happen in Inline graphic, assuming that there were no initiations prior to Inline graphic. We write

graphic file with name pone.0032053.e099.jpg (7)

where Inline graphic is the genome length. Consequently, the probability that at least one initiation occurred prior to time Inline graphic is given by

graphic file with name pone.0032053.e102.jpg (8)

and the replication starting time distribution is simply given by Inline graphic. Figure 4 compares the calculated starting time distribution with simulation results.

Figure 4. Replication starting and ending times density functions,

Figure 4

Inline graphic and Inline graphic , for our model system. Symbols were obtained from simulations, while solid lines were calculated from the solution of our rate-equations.

Equation 8 is valid for any molecule of length Inline graphic, whether periodic or non-periodic boundary conditions are considered. However, Eq. 8 must be modified if one is studying a finite-size fragment that is part of a larger molecule. A fragment thus corresponds to a finite-length linear molecule without PBCs but with a flux of forks at its boundaries (these forks were previously initiated elsewhere outside the fragment region). In order to calculate the starting probability of such a fragment, we first define the notion of directional replication fractions, Inline graphic, which are the probabilities that the location Inline graphic has been replicated by a right- or a left-moving fork. These replication fractions are obtained from

graphic file with name pone.0032053.e109.jpg (9)

and can be calculated as a by-product of the numerical integration of Eq. 1 . Thus, for a fragment that begins at Inline graphic and ends at Inline graphic, the probability that replication has started is given by

graphic file with name pone.0032053.e112.jpg (10)

where

graphic file with name pone.0032053.e113.jpg (11)

Equation 10 says that the probability that replication has not started along a given molecule is the product of the probability that no initiation already occurred within the molecule times the probability that no fork came across the molecule boundaries.

End-time distributions

Another useful quantity is the probability that replication has ended at time Inline graphic, Inline graphic. This quantity is of great interest because it tells us when the replication is over. It could therefore be used to study the duplication time of the genome as a function of the replication scenarios. In general, we cannot derive an analytical solution of our nonlinear rate-equations system and, consequently, we cannot derive a formula for Inline graphic as we did for the starting-time distribution. Nonetheless, we can use our knowledge of the replication fork density to estimate Inline graphic as the probability that there is no fork along the genome. For a periodic system, where the number of right-moving forks is always equal to the left-moving forks, we have

graphic file with name pone.0032053.e118.jpg (12)

where we have assumed the number of forks at time Inline graphic to be given by a Poisson distribution (an equivalent estimate for Pe(t) in a system with PBCs is obtained by replacing ρ + (x; t) by ρ (x, t) in Eq. 12). The tilde notation used in Eq. 12 denotes the fact that Inline graphic is an approximation of Inline graphic. The density is normalized by the probability of being replicated. Figure 4 also compares the end-time distribution function, Inline graphic, with simulation results. Note that we can replace Inline graphic by Inline graphic in Eq. 12 to get an estimate of Inline graphic. We expect the approximation that fork distributions are Poisson to be accurate at the beginning and end of S phase, where the number of forks is small, but to be less so in mid-S phase, where there are more forks. For the model explored here, the maximum difference between the calculated and the simulated values of the ending probability is Inline graphic. In Supporting Information S1, we solve exactly the case of a uniform initiation profile and show that the error of our approximation of Inline graphic, when compared to the exact solution, decreases as the number of initiations increases (i.e., as Inline graphic increases).

In the case of a non-periodic system, the lack of forks that move in one direction over a certain range Inline graphic does not imply that the whole range is replicated. Therefore, we must modify Eq. 12 so as to obtain the end-time distribution of finite-size systems without periodic boundary conditions (e.g., finite-length linear DNA or a section of a larger molecule). The probability that a DNA fragment located between Inline graphic and Inline graphic is fully replicated is given by

graphic file with name pone.0032053.e132.jpg (13)

Equation 13 asserts that the replication of a molecule without PBCs has finished if no right-moving forks are observed and if the left boundary is replicated (or vice versa for left-moving forks). As mentioned above, an equivalent estimate for Pe(t) without PBCs is obtained if we substitute the pre-factor f(x , t) by f(x +, t) and use ρ (x, t) instead of ρ +(x, t).

Boundary fork injection

The previous sections presented how our model can be used to study replication of molecules with and without PBCs. Deriving Eqs. 10 and 13, we even demonstrated how to calculate the probability that a sub-section of the modeled systems has started or ended replicating. Here we now show how we can adapt the boundary condition so they act as sources of forks in order to account for initiations that occur outside the modeled DNA segment. These forks mimic initiations occurring outside Inline graphic. The simplest case would be to have a source term that is equivalent to a semi-infinite region where the initiation rate and the fork velocity are constant. In such a case, the density of forks at the boundaries is simply

graphic file with name pone.0032053.e134.jpg (14)
graphic file with name pone.0032053.e135.jpg (15)

where Inline graphic are the constant initiation rates outside the modeled regions (Inline graphic for right-moving forks coming from the Inline graphic region and Inline graphic for left-moving forks initiated at Inline graphic). The derivation for this boundary condition is presented in Supporting Information S1 and the Figure S1.

Stochastic fork progression

Our calculation method can also be adapted to model the impact of DNA damage on replication kinetics. Even in normal, healthy cells, there are a large number of DNA “defects” where forks slow, or even stop. Such damage usually affects only one of the two DNA strands. These single-strand lesions are characterized by base oxidation caused by reactive oxygen species or by base misincorporation due to a copying error during DNA replication. In more serious but rarer cases, defects involve both DNA strands. Examples of such double-strand defects include DNA crosslinking induced by ionizing radiation or double-strand breaks that result from a failed repair to single-strand damage. Double-strand damage is more dangerous because its repair can lead to rearrangements of the genome and even contribute to the development of cancer [43]. Depending on their density and on the repair mechanisms involved, DNA damage can have a strong impact on the replication kinetics. The slow down or stalling of forks at defects gives more time to fire to origins that would otherwise have been passively replicated [44]. Also, fork stalls trigger local and global checkpoint signals that can affect the progression of forks and the firing rate of new origins elsewhere along the genome [11].

If replication speed changes predictably along the genome, one can simply define an appropriate velocity profile Inline graphic. However, fork progression can also be affected in a more stochastic way in the presence of DNA damage. When they encounter such defects, replication forks are stalled for a given period of time until repaired. The repair time depends on the nature of the defects and can either be finite or infinite (i.e., not repaired during the current S phase). In the infinite-repair-time case, the replication of the DNA on the other side of the defect must come from the oppositely moving fork. Such a stochastic blocking/unblocking mechanism can be added to our mathematical framework by modifying our expression for Inline graphic to

graphic file with name pone.0032053.e143.jpg (16)

where the five terms on the right hand side are

  1. the initiation rate of new forks, as we had in Eq. 2 ;

  2. the stall rate of the moving forks, assuming that the average spacing between defects is given by Inline graphic;

  3. the repair rate of the stalled forks, denoted Inline graphic, with the average repair time given by Inline graphic;

  4. the coalescence rate between moving forks, per Eq. 2 ;

  5. the coalescence rate of moving forks that collide with stalled forks.

The densities of stalled forks can be obtained by adding two differential equations to our set. These new equations are used to describe the rate of change of the densities of forks that are stalled at DNA lesions as

graphic file with name pone.0032053.e147.jpg (17)

where the three terms represent stall, repair, and coalescence rates. There is no Inline graphic term on the left-hand side of Eq. 17 because stalled forks are assumed to be fixed in space. A simplified version of this fork-stall model, neglecting spatial inhomogeneity, was the subject of a previous publication [19].

Results

Analyzing experimental data

An obvious application of our analysis would be to reproduce results from experiments based on microarrays [38]. Microarrays provide genome-wide average replication profile as a function of time (derived from the overall molecule replication fraction), which ideally corresponds to the replication fraction Inline graphic obtained from our rate equations (see [39], [40] for examples). Of course, real microarray experiments are not ideal, and issues such as the spatial resolution of the array or the cell-cycle asynchrony of populations should be kept in mind when analyzing the data. In a future contribution, we shall discuss how to reproduce such time-course results. Here, we demonstrate the versatility of our modeling technique by adapting it to the study of a more subtle type of data that has recently been obtained via single molecule analysis of replicated DNA (SMARD), a method developed by Norio et al. [7]. The modeling and fitting procedures presented in this paper were used to analyze a large SMARD data set obtained from mice bone marrow cells (Demczuk et al., unpublished). One feature of such experiments is that the data are obtained from an asynchronous population of cells (i.e., the starting time of each cell in the population is random, drawn from a uniform distribution). Unlike microarrays, SMARD also allows one to determine the steady state distribution of replication forks, as well as the location of initiation events and fork collisions (in addition to the temporal order of replication for a specific portion of the genome). This additional information can be used to determine more precisely the level of origin activity across the genomic region analyzed. We shall need to adapt our model to make predictions for such a case.

Simulating a SMARD data set

The goal of the current section is to adapt our calculation approach to the analysis of an actual experimental setup, the SMARD experiment. The first step towards such a goal is to be able to simulate the data collected during this experiment.

The SMARD procedure is presented in detail in Ref. [7]. Here, we give a brief summary. In a population of asynchronously growing cells, one supplements the normal nucleotides used to synthesize DNA by two different types of halogenated nucleotides that are then conjugated to fluorescent antibodies. For convenience, we shall refer to them as red and green labels. (The first label is red; the second is green). Since cells are replicate asynchronously, the labeling switch can occur at any time relative to the cell cycle for a particular cell. (In particular, the switch will often occur when the cell is not in S phase.) Figure 5 depicts the labeling procedure when the transition happens during the replication process. Part (a) compares the labeling timeline with the replication space-time diagram, while part (b) shows the DNA molecule one would observed after such labeling. As shown in Fig. 5 b, the positions where labels are changing indicate the locations of the replication forks at the switching time (depicted by arrows). Then, if we know the labeling sequence (red followed by green in this case), we can distinguish left- from right-moving forks (forks are moving from red to green zones).

Figure 5. SMARD labeling procedure.

Figure 5

(a) Example of a replication space-time profile and the corresponding SMARD labeling procedure. As before, blue sections indicate replicated DNA while orange sections represent unreplicated DNA. Circles denote fired origins, while diamonds indicate coalescences of replication forks. Periodic boundary conditions were used (circular genome). The dashed line at time Inline graphic sec indicates the end of the first labeling period (red) and the beginning of the second (green) one. Arrows indicates the fork propagation directions at the labeling transition time. The labeling timeline on the right side and the solid line on the space-time profile illustrate the labeling process to produce the molecule example presented in (b). (b) Example of a molecule extracted from the simulation presented in (a). Red sections were replicated during the red pulse (before Inline graphic sec), while green sections were replicated later. To obtain a two-color molecule, the label transition time must occur after the first initiation and before the last coalescence.

In practice, the red- and green-labeling periods are preceded by normal periods of non-fluorescent nucleotide synthesis. If each of these labeling periods is significantly longer than the duplication time of the analyzed molecules, then every molecule that is examined will show one or two types of nucleotide (but never three). All replicated molecules are collected, but only the ones that are fully labeled with fluorescent markers are kept for analysis (fully red, fully green, or red-green molecules).

The molecule-selection procedure described above–replication simulation followed by random molecule selection–can be repeated to collect a distribution of molecules. Figure 6 a shows an example of Inline graphic red-green labeled molecules collected during a simulation of our model system (Fig. 2) using the protocol of the SMARD experiment. We simulated more molecules but kept only the ones with both labels. The red-green molecules in Fig. 6 a are organized according to their red-label content. Note that a simple visual inspection of Fig. 6 a is sufficient to obtain a general sense about the position and relative efficiency of the replication origins located in the region.

Figure 6. Simulation of SMARD experiment with comparison to rate-equation estimates.

Figure 6

(a) Labeled molecules collected from simulations of the SMARD procedure, using the model system of Fig. 2 . Each line corresponds to a molecule as the example presented in Fig. 5 b. Molecules were organized according to their red-label content. Only molecules that were fully substituted with fluorescent nucleotides were considered for the analysis. (b) Red-green content Inline graphic of the molecules from (a) as a function of the position Inline graphic along the genome (circles). A value of one (zero) means that all the molecules are red (green) labeled at a given position. The solid line was calculated using our rate equations for Inline graphic (see Eq. 23). Red-green content was determined by averaging over 5 kb bins; for clarity, only one value in ten is shown. (c) Left- and right-moving fork densities Inline graphic observed in the molecules presented in (a) as a function of the position Inline graphic along the genome (triangles). The fork density is defined as the number of forks per unit length at a given position (using 50 kb bins, 10 times larger than the simulation bin size). The solid line is derived from the rate equations for Inline graphic (see Eq. 24). Gray arrows in background show the locations of initiation zones (i.e., from left to right, the intersections of increasing right-moving fork densities with decreasing left-moving fork densities). (d) Autocorrelation function of average red-green content, computed from the pool of molecules presented in (a). Since we used periodic boundary conditions, the maximum displacement is Inline graphic.

Data analysis

Figures 6 b and 6 c present three statistical “profiles” that are functions of the genome position but averaged over all the simulated molecules shown in Fig. 6 a: the local red-green ratio and the densities of replication forks in both directions. Quantities are averaged over all samples because typical experimental data sets are small (10 to 100 red-green molecules, Demczuk et al., unpublished). As we shall see in the next sections, we can adapt our approach to reproduce such average quantities without having to do simulations.

Figure 6 b shows the red-green content, Inline graphic, as a function of the genome position averaged over all the molecules collected in Fig. 6 a. This quantity is always between one (all red) and zero (all green) and is given by

graphic file with name pone.0032053.e161.jpg (18)

where Inline graphic is the number of samples collected and Inline graphic is the label value (1 for red and 0 for green) of sample Inline graphic at the position Inline graphic. Figure 6 b clearly shows that the positions, widths and amplitudes of the red-green content function peaks correlates with the initiation zones in Fig. 2 . To a first approximation, a maximum of Inline graphic corresponds to an initiation zone, while its numerical value reflects the zone efficiency. We verified that an increase of the initiation zone width also correlates with an increase of the corresponding red-green peak width (not shown).

Another measurement that can be extracted from SMARD experiments is the position of forks along the genome. Figure 6 c shows the fork densities Inline graphic as a function of the genome position (again, the Inline graphic sign refers to right- and left-moving forks, respectively). Since the fork density is defined as number of forks per kb, it is, in the context of the SMARD experiment, given by

graphic file with name pone.0032053.e169.jpg (19)

where the local fork density Inline graphic is the number of forks observed in sample Inline graphic in a bin of size Inline graphic, divided by Inline graphic. Again, the fork densities shown in Fig. 6 c were obtained from all the molecules presented in Fig. 6 a. These figures also show that the two fork densities can be used to characterize the initiation zones. For example, the position of an initiation zone approximately corresponds to the intersection of a decreasing left-moving fork density with an increasing right-moving fork density. Of course, since there are fewer forks per molecule, the fluctuations in densities are higher than the fluctuations in red-green content. Intuitively, this observation results from the fact that initiation zones are regions from which both types of forks emerge, leading to the observed positive and negative gradients of right- and left-moving fork densities across the zones. In other words, a right-moving fork is more likely to survive (not coalesce) as its moves across the zone (and vice versa for left forks). The converse situation, decreasing right-moving fork density and increasing left-density, characterizes termination zones, which are regions where coalescences are more likely to happen.

Estimating SMARD-like data from rate-equations results

Solving the rate equations (Eqs. 1 and 2) does not directly lead to quantities that we can compare to data obtained from SMARD experiments. The quantities Inline graphic and Inline graphic are not simple time averages of Inline graphic and Inline graphic. In the SMARD experiment, one collects only molecules with red and green labels, which means that all of them come from DNA that was replicated during the two labeling periods. For example, that means that fragments can only be collected between Inline graphic sec and Inline graphic sec in the case illustrated in Fig. 5. However, the Inline graphic profile obtained from our rate equations corresponds to the average of an infinite number of space-time replication events similar to the one shown in Fig. 5 but it includes information collected at all times from Inline graphic to Inline graphic. Consequently, the information prior to the first initiations and after the last coalescences that is incorporated in our rate-equation solution must be taken out to model the SMARD results. Fortunately, we can use our knowledge of the probabilities Inline graphic and Inline graphic to estimate Inline graphic and Inline graphic.

In order to convert our calculated mean-field profile Inline graphic to SMARD-like red-green content function Inline graphic, we first recall that Inline graphic is the average of an infinite number of single replication events similar to the one depicted in Fig. 3 a–I (Inline graphic is 0 or 1 in Fig. 3 a–I, while it is a continuous number between 0 and 1 in Fig. 3 b–I and in 3 c–I). The replication fraction profile in Fig. 3 b–I is given by

graphic file with name pone.0032053.e191.jpg (20)

where Inline graphic is a single-event replication profile (as in Fig. 3 a–I), and Inline graphic is the number of events (or simulations). The solution to the rate equations corresponds to Inline graphic. Equation 20 can be re-expressed as

graphic file with name pone.0032053.e195.jpg (21)

where Inline graphic is the replication fraction averaged over the whole molecule. The terms with Inline graphic represent molecules collected at time Inline graphic that have not begun to replicate. They are not included in the sum in Eq. 21 , since they each contribute 0. The terms with Inline graphic represent molecules collected at time Inline graphic that have completely replicated. Their average just gives the probability that replication has ended by that time, Inline graphic.

Assuming the population of cells to be perfectly asynchronous, we can collect molecules at any time Inline graphic, as long as replication has started, but not ended, at time Inline graphic. Consequently, our estimate of the red-green content function Inline graphic from the rate-equation solution is given by

graphic file with name pone.0032053.e205.jpg (22)

where the number of samples Inline graphic is given by the number of replication events Inline graphic times the integral of the probability that DNA is actually being replicated at time Inline graphic (i.e., the probability that replication has started multiplied by the probability that it has not finished). Using Eq. 21 , we can rewrite the red-green content function in a form that can be evaluated in terms of the rate-equation solution:

graphic file with name pone.0032053.e209.jpg (23)

Note that the term Inline graphic corrects for fully replicated molecules that are included in the calculation of Inline graphic but not in Inline graphic. (No correction is needed for completely unreplicated molecules since their Inline graphic-value is zero.) We use Eq. 23 and the solution to the rate equations to plot the solid line in Fig. 6 b.

Similarly, the average fork density in the SMARD experiment Inline graphic is given by

graphic file with name pone.0032053.e215.jpg (24)

After substituting the rate-equation solution into Eq. 24 , we plot the solid lines in Fig. 6 c. In contrast with Eq. 23 , no correction for fully replicated molecules is needed in Eq. 24 since fully replicated molecules have no forks (Inline graphic).

Figure 6 b and c compare our calculated estimates of Inline graphic and Inline graphic to simulation results. These figures demonstrate that Eqs. 23 and 24 can be used to accurately reproduce the simulated profiles obtained from experimentally typical size data set. Consequently, our model can be used to fit SMARD data in order to infer the initiation and fork velocity profiles.

One last issue that needs to be addressed is that the data points obtained from a single SMARD experiment are correlated. We can see this in Fig. 6 d, which plots Inline graphic, the autocorrelation function, as a function of Inline graphic. This means that the probability of being replicated at Inline graphic is not independent of the probability of being replicated at Inline graphic. As a consequence, the weights given each point in a fit must take into account that errors in nearby points are likely to be similar in neighboring bins.

Fitting to correlated data

Standard least-squares fitting programs assume that the statistical errors in each data point in the fit are independent. However, we have just argued that our errors show significant correlations. In order to make valid inferences about issues such as the goodness of fit, we need to take these correlations into account. To do this using standard curve-fitting routines, we linearly transform the data set to diagonalize the covariance matrix (see [45] for example). Such decorrelated data are then independent, which means that standard statistical tests (e.g., the chi-square statistic) can be used to measure the quality of a fit. Moreover, as we shall see, the diagonalization can be done in a way that evenly weights all decorrelated data (i.e., the weights can be set equal to one). Equal weights are optimal numerically for curve fitting.

Let the experimental data be expressed as a one-dimensional vector Inline graphic that comprises the red-green profile and the fork density densities (or any other information we can extract from both the data and our rate-equation solution). The covariance matrix Inline graphic of the data set Inline graphic is then given by

graphic file with name pone.0032053.e226.jpg (25)

where Inline graphic represents an ensemble average over many repetitions of the experiment. The decorrelation procedure requires a matrix Inline graphic that changes coordinates in the data space so that Inline graphic, where the matrix Inline graphic is diagonal. We say that Inline graphic is a decorrelation matrix because the covariance matrix of the decorrelated data, denoted Inline graphic, is given by the diagonal matrix Inline graphic. Given a correlation matrix Inline graphic, many different valid decorrelation matrices can be found, as long as Inline graphic is diagonal.

We can restrict the choices of decorrelation matrices by adding the constraint that all the decorrelated data points should have equal weight. This means that the diagonal matrix Inline graphic can be scaled equal to the identity matrix, which implies that the decorrelation matrix Inline graphic satisfies Inline graphic. One way to obtain such a factorization of the correlation matrix is to perform a Cholesky decomposition of Inline graphic such that [46]

graphic file with name pone.0032053.e240.jpg (26)

where Inline graphic is a lower triangular matrix. The Cholesky decomposition can be performed on the correlation matrix because Inline graphic is, by definition, symmetric and positive definite. Consequently, the Cholesky matrix Inline graphic converts correlated data into evenly weighted decorrelated data (with all weights set to unity). Then, the following recursive procedure can be used to find the best fit of the data set:

  1. Choose an initial replication scenario (initiation rate and velocity profile) that approximately reproduces the observed data Inline graphic. In order to perform a fit, the scenario must be expressed using a finite number of parameters.

  2. Solve the rate equations using the current replication scenario. Estimate the data set Inline graphic, consisting of the red-green and fork-density profiles.

  3. Perform Inline graphic simulations based on the current replication scenario. Each simulation should collect the same number of fully labeled molecules as were collected during the real experiment. Analyze each simulation in the way real molecules were treated, and record the series of simulated data vectors Inline graphic, where the index Inline graphic.

  4. Calculate the covariance matrix of the simulated data, Inline graphic. In practice, if the number of simulation runs is not large enough, the estimated covariance matrix may not be positive definite, as required to perform a Cholesky decomposition. Alternately, one can parametrize (e.g., by exponential decays) the correlations and fit any unknown parameters to simulation data. The form of the parametrized covariance matrix, denoted Inline graphic, can chosen to ensure that Inline graphic is positive definite.

  5. Calculate the Cholesky decomposition matrix, Inline graphic, of the parametrized covariance matrix such that Inline graphic [46].

  6. Decorrelate the observed data Inline graphic using the Cholesky matrix. The decorrelated data, denoted Inline graphic, are given by Inline graphic.

  7. Fit the decorrelated data Inline graphic with the decorrelated solution of our rate-equations, Inline graphic. The fit searches for the replication scenario that minimizes the difference between the decorrelated data vectors Inline graphic and Inline graphic (where the weights of all data sets components are equal and set to unity). The correlated fit solution is given by Inline graphic.

  8. Repeat, starting from Step 2, using the latest fit result as the current replication scenario, until the solution converges.

Fit example

We now apply the correlated data fitting procedure described above to a real SMARD data set. The data we use here and all the experimental details related to their collection can be found in Demczuk et al., unpublished. In this paper, the SMARD technique was used to study DNA replication in mouse bone marrow pro-B cells at different developmental stages. The study was performed on four adjacent restriction fragments that cover about Inline graphic Mb of the genome. Because the fragments come from a much longer genome, we did not use periodic boundary conditions but instead modeled explicitly the injection of outside forks into the studied region.

In Fig. 7 , we present global fits to six different fragments (from Demczuk et al., unpublished). The term “global” here means that all the fragments are simultaneously fit by a common, or global, set of parameters. Fragments 1 to 4 cover the studied region in unrearranged normal pro-B cells (left side of Fig. 7). The last two fragments (Inline graphic and Inline graphic) come from a clonal population of cells containing a genomic rearrangements within fragment 3 (right side of Fig. 7). The rearrangement of fragment 3 into Inline graphic consist in a genomic deletion of approximatively 65 kb (located at 68 kb from the right end of fragment 3, see dashed lines in Fig. 7).

Figure 7. SMARD analysis of DNA replication in mouse bone marrow pro-B cells.

Figure 7

The left side presents the data collected from four fragments covering a Inline graphic Mb region in normal cells. The right side shows data obtained from clone cells where the genome sequence was rearranged (65 kb was deleted from the genome). ÊThe deletion is located between the two dashed lines on the left side graphs. Only the equivalent of fragments 3 and 4 from normal cells was studied in the clonal population. Symbols represent experimental data while solid lines refer to the solution of our rate-equation system. (a) Red-green content Inline graphic obtained from Eqs. 18 (symbols) and 23 (solid lines). (b, c) Left- and right-moving fork densities Inline graphic given by Eqs. 19 (symbols) and 24 (solid lines). (d) Best fit result for the initiation rate Inline graphic (solid lines) and boundary fork injection rates (symbols) used to solve our rate-equations. The best-fit fork velocities we obtained were Inline graphic kb/sec and Inline graphic kb/sec for normal and clonal cell populations, respectively. Errors bars in (a, b, c) were obtained from simulations of the best-fit replication scenario.

In fitting the experimental data, we made the following assumptions about the replication scenario:

  1. Based on the normal cell red-green content profile (left side of Fig. 7 a), we assumed that two initiations zones are present (around 250 kb and 1150 kb). Each zone has three parameters that describe the position, width, and initiation rate of the zone. Another parameter defines a constant background of initiation (this parameter was added because low levels of initiations were observed outside the initiation regions). Finally, two other parameters describe fork injection rates at the boundaries of the modeled region (see filled symbols in Fig. 7 d).

  2. For practical reasons, we assumed that the shape for the initiation zones was a rounded box, such as the ones shown in Fig. 7 d. As we see in Fig. 7 , the red-green content profile is not too sensitive to the precise shape of the initiation zones (e.g., the red-green content maxima have smoother edges than their corresponding boxy initiation zones).

  3. We also assume that the initiation profile does not change with time during the S phase. Time-dependent profiles were considered but did not affect significantly the fit (unpublished observation).

  4. Data sets from unrearranged and rearranged alleles were assumed to have the same initiation rates except within fragments 3/Inline graphic. The linear red-green content profiles and the corresponding fork densities of fragments Inline graphic and Inline graphic indicate that these fragments are almost always replicated by left-moving forks coming from the right side of fragment Inline graphic. We thus assumed that the initiation profile of the deleted allele is the same as the one of undeleted allele except for the absence of the second initiation zone located within the deleted region (compare fragments 3 and Inline graphic in Fig. 7 d).

  5. We assumed a constant velocity throughout the four fragments. However, the experimental results presented in Demczuk et al., unpublished, indicate that forks propagated at different speeds in these two experiments (probably caused by differences in the growing rate of the cultured cells in the two experiments). Therefore, we used two fork speed parameters, one for fragments 1 to 4 and another one for fragments Inline graphic and Inline graphic.

The hypothetical replication described above comprises 11 free parameters that can be adjusted throughout a fitting routine (6 for the two initiation zones, 1 for the background initiation rate, 2 for forks coming from outside the modeled region, and 2 for velocities in both cell types). Using that hypothesis, we followed the fitting procedure described in Section to perform a global fit of the SMARD data collected from the six fragments. The fitted Inline graphic and Inline graphic profiles are shown as solid lines in Fig. 7 a–c. The best-fit results are illustrated in Fig. 7 d as an initiation-rate curve. Note that our rate-equation system has to be solved two times for a given set of parameters (with and without the second initiation zone for normal and clone cells, respectively).

Since determining the replication program was the aim of the experiment, the quality of the fit cannot be directly compared to the “actual” replication program. However, SMARD provides information that was not used for the fit. Hence, it is possible to verify that the result of the fit are consistent with this additional information. First, the fitted fork velocities we obtained are Inline graphic kb/sec and Inline graphic kb/sec (both Inline graphic kb/sec) for the normal and clonal data set. The corresponding experimental values are Inline graphic kb/sec and Inline graphic kb/sec (Demczuk et al., unpublished). Considering the small sample sizes used to obtain these fork velocities (from 11 to 57 fully labeled molecules only, depending on the fragment, Demczuk et al., unpublished), we evaluated from simulations the statistical errors for the measured fork velocities (Inline graphic). (Experimentally, the fork velocity within a fragment is calculated as Inline graphic, where Inline graphic is the fragment length, n f the average number of forks observed per fragments, and t rep the replication time of the fragment. The replication time is given by Inline graphic, where Inline graphic is the number of fully red (or green) labeled fragments while n rg is the number of fully labeled fragments that have incorporated both labels (Demczuk et al., unpublished).) Thus, our fitted values nicely agree with the experiments. Second, the position of the second initiation zone, [1.11 Mb, 1.17 Mb] (Inline graphic Mb), is almost completely located within the genomic deletion region of fragment 3, which is found between [1.12 Mb, 1.18 Mb]. (Remember that we did not use the deletion location to restrict the second initiation zone position while fitting.)

Our fit result has a reduced chi-square statistic of Inline graphic with 694 degrees of freedom. This high Inline graphic value is due to the simplistic initiation function we used. For example, a more complicated initiation function could be used to obtain a better fit of the red-green content profiles (e.g., we could use a higher initiation rate at the right side of fragment 2 or a different shape for the zone in fragment 3). Nevertheless, we believe that the simple replication scenario used here captures the most important features of the data set. Moreover, when we use the fit result to perform simulations of the SMARD experiment, we obtain statistics about the initiation/coalescence events and the replication time of each fragments that agree with the experimental values (Demczuk et al., unpublished).

Discussion

Over the years, various experimental approaches have been used to measure the absolute and relative efficiencies of origin firing in eukaryotic cells. However, the efficiency of origin firing does not encapsulate all the information required to understand how DNA origins of replication are regulated. Since eukaryotic genomes contain large numbers of origins, understanding their regulation requires a quantitative analysis of the dynamics of origin firing along the genome and across S phase. Achieving this goal requires comprehensive data sets about DNA replication across large genomic regions, as well as mathematical procedures for the analysis of complex data sets.

In this manuscript, we present a new set of rate equations that can be used to calculate the firing rate of DNA origin of replication using multiple sets of data (temporal order of replication, fork density, replication time). Our mathematical procedure is versatile and allows the analysis of complex data sets obtained using various experimental approaches (SMARD, microarrays, etc.). This is possible because our model follows the spatial and temporal evolution of several replication factors. In contrast, previous procedures have mostly relied on the analysis of individual parameters of DNA replication that can be modeled with limited detail (e.g., timing of replication). The main advantage of this technique is that the rate-equation solution corresponds to the exact mean-field replication program. Our approach thus provides more precise information about average replication kinetics than Monte Carlo simulations. It is faster, too. As discussed previously, simulation remains the appropriate technique for estimating statistical fluctuations of replication-related quantities. Since average replication kinetics is often the only information obtainable from experiments, our model is, in many practical cases, sufficient to reproduce experimental data. For these reasons, our mathematical procedure makes it possible to perform a faster, and more thorough, analysis of the process of DNA replication initiation and of its regulation in complex eukaryotes.

Although our procedure can be used to analyze data sets obtained with different experimental approaches, we validated it using results of recent SMARD experiments performed across a 1.4 Mb region which spans the mouse immunoglobulin heavy chain locus (Demczuk et al., unpublished). We chose these experiments because, besides providing the data sets used in all the calculations, SMARD provided us with additional information that could be directly compared with the predictions of the procedure (e.g., the location of initiation events and fork collisions, the number of molecules containing such events, and the average number of events per molecules). The close match between calculated and experimental data sets indicates that our procedure can be used to make valuable inferences about various aspects of DNA replication in eukaryotes, with the calculations taking only modest computer resources. The usefulness of our model was illustrated by the series of fits of SMARD data we performed in Demczuk et al., unpublished.

In Demczuk et al., unpublished, the methods presented here implied that origin firing within the mouse Igh locus is compatible with the stochastic firing of origins throughout S phase, with a rate that varies along the locus. The Igh locus is divided into domains of similar firing rates, and the rate of firing within these domains is developmentally regulated. These observations contrast notably with results obtained in budding yeast, where the rate of firing varies from origin to origin and coordination in origin activity has not been observed [18]. Moreover, this approach allowed us to study various aspects of the developmental regulation of origin activity during B cell development.

In summary, the mathematical procedure described in this study has already provided new insights on the regulation of DNA replication initiation in mammalian cells and makes possible the study of additional phenomena such as replication time in the presence of fork velocities that depend on genome location or the impact of a correlation between initiation rates and fork density. Our method is thus a natural starting point for investigating checkpoint mechanisms where, for example, the cell regulates the local or global replication activity in response to various intra- or extracellular feedback signals.

Supporting Information

Figure S1

Space-time diagram of replication with inhomogeneous fork speeds. The space-time point Inline graphic is replicated by an initiation that occurred within the shaded area (e.g., initiation A). By contrast, initiation B will replicate the location Inline graphic but only at a time Inline graphic. The inset defines symbols that refer to different portions of the shaded area. Note that Inline graphic.

(TIF)

Supporting Information S1

Ending probability (homogeneous case). Modeling fork injection at boundaries.

(PDF)

Acknowledgments

We thank N. Rhind and S. Jun for their careful reading of our manuscript.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: Michel G. Gauthier and John Bechhoefer were supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and by the Human Frontier Science Program (HFSP). Paolo Norio was supported by National Institutes of Health grant R01GM080606. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Kaufmann WK. The human intra-S checkpoint response to UVC-induced DNA damage. Carcinogenesis. 2010;31:751–65. doi: 10.1093/carcin/bgp230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Branzei D, Foiani M. The DNA damage response during DNA replication. Curr Opin Cell Biol. 2005;17:568–75. doi: 10.1016/j.ceb.2005.09.003. [DOI] [PubMed] [Google Scholar]
  • 3.Jun S, Zhang H, Bechhoefer J. Nucleation and growth in one dimension. I. The generalized Kolmogorov-Johnson-Mehl-Avrami model. Phys Rev E. 2005;71:011908. doi: 10.1103/PhysRevE.71.011908. [DOI] [PubMed] [Google Scholar]
  • 4.Jun S, Bechhoefer J. Nucleation and growth in one dimension. II. Application to DNA replication kinetics. Phys Rev E. 2005;71:011909. doi: 10.1103/PhysRevE.71.011909. [DOI] [PubMed] [Google Scholar]
  • 5.Norio P, Kosiyatrakul S, Yang Q, Guan Z, Brown NM, et al. Progressive activation of DNA replication initiation in large domains of the immunoglobulin heavy chain locus during B cell development. Mol Cell. 2005;20:575–587. doi: 10.1016/j.molcel.2005.10.029. [DOI] [PubMed] [Google Scholar]
  • 6.Norio P, Schildkraut CL. Plasticity of DNA replication initiation in Epstein-Barr virus episomes. PLoS Biol. 2004;2:816–833. doi: 10.1371/journal.pbio.0020152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Norio P, Schildkraut CL. Visualization of DNA replication on individual Epstein-Barr virus episomes. Science. 2001;294:2361–2364. doi: 10.1126/science.1064603. [DOI] [PubMed] [Google Scholar]
  • 8.Katsuno Y, Suzuki A, Sugimura K, Okumura K, Zineldeen DH, et al. Cyclin A-Cdk1 regulates the origin firing program in mammalian cells. P Natl Acad Sci USA. 2009;106:3184–9. doi: 10.1073/pnas.0809350106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Courbet S, Gay S, Arnoult N, Wronka G, Anglana M, et al. Replication fork movement sets chromatin loop size and origin choice in mammalian cells. Nature. 2008;455:557–60. doi: 10.1038/nature07233. [DOI] [PubMed] [Google Scholar]
  • 10.Huvet M, Nicolay S, Touchon M, Audit B, d'Aubenton Carafa Y, et al. Human gene organization driven by the coordination of replication and transcription. Genome Research. 2007;17:1278–85. doi: 10.1101/gr.6533407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Herrick J, Bensimon A. Global regulation of genome duplication in eukaryotes: an overview from the epiuorescence microscope. Chromosoma. 2008;117:243–60. doi: 10.1007/s00412-007-0145-1. [DOI] [PubMed] [Google Scholar]
  • 12.Lebofsky R, Heilig R, Sonnleitner M, Weissenbach J, Bensimon A. DNA replication origin interference increases the spacing between initiation events in human cells. Mol Biol Cell. 2006;17:5337–45. doi: 10.1091/mbc.E06-04-0298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nieduszynski CA, Blow JJ, Donaldson AD. The requirement of yeast replication origins for pre-replication complex proteins is modulated by transcription. Nucleic Acids Res. 2005;33:2410–20. doi: 10.1093/nar/gki539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Alexandrow MG, Hamlin JL. Chromatin decondensation in S-phase involves recruitment of Cdk2 by Cdc45 and histone H1 phosphorylation. J Cell Biol. 2005;168:875–86. doi: 10.1083/jcb.200409055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Marheineke K, Hyrien O. Aphidicolin triggers a block to replication origin firing in Xenopus egg extracts. J Biol Chem. 2001;276:17092–100. doi: 10.1074/jbc.M100271200. [DOI] [PubMed] [Google Scholar]
  • 16.Marheineke K, Hyrien O. Control of replication origin density and firing time in Xenopus egg extracts: role of a caffeine-sensitive, ATR-dependent checkpoint. J Biol Chem. 2004;279:28071–81. doi: 10.1074/jbc.M401574200. [DOI] [PubMed] [Google Scholar]
  • 17.Patel PK, Arcangioli B, Baker SP, Bensimon A, Rhind N. DNA replication origins fire stochastically in fission yeast. Mol Biol Cell. 2006;17:308–16. doi: 10.1091/mbc.E05-07-0657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yang SCH, Rhind N, Bechhoefer J. Modeling genome-wide replication kinetics reveals a mechanism for regulation of replication timing. Molecular Systems Biology. 2010;6:404. doi: 10.1038/msb.2010.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gauthier MG, Herrick J, Bechhoefer J. Defects and DNA replication. Phys Rev Lett. 2010;104:218104. doi: 10.1103/PhysRevLett.104.218104. [DOI] [PubMed] [Google Scholar]
  • 20.Gauthier MG, Bechhoefer J. Control of DNA replication by anomalous reaction-diffusion kinetics. Phys Rev Lett. 2009;102:158104. doi: 10.1103/PhysRevLett.102.158104. [DOI] [PubMed] [Google Scholar]
  • 21.Yang SCH, Gauthier MG, Bechhoefer J. 2009. pp. 555–574. Computational methods to study kinetics of DNA replication, in DNA Replication: Methods and Protocols, Humana Press, chapter 32.
  • 22.Bechhoefer J, Marshall B. How Xenopus laevis replicates DNA reliably even though its origins of replication are located and initiated stochastically. Phys Rev Lett. 2007;98:098105. doi: 10.1103/PhysRevLett.98.098105. [DOI] [PubMed] [Google Scholar]
  • 23.Zhang H, Bechhoefer J. Reconstructing DNA replication kinetics from small DNA fragments. Phys Rev E. 2006;73:051903. doi: 10.1103/PhysRevE.73.051903. [DOI] [PubMed] [Google Scholar]
  • 24.Yang SCH, Bechhoefer J. How Xenopus laevis embryos replicate reliably: Investigating the random-completion problem. Phys Rev E. 2008;78:041917. doi: 10.1103/PhysRevE.78.041917. [DOI] [PubMed] [Google Scholar]
  • 25.Herrick J, Jun S, Bechhoefer J, Bensimon A. Kinetic model of DNA replication in eukaryotic organisms. J Mol Biol. 2002;320:741–750. doi: 10.1016/s0022-2836(02)00522-3. [DOI] [PubMed] [Google Scholar]
  • 26.Kolmogorov A. A statistical theory for the recrystallization of metals. Bull Acad Sc USSR, Phys Ser 1. 1937;1:335. [Google Scholar]
  • 27.JohnsonWA, Mehl FL. Reaction kinetics in processes of nucleation and growth. Trans AIME. 1939;135:416. [Google Scholar]
  • 28.Avrami M. Kinetics of phase change. I General theory. J Chem Phys. 1939;7:1103. [Google Scholar]
  • 29.Avrami M. Kinetics of phase change. II Transformation-time relations for random distribution of nuclei. J Chem Phys. 1940;8:212. [Google Scholar]
  • 30.Avrami M. Granulation, phase change, and microstructure - Kinetics of phase change. III. J Chem Phys. 1941;9:177. [Google Scholar]
  • 31.Blow JJ, Ge XQ. A model for DNA replication showing how dormant origins safeguard against replication fork failure. EMBO Rep. 2009;10:406–412. doi: 10.1038/embor.2009.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lygeros J, Koutroumpas K, Dimopoulos S, Legouras I, Kouretas P, et al. Stochastic hybrid modeling of DNA replication across a complete genome. P Natl Acad Sci USA. 2008;105:12295–300. doi: 10.1073/pnas.0805549105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Goldar A, Labit H, Marheineke K, Hyrien O. A Dynamic stochastic model for DNA replication Initiation in early embryos. PLoS ONE. 2008;3:e2919. doi: 10.1371/journal.pone.0002919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Spiesser TW, Klipp E, Barberis M. A model for the spatiotemporal organization of DNA replication in Saccharomyces cerevisiae. Mol Genet Genomics. 2009;282:25–35. doi: 10.1007/s00438-009-0443-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.de Moura APS, Retkute R, Hawkins M, Nieduszynski CA. Mathematical modelling of whole chromosome replication. Nucleic Acids Res. 2010;38:5623–33. doi: 10.1093/nar/gkq343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bensimon A, Simon A, Chiffaudel A, Croquette V, Heslot F, et al. Alignment and sensitive detection of DNA by a moving interface. Science. 1994;265:2096–8. doi: 10.1126/science.7522347. [DOI] [PubMed] [Google Scholar]
  • 37.Herrick J, Bensimon A. Imaging of single DNA molecule: applications to high-resolution genomic studies. Chromosome Res. 1999;7:409–23. doi: 10.1023/a:1009276210892. [DOI] [PubMed] [Google Scholar]
  • 38.Raghuraman MK, Winzeler EA, Collingwood D, Hunt S, Wodicka L, et al. Replication dynamics of the yeast genome. Science. 2001;294:115–21. doi: 10.1126/science.294.5540.115. [DOI] [PubMed] [Google Scholar]
  • 39.Feng W, Collingwood D, Boeck ME, Fox LA, Alvino GM, et al. Genomic mapping of singlestranded DNA in hydroxyurea-challenged yeasts identifies origins of replication. Nat Cell Biol. 2006;8:148–55. doi: 10.1038/ncb1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Heichinger C, Penkett CJ, Bähler J, Nurse P. Genome-wide characterization of fission yeast DNA replication origins. EMBO J. 2006;25:5171–9. doi: 10.1038/sj.emboj.7601390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Conti C, Saccà B, Herrick J, Lalou C, Pommier Y, et al. Replication fork velocities at adjacent replication origins are coordinately modified during DNA replication in human cells. Mol Biol Cell. 2007;18:3059–3067. doi: 10.1091/mbc.E06-08-0689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Frank FC. Nucleation-controlled growth on a one-dimensional growth of finite length. J Cryst Growth. 1974;22:233–236. [Google Scholar]
  • 43.Vilenchik MM, Knudson AG. Endogenous DNA double-strand breaks: production, fidelity of repair, and induction of cancer. P Natl Acad Sci USA. 2003;100:12871–6. doi: 10.1073/pnas.2135498100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Woodward A, Göhler T, Luciani M, Oehlmann M, Ge X, et al. Excess Mcm2–7 license dormant origins of replication that can be used under conditions of replicative stress. J Cell Biol. 2006;173:673. doi: 10.1083/jcb.200602108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tellinghuisen J. On the least-squares fitting of correlated data: Removing the correlation. Journal of Molecular Spectroscopy. 1994;165:255–264. [Google Scholar]
  • 46.Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes: The Art of Scientific Computing, Third Edition. New York: Cambridge University Press; 2007. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Space-time diagram of replication with inhomogeneous fork speeds. The space-time point Inline graphic is replicated by an initiation that occurred within the shaded area (e.g., initiation A). By contrast, initiation B will replicate the location Inline graphic but only at a time Inline graphic. The inset defines symbols that refer to different portions of the shaded area. Note that Inline graphic.

(TIF)

Supporting Information S1

Ending probability (homogeneous case). Modeling fork injection at boundaries.

(PDF)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES