Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 4.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2012 Sep 17;86(3 0 1):031916. doi: 10.1103/PhysRevE.86.031916

Mathematical modeling of genome replication

Renata Retkute 1, Conrad A Nieduszynski 1, Alessandro de Moura 2
PMCID: PMC3671344  EMSID: EMS52998  PMID: 23030953

Abstract

Eukaryotic DNA replication is initiated from multiple sites on the chromosome, but little is known about the global and local regulation of replication. We present a mathematical model for the spatial dynamics of DNA replication, which offers insight into the kinetics of replication in different types of organisms. Most biological experiments involve average quantities over large cell populations (typically >107 cells) and therefore can mask the cell-to-cell variability present in the system. Although the model is formulated in terms of a population of cells, using mathematical analysis we show that one can obtain signatures of stochasticity in individual cells from averaged quantities. This work generalizes the result by Retkute et al. [Phys. Rev. Lett. 107, 068103 (2011)] to a broader set of parameter regimes.

I. INTRODUCTION

DNA replication, the process during which cells’ genetic information is duplicated, is one of the most fundamental processes in biology. Eukaryotic cells regulate the replication of their genomes in a highly complex manner: it is vital that chromosomal replication be completed before cell division takes place, in order to pass full and accurate genetic information to the daughter cells.

Cell cycle progression in eukaryotic organisms consists of four morphologically distinct phases: a gap phase (G1) during which a cell grows, followed by the DNA synthesis (S) phase, when the cell’s genome is duplicated, a second gap phase (G2), and the mitosis (M) phase, when the cell divides into two [1]. The duration of each of these phases varies from organism to organism, but in all eukaryotes chromosome replication occurs during the S phase and is initiated at specific locations on the chromosome called replication origins. When an origin is activated, two replication forks are formed, which travel on the chromosome in opposite directions. The number of origins varies depending on species and cell type; for example, most bacterial genomes are replicated from a single origin. The size of eukaryotic genomes necessitates the use of multiple origins to ensure timely and complete replication prior to cell division. The use of multiple origins requires tight regulation of origin activity to ensure that sufficiently many origins are activated, but that no origin is activated more than once in a single round of genome replication. This is achieved by the mechanism of licensing of replication origins. This consists of binding of a series of specific protein complexes at origin sites in the DNA, followed by the loading of pairs of Mcm2-7 molecules. If in a given cell licensing of a certain origin is not completed by the time the S phase starts, that origin is unable to function [2]. Also regulated are the number of origins that are activated in a given S phase and timing of the replication of specific regions on a chromosome [3].

There has been much interest recently in the mathematical modeling of DNA replication [415]. Two different modeling approaches have been used: simulations to capture the replication dynamics at a single cell level [812], and probabilistic models that characterize the dynamics of replication at a population level [5,7,1315]. Although valuable insights have been gained from previous works on mathematical modeling, they ignore the possibility that origins can fail to license, and we will show that this has a crucial effect on the system’s dynamics [4,5]. In addition, most of the existing models are numerical.

In this work, we analyze an analytical model of eukaryotic DNA replication which fully takes into account the stochastic nature of both the licensing process and origin activation [5]. We start by formulating a general model for organisms with multiple chromosomes and multiple origins; then we analyze in detail the kinetics of replication for an idealized chromosome with two origins, where the most important features of the replication dynamics can be studied in its simplest nontrivial case. We focus on quantities which are of biological interest: replication profiles, average number of active origins, replicon sizes, and others; and we derive explicit analytical expressions for them.

Genome replication has been comprehensively studied in the model organism Saccharomyces cerevisiae (brewer’s yeast). High-throughput experiments have allowed the measurement of replication times as a function of chromosomal position for the whole genome [16]. These methods yield average replication times at distinct chromosomal positions, over large cell populations (typically >107 cells), and therefore can mask the cell-to-cell variability present in the system [17]. To date single cell and single molecule studies are not able to measure the kinetics of whole genome replication [17,18]. The low abundance of the molecules involved in triggering origin activation strongly suggests that origins have stochastic activation times [19], with origins having different relative activation probabilities [20].

Replication origins in eukaryotic organisms must successfully complete the process of licensing in order to be able to replicate during the S phase. Licensing involves a number of orchestrated binding events between origin sites and molecules, some of which are present in low numbers in the cell. Once the S phase starts, no further licensing is allowed, so each origin has a relatively short time window in which to complete licensing. This suggests that a given origin will not manage to complete licensing in every cell within a population before the onset of the S phase, due to inhomogeneities in the abundance of key molecules caused by stochasticity in their expression. We anticipate that different origins will have different licensing efficiencies, due to known differences in affinities to the various species involved in licensing [21], as well as stochasticity in the chemical dynamics of licensing. This leads us to the concept of origin competence [4], which is the probability that a given origin in a population will finish the licensing process before the S phase, and is thus eligible for originating replication forks. The competence of an origin is hard to measure directly, although plasmid replication efficiency experiments suggest that many origins in yeast have low competence, making the concept relevant to the study of the dynamics of replication. The concept of competence introduces an additional parameter to each origin in a chromosome, which has the technical disadvantage that it makes parameter estimation from data somewhat problematic, as discussed in [4]. Despite this, we feel there are compelling biological reasons to take this effect into account and investigate its consequence for DNA replication dynamics; this is one of the main goals of this work.

We note that this idea of origins with less than 100% competence is compatible with the hypothesis that origins have a probability distribution of having different numbers of Mcm2-7 molecules, which is proposed in [20] to determine their activation times. But any mechanism generating a stochastic distribution of Mcm2-7 numbers in an origin will have a nonzero probability of not having a pair of Mcm2-7 molecules in any particular origin; this corresponds exactly to the origin having failed to license, and therefore having a competence below 100%.

We notice that the dynamics of DNA replication has many similarities with the process of nucleation, and some models of DNA replication [7,14] are closely related to Kolmogorov’s classical model of nucleation [22]. The model we propose here can be regarded as an inhomogeneous model of nucleation with quenched disorder, where nucleation starts at specific sites. Inhomogeneous models of nucleation have been studied in the context of statistical physics and have relevance to surface science and other areas [23,24].

II. THE MODEL

In our model we consider a chromosome with N origins, where each origin i is defined by the following parameters: its chromosomal position xi; the probability qi that the origin achieves licensing (in a given cell within a population), and is thus capable of activating, i.e., competence; and the activation time probability distribution pi(t), which is the probability density of origin i activating and starting bidirectional replication forks at time t [4]. Since an origin may not be competent in every cell within the population, in general qi < 1, and pi satisfies

+pi(t)dt=qi.

The fundamental quantity from which all statistical properties of this system can be calculated is the probability density P(x,t), defined such that P(x,t)dt is the probability that chromosomal position x is replicated between times t and t + dt. If only origin i were present, P would be given by P(x,t) = pi(t – |xxi|/ν), where ν is the fork velocity, which we assume to be a constant.

In the presence of all N origins, the calculation of P(x,t) is complicated by the fact that position x can be replicated by forks originated from any of the origins [4,14]. Let us assume that position x is replicated between times t and t + dt by a fork from origin i. This requires that (i) origin i activated at time t – |xxi|/ν, so that the fork arrives at x at time t; and (ii) all other origins ji either have not activated or they have activated but their forks would arrive at x later than t. This allows us to account for passive replication—i.e., an origin is inactive due to replication by a fork that originated at another origin The probability density for event (i) is

pi(x,t)=pi(txxiv), (1)

and the probability for event (ii) is

Qi(x,t)=jiMj(x,t), (2)

where Mi is the probability that a fork from origin i arrives later than t, or fails to activate:

Mi(x,t)=si+t+pi(x,τ)dτ, (3)

where si = 1 – qi is the probability of origin i not being competent. Therefore, the probability density Pi(x,t) that position x is replicated by origin i at time t is

Pi(x,t)=pi(x,t)Qi(x,t). (4)

Finally, the probability density that position x is replicated at time t, irrespective of which origin the fork started from, is [5]

P(x,t)=i=1NPi(x,t). (5)

Using Eq. (5), expressions for various quantities of biological interest can be found. We notice here that similar expressions have been derived before for the case where all the origins are 100% competent [14].

III. GENERAL RESULTS FOR THE MODEL

Now we will derive expressions for important quantities characterizing the replication dynamics, which are of great interest to biologists working in this area: the mean replication time; the efficiency of origins, that is, in what fraction of cells a given origin has activated in a round of replication; the fraction f of total DNA replicated (or the relative DNA content) at any given time in the S phase; the rate of origin initiation in a population; the number of replication forks and active origins as a function of time; the probability of fork termination and how it depends on the chromosomal position; and the interorigin distance distribution between active origins. All these quantities are derived below from the fundamental expression Eq. (5).

A. Mean replication time

One of the most important quantities in the area of DNA replication biology is the replication profile, which is the mean replication time T(x) at the chromosomal position x; here the average is over a large population of cells, which is the typical situation in experiments. There are a number of techniques for obtaining whole-genome replication profiles, such as the density transfer method or by measuring relative DNA copy number using microarray [16,25] or next generation sequencing [26,27] techniques. In the biological literature, the term “replication profile” has a number of different but related meanings: mean replication time, mean percentage replicated, mean copy number at a chromosomal position; or the so-called S:G1 ratio, where cells in the G1 and S phases of the cell cycle are sorted and numbers of cells in both phases are compared for each sequence [2733]. All of those quantities are closely related to the mean replication time T(x), and they contain essentially the same information. Therefore in this paper the term “replication profile” will mean T(x).

From Eq. (5), we can write the mean replication time as

T(x)=11i=1nsi+tP(x,t)dt, (6)

where the normalization factor 1i=1nsi=+P(x,t)dt is the that at least one of the origins probability will activate. In Eq. (6) we are thus excluding the situation where all origins simultaneously fail from the definition of the average; the probability of this happening is very remote in real cells. The probabilities will be defined in this way in all the remaining expressions in this paper.

The replication profile ([T(x) curves] has been measured in a number of organisms. However, caution is required when interpreting T(x) curves. In some of the biological literature T(x) curves are used to directly infer origin parameters [16]. For example, it is widely accepted that the values of T at xi are the average activation times of origins. However, Eq. (6) shows that T(x) is determined collectively by all origins [4,5], which suggests that simple interpretations of T(x) are likely to be misleading. This issue will be discussed in more detail in Sec. VII.

B. Fraction of DNA replicated

The fraction of cells in a population which have position x on the chromosome replicated by time t can be calculated by integrating the probability P(x,t) with respect to t [5]:

m(x,t)=11i=1nsitP(x,τ)dτ. (7)

The higher the value of the fraction replicated, the earlier the chromosome position replicates on average during the S phase [16].

Replication dynamics has also been studied by measuring the copy number change by differentiating between replicated and nonreplicated stages of the cell cycle [25,27]. The copy number C(x,t) of the position on the genome at a particular time t is given by

C(x,t)=m(x,t)+1. (8)

A twofold increase in copy number is required as the cell progresses from the G1 to the G2 phase.

A relationship between mean copy number C(x) and mean replication time T(x) for the S phase which starts at time T1 and finishes at time T2:

C(x)=T2T(x)T2T1+1; (9)

for details see Appendix A.

The replicated fraction f(t) of a genome of length L is given by the average over the whole length of the chromosome of the probability that a piece of DNA has replicated by time t:

f(t)=1L0Lm(x,t)dx. (10)

As the population of cells enters into the S phase and replication starts, the chromosomal content is equal to one copy of the genome, and the replicated fraction f(t) is 0. As replication progresses, f(t) increases and towards the end of the S phase the DNA content doubles, ensuring a full complement of DNA for each daughter cell. By measuring the fraction of replicated DNA as a function of time, it is possible to determine the rate of total DNA synthesis during the cell cycle [34].

Figure 1(a) shows the simulated replicated fraction for a virtual chromosome with a length 100 kilobase pairs (kb) with ten origins periodically positioned along a chromosome. In this example, each origin has a Gaussian activation distribution with a mean of 15 min and a standard deviation of 5 min. We see that the fraction of DNA replicated as a function of time has a sigmoidal shape.

FIG. 1.

FIG. 1

Results for a virtual chromosome of length 100 kb with ten origins periodically positioned along the chromosome. Origin activation follows a Gaussian distribution with a mean of 15 min, a standard deviation of 5 min, and a fork velocity 1.5 kb/min. (a) Fraction of replicated DNA f(t), (b) rate of origin initiation g(t), and (c) initiation rate I(t).

C. Rate of origin initiation

The probability density of a single isolated origin i activating and starting bidirectional replication forks at time t is pi(t) (Sec. II). Any given origin i can activate only if forks from neighbor origins ji arrive at position xi later than time t, or fail to activate, and this probability is given by jiMj(xi,t), with Mj defined by Eq. (3). Then the rate of origin initiation g(t) is defined as the sum over all origins of the probability densities of the origins activating at a time t:

g(t)=11i=1nsii=1Npi(t)jiMj(xi,t). (11)

The importance of this quantity lies in the fact that in a large population of cells, the number of origins activating in a small time window (t,t + dt) is proportional to g(t); hence the term “rate” used above.

Population-averaged measurements of replication initiations per time unit per unit length of unreplicated DNA have been obtained for a number of organisms, including S. cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, Xenopus laevis, and Homo sapiens [34,35]. All profiles have been observed to possess a strikingly similar shape: increasing during the first half of the S phase and then decreasing before the end of the S phase.

The initiation rate, i.e., the rate of origin initiations per time unit per unit length of unreplicated DNA, is obtained by dividing Eq. (11) by the chromosome length L and by the fraction of DNA still unreplicated at time t [obtained from Eq. (10)]:

I(t)=g(t)L[1f(t)]. (12)

To test if the model reproduces the behavior of the initiation rate observed in the experiments described in [35], we have applied Eqs. (11) and (12) to the virtual chromosome described in Sec. III B. We consider a virtual chromosome with a length of 100 kb replicated from ten origins periodically positioned along the chromosome; each origin has a Gaussian activation distribution function with a mean of 15 min and a standard deviation of 5 min. The general shape of the curve predicted for the initiation rate per time unit per unreplicated unit of DNA [I(t); Fig. 1(c)] is in good agreement with the experimental results [35]. In the case we have analyzed, the initiation rate I(t) increases during the first 4/5 of the S phase and then declines for the last 1/5 of the S phase. This late decrease is due to the rate of the origin initiation function g(t) reaching its maximum just after mid-S-phase (at 14 min). The end of the S phase is the time at which the fraction of DNA replicated, f(t), reaches 1 (at 25 min).

D. Origin efficiency and average number of active origins

The efficiency of an origin i is the fraction of cells in a population that actually activated that origin in any given S phase [36]. In any given cell, not all origins will be activated during the S phase; for example, in some mammalian cells most origins are used in less than 10% of cell cycles [26].

The efficiency of origin i is the probability that origin i activates at any time, and is therefore given by the integral of the probability that the ith origin initiates between times t and t + dt multiplied by the probability that none of the other origins has replicated position xi before time t [4,14]:

ei=11i=1Nsipi(t)jiMj(xi,t)dt. (13)

It has been observed in experiments that the efficiency of origins depends heavily on the distance and timing of activation of neighboring origins [37]. This is seen directly from Eq. (13), where the term jiMj(xi,t) encodes the influence of other origins on the chromosome.

Another quantity describing the collective dynamics of DNA replication is the average number nO of active origins, that is, on average how many origins have activated in a cell within a population. The presence of multiple origins creates redundancy and the average number of activated origins will be less than the total number of origins present on a chromosome.

The average number of active origins is readily obtained from the origin initiation rate given by Eq. (11):

no=g(t)dt. (14)

Applying Eq. (11) to Eq. (14) and using Eq. (13), we can relate the average number of active origins to origin efficiencies:

no=i=1nei (15)

Figure 2 shows the probability distribution for a number of activated origins on S. cerevisae chromosome VI, based on origin efficiencies determined experimentally [19] and calculations based on normal activation distribution and parameter values from [4]. The average number of active origins nO on chromosome VI, calculated using Eq. (15) based on experimental data and parameter estimation, gave 3.8 and 3.6 origins per cell cycle, respectively.

FIG. 2.

FIG. 2

Probability distribution for a number of activated origins on S. cerevisae chromosome VI: experimental results from [19] (black full circles) and calculations based on parameter values from [4] (gray empty circles).

E. Describing the dynamics of replication forks

As replication progresses through the S phase, replication forks originate at activated origins, and disappear when two forks moving in opposite directions in the chromosome collide. The average number of forks moving at some time t is a quantity that can yield valuable insights into the replication dynamics. If there were only a single fork moving in the chromosome, the rate df(t)/dt of replication would be simply ν, the fork velocity; nf forks thus correspond to a replication rate of nfν, which leads to the expression

nf(t)=1vdf(t)dt,

where f(t) is the fraction of replicated DNA at time t. This expression was also derived for DNA replication in X. laevis [38]. By applying Eqs. (7) and (10) in above equation we have

nf(t)=1v(1i=1nsi)0LP(x,t)dx, (16)

where L is the length of the chromosome.

The direction of replication fork movement can be analyzed using a distribution for the proportion of left moving forks at each position x:

nleft(x)=11i=1nsii=1NI(x<xi)pi(x,t)×jiMj(x,t)dt, (17)

where I(X) is the indicator function, which takes a value equal to 1 if the condition X is satisfied and 0 otherwise.

Figure 3(a) shows the probability distribution for the proportion of leftward traveling forks at position x for S. cerevisae (chromosome VI), computed using Eq. (17), with origin parameters estimated in [4]. All regions except for the start and the end of the chromosome are replicated by both leftward and rightward moving forks.

FIG. 3.

FIG. 3

Probability distribution for (a) the proportion of left moving forks, (b) fork termination at position x on S. cerevisae chromosome VI based on Eq. (16), and (c) histogram of fork termination sites based on 106 Monte Carlo simulations. Parameter values from [4].

The difference between the proportion of right and left moving forks gives the average fork polarity fpl(x). The average fork polarity is the product of the derivative of the mean replication time and the replication fork velocity [15]:

fpl(x)=vT(x).

Taking into account that the sum of the proportion of left and right moving forks is equal to 1, this gives the relationship between the proportion of left moving forks and mean replication time:

nleft(x)=1vT(x)2.

When two forks collide, they both terminate and replisome proteins are released from the DNA molecule. In Escherichia coli termination sites are regulated and fixed, but in eukaryotes termination is less well defined than in bacteria, and specific sites are rare [39].

To find the probability density of termination events at position x and at time t, we first consider any given pair of origins i and j with i < j—so their positions satisfy xi < xj. For the forks originating at i and j to terminate exactly at position x at a time t, the following conditions must be met: (a) x must lie between xi and xj; (b) i and j must have activated at times txxiv and txxjv, respectively; and (c) the position x must not be replicated by any other origin k, with ki,kj, at time t. The total probability density of fork termination at position x and time t is the sum of the termination probabilities for all possible pairs (i,j) satisfying xixxj; since i < j, it is enough to consider indices in the sum given by i = 1,…,N and j = i + 1,…,N – 1. Finally, the probability (density) that forks will terminate at x is found by integrating over all times, yielding the expression

Pft(x)=1Ni=1Nj=i+1N1I(xixxj)×pi(x,t)pj(x,t)ki,kjMk(x,t)dt, (18)

where N is a normalization constant.

Figure 3(b) shows the probability distribution of fork termination at position x for S. cerevisae (chromosome VI), computed using Eq. (18), with origins activating according to a normal distribution with mean and variance values estimated in [4]. For a comparison, we have plotted a histogram of fork termination sites based on 106 Monte Carlo simulations [Fig. 3(c)], the solution based on Eq. (18) agrees with the simulation results within statistical limits.

F. Distribution of distances between active origins

The distribution of distances between active origins (between origins that have actually activated) in a population of cells is an important quantity for biology. Its importance is related to the fact that the stochastic nature of origin activation can result in large distances between active origins, with the consequence that sections of the genome would take too long to replicate; this can result in portions of the genome remaining unreplicated when cells enter the M phase [40] or causing instability in fragile sites [41].

For organisms such as yeast where the replication origins have fixed positions in any given chromosome, the inter-active-origin distance can assume only a discrete set of values. For example, about 120 early inter-active-origin distances in fission yeast and between 250 and 300 in budding yeast [42] have been observed to have a typical spacing of between 30 and 100 kb [43]. In normal human primary keratinocyte cells, the mean value for inter-active-origin distances is about 124 kb [44].

The distance yij between origins i and j is equal to yij = |xixj|, where xi is the position of origin i and xj is the position of origin j. Our goal is to obtain an expression for the probability distribution of distances between active origins P(y) for the interorigin distance y. Consider first a given origin pair (i,j), with i < j, then the probability density that this particular pair activates in one replication round, and no other origin activates in between them—this event contributes to P with yij. The probability density that origin i activated at time t and neighboring origins on the left either have not activated or they have activated but their forks would arrive at xi later than t, is given by pi(t) pi(t)k=1i1Mk(xi,t). The probability density that origin j has activated at time t and neighboring origins on the right either have not activated or they have activated but their forks would arrive at xj later than t; this is given by pj(t)k=j+1NMk(xj,t). Since yij = yji it is sufficient to consider origin pairs with indices i and j with ranges given by i = 1,…,N and j = i + 1,…,N – 1. Further, if i and j are such that ji > 1, i.e., these origins are not adjacent, we need to ensure that all origins k lying between them (i < k < j) will not activate at all (otherwise there would be an origin activating between i and j, and this would split the interval [xi,xj] into two); this probability is equal to k=i+1j1Mk(xk,t). The probability density function for inter-active-origin distances is given by the integral over all times of the product of these probabilities:

P(y)=1Ni=1Nj=i+1N1I(y=xixj)×pi(t)pj(t)k=1i1Mk(xi,t)k=j+1NMk(xj,t)×k=i+1j1Mk(xk,t)dt, (19)

where N is a normalization constant. The probability density function P takes nonzero values for y equal to distances separating any two origins on the chromosome.

Figures 4(a) and 4(b) show the probability distribution of distances between active origins for S. cerevisae (chromosome VI), computed using Eq. (19) and a histogram for 105 Monte Carlo simulation, with normal activation distribution and parameter values estimated in [4]. Based on this distribution, we calculate that the mean distance between active origins is 55 kb. Whole-genome analysis has shown that the average distance between active replication origins in the S. cerevisiae genome is approximately 58 kb [45]. We note that the distribution of inter-active-origin distances has a long tail; ~4% of cells have an inter-active-origin distance >130 kb. Similar & large inter-active-origin distances have been observed experimentally [45,46]. To prevent under-replication (or delay to cell cycle progression) these large inter-active-origin distances much be replicated prior to cell division. S. cerevisiae cells have about 60 min available to complete DNA replication before cell division starts, i.e. two forks converging at 2 kb/min can potentially replicate 240 kb during that time; therefore under-replication is unlikely according to our model.

FIG. 4.

FIG. 4

Probability distribution of distances between active origins on S. cerevisae chromosome VI: (a) as a solution of Eq. (19), and (b) histogram for 105 Monte Carlo simulation. Parameter values from [4].

IV. PARAMETER REGIMES

We want to use the general theory presented above to study replication dynamics in a simple setting. From now on we focus on the case of a hypothetical linear chromosome with just two origins. We define the chromosomal coordinates so that one of the origins has position x1 = 0; the other origin has position x2 = D. We assume that each origin can activate within a time window Δti with uniform probability; we will argue later that our conclusions are largely independent of the precise shape of the probability distribution. We select the time axis in such a way that the average activation time of the first origin is 0. The other origin has an average activation time τ, and we assume without loss of generality that τ ≥ 0. Thus the activation time distributions are

pi(t)=qiΔtiift[tiavΔti2,tiav+Δti2], (20)

where i = 1,2, t1av=0 and t2av=τ, and p1 and p2 are set to zero outside the stated intervals.

Using Eqs. (6) and (20), we canwrite analytical expressions for the probability density P(x,t) and other quantities of interest. From Eqs. (20) and (5), P(x,t) vanishes outside the intervals I1=[I1l,I1u] and I2=[I2l,I2u] given by

Iil=xixv+tiavΔti2;Iiu=xixv+tiav+Δti2.

A. States of replication

There are six possible states of the replication at each position x on the chromosome, defined by the order in which forks from the two origins can replicate x. These are summarized in Fig. 5, where each position has a probability to be replicated at any time t by origin 1, origin 2, or by either origin depending on the distance from each origin and other parameters. The different possibilities are denoted by “states” A to F, as defined in Fig. 5.

FIG. 5.

FIG. 5

A particular position along a chromosome in different cells can be replicated by forks from origin 1 (light gray), origin 2 (dark gray), or either origin (black). The order in which this happens (indicated in A–F above) as time progresses from left to right defines the possible states of replication.

State A corresponds to the situation where forks from origin 1 arrive at x before any fork coming from origin 2; if the competence q1 of origin 1 is equal to 1, then this position is completely replicated by the first origin. For q1 < 1, x is replicated by origin 2 only when origin 1 fails to activate. The replication probability density P(x,t) is given in this case by

P(x,t)={q1Δt1iftI1,s1q2Δt2iftI2,0otherwise.} (21)

In similar manner, state B corresponds to forks from origin 2 arriving at x before forks from origin 1:

P(x,t)={q2Δt2iftI2,s2q1Δt1iftI1,0otherwise.} (22)

For the states C and D, there is an overlapping time window, when forks originating at either of the origins can get to x first. States E and F show that the time window where x is replicated by forks originating at one of the origins is contained in the time window of the other fork; this latter state is possible only if Δt1 ≠ Δt2.

If we denote by [tl,tu] the overlapping time window in cases C–F, then the replication probability densities for these four cases are given by the expression below, with i and j having different meanings for each state: C (i = 1,j = 2), D (i = 2,j = 1), E (i = j = 1), and F (i = j = 2):

P(x,t)={piify[Iil,tl],p1(tI2up2dt+s2)+p2(tI1up1dt+s1)ift[tl,tu],pjift[tu,Iju],0otherwise.} (23)

Expressions for tl and tu for the different states are given in the table below:

State tl tu

C Dxv+τΔt22 xv+Δt12
D xvΔt12 Dxv+τ+Δt22
E Dxv+τΔt22 Dxv+τ+Δt22
F xvΔt12 xv+Δt12

From the above table we can see that the state changes as position is changed along the chromosome. Based on these states it is possible to classify the replication dynamics into a number of parameter regimes. Figure 6 shows such regimes for Δt1 = Δt2 = Δt and Δt1 ≠ Δt2. In the first case [Fig. 6(a)], the number of parameters is reduced and we can look at the full dependencies between τ, Δt, D, and ν. When Δt1 ≠ Δt2, we will look at regimes for a few values of τ, Δt1, and Δt2 [Figs. 6(b)–6(d)].

FIG. 6.

FIG. 6

Plots showing parameter regime diagrams based on the spatiotemporal dynamics of replication: gray areas indicate where the activation time distribution of origin 1 or 2 has nonzero values, and dark gray regions are where distributions for the two origins overlap. (a) Regimes for Δt1 = Δt2. (b) Some of the regimes for Δt1 ≠ Δt2. (c) Regimes for variable Δt1 (fixed value of Δt2D/ν) and Δt2 (fixed value of Δt1D/ν) for τ < D/ν. (d) Regimes for variable Δt1 (fixed value of Δt2D/ν) and Δt2 (fixed value of Δt1D/ν) for τ > D/ν. Parameter values for (c) and (d) where replication dynamics undergoes changes are given in Appendix C.

B. Regimes for Δt1 = Δt2

In total there are five regimes (depending on the relative values of τ, Δt, D, and ν) that differ in the dynamics of how the chromosome is replicated [shown in Fig. 6(a)]. For regime R1 only state A occurs for the whole chromosome.

Most relevant to biological systems is regime R2, since this is the case for many pairs of origins in real chromosomes. In this regime, the condition τ + Δt < D/ν is satisfied, meaning that origins activate at a similar time and the variations in the activation time Δt are small enough that a fork from one origin can replicate the other origin only if that origin is not competent. Regime R2 is formed from states A, B, C, and D. Changes occur at positions x=12[D+v(τΔt)]] (from A to C), x=12(D+vτ) (from C to D), and x=12[D+v(τ+Δt)] (from D to B).

Regime R2 describes the situation where two origins are sufficiently apart that they are not passively replicated by each other. It is quite common, however, to find two origins located close to each other in a chromosome, such that the condition τ + Δt < D/ν is violated. In this case, regimes other than R2 describe the replication dynamics of the system.

In regime R3, in some cells origin 2 actives just before forks from origin 1 arrive at position x = x2 = D and in some cells both origins compete to replicate positions on the right of x=12[D+v(τΔt)]. At this point the behavior changes from state A to C.

For regime R4 state A occurs up to position x=12[D+v(τΔt)], then it changes to state C. In a similar way, regime R5 changes from state C to state D at position x=12(D+vτ). More details on the dynamics of replication for different sets of parameters are shown in Fig. 6(a).

For a fixed value of Δt < D/ν, as the difference in average activation time between origins, τ, increases, the replication dynamics undergoes the following changes:

Change τ Change τ Change τ

R2 → R3 DvΔt R3 → R4 Dv R4 → R1 Dv+Δt

For any fixed τ < D/ν, increasing the width Δt of the activation time distribution leads to change of the regime from R2 to R3 at Dvτ and from R3 to R5 at Dv+τ.

A discussion on regimes for Δt1 ≠ Δt2 is given in Appendix C.

V. DYNAMICS OF REPLICATION FORKS

The proportion of left and right moving forks will depend on the properties of the origins and fork velocity, as indicated by Eq. (17). The analytical expression for the proportion of left moving forks, nleft, for regime R2, is given by Eq. (C1) in Appendix C. In regime R2 the proportion of left moving forks takes a sigmoidal shape. Figures 7(a), 7(c), and 7(e) show the effect of varying parameters on proportion of left moving forks in regime R2: as a function of competence q2 [Fig. 7(a)], as a function of time τ [Fig. 7(c)], and as a function of the width of the activation time distribution Δt1 = Δt2 = Δt [Fig. 7(e)]. Parameter values are q1 = q2 = 1, τ = 0, Δt1 = 5, Δ2 = 5, and ν = 1, if not otherwise stated. The consequences of differences in parameter values can be clearly distinguished: with decreasing competence, the curve is pushed down; increasing τ shifts the curve closer to the second origin; decreasing Δt increases the gradient of the sigmoidal function.

FIG. 7.

FIG. 7

The effect of varying parameters on the proportion of left moving forks and fork termination: (a),(b) as a function of competence q2, (c),(d) as a function of time τ, (e),(f) as a function of width of activation Δt1 = Δt2 = Δt, and (g),(h) as a function of width of activation Δt2. (i) Experimentally measured proportion of left moving forks (gray) [47] on the S. cerevisiae chromosome VII (700–780 kb) with a curve based on Eq. (17) (black) and (j) prediction of the fork termination position probability distribution based on Eq. (D1).

We have also looked at other regimes—in Fig. 7(g) the parameter values Δt2 = 10 and Δt2 = 15 correspond to R7, while Δt2 = 20 and Δt2 = 25 correspond to R9; here q1 = q2 = 1, τ = 0 and Δt1 = 5. Unlike regime R2, these regimes passive replication of one (R7) or both (R9) origins. It can be seen that for these regimes the proportion of left moving forks between the two origins becomes linear with respect to position on the chromosome.

Recently, high-resolution analysis of Okazaki fragment synthesis was performed in S. cerevisiae [47]; this allows determination of the lagging strand proportion and hence the proportion of left moving forks. Figure 7(i) shows the experimentally determined proportion of left moving forks for a part of chromosome VII from 700 to 780 kb (gray lines) with a curve based on Eq. (17) (black) [48]. This demonstrates the close agreement between our model prediction and an independent experimental data set.

VI. RESULTS FOR THE FORK TERMINATION POSITION PROBABILITY DISTRIBUTION

Although the forks all start at the same locations (the origins), they meet each other and terminate at different locations in each cell, because of the stochastic activation times. Only forks traveling in opposite directions between the two origins can collide. This condition limits positions of fork termination to the interval [0; D] (excluding the two ends of the chromosome). (If one of origins is activated much later in time, i.e., the dynamics is in regime R1, only one origin activates and therefore there are no termination events between the two origins.) The fork termination position distribution will have nonzero values in the areas of the space-time diagram where the probability of that position being replicated by forks from both origins is nonzero. An analytical expression for the fork termination position probability distribution function (PDF) is given in Appendix D.

Figures 7(b), 7(d), 7(f), and 7(h) illustrate the effect of varying the parameters q2, τ, Δt, and Δt2. For τ = 0, the maximum of the PDF is at position x=D2 and takes the value p=q1q2(Δt1+Δt2)2Δt1Δt2(1s1s2). If τ is increased, the maximum moves to the right by vτ2. The area under the PDF is (Δt1+Δt2)2q1q2v8Δt1Δt2(1s1s2) for values τ<DvΔt1+Δt22; for increasing τ it starts decreasing until zero is reached at τ=2D+v(Δt1+Δt2)2v.

The biological literature commonly suggests that termination sites are disperse, but there is limited evidence for discrete termination sites [39]. Our model indicates that termination sites have a continuous distribution. Equation (D1) can help identify the most probable positions of fork termination sites on the chromosome. Figure 7(j) shows the prediction of the fork termination position probability distribution for a part of chromosome VII from 700 to 780 kb.

VII. SIGNATURES OF STOCHASTICITY IN REPLICATION PROFILES

In this section we will discuss how information about the stochasticity of the activation times of origins—the width of the activation window—can be obtained from the mean replication time as a function of chromosomal position, T(x). Because the parameter space in the generic case is very big, from now on we will consider only the case Δt1 = Δt2 = Δt.

An analytical expression for T(x) for the case where the condition τ + Δt < D/ν is satisfied has been derived [5]. This corresponds to regime R2 and is the case for many pairs of adjacent origins in real chromosomes. This is valid when the variations in the activation time Δt are small enough that a fork from one origin can replicate the other origin only if that origin is not competent.

A plot of T(x) for various sets of parameters is shown in Fig. 8. We see that T(x) has discontinuous derivatives at the origin locations. This is due to forks originating only at the origin locations; there is a discontinuous change in the proportion of left propagating compared to right propagating forks as one crosses an origin site. At the origins, the mean replication times are [5]

T(x1)=T(0)=q2s1(Dv+τ)(1s1s2);T(x2)=T(D)=(q1s2Dv+q2τ)(1s1s2). (24)

It is commonly assumed in the replication literature that T(x) has a minimum at an origin, and that the value of this minimum directly gives the average activation time for the origin. However, Eq. (24) shows that this is not the case and, in fact, T(xi)tiav: the mean replication time at an origin location is equal to or greater than the origin’s average activation time. Only when an origin has qi = 1 can T(xi)=tiav, because if an origin fails to activate in a given cell, the DNA at the origin location will not be replicated until a fork from another origin arrives. This means that Ti is higher for origins that are more likely to fail, as seen directly in Fig. 8(a). Another important conclusion is that even when both origins have the same average activation time (τ = 0), generally we have T(x1) ≠ T(x2), also shown in [5]. This is again due to the possibility of origins not activating. Therefore, the origin with the lower minimum of T(x) does not necessarily activate earlier than the other origin: minima of T(x) cannot be used to draw conclusions on the relative activation times of the corresponding origins, as previously assumed [9,16]. The equation for mean replication time shows that in general T(x) at any point depends collectively on the parameters of all origins. However, if an origin is highly competent, early activating, and isolated from other origins, T(x) at that origin’s position will be close to the origin’s average activation time.

FIG. 8.

FIG. 8

Replication time curves for (a) differing values of competence q1; (b) different widths of the activation time window; (c) different values of τ, with position y = x + ντ/2.

The expression for mean replication time, Eq. (24), challenges the assumption that there is a one-to-one correspondence between replication origins and minima of T(x): minima correspond to origin locations, but there may be origins for which there is no peak. The slope of T near the first origin (for x > 0) is [5]

T(x)=q1q2s1v(1s1s2). (25)

This expression shows that the slope is a function of the competences qi of both origins as well as the fork velocity ν. For the origin at x = 0 to be a minimum of T(x), we must have T′ > 0 for x > 0, from which we get the condition

q1>q21+q2. (26)

In a similar way, the slope of T(x) near the second origin (for x < D) is

T(x)=q1s2q2v(1s1s2), (27)

which yields the condition

q2>q11+q1. (28)

This shows that if an origin has low competence compared to its neighbor, it may not be a minimum of T(x), as illustrated in Fig. 8(a). This phenomenon has been observed in experimental data [16]. Note that if q1 > 1/2, this condition is always satisfied and a minimum is guaranteed for this two-origin system and this regime. Figure 9 shows values for q1 and q2, for which Eqs. (26) and (28) are satisfied, i.e., both origins are local minima in the replication time curve. In addition, Eqs. (25) and (27) show that the fork velocity is not given by the slope of T(x), an assumption widely used in the literature [16].

FIG. 9.

FIG. 9

Gray area indicates the sets of competencies q1 and q2 for which both origins are local minima in the replication time curve.

In contrast to the discontinuity of T′(x) at the origin locations, the plots of replication time curves show that the local maximum of T(x) between two origins is a smooth curve. The reason is that in different cells in a population forks meet each other and terminate at different locations on the DNA, as has been shown in Sec. VI. This suggests that the shape of the maximum of T(x), when it exists, could be used to infer information about the width of the activation window. We expect that sharp maxima should correspond to forks meeting within a narrow time window; conversely, a broad maximum corresponds to a high time window. This can be seen in Fig. 8(b), where T(x) is plotted for various values of Δt.

In order to investigate this more quantitatively, we use the modulus of the second derivative of T(x) at the maximum to measure how broad the maximum is, because low values of |T′′(x)| correspond to broad peaks. At the maximum of T(x) [5],

T(x)=4q1q211q11q2v2Δt(1s1s2), (29)

where

x={D+vτ2forq1=q2;D+vτ2+vΔt2(11q11q2)forq1>q2;D+vτ2vΔt2(11q11q2)forq1<q2.}

Thus |T′′(x*)| is inversely proportional to Δt. Notice also that |T′′(x*)| does not depend on τ, which means it is independent of the origins’ average activation times. Figure 8(c) shows replication time curves for different values of τ, where the position along the chromosome has been shifted by y=x+vτ2 and replication times are shifted downwards by τ2. This figure clearly shows that the shape of T(x) near maximum is independent of τ.

Expression (29) can be used to calculate Δt from an experimental replication time profile T(x), if the origin competences and the fork velocity are known. This is a very useful result because it allows the determination of a quantity characterizing stochastic properties of the system, Δt, from T(x), which is defined by a population average. This is valuable because experiments to directly measure Δt are technically difficult [17,18]. We note that this does not require assuming that all cells in the population are synchronized, since in each individual cell in an asynchronous population, the statistics of the relative activation times of origins remains unaltered [4].

VIII. GAUSSIAN AND SKEWED ORIGIN ACTIVATION DISTRIBUTIONS

Although Eq. (29) was obtained using a simple uniform distribution for pi(t), we expect it to be a good approximation for any single-peaked distribution function pi(t), since Eq. (29) involves only the second moment (the variance) of the distribution, and the replication dynamics are mostly determined by the average activation time and the width of the activation distribution. So we expect two single-peaked distributions with the same tav and Δt to have very similar T′′(x).

To test this assumption, we used Eq. (6) to numerically compute T(x) for pairs of origins with Gaussian activation time distribution and with a skewed distribution leading to a sigmoidal cumulative activation time distribution, described by a Hill’s-type function

F(t)=qln3t12(t12t)ln3κ1κt2[1+(t12t)ln3κ]2, (30)

where

κ=ln(tw+4t122+tw22t12), (31)

as used in Ref. [14].

Figure 10 shows good agreement of the mean replication time and proportion of left moving forks for the three distributions mentioned above. Choosing parameters for which all these distributions have the same mean and variance, we find that in all cases T′′(x) never differs between distributions by more than 10%. This means that Eq. (29) is a very accurate prediction of Δt, regardless of the detailed shape of pi(t).

FIG. 10.

FIG. 10

Plots showing mean replication time (a) and proportion of left moving forks (b) for three different distributions: Uniform distribution (tiav=15ti = 8), Gaussian distribution (μi 15,σi = 2.31), and skewed distribution given by Eq. (30) (t1/2i = 15.35,twi = 2.77), i = 1,2. For all three cases q1 = 0.9,q2 = 0.7,ν = 1.

IX. APPLICATION TO EXPERIMENTAL DATA

We expect Eq. (29) to be a reliable prediction for isolated pairs of origins whose competences are not too low, so that there are well-defined peaks at the origin positions; this ensures that most forks traveling in the region between the two origins come from those origins, which is what is required for Eq. (29) to hold. We also note that there are organisms with far fewer replication origins than S. cerevisiae, for which the two-origin assumption involves little or no approximation; in particular, many archaea have only two or three origins, and high-throughput methods to study their replication dynamics are available [4951].

We will apply the above model to data from S. cerevisiae (brewer’s yeast), which has 16 chromosomes and about 300 origins [52]. A rough criterion for a pair of origins to be considered “isolated” is if the distance between them is smaller than the distance between either origin and its immediate neighbor. More precisely, let (xi,xi+1) be a pair of origins in a chromosome, and di,i+1 be the distance between the origin at xi and the origin at xi+1. Then our criterion for the pair (xi,xi+1) to be isolated is that it satisfies both di,i+1 < di–1,i and di,i+1 < di+1,i+2. We have used the DNA replication origin database [52] to compute how many pairs of S. cerevisiae organism satisfy this criterion, and we found that 35% do. Using the stochastic simulation procedure described in Ref. [14], we found that of the origin pairs which are isolated, in 40% of those the Δt’s of the two origins are within 2 min of each other (meaning 14% of the origins overall). If we relax the definition of “close” to a 3 min difference, this fraction increases to 60% (or 21% overall).

We looked at experimental data [16] for S. cerevisiae chromosome IV (the region containing origins ARS420 and ARS422), shown in Fig. 11. The smoothness of the curve—ignoring the fluctuations caused by experimental noise—is direct evidence for stochastic origin activation, in agreement with other results [17,18]. We fitted a parabola through the data points and from this determined the value of |T′′|. Using Eq. (29) we estimate the values of Δt as 15 min [53]. This value is in agreement with the limited number of single cell measurements that have been made at other S. cerevisiae origins [18].

FIG. 11.

FIG. 11

Replication time curves [16] for S. cerevisiae chromosome IV (620–715 kb) with a fitted parabola

X. DISCUSSION AND CONCLUSIONS

In this paper we studied the properties of a mathematical model of chromosome replication that describes genome duplication dynamics in eukaryotic cells. In the first part of the study, we formulated a general model for replication, which takes into account the fact that origins activate stochastically, and also accounts for the possibility that origins in individual cells may not activate at all, because they may have failed to license. We have derived a probability distribution of replication as a function of position and time, and from this we have found analytical expressions for many quantities of interest such as mean replication time, the average number of active origins, and others.

In the second part of the paper, the dynamics of replication was investigated in detail in an idealized scenario for a chromosome with two origins of replication and the activation probability distribution given by a uniform distribution. Despite being idealized, this simple scenario encapsulates much of the essential behavior found in more complex cases; and although organisms vary enormously in the size, shape, and distribution of genomes, the case of the chromosome with just two origins is of biological interest: recently an E. coli mutant with two identical functional replication origins has been constructed [54]. We were able to derive an analytical expression for the mean replication time and to extract hence many important properties of the dynamics of DNA replication. Furthermore, we have proposed a method to determine the width of the activation time probability distribution from experimentally measured population-averaged data. Our results compare favorably with experimental measurements in S. cerevisae.

ACKNOWLEDGMENTS

We thank M. Hawkins for valuable discussions. This work has been supported through the Biotechnology and Biological Sciences Research Council (Grants No. BB/E023754/1, No. BB/G001596/1, and No. BB-G010722).

APPENDIX A. RELATIONSHIP BETWEEN MEAN REPLICATION TIME AND MEAN COPY NUMBER

For a eukaryotic cell with a fixed S phase, the mean replication time can be calculated:

T(x)=1NT1T2tP(x,t)dt, (A1)

where N=1i=1nsi, T1 is the time when replication starts, and T2 is the time when replication finishes.

The relationship between copy number C(x,t) and fraction replicated m(x,t) is given by Eq. (8). Now, the fraction replicated before time t for cells where replication started at T1 is given by

m(x,t)=1NT1tP(x,τ)dτ. (A2)

The time average of the function m(x,t) is

m(x)=1N1T2T1T1T2dtT1tP(x,τ)dτ=1N1T2T1T1T2dττT2P(x,τ)dt=1N1T2T1T1T2P(x,τ)dττT2dt=1N1T2T1T1T2P(x,τ)(T2τ)dτ=1N1T2T1(T1T2P(x,τ)T2dτT1T2τP(x,τ)dτ)=1N1T2T1(T2T1T2P(x,τ)dτT2T1tP(x,t)dt).

Taking into account Eq. (A1) and the fact that the first integral in the parentheses equals 1i=1nsi=N, we get

m(x)=T2T(x)T2T1 (A3)

and

C(x)=T2T(x)T2T1+1. (A4)

APPENDIX B. REGIMES FOR Δt1 ≠ Δt2

Some of the regimes for the case Δt1 ≠ Δt2 are shown in Fig. 6(b). Only in this case can the states E and F (Fig. 5) be observed. Also, regimes R2, R3, and R5 have either state E occurring between positions x=14[2D+v(2τΔt1+Δt2)] and x=14[2D+v(2τ+Δt1Δt2)] for Δt1>Δt2, or state F between positions x=14[2D+v(2τ+Δt1Δt2)] and x=14[2D+v(2τΔt1+Δt2)] for Δt1 < Δt2. If Δt1 → Δt2, the length of this interval decreases until states E and F disappear and position x=12(D+vτ) then separates states C and D.

For regime R6, state A occurs up to position x=14[2D+v(2τΔt1Δt2)]; then it changes to state C. At position x=14[2D+v(2τΔt1+Δt2)], state C changes to the state E. In a similar way, for regime R7 state A also occurs up to position x=14[2D+v(2τΔt1Δt2)]; and then it changes to state C. At position x=14[2D+v(2τΔt1+Δt2)], state C changes to the state F.

Other types of regime characteristic for the case Δt1 ≠ Δt2 are regime R8t1 ≫ Δt2), where only state E is possible; and regime R9t2 ≫ Δt1), where only state F occurs.

For a fixed value of Δt1Dv the regime dynamics as Δt2 changes are shown in the upper panels of Figs. 6(c) and 6(d). The transitions in the dynamics are given in the table below:

τ<Dv τ<Dv
Change Δt1 Change Δt1

R2 → R3 2(DvτΔt22) R1 → R4 2(τDvΔt22)
R3 → R5 2(Dvτ+Δt22) R4 → R6 2(τDv+Δt22)
R5 → R8 2(τ+Dv+Δt22) R6 → R8 2(Dv+τ+Δt22)

For a fixed value of Δt2Dv the regime dynamics for variable Δt1 is shown in the lower panels of Figs. 6(c) and 6(d). The transitions are

τ<Dv τ<Dv
Change Δt2 Change Δt2

R2 → R3 2(DvτΔt12) R1 → R4 2(τDvΔt12)
R3 → R5 2(Dvτ+Δt12) R4 → R7 2(τDv+Δt12)
R5 → R9 2(Dv+τ+Δt12) R7 → R9 2(Dv+τ+Δt12)

In the extreme case where τ>Dv+Δt12+Δt22 the only possible regime is R1.

APPENDIX C. PROPORTION OF LEFT MOVING FORKS IN REGIME R2

This is given by

nleft(x)={1ifx<0;q2q1q2q1+q2q1q2ifx[0;D+v(τΔt)2);q2{D2q1Δt2(2+q1)v22Dq1(Δtv+2xvτ)2Δtq1v(2x+vτ)+q1(2x+vτ)2}2Δt2{q1(q21)q2}v2ifx[D+v(τΔt)2;D+vτ2);q2{D2q1+Δt2(2+q1)v2+2Δtq1v(2x+vτ)+q1(2x+vτ)2+2Dq1(2x+vτ+vΔt)}2Δt2{q1(q21)q2}v2ifx[D+vτ2;D+v(τ+Δt)2);q2q1+q2q1q2ifx[D+v(τ+Δt)2);D);0ifxD.} (C1)

APPENDIX D. PROBABILITY OF THE FORK TERMINATION POSITION DISTRIBUTION

This is given by

Pft(x)={0ifτ>2D+v(Δt1+Δt2)2v;0ifx<0;0ifx>D;q1q2[4x2D+v(2τ+Δt1+Δt2)]2Δt1Δt2v(1s1s2)ifx[2D+v(2τΔt1Δt2)4;D+vτ2];q1q2[4x+2D+v(2τ+Δt1+Δt2)]2Δt1Δt2v(1s1s2)ifx[D+vτ2;2D+v(2τ+Δt1+Δt2)4].0otherwise.} (D1)

APPENDIX E. SENSITIVITY ANALYSIS OF AN EXPRESSION FOR THE SECOND DERIVATIVE OF THE MEAN REPLICATION TIME

We rewrite Eq. (29) as G=Ag, where A=4q1q211q11q2v2(1s1s2), and g = Δt. Then the change in the value of G with respect to differences in the parameter g is

ΔG=A(1g1g+Δg)=AΔgg(g+Δg)

For the parameter g in the range (7,18) min [14], a difference of 2 min results in a difference of up to 2.5% in ΔGA.

Normalization errors and noisy data result in errors in the calculated mean replication time [T(x)]. We have used a simple Monte Carlo simulation technique to roughly estimate how the Δt predicted from formula (29) is affected by errors in the replication time profile T(x). For this simulation we added Gaussian-distributed noise of mean 0 and standard deviation 5 min to each simulated replication profile of a two-origin chromosome, thus mimicking the effects of experimental error (we expect 5 min to be a reasonable estimate of the error for the experiments we use). We then numerically calculate the second derivative of a fitted parabola through the resulting mean replication time curve, and use formula (29) to calculate the “predicted” Δt*, which we can compare with the actual value Δt used to generate T(x) in the first place. We then repeat this procedure 100 times, generating a distribution of Δt*’s, and we do this for a few values of Δt to see how the error affects origins with different variabilities in activation time.

Errors in the estimation of Δt are greatest for origins with low variation in the activation time, as is to be expected. For origins with Δt ≈ 10 min, which is the value we estimated in the example discussed in the paper, the error is around 10%, which shows that formula (29) is quite robust to experimental errors.

Footnotes

PACS number(s): 87.14.gk, 87.10.Ca, 87.10.Mn

References

RESOURCES