Skip to main content
Springer logoLink to Springer
. 2024 Oct 29;86(12):142. doi: 10.1007/s11538-024-01372-3

Accumulation of Oncogenic Mutations During Progression from Healthy Tissue to Cancer

Ruibo Zhang 1, Ivana Bozic 1,2,
PMCID: PMC11522190  PMID: 39472320

Abstract

Cancers are typically fueled by sequential accumulation of driver mutations in a previously healthy cell. Some of these mutations, such as inactivation of the first copy of a tumor suppressor gene, can be neutral, and some, like those resulting in activation of oncogenes, may provide cells with a selective growth advantage. We study a multi-type branching process that starts with healthy tissue in homeostasis and models accumulation of neutral and advantageous mutations on the way to cancer. We provide results regarding the sizes of premalignant populations and the waiting times to the first cell with a particular combination of mutations, including the waiting time to malignancy. Finally, we apply our results to two specific biological settings: initiation of colorectal cancer and age incidence of chronic myeloid leukemia. Our model allows for any order of neutral and advantageous mutations and can be applied to other evolutionary settings.

Keywords: Cancer initiation, Driver mutations, Cancer incidence, Branching process

Introduction

Cancer is a genetic disease fueled by accumulation of driver mutations which confer a selective growth advantage to tumor cells (Vogelstein and Kinzler 2004). For solid cancers, typically more than one driver mutation is required for the development of malignancy, while a single genetic alteration may be sufficient to cause certain types of leukemia (Vogelstein et al. 2013). With the emergence of advanced sequencing technology, specific driver genes, including oncogenes, tumor suppressor genes and DNA repair genes, have been found to be responsible for carcinogenesis. For example, tumor suppressor genes APC, TP53 and oncogene KRAS are the most commonly mutated driver genes in colorectal cancer (Morin et al. 1997; Fearon 2011; Tomasetti et al. 2015), and fusion gene BCR-ABL is found to cause chronic myeloid leukemia (Deininger et al. 2000).

Some of the key questions in cancer research involve uncovering the identities, the number, the order and the effects of specific driver mutations on tumorigenesis. To facilitate mathematical quantification of the carcinogenic process, stochastic models can be used to model the accumulation of driver mutations, in particular population sizes and arrival time distributions for premalignant and malignant subpopulations. This approach goes back to the multi-stage theory of Armitage and Doll (1954), in which the shape of a cancer age incidence curve is shown to be associated with the required number of driver mutations. More recently, branching processes have been employed to investigate the age incidence of cancer (Meza et al. 2008; Paterson et al. 2020; Wang et al. 2022), cancer relapse and treatment response (Komarova and Wodarz 2005; Bozic et al. 2013; Foo et al. 2014; Avanzini and Antal 2019), and cancer heterogeneity (Durrett et al. 2011).

In the context of cancer initiation, the onset of the process occurs in healthy tissue, when a previously healthy cell receives the first oncogenic alteration. The process proceeds through abnormal growth of the altered subpopulation, acquisition of subsequent driver mutations and further waves of clonal expansion. Previous works that studied accumulation of driver mutations on the way to cancer focused on modeling evolution in exponentially growing populations (Durrett and Moseley 2010; Bozic et al. 2010; Nicholson et al. 2023). These works analyze a process that starts with a single cell that already has selective growth advantage, and model the evolution arising from this single activated cell.

In this paper, we study a process in which the large initial cell population is in homeostasis, capturing the population dynamics both before and during the exponential growth stage. In our model, any sequence of neutral or advantageous genetic alteration can occur and eventually lead to malignancy. Building upon Durrett and Moseley (2010) and Nicholson et al. (2023), we give explicit formulas for population size and arrival time distributions given the order, mutation rates and fitness increments of the driver genes along a specific mutational pathway. Our results are applicable to other multi-hit models that involve the evolution of an initially non-growing population.

Model

Inspiration for our model comes from initiation of colorectal cancer, which is thought to require inactivation of two tumor suppressor genes and activation of one oncogene (Vogelstein et al. 2013; Tomasetti et al. 2015; Paterson et al. 2020). Tumor suppressor genes, such as APC and TP53, are the most commonly mutated genes in colorectal cancer, and require inactivation (through genetic alterations) of both alleles to act as cancer driver genes. Oncogenes, such as KRAS or BRAF, which are also commonly mutated in colorectal cancer, require a single activating mutation in one allele of the gene in question. In other words, initiation of colorectal cancer requires five genetic alterations (two each in two tumor suppressor genes and one in an oncogene). If the first of the five alterations is activation of an oncogene, the crypts carrying that mutation can already exhibit selective growth advantage compared to neighboring crypts, as their rate of crypt fission (division) is significantly increased (Snippert et al. 2014). However, if the first alterations are in tumor suppressor genes, the first one to three alterations may not immediately lead to selective advantage (Paterson et al. 2020). This is because inactivation of a single allele of a tumor suppressor gene typically does not provide selective growth advantage to crypts. Furthermore, some driver genes, such as TP53, do not provide selective growth advantage when they are the initial driver alteration, but may lead to abnormal growth if another mutation is subsequently obtained (Paterson et al. 2020).

We study a multi-type branching process generalization of the process above, that starts with a large wild-type population at homeostasis, corresponding to healthy tissue (type 0). As colorectal crypts in homeostasis rarely divide or die (Nicholson et al. 2018), we set the division and death rates of the initial population to 0. In the model, we allow for a number of further oncogenic alterations that initially do not provide selective growth advantage, which occur with distinct constant rates per crypt. After a sufficient number of neutral alterations, the next alteration leads to selective growth advantage in the form of increased division rate. This corresponds, for example, to the inactivation of the second allele of tumor suppressor gene APC. After that, subsequent oncogenic mutations, which may be initially neutral, or provide additional selective growth advantage, can accrue. Once a sufficient number of mutations is collected, the crypt becomes malignant. The model can be summarized by the following diagram:

N0(t)Healthyu0N1(t)u1Nk(t)types with zero growth rateukNk+1(t)λk+1uk+1Nq(t)λquqNq+1(t)Malignanttypes with positive growth rate 1

More formally, we study a continuous-time branching process with q+2 types forming a linear evolutionary pathway from type 0 (the healthy type) to type q+1 (the malignant type). We denote the population size of type i at time t by Ni(t). The process is started at time 0 with a large healthy population, N0(0)=N. In general, population sizes of individual types may change due to three events: division, death, and mutation. Type i cells (or crypts) divide into two daughter cells (crypts) of the same type at rate bi, die at rate di, and mutate into type i+1 cells at rate ui. We define λi:=bi-di to be the net growth rate of type i.

Fig. 1.

Fig. 1

a Model illustration. Our model concerns an evolutionary process that starts with a large healthy population (blue circles). In this example, the first oncogenic alteration (yellow) does not provide selective growth advantage. The subsequent genetic alteration (orange) results in growth advantage. Orange cells divide at a higher rate, breaking the homeostasis while still not being considered cancerous. After another genetic alteration takes place, the malignant type (red) emerges. b Comparison with prior publications. Prior work mainly focuses on the cases in which the initial type has a positive growth rate. In contrast, we allow the first several types to have zero growth rates, representing cells still in homeostasis (Color figure online)

In the model, the initial type is at homeostasis, with net growth rate λ0=b0=d0=0. We also assume that the first k>0 mutations that accumulate in the process are neutral, leading to no change in division of death rates. The next mutation provides selective growth advantage, leading to positive net growth rate of the (k+1)-st type, λk+1>0. Subsequent mutations may be advantageous or neutral. The main quantity of interest in the model is the waiting time to the first type i+1 cell (crypt)

τi+1=inf{t0|Ni+1(t)>0}.

Our model is related to previous models from Durrett and Moseley (2010), Nicholson and Antal (2019), and Nicholson et al. (2023). In particular, Durrett and Moseley (2010) considered a model for clonal expansion, in which the branching process starts with a single cell with a positive growth rate and subsequent net growth rates are strictly increasing. Nicholson and Antal (2019) studied the evolution of drug resistance. Their model focused on a branching process in which the first type has the largest positive net growth rate. Recently, Nicholson et al. (2023) considered a branching process model that allows an arbitrary sequence of growth rates following the initial supercritical type. In contrast, this paper focuses on the scenario in which the initial type(s) have a zero growth rate, corresponding to homeostatic tissue. For comparison, in Fig. 1b we have listed prior publications that consider a model similar to this work but focus on different parameter regimes.

Results

In this section, we provide analytic results to estimate population sizes and arrival times in the branching process model described in the previous section, and compare them with exact computer simulations of the process. For simplicity, we only discuss the case when mutations are advantageous or neutral and there is no cell death. The scenarios that allow deleterious mutations and cell death are discussed in A.4. We also present two possible applications of the model: initiation of colorectal cancer and incidence of chronic myeloid leukemia.

Population Sizes

Individual cells in the model (1) evolve independently. Therefore, the population can be stratified into N independent lineages, each of which consists of cells descended from a single original healthy (type 0) cell. Consequently, the population size of a neutral type l, 1lk, counts the number of healthy cells that have evolved to type l, but have not changed to type l+1 yet. In particular, at any fixed time t, the population of type l is distributed as a Binomial(Np(t)), with

p(t)i=0l-1uitll! 2

being the time-dependent success probability (for derivation, see A.3). This estimate success probability has a same form as the rate of incidence in Armitage and Doll (1954) with a single unit initial population (see Durrett and Moseley (2010), equation (1)). It follows that, the expectation of Nl(t) reads

E[Nl(t)]=Np(t)Ni=0l-1uitll!.

In the small mutation rate regime, p(t) is a small number, which causes the variance to have a magnitude similar to the mean value:

Var(Nl(t))=Np(t)(1-p(t))E[Nl(t)].

Therefore, the populations of neutral types approximately grow as a power function (Fig. 2).

Fig. 2.

Fig. 2

Population sizes in a multi-type branching process. We consider a 5-type branching process with two initial neutral types (with zero growth rate) and three advantageous types (with positive growth rate). Panels a and b display two different realizations of the process. Solid lines represent computer simulations, and dashed lines represent asymptotic behaviors. Type 1 (light yellow) population grows linearly; Type 2 (orange), type 3 (red), and type 4 (black) populations grow exponentially at large times. Parameter values: u0=2×10-4,u1=8×10-3,u2=5×10-3,u3=6×10-3,λ2=0.7,λ3=1.0,λ4=1.0,N=104,t[0,20]

Following Durrett and Moseley (2010); Nicholson and Antal (2019); Nicholson et al. (2023), we approximate the population sizes of advantageous types (i.e. types with positive growth rate) in a parameter regime of large times and small mutation rates. For i1, we have shown that

Nk+i(t)trk+i-1eλk+itWk+i 3

where rk+i=#{j=1,2,,i|λk+j=λk+i} is a constant, and Wk+i is a random variable. This approximation separates the stochasity and the time dependence: The population can be decomposed into a multiplication of a time-dependent deterministic function trk+i-1eλk+it and a time-independent random variable Wk+i.

Random variable Wk+i can be characterized using its Laplace transform:

LWk+i(θ)1+i=0kuiλk+1k+1PolyLogk+1,-(ωk+iθ)λk+1/λk+iN. 4

Here PolyLog(m,z) denotes the PolyLogrithm (DLMF 2022, 25.12.10). ωk+i can be computed iteratively, with ωk+1=1, and

ωk+i+1=uk+irk+iωk+iλk+i=λk+i+1ck+iuk+i(loguk+i-1)rk+i-1ωk+iλk+i+1/λk+iλk+i<λk+i+1

for i1. Finally, we have

ck+i=πλk+i+1λk+irk+i-1sinπλk+iλk+i+1-1.

We show that approximations (3) and (4) are in excellent agreement with exact computer simulations of the process in Figs. 2 and 7. In particular, in Fig. 2 we depict two realizations of the same branching process. The realizations in panels a and b share the same asymptotic growth rates. Specifically, the growth rates (the slope of the dashed line) for types 2, 3, and 4 are characterized by λ2,λ3, and λ4 (k=1,i=1,2, or 3 in Eq. (3)), respectively. However, the intercepts of the two dashed lines (for each of the types 2-4) differ because the limiting random variables W2,W3 and W4 (Eq. (3)) have non-identical values in the two realizations.

Fig. 7.

Fig. 7

Random amplitude. Laplace transforms of W2 and W3 obtained from Eq. (4) and computer simulations of the process described in Fig. 2. Solid lines depict the simulated Laplace transforms, which are obtained by computing the Laplace transform of scaled populations at t=25. Dashed lines show formula (4). Parameter values: u0=2×10-4,u1=8×10-3,u2=5×10-3,u3=6×10-3,λ2=0.7,λ3=1.0,λ4=1.0,N=104,t[0,20]. Number of realizations in computer simulation: 1000

Arrival Times

Before the arrival of the first advantageous type k+1, the total population of the branching process stays fixed. The only possible event for any cell is to change its type into the subsequent type. For a single cell, each alteration requires an exponential waiting time. In a population of cells, the waiting time for a specific type is the minimum time for individual cells to reach that type. This results in the following waiting time distribution for type 1lk+1:

P(τl>t)1-i=0l-1uitll!N 5

The arrival time of a type that appears after the homeostasis has been partially broken (i.e. whose growth rate is positive) can be split into two segments: (i) The time from the beginning of the process to the arrival of the first advantageous cell, and (ii) The time from the first advantageous cell to the first target type cell. Adapting results from Nicholson et al. (2023), we find an estimate of (ii). Then, treating (i) as a time delay of (ii), we make use of the hypo-exponential distribution to obtain the following approximate formula for the waiting time distribution for type k+i+1, i1:

P(τk+i+1>t)1+l=0kulλk+1k+1PolyLogk+1,-expλk+1(t-t1/2(k+i+1)N. 6

The shape of the waiting time curve is largely determined by λk+1, the growth rate of the first advantageous type. l=0kul characterizes the amount of time delayed in (i). t1/2k+i+1 represents the median evolution time from a single type k+1 to the first type k+i+1. The value of t1/2(k+i+1) can be derived by the following iterative scheme, which depends on whether (k+i)-th alteration is neutral or advantageous:

t1/2(k+2)=1λk+1logλk+1uk+1,t1/2(k+i+1)=t1/2(k+i)+1λk+ilogλk+iuk+i+1λk+ilogrk+i-1[log(uk+i-1-1)]rk+i-1-1[log(uk+i-1)]rk+i-1,λk+i=λk+i-11λk+ilogλk+iuk+i-1λk+i-1logλk+i-1λk+iπsin(λk+i-1λk+iπ),λk+i>λk+i-1

For derivation of results (5) and (6), see A.3.

We show that approximations (5) and (6) are in good agreement with exact computer simulations of the process in Fig. 3.

Fig. 3.

Fig. 3

Comparison of analytic results and computer simulations for the waiting time distribution of neutral and advantageous types. Solid lines depict cumulative distribution functions for waiting times to types 1 through 4 in the model described in Fig. 2. Points denote probabilities obtained from computer simulations of the process, with bars showing the 95% confidence interval. Yellow line and orange line show approximation (5); Red line and bleck line show approximation (6). Parameter values: u0=2×10-4,u1=8×10-3,u2=5×10-3,u3=6×10-3,λ2=0.7,λ3=1.0,λ4=1.0,N=104,t[0,25]. Number of realizations in computer simulation: 1000

Application: Colorectal Cancer Initiation

Colorectal cancer (CRC) is the end result of a process in which healthy tissue accumulates sequential oncogenic alterations. Multiple driver genes are identified to contribute to this cancerous transformation, but the effect of mutational order of the driver genes on cancer initiation time is not fully understood. Recent work (Paterson et al. 2020) developed a multi-type branching process model to study CRC initiation through acquisition of three common driver genes, tumor suppressors APC and TP53, and the KRAS oncogene. Both alleles of a tumor suppressor gene need to be inactivated for it to function as a driver gene, while only one mutant allele is sufficient for the activation of an oncogene. It follows that CRC initiation involves five sequential genetic alterations. In the model, these genetic alterations may take place through either loss of heterozygosity (LOH) or mutation in any order and at constant rates (Table 1). Zhang et al. (2023) recently studied the waiting time distributions along a single mutational pathway in the order of APC inactivation, KRAS activation, and TP53 inactivation.

Table 1.

CRC driver genes and corresponding parameter values

Gene APC inactivation KRAS activation TP53 inactivation
Alteration LOH Mutation Mutation LOH Mutation
Rate (per year) 2.86×10-4 1.06×10-5 9.00×10-7 1.36×10-4 4.56×10-7
Fitness advantage (per year) 0.20 0.07 0

We consider a model of colorectal cancer that starts with wild type crypts. Colonic crypts are basic functional units found in the epithelium of the colon. Within a single crypt, cells rapidly renew and migrate upward. New mutations that appear in individual cells of the crypt are either lost quickly or fixate in the crypt (Campbell et al. 1996), which enables us to focus on crypts as units of selection (Paterson et al. 2020). The number of crypts in the human colon is approximately 107108 (Tomasetti et al. 2015; Potten et al. 2003; Paterson et al. 2020). Along each individual pathway, population sizes and waiting time distributions can be estimated using the formulas derived in this paper. To demonstrate this, we select two different mutational pathways to CRC and compare our waiting time approximations and the exact computer simulation of the process (Fig. 4). In the first pathway, wild type colonic crypts undergo APC inactivation, KRAS activation, and TP53 inactivation consecutively (Fig. 4a). In the second pathway, APC inactivation is followed by TP53 inactivation, and KRAS activation (Fig. 4b).

Fig. 4.

Fig. 4

CRC waiting times. Comparison of analytic results and computer simulations for the waiting time distributions of types 3, 4 and 5. Points denote probabilities obtained from computer simulations of the process, with bars showing the 95% confidence interval. Solid lines depict cumulative distribution functions for waiting times obtained from equation (6). In panel a, the mutational order is APC inactivation, KRAS activation, and TP53 inactivation. In panel b the mutational order is APC inactivation, TP53 inactivation, and KRAS activation. Parameter values: N=108 crypts. Mutation rates and selective growth advantageous are listed in Table 1. Number of realizations in computer simulation: 5×105

The two numerical verifications (Figs. 3 and 4) indicate that the analytic results and the exact waiting time distributions are in good agreement. In particular, for types with no selective growth advantage (i.e. types before the first supercritical type), only small mutation rate approximations have been carried out. This results in good agreement for both early times and later times (e.g. τ1 in Fig. 3 and τ2 in Fig. 4). However, for supercritical types, one can still observe discrepancies between the approximations and the computer simulations at early times and for higher types. The error at early times comes from the fact that the approximation for the waiting time distribution for type k+1 relies on the large time limit for the population size of type k, which is less accurate for early times.

For the error observed for higher types, we note that the approximations for population sizes are performed in an iterative way. In particular, to obtain an approximate population size for type i+1, one uses the approximation for population size of type i. Therefore, the approximation error accumulates through the iterations, and the approximation becomes less accurate as the types increase.

In Appendix C, we introduce an approach to improve the approximation by employing a more precise estimate of the population size of the first type with a positive growth rate (type k+1). Specifically, we employ a nonhomogeneous Poisson approximation such that the distribution of the waiting time to type k+i is related to an integral of the population size of type k+1 over time. Typically, to evaluate the integral, the population of type k+1 over time is approximated as a multiplication of an exponential function of time and a time-independent random variable. We have found linear terms in addition to the exponential term such that the population size can be more accurately estimated at early times. Calculating the integral with these linear terms improves the approximation of the distribution function of the waiting time for type k+i (Fig. 9).

Fig. 9.

Fig. 9

CRC waiting times. Comparison of the original approximations, Eqs. (5) and (6), with the improved approximations, Eqs. (C28) and (C29). Points denote probabilities obtained from computer simulations of the process, with bars showing the 95% confidence interval. Solid lines depict cumulative distribution functions for waiting times obtained from Eq. (6) (τ3 through τ5). Dashed lines depict the approximation obtained from Eqs. (C28) (τ3) or (C29) (τ4 and τ5). In panel a, the mutational order is APC inactivation, KRAS activation, and TP53 inactivation. In panel b the mutational order is APC inactivation, TP53 inactivation, and KRAS activation. Parameter values: N=108 crypts. Mutation rates and selective growth advantageous are listed in Table 1. Number of realizations in computer simulation: 5×105

Application: Incidence of Chronic Myeloid Leukemia

Chronic myeloid leukemia (CML) is an uncommon type of cancer that is thought to arise in hematopoietic stem cells. Fusion ocogene BCR-ABL is identified to initiate the CML carcinogenesis (Deininger et al. 2000). Michor et al. (2006) established a single-hit model that characterizes the malignant transformation of healthy hematopoietic stem cells. In the model, a Moran process is employed to describe the underlining stem cell dynamics. The process starts with a fixed number of healthy stem cells. At each division, a cell is randomly picked and replaced by a newly produced cell, which can carry an oncogenic mutation with some probability. The mutant cell has a selective growth advantage compared to healthy stem cells, leading to clonal expansion of the mutant population. It is assumed that the detection rate of CML is proportional to the population of mutants cells. Michor et al. (2006) derive the detection probability explicitly, fit their model to CML prevalence data and conclude that BCR-ABL alone might be sufficient to initiate CML.

Here, we find that a single-hit branching process model can also recover the CML age-prevalence curve. To this end, we consider a three-type branching process in which type 0 cells are healthy hematopoietic stem cells, type 1 cells are mutant stem cells with activated BCR-ABL, and type 2 corresponds to CML that has been detected. We assume that healthy stem cells (type 0) are at homeostasis and have a 0 growth rate, and that mutant stem cells (type 1) have a positive growth rate λ1>0. In our model, the probability of CML detection at time t can be characterized by the type 2 waiting time distribution P(τ2t). We use an improved approximation equation as an estimate for P(τ2t):

P(τ2t)Nu0u1λ12(eλ1t-λ1t-1). 7

For the derivation of (7), see Section C in the Appendix. Curve fitting in log space is performed using the CML age-prevalence data to identify parameter values, including the number of healthy hematopoietic stem cells N, the production rate for BCR-ABL mutants u0, the CML detection rate u1, and the growth rate of mutant cells λ0 (Fig. 5). In the approximation, N, u0, and u1 appear together in the form Nu0u1. Consequently, only their multiplication is identified as being of the order of 10-7. The growth rate of mutant cells is identified to be 0.0406±0.0027 per year.

Fig. 5.

Fig. 5

CML prevalence. Comparison of the cumulative probability distribution of CML detection (prevalence) from SEER data (Table 1 in Michor et al. (2006)) and equation (7). Parameter values with 95% confidence interval: Nu0u1=(1.32±0.1)×10-7,λ1=0.0406±0.0027 per year

Recently, the number of hematopoietic stem cells was estimated to be in the range of 50,000-200,000 using deep sequencing and phylogenetic inference (Lee-Six et al. 2018). Mitchell et al. (2022) used a similar approach and inferred that the hematopoietic stem cell population is in the range of 20,000-200,000. These works suggest that the number of healthy hematopoietic stem cells is of the order of 104105. Thus, we obtain an estimate for the product u0u1, which is on the order of 10-1110-12.

In Fig. 5, we show that Eq. (7) is in good agreement with the age-prevalence curve of CML.

Discussion

In this work, we study a multi-type branching process that starts with a large cellular population in homeostasis, and models accumulation of neutral and advantageous mutations on the way to malignancy. We derive approximations for population size and arrival time distributions for initial types with no phenotypic changes compared to healthy tissue, as well as for later types that grow abnormally. Applications to modeling the initiation of colorectal cancer and age-prevalence of chronic myeloid leukemia demonstrate the applicability of our results. Besides cancer evolution, our results are also applicable to other biological phenomena that involve a transformation of a non-growing population through sequential genetic or phenotypic alterations.

We note that the approximations presented here assume that mutation rates are much smaller than growth rates of advantageous types. In particular, for the approximations to be valid, the initial mutation rates have to be small enough compared with the first positive growth rate so that the subsequent mutation occurs when the population of the first advantageous type grows exponentially. This assumption is most likely to be violated when there is a large influx into the first type with a positive growth rate from the previous type, resulting in polynomial population growth when a subsequent mutation occurs.

Acknowledgements

This work is supported by the National Science Foundation Grant DMS-2045166.

Appendix A Methods

In this section, we provide technical details and derive the results presented in the main text. We start by listing assumptions regarding the parameter values that underlie the mathematical proofs. Part of them could be relaxed and will be discussed in A.4. We assume that genetic alterations occur at distinct rates:

Assumption 1

All the mutation rates are mutually different, i.e. uiuj,ij.

We also assume that the mutation rates of genetic alterations are much lower than the growth rates of advantageous mutants. This results in

Assumption 2

(Small Mutation Rates) 0iq and k+1jq, uiλj. The is in the sense that when taking any ui0, λj for any j is unaffected.

As we are mainly concerned with neutral and advantageous mutations, we have

Assumption 3

For all k+1iq-1, λiλi+1.

Lastly, for simplicity, we initially assume the death rates are zero, i.e.

Assumption 4

For all k+1iq, di=0.

The last two assumptions (3, 4) are not necessary, and we will discuss the case when these two assumptions do not hold in A.4.

We build upon the work by Nicholson et al. (2023), which provides long-time approximations for population sizes and waiting times in a branching process model with a surviving supercritical initial type. To state the procedure of developing results in this paper, it is necessary to introduce two sub-processes of the main model. Let eiRq+2 be the vector with the (i+1)th coordinate being 1 and all other coordinates being 0, representing the case when the process initial only consists a single type i cell. As a Markovian process, the model (1) {N(t)}t0 is induced by the initial distribution N(0)=Ne0, i.e. N initial healthy cells. We will consider two sub-processes of the main model: (i) a process that starts with a single healthy cell, i.e. {N(t)=(N0(t),N1(t),,Nq+1(t))}t0 induced by an initial distribution N(0)=e0, and (ii) a process that starts with a single type k+1 cell, i.e. {N(t)=(N0(t),N1(t),,Nq+1(t))}t0 induced by an initial distribution N(0)=ek+1.

An outline for obtaining the approximations in this paper includes three steps: First, we employ the results Nicholson et al. (2023) to approximate population sizes and waiting time distributions of {N(t)}t0. Next, by using the fact that {N(t)}t0 is essentially a delayed version of {N(t)}t0, we establish approximations of population sizes and arrival times for {N(t)}t0. Finally, we move from the process with a single initial cell to the model with N initial cells and establish our main results by utilizing the branching property.

A.1 Properties of the Model Initiated by a Single Cell with Positive Growth Rate

For {N(t)}t0, approximations for the arrival times and population sizes from type k+1 to type p are exhaustively discussed in Nicholson et al. (2023). To apply their results, we introduce the following new notation:

  1. rk+i:=#{j=1,,i:λk+j=λk+i}: Number of times λk+i has been attained over types k+1,,k+i.

  2. τk+i: Arrival time until the first type k+i cell, i.e. τk+i=inf{t0|Nk+i(t)>0}.

  3. t1/2(k+i): Median arrival time of type k+i in the process {N(t)}t0. In other words, Pτk+i>t1/2(k+i)=1/2.

Under our assumptions (14), Nicholson et al. (2023) show that there exists (see A.1.1) an approximation of process {N(t)}t0, denoted by {Z(t)={Zk+1(t),,Zq+1(t)}}t0 such that Zk+1=eλk+1tVk+1. Additionally, for i2,

limuk+10limuk+i-10limtFi-1(uk+1,,uk+i-1)t-(rk+i-1)e-λk+itZk+i(t)=Vk+i,

where Fi-1 is a known function of the i-1 mutations rates and Vk+i is a Mittag-Leffler distributed random variable. The Laplace transform of Vk+i reads

LVk+i(θ):=E[e-θVk+i]=11+(ωk+iθ)λk+1/λk+i,

with ωk+i being a constant that depends on the parameters. For computing the value of ωk+i, see the recursive formulation after Eq. (4). The Laplace transforms of Vk+1 and Vk+i are connected through

LVk+i(θ)=LVk+1(ωk+iθ)λk+1/λk+i, A1

with

LVk+1(θ)=11+θ. A2

Importantly, the recursive relationship does not depend on the distribution of Vk+1 due to the construction of {Z(t)}t0 (see A.1.1). This large-time small-mutation-rate limit leads to the following population size approximation for type k+i:

Zk+i(t)trk+i-1eλk+itVk+i. A3

It follows that Nk+i(t)trk+i-1eλk+itVk+i as well.

From this approximation, Nicholson et al. (2023) find that the arrival time of type k+i+1,i1 can be estimated by

P(τk+i+1>t)LVk+1eλk+1(t-t1/2(k+i+1)), A4

with t1/2(k+i+1) being the median of τk+i+1. t1/2(k+i+1) can be expressed using ωk+i by

t1/2(k+i+1)=1λk+ilogλk+iωk+iuk+i[λk+1-1log(uk+1-1)]rk+i-1.

Alternatively, a recursive formulation of t1/2(k+i+1) is presented in the main text (see Eq. (6)).

We note that there is subtle difference between the model in Nicholson et al. (2023) and our model regarding the mutation events. In our model, a mutation from type i to type i+1 causes the population of type i to decrease by one, i.e. (i)(i+1), while in Nicholson et al. (2023), a mutation events occurs during an asymmetric division in which the type i population is not changed, i.e. (i)(i)(i+1). Due to the fact that this difference only exists for “growing” types whose net growth rates are assumed to be much greater than mutation rates, Nicholson’s model can be treated as a good approximation for the types k+1 to q+1 in our model.

A.1.1 Construction of {Z(t)}t0

The general strategy of constructing the approximation {Z(t)}t0 was first introduced by Durrett and Moseley (2010) and recently studied by Nicholson et al. (2023). A rigorous mathematical description is presented in Nicholson et al. (2023). Briefly speaking, in the construction, we use a random variable Vi to establish a two-type stochastic process (Ai(t):=f(t)Vi,Zi+1(t)), where Zi+1(t) counts a Poisson process with intensity uiAi(t) for each single realization of Ai(t). Vi is found by taking a large time limit of f(t)-1Zi(t), so that Ai(t) is a good approximation of Zi(t). After that, Vi+1 is found to be a large time limiting random variable of Zi+1(t), and one constructs (Ai+1(t),Zi+2(t)) by the same methodology. Nicholson et al. (2023) established results that incorporate small mutation rate limits in the approximations. In our case, the construction starts at type k+1 where Zk+1:=eλk+1tVk+1 is imposed. Importantly, Nicholson et al. (2023) indicates that the population dynamics of {Z(t)}t0, in a small transition rate limit, is fully induced by the initial type, i.e. Zk+1(t). Furthermore, in the approximate model, the probability distribution of the waiting time to type k+i+1 (i.e. τk+i+1) can be expressed by LVk+1(θ) directly (Eq. (A4)).

A.2 Properties of the Model that Starts with a Single Cell with a Zero Growth Rate

Now we move our focus to the process {N(t)}t0, which represents the branching process that starts with a single type 0 cell. Recall that the growth rates for type 0 through type k are zero. This indicates that, the process {N(t)}t0 becomes {N(t)}t0 after the initial cell collects the first k+1 mutations and changes its type into type k+1. Let τk+1 be the arrival time of type k+1 in {N(t)}t0. We find that for i1

Nk+i(t)=Nk+i(t-τk+1):=Nk+i(t-τk+1)iftτk+10otherwise

and for 1lk

Nl(t)=1(τlt<τl+1). A5

Due to the above expression of Nl(t), for any 1lk, the population size follows a Bernoulli distribution with a time-dependent parameter

pl(t)=P(τlt<τl+1).

Thus, to understand the properties of N, we need to obtain the distribution of τl+1, 1lk. Since each mutation among the first k+1 neutral mutations takes an exponentially distributed time to occur, we have

τl+1=i=0lExp(ui),for1lk,

where Exp(ui) denotes an exponentially distributed random variable with density uieuit. From our Assumption 1, uiuj whenever ij. Thus, τl+1 follows a hyperexponential distribution with density

pτl+1(x)=i=0lj=0,jilujuj-uiuie-uix.

Using this density function, we find that

P(t<τl)=i=0l-1j=0,jil-1ujuj-uie-uit,P(tτl+1)=1-i=0lj=0,jilujuj-uie-uit.

Thus, it follows that the Bernoulli parameter p(t) for the neutral population is given by

p(t)=1-P(t<τl)-P(tτl+1)=i=0lj=0,jilujuj-uie-uit-i=0l-1j=0,jil-1ujuj-uie-uit=i=0l-1uii=0l1j=0,jil(uj-ui)e-uit=i=0l-1uitll!+o(ul-1l+1)asul-10. A6

In the last equation, we have used a Taylor expansion in ul-1:=(u0,u1,,ul-1) at 0.

Next, we want to identify the large-time small-mutation-rate population behavior for Nk+i(t),i1. To do that, we first observe that Zk+1(t-τk+1) admits a large-time small-mutation-rate limit.

Lemma 1

For the process {Zk+1(t-τk+1)}t0, the following large-time small-mutation-rate limit exists almost surely:

limte-λk+1tZk+1(t-τk+1)=Vk+1e-λk+1τk+1.
Proof

Notice that P(τk+1<)=1. Thus, for each realization ω, we have that

limte-λk+1tZk+1(t-τk+1(ω),ω)=e-λiτk+1(ω)limte-λk+i(t-τk+1(ω))Zk+1t-τk+1(ω),ω)=Vk+1(ω)e-λk+1τk+1(ω).

This shows that a large-time small-mutation-rate limit still exists after Zk+1(t) has been shifted by a waiting time τk+1.

We denote the above new limiting random variable by Xk+1:=Vk+1e-λk+1τk+1. Using LVk+1(θ) (Eq. (A2)), we find the Laplace transform of Xk+1

E[exp(-Xk+1θ)]=i=0kj=0,jikujuj-ui2F11,uiλk+1,1+uiλk+1,-θ. A7

2F1(a,b,c,z) denotes the Gaussian hypergeometric function (DLMF 2022, 15.1.1). Next, let Zk+1(t)=eλk+1tXk+1 be the approximation of Nk+1(t). We construct an auxiliary process {Z(t)=(Zk+1(t),Zk+2(t),,Zk+q(t))}t0 following the procedure described in A.1.1. As an approximation of (Nk+1(t),Nk+2(t),,Nk+q(t)), the auxiliary process {Z(t)}t0 suggests that

Nk+i(t)trk+i-1eλk+itXk+i,

where

LXk+i(θ)=LXk+1(ωk+iθ)λk+1/λk+i=i=0kj=0,jikujuj-ui2F11,uiλk+1,1+uiλk+1,-(ωk+iθ)λk+1/λk+i.

The constant ωk+i is given by the recursive relationship after Eq. (4). In addition, the {Z(t)}t0 construction also guarantees that

P(τk+i+1>t)LXk+1eλk+1(t-t1/2(k+i+1)). A8

Taking advantage of (A8), we get

P(τk+i+1>t)i=0kj=0,jikujuj-ui2F11,uiλk+1,1+uiλk+1,-expλk+1(t-t1/2(k+i+1)). A9

We find that (A9) can be simplified in the small u0,,uk regime, leading to

P(τk+i+1>t)1+i=0kuiλk+1k+1PolyLogk+1,-expλk+1(t-t1/2(k+i+1). A10

The derivation of Eq. (A10) is provided in Section B.

A.3 Properties of the Model that Starts with a Large Non-growing Population

Finally, we consider the population dynamics of {N(t)}t0 that starts with N type 0 cells. By the branching property, {N(t)}t0 and {N(t)}t0 are related through:

N(t)=n=1NN(n)(t), A11

where {{N(1)(t)}t0,,{N(N)(t)}t0} is a collection of independent processes that are identically distributed as {N(t)}t0. Equality (A11) reflects that, in a multitype branching process model, each individual in the initial population evolves independently.

For types with zero growth rates, we find that for any 1lk,

Nl(t)=n=1NNl(n)(t)=n=1N1(τl(n)t<τl+1(n)),

where {Nl(n)(t)}t0 are i.i.d. copies of the process {Nl(t)}t0, and τl(n) are i.i.d. copies of τl. This expression indicates that the population of type l (with zero growth rate) at time t follows a Binomial (Np(t)) distribution. We show that this result is in good agreement with exact computer simulations of the process in Fig. 6.

Fig. 6.

Fig. 6

Population dize distribution of a non-growing population. In the model described in Fig. 2, type 1 does not have a selective growth advantage. The simulated cumulative distribution (CDF) of type 1 population at t=20 is presented by the blue bars. Binomial CDFs with exact theoretical success probability p(t) (A6) and approximate p(t) (2) are presented in solid lines. Parameter values: u0=2×10-4,u1=8×10-3,u2=5×10-3,u3=6×10-3,λ2=0.7,λ3=1.0,λ4=1.0,N=104,t[0,20]. Number of realizations in computer simulation: 1000

The arrival time of type l can be treated as the minimum of the type l arrival times among the processes that each start with a single cell, i.e.

τl=minn=1,2,,Nτl(n).

It follows that P(τl>t)=P(τl>t)N. Since τl is hypo-exponentially distributed, we find

P(τl>t)=i=0l-1j=0,jij-1ujuj-uie-uitN=1-i=0l-1uitll!+o(ul-1l+1)N

For types with positive growth rates, relationship (A11) indicates that

Nk+i(t)trk+i-1eλk+itWk+i,i1,

where Wk+i=n=1NXk+i(n) and Xk+i(1),,Xk+i(N) are i.i.d. copies of Xk+i. The Laplace transform of Wk+i is found to be

LWk+i(θ)=E[exp(-θWk+i)]=En=1Nexp(-θXk+i(n))=E[exp(-θXk+i)]N=i=0kj=0,jikujuj-ui2F11,uiλk+1,1+uiλk+1,-(ωk+iθ)λk+1/λk+iN.

We find an approximate version of LWk+i(θ) in the small mutation rate parameter regime

LWk+i(θ)1+i=0kuiλk+1k+1PolyLogk+1,-(ωk+iθ)λk+1/λk+iN A12

For the arrival time of post-advantageous types, we take advantage of the relationship (A8) and simplification (A10) and to get the distribution of τk+i+1

P(τk+i+1>t)1+i=0kuiλk+1k+1PolyLogk+1,-expλk+1(t-t1/2(k+i+1)N. A13

A.4 Allowing Death and Fitness Decreasing Events

Following Nicholson et al. (2023), we allow deleterious mutations and positive death rates in the model (1) after the first advantageous mutation as long as the first type with non-zero growth rate is supercritical. Below, we list the results that reflect this parameter regime relaxation. We first introduce the following notation:

  1. δk+i:=maxj=0,1,,i{λk+j}: Running-max fitness.

  2. sk+i:=#{j=1,,i:λk+j=λk+i}: Number of times δk+i has been attained over types k+1,,k+i.

Note that when Assumption 3 holds, type i always has the largest growth rate among all types before i. Thus, rk+i=sk+i,λk+i=δk+i.

The waiting time distribution of type k+i+1 can be approximated in a form similar to Eq. (6):

P(τk+i+1>t)1+I=0kuIλk+1k+1PolyLogk+1,-expλk+1(t-t~1/2(k+i+1)N. A14

The only difference between (6) and (A14) is that the median time has been changed to t~1/2(k+i+1). Here, we have (see displays (2) and (5) in Nicholson et al. (2023))

t~1/2(k+i+1)=1δk+ilogδk+iω~k+iuk+i[δk+1-1log(uk+1-1)]sk+i-1,

with ω~k+i satisfying ω~k+1=bk+i/λk+i and

ω~k+i+1=uk+1δk+i-λk+i+1ω~k+iδk+i>λk+i+1uk+isk+iω~k+iδk+i=λk+i+1c~k+iuk+i(loguk+i-1)sk+i-1ω~k+iλk+i+1/δk+iδk+i<λk+i+1

for i1. Finally, we have

c~k+i=πbk+i+1λk+i+1δk+iλk+i+1bk+i+1δk+isk+i-1sinπδk+iλk+i+1-1.

Appendix B Derivation of the Simplified Distribution Function

In this section, we derive Eq. (A10), an approximation of Eq. (A9), using Taylor expansion of hypergeometric functions.

B.1 The Leading Order Term of the Distribution Function

The main goal is to find the leading order term of

P(τk+i+1>t)=i=0kj=0,jikujuj-ui×2F11,uiλk+1,1+uiλk+1,-expλk+1(t-t1/2(k+i+1)) B15

as uk:=(u0,u1,,uk)0. Let

G(u):=2F11,uλk+1,1+uλk+1,-expλk+1(t-t1/2(k+i+1)).

Then (B15) can be written as

i=0kuii=0kG(ui)uij=0,jik(uj-ui).

Now, consider the Taylor expansion of G at u=0, we have

G(u)=m=0k+1G(m)(0)m!um+o(uk+1).

Thus, we have that

i=0kG(ui)uij=0,jik(ui-uj)=m=0k+1(-1)mG(m)(0)m!i=0k(-ui)muij=0,jik(ui-uj)+oi=0k+1uik+1.

Then, by the fact that (see equation (7) in Supplement File (1) of Bozic et al. (2013))

i=1k(-αi)sαi1jk,ji(αj-αi)=1i=1kαis=001sk-1-1s=k

we have

P(τk+i+1>t)=1+(-1)k+2G(k+1)(0)(k+1)!i=0kui+o(ukk+1).

In the next section, we will give an explicit expression of G(k+1)(0).

B.2 Computing the Partial Derivatives of the Hypergeometric Function at a Particular Point

Here we discuss the derivatives of function G at 0. Let

g(u)=2F1(1,u,1+u,z),h(u)=uλk+1,z=-exp(λk+1(t-t1/2(k+i+1))).

It follows that

G(u)=g(h(u)),dhdu=1λk+1.

Thus, we have

dmdumG(u)=dm-1dum-1dGdu=dm-1dum-11λk+1dgdu(h(u))=1λk+1dm-1dum-1g(h(u))=1λk+1mg(m)(h(u)).

Next, the derivative of g is given by the following lemma.

Lemma 2

Let 2F1(a,b,c,z) be the Hypergeometric function and define

g(u)=2F1(1,u,1+u,z).

Then we have

g(m)(u)|u=0=(-1)m+1m!PolyLog(m,z)

where PolyLog(m,z) denotes the PolyLogrithm (DLMF 2022, 25.12.10).

With this lemma, we immediately see that

dmdumG(u)|u=0=1λk+1mg(m)(h(u))|u=0=1λk+1m(-1)m+1m!PolyLog(m,z).
Proof

Following Ancarani and Gasaneo (2009), to find the mth derivative of g, we define F(u,z):=2F1(1,u,1+u,z) and apply the hypergeometric differential equation. The hypergeometric function satisfies the following second-order ODE:

z(1-z)d2dz2+(c-(a+b+1)z)ddz-ab2F1(a,b,c,z)=0.

Thus, for F, we have that

z(1-z)d2dz2+(1+u-(2+u)z)ddz-uF=0. B16

Next, since F is analytic in u, we can take a derivative with respect to u on both sides, which results in

z(1-z)d2dz2+(1+u-(2+u)z)ddz-uFu=-(1-z)dFdz+F.

Then, by the formula (DLMF 2022, 15.5.21)

c(1-z)ddz2F1(a,b,c,z)=(c-a)2F1(a,b,c+1,z)+c(a+b-c)2F1(a,b,c,z).

we get that

(1-z)ddz2F1(1,u,1+u,z)=u1+u2F1(1,u,2+u,z).

Let fm:=mumF|u=0=g(m)|u=0. If we look at the derivative at u=0 on both sides, we get

z(1-z)d2dz2+(1-2z)ddzf1=1,f1(0)=0,

where we have used the fact that 2F1(1,0,1,z)1. The above equation is a second order linear equation, but one can see that by letting v(z)=f1(z), the ODE reduces to a first order equation. Hence, we get a unique solution

f1(z)=-log(1-z).

For a general m, the ODE for fm is

z(1-z)d2dz2+(1-2z)ddzfm=-m(1-z)dfm-1dz+mfm-1,fm(0)=0. B17

This can be obtained by taking derivatives of the both sides of (B16) m times. Let us denote

pm(z)=PolyLog(m,z).

We use induction to show that

fm(z)=g(m)(u)|u=0=(-1)m+1m!pm(z)

satisfies (B17). For the base case when m=1, we have proved above that

f1(z)=-log(1-z)=p1(z).

Next, we move to the induction part. Suppose that our claim holds true for m, that is

fm(z)=(-1)m+1m!pm(z)

solves (B17). Then, the (m+1)-st equation reads

z(1-z)d2dz2+(1-2z)ddzfm+1=-(m+1)(1-z)dfmdz+(m+1)fm,fm+1(0)=0.

From the derivative formula of the polylogrithm (see (18) at Polylogarithm), we have that

ddzpm+1(z)=1zpm(z),d2dz2pm+1(z)=-1z2pm(z)+1zddzpm(z).

It follows that

z(1-z)d2dz2+(1-2z)ddzpm+1(z)=(1-z)ddzpm(z)-pm(z).

Thus, we see that

z(1-z)d2dz2+(1-2z)ddz-(m+1)pm+1(z)=-(m+1)(1-z)ddzpm(z)+(m+1)pm(z).

Multiplying both sides by (-1)m+1m!, we have

z(1-z)d2dz2+(1-2z)ddz(-1)m+2(m+1)!pm+1(z)=-(m+1)(1-z)ddzfm(z)+(m+1)fm(z).

This shows that fm+1(z)=(-1)m+2(m+1)!pm+1(z) and finishes the induction. Finally, we conclude that

fm(z)=(-1)m+1m!PolyLog(m,z).

Appendix C Improving the Accuracy of Approximations for the Waiting Time Distributions

An improvement of the main results can be achieved when the population size of type k+1 can be better estimated. Recall the density function of the waiting time to the first type k+1 cell when starting from a single wild type cell (see Section A.2):

pτk+1(x)=i=0kj=0,jikujuj-uiuie-uix.

Hence the expectation for the size of type k+1 population (when starting from a single wild type cell) is given by

E[Nk+1(t)]=0teλk+1(t-x)i=0kj=0,jikujuj-uiuie-uixdx=i=0kj=0,jikujuj-uiuiλk+1+uieλk+1t-e-uit. C18

We performed a small mutation rate approximation of Eq. (C18) regarding u0,u1,,uk and found that

E[Nk+1(t)]i=0kuiλk+1k+1eλk+1t-n=0kλk+1ntnn!.

Thus, a corresponding approximation for the expected population size of type k+1 when there are N initial cells is

E[Nk+1(t)]Ni=0kuiλk+1k+1eλk+1t-n=0kλk+1ntnn!. C19

Note that the limiting random variable of type k+1 admits (see (4) for its approximate Laplace transform)

E[Wk+1]=-θLWk+1(θ)|θ=0-θ1+i=0kuiλk+1k+1PolyLogk+1,-θN|θ=0=Ni=0kuiλk+1k+1.

This motivates us to estimate the population size of Nk+1(t) by

Nk+1(t)eλk+1t-n=0kλk+1ntnn!Wk+1. C20

Let pm,n(t) represent the probability of having at least one type n cell in a branching process that starts with a single type m cell. The distribution of the waiting time to type k+i can be approximated by

P(τk+i>t)=EEexp-0tuk+1Nk+1(s)pk+2,k+i(t-s)ds|Nk+1(s),stEEexp-0tuk+1eλk+1s-n=0kλk+1nsnn!Wk+1pk+2,k+i(t-s)ds|Wk+1=LWk+1uk+10teλk+1s-n=0kλk+1nsnn!pk+2,k+i(t-s)ds. C21

Note that pm,n(t) coincides with the cumulative distribution of the waiting time to the first type n cell when the process initially has a single type m cell.

pm,n(t)=P(τm,nt):=P(Nn(t)>0|Nm(0)=1,Nl(0)=0,forlm).

In the case that m=n, pm,n(t)1 is well defined. Adapting the estimation of pk+2,k+i(t) from the construction of the first auxiliary process N(t) gives us

pk+2,k+i(t)1ifi=2,1-1+expλk+2(t-t1/2(k+2,k+i))-1otherwise. C22

Plugging (C22) into (C21) gives us the improved formula for approximating the waiting time to type k+1. When i=2, we obtain an explicit expression:

P(τk+2t)1-1+i=0kuiλk+1k+1PolyLogk+1,-uk+1λk+1eλk+1t-n=0k+1(λk+1t)nn!N C23

In general, for i>2, there is no explicit solution for the integral

Ik,i:=0teλk+1s-n=0kλk+1nsnn!1-11+expλk+2(t-t1/2(k+2,k+i)ds. C24

However, for small k, the integral (C24) can be evaluated explicitly. In this case, the approximation for the distribution function of τk+i for any ki is available through expression (C21).

C.1 Explicit Approximations for an Evolutionary Pathway with a Single Neutral Type

We have introduced an approach in Section C for improving the approximations of the waiting time distributions through a more precise estimate of the population size of type k+1 cells. In particular, when there is only one neutral type, k=0, the population size of type 1 cells can be approximated by (see (C20))

N1(t)(eλ1t-1)W1.

Next, Eq. (C23) gives us an approximation of the distribution function of τ2 (k=0, i=2)

P(τ2t)1-1-u0λ1log1+u1λ1(eλ1t-λ1t-1)N, C25

where we have used the fact that PolyLog(1,z)=-log(1-z). Specifically, the expression can be further simplified since u0 and u1 are small. Using the approximation log(1+x)x,(1-x)N1-Nx when x is small, we obtain

P(τ2t)1-1-u0λ1log1+u1λ1(eλ1t-λ1t-1)N1-1-u0u1λ12(eλ1t-λ1t-1)NNu0u1λ12(eλ1t-λ1t-1). C26

Expression (C26) is used for fitting the CML prevalence curve in the main text (See Fig. 5). In the curve fitting, the two identifiable terms are the multiplication Nu0u1 and the growth rate λ1. In particular, each of the parameters N,u0, and u1 is not identifiable in the curve fitting. The approximation (C26) (or (7) in the main text) is in good agreement with the exact computer simulations (see Fig. 8).

Fig. 8.

Fig. 8

Comparison of Eq. (C26) (or (7) in the main text) with exact Gillespie computer simulations. Parameter values for the the computer simulations: N=105,u0=1×10-6,u1=1.32×10-6,λ1=0.0406 per year. Number of realizations in computer simulation: 5×105

Next, to obtain the distribution for τi,k=0,i>2, we evaluate the integral (C24). The result reads

I0,i=1λ1[e2λ1tF11,λ1λ2,1+λ1λ2,-eλ2t1/2(2,i)-2F11,λ1λ2,1+λ1λ2,-e-λ2(t-t1/2(2,i))-λ1t+λ1λ2log1+eλ2t1/2(2,i)1+e-λ2(t-t1/2(2,i))].

Hence, for k=0,i>2, the improved waiting time distribution approximation is given by

P(τit)1-(1-u0λ1log[1+u1λ1(e2λ1tF11,λ1λ2,1+λ1λ2,-eλ2t1/2(2,i)-2F11,λ1λ2,1+λ1λ2,-e-λ2(t-t1/2(2,i))-λ1t+λ1λ2log1+eλ2t1/2(2,i)1+e-λ2(t-t1/2(2,i))])])N. C27

C.2 Explicit Approximations for an Evolutionary Pathway with Two Neutral Types

When there are two two neutral types in our model, i.e. k=1, the improved approach for approximating waiting time distributions (see Section C) implies that (see (C20))

N2(t)(eλ2t-λ2t-1)W2.

When k=1,i=2, Eq. (C23) gives the approximation for the cumulative distribution function of τ3:

P(τ3t)1-1+u0u1λ22PolyLog2,-u2λ2eλ2t-n=02(λ2t)nn!N. C28

To approximate the distribution function for each waiting time τk+i for i>2, we first evaluate the integral I1,i (defined by (C24)). We found that

I1,i=1λ2[e2λ2tF11,λ2λ3,1+λ2λ3,-eλ3t1/2(3,1+i)-2F11,λ2λ3,1+λ2λ3,-e-λ3(t-t1/2(3,1+i))-12λ22t2+λ22λ32PolyLog2,-eλ3t1/2(3,1+i)-PolyLog2,-e-λ3(t-t1/2(3,1+i))+λ22λ3tlog1+eλ3t1/2(3,1+i)-λ2t+λ2λ3log1+eλ3t1/2(3,1+i)1+e-λ3(t-t1/2(3,1+i))].

Next, plugging the integral I1,i into the expression (C21) give as the distribution for each τk+i where k=1,i>2. Specifically, the improved waiting time distribution approximation of τ1+i is given by

P(τ1+it)1-(1+u0u1λ22PolyLog(2,-u2λ2(e2λ2tF11,λ2λ3,1+λ2λ3,-eλ3t1/2(3,1+i)-2F11,λ2λ3,1+λ2λ3,-e-λ3(t-t1/2(3,1+i))+λ22λ3tlog1+eλ3t1/2(3,1+i)-12λ22t2+λ22λ32PolyLog2,-eλ3t1/2(3,1+i)-PolyLog2,-e-λ3(t-t1/2(3,1+i))-λ2t+λ2λ3log1+eλ3t1/2(3,1+i)1+e-λ3(t-t1/2(3,1+i)))])N. C29

The improved approximations (C28) or (C29) are in good agreement with exact computer simulations (Fig. 9).

C.3 Derivation of Approximate Expectation for Type k + 1

The goal in this section is to perform the approximation

E:=E[Nk+1(t)]=i=0kj=0,jikujuj-uiuiλk+1+uieλk+1t-e-uiti=0kuiλk+1k+1eλk+1t-n=0kλntnn!. C30

Let

G(u)=uλk+1+ueλk+1t-e-ut.

Then, the target expression can be rewritten into

E=i=0kuii=0kG(ui)uij=0,jik(uj-ui).

Next, following the same derivations in Section B.1, we can get that

E(-1)k+2G(k+1)(0)(k+1)!i=0kui C31

as u0,,uk0. Lastly, since

uk+1uλk+1+u|u=0=(-1)k(k+1)!λk+1k+1

and

uk+1ue-utλk+1+u|u=0=(-1)k(k+1)!λk+1k+1n=0kλntnn!,

we obtain that

E[Nk+1(t)]i=0kuiλk+1k+1eλk+1t-n=0kλntnn!.

Code availability

For access to Gillespie simulation code, please contact the authors.

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Ancarani LU, Gasaneo G (2009) Derivatives of any order of the gaussian hypergeometric function 2F1(a, b, c; z) with respect to the parameters a, b and c. J Phys A Math Theor 42(39):395208. 10.1088/1751-8113/42/39/395208 [Google Scholar]
  2. Armitage P, Doll R (1954) The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer 8(1):1–12. 10.1038/bjc.1954.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Avanzini S, Antal T (2019) Cancer recurrence times from a branching process model. PLoS Comput Biol 15(11):e1007423. 10.1371/journal.pcbi.1007423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, Chen S, Karchin R, Kinzler KW, Vogelstein B, Nowak MA (2010) Accumulation of driver and passenger mutations during tumor progression. Proc Natl Acad Sci USA 107(43):18545–18550. 10.1073/pnas.1010978107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bozic I, Reiter JG, Allen B, Antal T, Chatterjee K, Shah P, Moon YS, Yaqubie A, Kelly N, Le DT, Lipson EJ, Chapman PB, Diaz Luis AJ, Vogelstein B, Nowak MA (2013) Evolutionary dynamics of cancer in response to targeted combination therapy. Elife 2:e00747. 10.7554/eLife.00747 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Campbell F, Williams GT, Appleton MA, Dixon MF, Harris M, Williams ED (1996) Post-irradiation somatic mutation and clonal stabilisation time in the human colon. Gut 39 [DOI] [PMC free article] [PubMed]
  7. Deininger MW, Goldman JM, Melo JV (2000) The molecular biology of chronic myeloid leukemia. Blood 96(10):3343–3356. 10.1182/blood.V96.10.3343 [PubMed] [Google Scholar]
  8. DLMF (2022) Nist digital library of mathematical functions. Release 1.1.8 of 2022-12-15, http://dlmf.nist.gov/
  9. Durrett R, Foo J, Leder K, Mayberry J, Michor F (2011) Intratumor heterogeneity in evolutionary models of tumor progression. Genetics 188(2):461–477. 10.1534/genetics.110.125724 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Durrett R, Moseley S (2010) Evolution of resistance and progression to disease during clonal expansion of cancer. Theor Popul Biol 77(1):42–48. 10.1016/j.tpb.2009.10.008 [DOI] [PubMed] [Google Scholar]
  11. Fearon ER (2011) Molecular genetics of colorectal cancer. Annu Rev Pathol 6:479–507. 10.1146/annurev-pathol-011110-130235 [DOI] [PubMed] [Google Scholar]
  12. Foo J, Leder K, Zhu J (2014) Escape times for branching processes with random mutational fitness effects. Stoch Process Their Appl 124(11):3661–3697. 10.1016/j.spa.2014.06.003 [Google Scholar]
  13. Komarova NL, Wodarz D (2005) Drug resistance in cancer: principles of emergence and prevention. Proc Natl Acad Sci USA 102(27):9714–9719. 10.1073/pnas.0501870102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lee-Six H, Øbro NF, Shepherd MS, Grossmann S, Dawson K, Belmonte M, Osborne RJ, Huntly BJ, Martincorena I, Anderson E, O’Neill L, Stratton MR, Laurenti E, Green AR, Kent DG, Campbell PJ (2018) Population dynamics of normal human blood inferred from somatic mutations. Nature 561 [DOI] [PMC free article] [PubMed]
  15. Meza R, Jeon J, Moolgavkar SH, Luebeck EG (2008) Age-specific incidence of cancer: phases, transitions, and biological implications. Proc Natl Acad Sci USA 105(42):16284–16289. 10.1073/pnas.0801151105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Michor F, Iwasa Y, Nowak MA (2006) The age incidence of chronic myeloid leukemia can be explained by a one-mutation model. Proc Natl Acad Sci USA 103(40):14931–14934. 10.1073/pnas.0607006103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Mitchell E, Chapman MS, Williams N, Dawson KJ, Mende N, Calderbank EF, Jung H, Mitchell T, Coorens TH, Spencer DH, Machado H, Lee-Six H, Davies M, Hayler D, Fabre MA, Mahbubani K, Abascal F, Cagan A, Vassiliou GS, Baxter J, Martincorena I, Stratton MR, Kent DG, Chatterjee K, Parsy KS, Green AR, Nangalia J, Laurenti E, Campbell PJ (2022) Clonal dynamics of haematopoiesis across the human lifespan. Nature 606 [DOI] [PMC free article] [PubMed]
  18. Morin PJ, Sparks AB, Korinek V, Barker N, Clevers H, Vogelstein B, Kinzler KW (1997) Activation of Inline graphic-catenin-Tcf signaling in colon cancer by mutations in Inline graphic-catenin or APC. Science 275(5307):1787–1790. 10.1126/science.275.5307.1787 [DOI] [PubMed] [Google Scholar]
  19. Nicholson AM, Olpe C, Hoyle A, Thorsen AS, Rus T, Colombé M, Brunton-Sim R, Kemp R, Marks K, Quirke P et al (2018) Fixation and spread of somatic mutations in adult human colonic epithelium. Cell Stem Cell 22(6):909–918. 10.1016/j.stem.2018.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Nicholson MD, Antal T (2019) Competing evolutionary paths in growing populations with applications to multidrug resistance. PLoS Comput Biol 15:1–25. 10.1371/journal.pcbi.1006866 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Nicholson MD, Cheek D, Antal T (2023) Sequential mutations in exponentially growing populations. PLoS Comput Biol 19(7):1–32. 10.1371/journal.pcbi.1011289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Paterson C, Clevers H, Bozic I (2020) Mathematical model of colorectal cancer initiation. Proc Natl Acad Sci USA 117(34):20681–20688. 10.1073/pnas.2003771117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Potten CS, Booth C, Hargreaves D (2003) The small intestine as a model for evaluating adult tissue stem cell drug targets. Cell Prolif. 10.1046/j.1365-2184.2003.00264.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Snippert HJ, Schepers AG, Van Es JH, Simons BD, Clevers H (2014) Biased competition between Lgr5 intestinal stem cells driven by oncogenic mutation induces clonal expansion. EMBO Rep 15(1):62–69. 10.1002/embr.201337799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Tomasetti C, Marchionni L, Nowak MA, Parmigiani G, Vogelstein B (2015) Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc Natl Acad Sci USA 112(1):118–123. 10.1073/pnas.1421839112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10(8):789–799. 10.1038/nm1087 [DOI] [PubMed] [Google Scholar]
  27. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW (2013) Cancer genome landscapes. Science 339(6127):1546–1558. 10.1126/science.1235122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wang Y, Boland CR, Goel A, Wodarz D, Komarova NL (2022) Aspirin’s effect on kinetic parameters of cells contributes to its role in reducing incidence of advanced colorectal adenomas, shown by a multiscale computational study. Elife 11:e71953. 10.7554/eLife.71953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Zhang R, Ukogu OA, Bozic I (2023) Waiting times in a branching process model of colorectal cancer initiation. Theor Popul Biol 151:44–63. 10.1016/j.tpb.2023.04.001 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

For access to Gillespie simulation code, please contact the authors.


Articles from Bulletin of Mathematical Biology are provided here courtesy of Springer

RESOURCES