Significance
Cancer evolution cannot be observed directly in patients, and new methodologies are needed for obtaining a quantitative understanding of this obscure process. We develop and analyze a stochastic model of malignant transformation in the colon that precisely quantifies the process of colorectal carcinogenesis in patients through loss of tumor suppressors APC and TP53 and gain of the KRAS oncogene. Our study employs experimentally measured mutation rates in the colon and growth advantages provided by driver mutations. We calculate the probability of a colorectal malignancy, the sizes of premalignant lesions, and the order of acquisition of driver mutations during colorectal tumor evolution. Our model can be used as a starting point in evaluating early detection and treatment efforts.
Keywords: cancer evolution, driver mutations, stochastic model
Abstract
Quantifying evolutionary dynamics of cancer initiation and progression can provide insights into more effective strategies of early detection and treatment. Here we develop a mathematical model of colorectal cancer initiation through inactivation of two tumor suppressor genes and activation of one oncogene, accounting for the well-known path to colorectal cancer through loss of tumor suppressors APC and TP53 and gain of the KRAS oncogene. In the model, we allow mutations to occur in any order, leading to a complex network of premalignant mutational genotypes on the way to colorectal cancer. We parameterize the model using experimentally measured parameter values, many of them only recently available, and compare its predictions to epidemiological data on colorectal cancer incidence. We find that the reported lifetime risk of colorectal cancer can be recovered using a mathematical model of colorectal cancer initiation together with experimentally measured mutation rates in colorectal tissues and proliferation rates of premalignant lesions. We demonstrate that the order of driver events in colorectal cancer is determined primarily by the fitness effects that they provide, rather than their mutation rates. Our results imply that there may not be significant immune suppression of untreated benign and malignant colorectal lesions.
Worldwide efforts to sequence cancer genomes over the past decade have provided insight into the number and type of genetic alterations implicated in tumor progression (1). Recent work suggests that the acquisition of just three driver alterations (2), often inactivation of two tumor suppressor genes (TSGs) and activation of one oncogene (1, 3), is sufficient to produce an invasive phenotype. In the case of colorectal cancer (CRC), the most significant driver events are inactivation of the TSGs APC and TP53 and activation of the KRAS oncogene (4). Activation of an oncogene requires a mutation of a single copy of the gene; inactivation of a TSG requires two genetic alterations (inactivation of both copies) (5), leading to approximately five mutational stages on the way to CRC.
Early work on multistage modeling of cancer incidence aimed to determine the number of mutations (stages) needed for cancer initiation by fitting multistage models to cancer incidence data (6, 7). Notably, Armitage and Doll (6) studied sequential accumulation of neutral mutations in a tissue and reported that approximately six stages were needed for CRC initiation. Knudson (7) used mathematical modeling and incidence data to conclude that two hits are needed for initiation of retinoblastoma; this finding was subsequently confirmed by the discovery of the Retinoblastoma (Rb) TSG. Subsequent work studied two-stage models allowing for clonal expansions (8, 9) or n-stage models where mutations accumulate in a predetermined (linear) order, typically allowing the last premalignant stage to provide proliferative advantage to cells, with the goal of explaining reported cancer incidence and estimating mutation and proliferation rates from cancer incidence data (10, 11). However, even though some driver mutations tend to be acquired before others on the way to cancer, this order is frequently violated (12–14).
In the context of CRC, Luebeck, Moolgavkar, and coworkers studied the two- to five-stage multistage clonal expansion (MSCE) model, which includes a clonal expansion of cells in the final premalignant stage (10, 11), and showed that it provides a better fit to CRC incidence data compared to Armitage and Doll’s neutral model (11). The MSCE model was later expanded to include the clonal expansion of malignant cells, allowing for modeling of the size distribution of the malignant population (15). Recently, Lang et al. studied cancer incidence and size distributions of premalignant (adenoma) and malignant cells in a two-stage branching process (16) model of CRC initiation and fit their results to data on size distribution of adenomas and carcinomas to obtain estimates for rates of development of CRC (17).
In contrast to previous approaches, here we use a genomically informed stochastic model in which we allow mutations to occur in any order, leading to a biologically realistic complex network of possible evolutionary pathways that lead to cancer. Our approach is based on experimentally testable hypotheses about molecular mechanisms and gives predicted incidence and timing of mutations as outputs. We parameterize the model using recent experimental estimates of in vivo mutation rates in the colon (18) and selective advantages of driver mutations of interest (19, 20) and show that reported lifetime CRC risk can be recapitulated by assuming that APC inactivation and KRAS activation provide small proliferative advantages to colorectal crypts, which is consistent with current experimental measures. We use the model to provide estimates of the sizes of premalignant lesions and mutation order in the earliest stages of colorectal tumorigenesis.
Results
Model and Parameters.
We develop and analyze a stochastic mathematical model of CRC initiation through acquisition of three driver events: inactivation of tumor suppressors APC and TP53 and activation of the KRAS oncogene. In this model, normal (wild-type) tissue consists initially of N colorectal crypts. Inactivation of a TSG requires inactivation of both alleles (TSG−/−). KRAS, an oncogene, can be activated through mutation of a single copy of the gene (KRAS+). We do not impose any particular order in which these genetic events are acquired. This leads to a complex network of available five-step pathways from an initially wild-type colorectal crypt to the one that has collected all three driver events (Fig. 1A).
Fig. 1.
(A) Schematic of CRC initiation. A healthy (wild-type) crypt is in the lower left corner, and a fully malignant crypt at the top right. (B) (Top) Transition rates from APC wild-type genotype (0,*,*) to fully inactivated APC through LOH and mutation (or vice versa) (3,*,*) or double mutation (4,*,*). (Bottom) Transition rate from wild-type KRAS (*,*,0) to activated KRAS (*,*,1).
Each allele of a TSG can be inactivated through either mutation or loss of the allele via a loss of heterozygosity (LOH) event. Loss of both alleles is not allowed, as this is usually fatal for the cell. rG is the rate of mutation of a gene G, and rLOH is the rate of loss of an allele of a TSG (Materials and Methods). In our model, the wild-type crypt genotype is denoted by (0,0,0), corresponding to no driver alterations in any of the three genes. An APC wild-type crypt (APC+/+), whose genotype is denoted (0,*,*), can inactivate a single copy of the APC gene (APC+/−) through LOH, which occurs with rate rLOH, leading to genotype (1,0,0), or through mutation, which occurs with rate rAPC, leading to genotype (2,0,0).
If the first copy of APC was inactivated through LOH, the second copy can only be inactivated through mutation. rAPC is the rate of driver mutation of the wild-type APC gene, which has two alleles available for an inactivating mutation. In an APC+/− gene, there is only one wild-type allele available for an inactivating mutation, so mutation from genotype (1,*,*) occurs with rate rAPC/2, leading to genotype (3,*,*). If the first copy of APC was inactivated through mutation, the second copy can be inactivated through LOH [with rate rLOH/2, leading to genotype (3,*,*)], or through a second mutation [with rate rAPC/2, leading to genotype (4,*,*)]. The possible driver events in the APC gene and their corresponding rates are summarized in Fig. 1B. Genetic alterations in TP53 occur in a similar manner as described above for APC (with mutation rate rTP53 instead of rAPC). For example, we denote by (0,2,0) a crypt that has mutated a single copy of TP53. A KRAS wild-type crypt has genotype (*,*,0) and can gain an activating KRAS mutation (KRAS+) with rate rKRAS, leading to genotype (*,*,1). The assumptions described above lead to 50 possible genotypes (5 · 5 · 2), including the four malignant genotypes (3, 3, 1), (4, 3, 1), (3, 4, 1), and (4, 4, 1). These transitions and their corresponding rates are summarized in Fig. 1B.
We can view the model of CRC initiation through the acquisition of the three driver events as the process of movement of crypts from the initial wild-type genotype (0,0,0) through a “space” consisting of different possible premalignant genotypes and ending in one of the malignant genotypes. In addition, we allow crypts that have accumulated certain driver alterations to undergo fission and increase in number. Crypt fission involves the division of a single crypt into two daughters through a bifurcation of a parental crypt (21). While normal crypts divide only very rarely in adult colonic tissue (20), crypt fission drives the expansion of mutant crypts in premalignant colorectal lesions (adenomas) (21). We assume that crypt fission occurs stochastically and follows a pure birth process (22) with some fixed birth (i.e., division) rate that depends on the crypt genotype.
Many of the parameter values for this model have only recently become available. Mutation rates are gene-specific: They are the product of the number of positions in the gene that will result in (in)activation if mutated (driver positions) and the point mutation rate. The number of driver positions in the specific gene can be estimated from genetics knowledge and cancer mutation databases (23, 24) (see Materials and Methods for details). Intestinal stem cells reside at the bottom of intestinal crypts, with each crypt containing five to eight stem cells (25). It has recently become possible to measure the rate at which mutations accumulate in normal tissues, using organoids developed from single healthy stem cells from the colon, small intestine, and liver (18). Blokzijl et al. (18) reported that individual stem cells of the colon accumulate about 40 mutations per year across the entire genome, leading to a rate of u ∼1.25*10−8 mutations per year per base pair. As colonic stem cells divide on average every 5 d (26), this measured mutation rate corresponds to mutation rate of ∼1.7 * 10−10 per base pair per stem cell division, in line with estimates of normal mutation rate per cell division in human cells (27). New mutations that appear in a single crypt will either be lost or will fixate in the crypt (28), suggesting that the measured rate of accumulation of mutations applies at the level of a single crypt.
There are approximately n = 107 to 108 crypts in the human colon (26, 29). It has been shown that APC inactivation and KRAS activation lead to clonal expansion of crypts carrying one of these alterations (30, 31). Division rates of APC−/− and KRAS+ crypts have been recently measured to be b1 = 0.2/y and b2 = 0.07/y, respectively (19, 20). In contrast, TP53 has been shown not to provide any fitness advantage on its own in normal conditions (32). As normal crypts divide only very rarely (20), we set division rates of all genotypes without inactivated APC or activated KRAS to 0. In the following, we will consider three scenarios: 1) all drivers are neutral, including APC and KRAS, and division rates of all genotypes are set to 0; 2) APC provides growth advantage: division rate of APC−/− crypts is set to b1 = 0.2/y, genotypes without APC inactivated have division rates set to 0; and 3) APC and KRAS provide growth advantage: division rate of APC−/− crypts is b1 = 0.2/y, division rate of KRAS+ crypts is b2 = 0.07/y, division rate of double-mutant APC−/−KRAS+ crypts is b12 = b1 + b2 = 0.27/y, and all other genotypes have division rates set to 0. Setting the division rate of double-mutant APC−/−KRAS+ crypts to b12 = b1 + b2 amounts to assuming that there are no epistatic interactions between APC and KRAS: While this is an area of active research, the size and direction of any such effect, if it exists, is not presently known with confidence for these two genes (33). Using the three scenarios, we will demonstrate that neutral accumulation of driver mutations is not sufficient to produce reported lifetime risk of CRC.
Time of Appearance of the First Malignant Crypt.
We are interested in the time of appearance of the first crypt that has collected all three driver events—the first malignant crypt. Approximately 15% of CRCs harbor all three driver events in question (APC, TP53, and KRAS) (34). The lifetime risk of colorectal adenocarcinoma (35) is ∼4.2%. Thus, the lifetime risk of developing CRC containing these three driver events is ∼0.6%. We will compare our model predictions to this incidence rate.
We analyze the model through computer simulations and analytic calculation. First, we study a model in which driver events are neutral until all three drivers are collected, that is, we set crypt division rates of all genotypes to 0. We show that the neutral assumption leads to incidence rates that are many orders of magnitude lower than what is reported. In the case of neutral accumulation of driver mutations, the probability that a malignant crypt exists by time t is given by (SI Appendix)
| [1] | 
In Fig. 2, we plot the probability of a malignant crypt as a function of time obtained using Eq. 1, assuming n = 108 crypts in the colon (29). If all mutations are neutral, probability of a malignant crypt at age 80 y (26) is less than 1 in 100 million—six orders of magnitude lower compared to the ∼1% of the population that develop CRC through acquisition of these three driver events. The assumption that all three driver mutations are neutral is thus not consistent with reported incidence.
Fig. 2.
Comparison of Eqs. 1–3, computer simulations of the process of CRC initiation, and lifetime risk of CRC containing mutations in APC, KRAS, and TP53; 8*106 runs of the computer simulation were performed for each setting: when APC provides advantage and KRAS is neutral, compared to Eq. 2, and when both APC and KRAS provide advantage, compared with Eq. 3.
In agreement with this analysis, it has been shown that APC inactivation in colorectal stem cells does lead to a fitness advantage, and that stem cells that sustain APC loss will go on to produce an adenoma (polyp), which can be up to a centimeter in diameter or larger (30). This will dramatically increase the chances for subsequent driver mutations within one of the cells of the APC−/−-mutant clonal polyp.
We generalize Eq. 1, which was derived under assumption of no crypt fission, to the case when APC inactivation provides crypts with proliferative advantage in the form of division rate b1 (SI Appendix). All other genotypes (without inactivated APC) have division rates set to 0. In that case, probability of a fully malignant crypt being present by time t is given by
| [2] | 
In Fig. 2, we plot the probability of a malignant crypt as a function of time obtained from Eq. 2 and compare it to computer simulations of the process. Eq. 2 agrees well with probability of malignancy obtained from simulations (Fig. 2). Assuming that APC inactivation provides proliferative advantage to crypts leads to the probability of a malignant crypt at age 80 y, P(80) = 3.4*10−5, which is still approximately three orders of magnitude lower than reported incidence.
We note that Eq. 2 does not take into account the increased chance that an APC-mutated stem cell will take over a crypt. It was measured that a stem cell with a single inactivated copy of the gene (APC+/−) has a 2.1 times higher chance to take over a crypt compared to a wild-type stem cell, and that a stem cell with both copies inactivated has a 2.8 times increased chance to take over a APC+/− crypt (32, 36). Thus the probability from Eq. 2 should be multiplied by c1 = 2.1*2.8 = 5.88 to obtain the corrected probability of a malignant crypt at age 80 y, Pc(80) = 2*10−4, still 30-fold lower than reported incidence.
Finally, we study a model in which APC and KRAS, but not TP53, provide selective growth advantage to crypts, as TP53 mutation does not seem to provide selective advantage on its own in normal conditions (32) (i.e., without the presence of colitis). KRAS has been reported to provide growth advantage to colorectal stem cells and crypts, typically leading to small clonal expansions of KRAS-mutated crypts (31). In this final scenario, we set division rate of APC−/− crypts to b1 = 0.2/y, division rate of KRAS+ crypts to b2 = 0.07/y, division rate of double-mutant APC−/−KRAS+ crypts to b12 = b1+b2 = 0.27/y, and all other genotypes have division rates set to 0.
We derive the approximate formula for the probability that a malignant crypt is present by age t (SI Appendix):
| [3] | 
Here c = c1 * c2 = 5.88 * 3.6 = 21.2 is the correction to the formula to account for increased chance of fixation of APC and KRAS mutated stem cells in a crypt (32, 36). We plot Eq. 3 as a function of age t and compare it to computer simulations of the mathematical model of CRC initiation in Fig. 2. Eq. 3 is in good agreement with computer simulations of the process until age 60 y but starts to overestimate the probability of malignancy at later times. We note that the chance that a malignant crypt is present at age 80 y from computer simulations of our model is 0.5%, very close to the lifetime risk of CRC through the APC/TP53/KRAS pathway of 0.6% (Fig. 2). The comparison remains similar if we use cumulative CRC risk until age 85 y from the SEER database (35), 0.15*5% = 0.75%, which allows for 5 y between the first malignant cell and cancer detection (15).
In Fig. 2, we used n = 108 for the number of colorectal crypts (29). As the estimates for the number of crypts vary between 4*107 (26) and 108 (29), we explored which degree of epistasis would explain the lifetime CRC risk, assuming that n = 4*107. Using computer simulations of our model, the closest fit to the reported lifetime risk is obtained by assuming that the division rate of double APC−/−KRAS+ mutants is b12 = 0.31/y, which is ∼15% higher than the growth rate of double APC−/−KRAS+ mutants without epistasis (0.27/y).
Number of Mutated Crypts.
We next use our model to calculate the number of mutated colonic crypts as a function of time. The formulas for expected numbers of crypts that have inactivated a single tumor suppressor or activated the KRAS oncogene are given in SI Appendix and plotted in Fig. 3A. KRAS activation requires only a single hit; initially, the number of KRAS-mutated crypts is much higher than crypts with inactivated APC or TP53. However, due to the much faster fission rate of APC-inactivated crypts, their number surpasses the number of KRAS-mutated crypts by middle age. The number of crypts with only TP53 inactivated remains low throughout the human lifetime, due to a lack of growth advantage provided by TP53 alone (32) (Fig. 3A).
Fig. 3.
Expected number of (A) single- and (B) double-mutant colorectal crypts as a function of time. Formulas for expected numbers of mutated crypts used to produce the plots are given in SI Appendix.
Formulas for the expected numbers of double-mutant colonic crypts as a function of age are given in SI Appendix and plotted in Fig. 3B. APC−/−KRAS+ crypts far outnumber the other two types of doubly mutated crypts: By age 60 y, the expected number of APC−/−KRAS+ crypts is ∼1,000, whereas the numbers of APC−/−TP53−/− and TP53−/−KRAS+ crypts stay below ∼10 and below 1, respectively, throughout the human lifespan (Fig. 3B).
Order of Mutations.
The primary outstanding question that we aim to address is in what order these three functional mutations (APC, TP53, and KRAS) typically appear over the course of cancer initiation. To answer this, we must differentiate between two relevant questions: 1) In which order do driver mutations arise in a typical colorectal crypt, and 2) what is the order of mutations that was experienced by the first malignant crypt? A typical colorectal crypt will not have any of the driver events in question: 99.998% of N initially healthy crypts will not have activated KRAS or deactivated APC or TP53 by age 80 y. Of the small number of N initial crypts that have collected a driver event, the most likely first event in a typical crypt is the activation of KRAS, as it only requires a single mutation (hit). This starts to answer question 1: A large majority of crypts will be functionally normal, and the most likely first driver accumulated by a typical colorectal crypt is KRAS.
We are however, more interested in question 2, the order of driver events that will actually produce a malignant crypt during a human lifespan. To answer this question, we developed a new Monte Carlo simulation, related to the previous simulation we used to measure cancer incidence. In the new simulation, instead of simply moving cells between different sites in the “space” of possible genotypes, all possible orders of the five mutational steps were tracked (SI Appendix). Cells could arrive at one of the four possible cancerous genotypes (corresponding to double mutation, or mutation and LOH of the two tumor suppressors) by 270 different possible paths (Fig. 4 A, Left). The probability for each of these paths to be taken by a crypt that became malignant by age 80 y was then measured, and that path was classified into one of the six possible sequences of the three driver events (Fig. 4 A, Right).
Fig. 4.
Order of acquisition of driver mutations in the mathematical model of CRC initiation. (A) Three example five-step paths (out of 270 possible paths) that a crypt can take in order to become malignant. We classify each path that leads to malignancy within 80 y in our computer simulation into one of the six possible orderings of the three driver mutations. (B) Likelihood of driver mutation order on the path to CRC, obtained from computer simulation of our mathematical model. Note that APC loss as the first event is followed by KRAS gain in the large majority of cases, and TP53 loss in the small minority of situations. KRAS gain as the first event is virtually always followed by APC loss, with TP53 loss as the last event.
The result of this mutational order analysis is shown in Fig. 4B. The majority of the first driver events in a path that ended in malignancy are in fact the inactivation of APC, accounting for 67.9% of first events. All of the remainder (32.1%) were activation of KRAS, as TP53 was never observed to be lost first. The majority of second driver events were gain of KRAS (63.8%), all of which followed from APC being lost first. The second most common second driver event (32.1%) was the inactivation of APC: In a small minority of cases (4.1%), TP53 was lost second. The last driver event in an outstanding majority of cases (95.9%) was the loss of TP53.
It is remarkable that the loss of APC was so often the first mutational event to have occurred on the path to malignancy, given that in a typical colorectal crypt KRAS gain is the first driver event. Our explanation for this is that the loss of APC carries the largest fitness advantage of any event in our model, so cells arriving at a cancerous genotype will tend to come from lineages in which APC is lost early. The loss of TP53 was overwhelmingly the last event to occur out of the three. We attribute this to the lack of a fitness advantage ascribed to TP53 loss, and the consequent small number of crypts which carry TP53−/− as one of the first driver events (Fig. 3 A and B).
The order of mutations in our mathematical model is strikingly similar to the well-known order of mutations on the path to CRC that was first described by Fearon and Vogelstein (37). In line with our results, KRAS mutations were found to be present with high frequency in the earliest preneoplastic lesions in the colon and absent in morphologically normal crypt areas (38). However, while KRAS mutations are very common in small nondysplastic lesions, such lesions have limited potential to progress to larger tumors and ultimately cancer (39).
In contrast, lesions with APC inactivation as the first event have a much stronger capacity to progress (39). APC was found in the earliest neoplastic lesions that can be examined (39) and is believed to precede both KRAS and TP53 in the adenoma–carcinoma sequence of CRC (37). Progression of APC-mutated adenomas usually occurs through subsequent acquisition of KRAS and other genes (39, 40). KRAS mutations were found in ∼10% of colorectal adenomas smaller than 1 cm and in ∼40% of adenomas larger than 1 cm (41), thus most likely to appear after APC was inactivated and started an adenoma, but before malignant transformation. As in our model, evidence points to inactivation of TP53 being a relatively late event in colorectal tumor progression (12, 37, 42), associated with the transition from benign adenoma to malignant carcinoma (43).
In SI Appendix, we derive an approximate analytic result for the relative probability of each individual path to malignancy (Fig. 4 A, Left). Using the formula, we find that there are four most likely paths to malignancy, each accounting for ∼10% of the conditional probability to reach a malignant state by age 80 y. The relative likelihood of taking one of these four paths, Xi, to a malignant state x by time t is given by
Here b1 is the division rate of APC−/− crypts, b2 is the division rate of KRAS+ crypts, and b12 is the division rate of double-mutant APC−/−KRAS+ crypts. The four most likely paths to malignancy all have two copies of APC inactivated first, followed by activation of KRAS, followed by inactivation of two copies of TP53. The four comes from the possible ordering of inactivation of two copies of TSGs (mutation followed by LOH, or vice versa). We note that these four most likely paths are exactly the paths along which the driver mutations are accumulated in exact order of highest to lowest growth advantage.
A striking finding from our analytic results on path probabilities (see SI Appendix for details) is that the relative probabilities of each path to malignancy are independent of transition rates between genotypes (i.e., driver mutation rates and rate of LOH) and only depend on the division rates of intermediate genotypes visited by a particular path. We have also performed lineage-tracking computer simulations to explore whether relative probabilities of paths to malignancy are independent of transition (mutation) rates in the process and find that relative path probabilities remain approximately the same across different values of transition rates (SI Appendix, Fig. S3).
Discussion
In this paper we develop and analyze a stochastic model of CRC initiation through a biologically realistic complex network of possible mutational genotypes on the path to malignancy. In this model, we calculate the probability that a colorectal crypt has collected three driver events of interest (APC, KRAS, and TP53) and became malignant as a function of age. We find that the reported lifetime risk of CRC can be recovered using a mathematical model of CRC initiation together with experimentally measured mutation rates in colorectal tissues and proliferation rates of premalignant lesions. This result has several important implications. First, it sheds light on the important mutation–selection debate in cancer. In our model, selection acts through increasing the division rate of crypts with driver mutations above the base rate for crypts with no such drivers, enabling a slow but (on average) exponential increase in the number of crypts with drivers. Our results demonstrate that neither increased mutation rates nor large proliferation rates of premalignant lesions are necessary to explain observed cancer incidence rates. Our findings are in agreement with recent results from Wojdyla et al. (44), who also find that neither increased mutation rates nor large growth advantages are needed to explain incidence of single-driver malignancy secondary myelodysplastic syndrome. Another intriguing implication of our findings is that the immune system may not be exerting significant pressure on the premalignant and malignant lesions during tumor evolution, as the reported lifetime risk of colorectal malignancy (which includes those cancers that escaped immune surveillance and were detected in the clinic) is not lower than the probability of developing CRC predicted by the model. Our model includes all premalignant lesions and cancers with the three drivers in question that were expected to appear throughout the human lifespan. Had there been a significant immune suppression of these tumors, we would expect the prevalence of detected cancers to be significantly lower than the prevalence predicted by our model. This finding is in agreement with recent reports that there may not be significant immune selection pressure during natural tumor evolution (45).
Our model provides a precise quantitative understanding of the process of premalignant and malignant transformation in the human colon and can be used as a starting point in evaluating early detection and treatment efforts. In particular, our results imply that, even though individual KRAS+ lesions are less likely to progress to malignancy compared to APC−/− lesions, KRAS mutation may be the initiating driver event in as many as a third of CRCs containing this alteration; thus, efforts to target KRAS (46) may prove particularly important in a significant fraction of patients diagnosed with CRC harboring a KRAS mutation.
While we compare our model predictions with the lifetime risk of CRC from the SEER database (35), we are not able to compare our model predictions to age-specific CRC incidence, as the SEER database does not include information on age-specific incidence of CRC with our three drivers of interest (APC, TP53, and KRAS). A secondary issue is that other factors such as hereditary predispositions and familial syndromes are expected to have a significant effect on the shape of the incidence curve. For example, it is thought that up to one-third of CRCs exhibit increased inherited familial risk, due to several well-defined familial syndromes as well as a number of less penetrant, but possibly more frequent, susceptibility genes (47). These inherited conditions include those with germline inactivation of one copy of the APC gene (which results in one less required step on the way to CRC), germline mutations in mismatch repair genes (which lead to an increased mutation rate), or mutations in many other (possibly unknown) genes with lesser known modes of action (47).
Many previous studies of (colorectal) cancer initiation employ models in which mutations accumulate in a predetermined order, typically with one to three initial neutral steps needed for appearance of a benign lesion (adenoma), followed by a clonal expansion of the benign lesion, and a single mutation leading from adenoma to carcinoma (10, 11, 15, 17). The advantages of such approaches are that they can provide simple unified models that can serve as a description of the process of cancer initiation in the entire at risk population and can be fit to cancer incidence curves to extract parameters of the model. The drawbacks of such models are that they do not typically ascribe specific molecular events to the steps in the model, do not allow for mutations to occur in any order, and do not allow for multiple clonal expansions during the premalignant phase. In contrast, our model allows for a complex network of intermediate premalignant genotypes, all possible orders in which mutations can accumulate, and multiple clonal expansions prior to malignancy; it also ascribes specific molecular events to the steps on the path to malignancy and can thus be parameterized with experimentally measured mutation and expansion rates. However, our detailed model can only describe 15% of all CRC cases (i.e., those that occur through acquisition of APC, TP53, and KRAS driver mutations) and cannot be fit to full cancer incidence curves using currently available epidemiologic data.
Several previous models of CRC initiation study dynamics of individual cells in colorectal crypts (48). We circumvent the need to analyze the accumulation of mutations in the stem cell populations of colonic crypts by using the recently measured rate of accumulation of mutations in normal tissues (18). New mutations are either lost or fixate in the crypt (28), which allows us to focus on crypts as the level of selection. Research suggest clonal stabilization time (i.e., time for a newly introduced mutation to be lost or fixate in the crypt) in mice crypts is ∼1 mo and in human colon ∼1 y (28, 49). As the reported times apply to neutral mutations, it is expected that driver mutations that we are studying fixate even faster, on the order of months if not less, and thus the measured rate of accumulation of mutations in colonic stem cells can also be applied at the level of individual crypts.
Healthy cells are chromosomally stable, but most CRCs are chromosomally unstable. Thus, at some point during tumor evolution cells acquire chromosomal instability (CIN). Several early studies suggested that CIN is an early event in colorectal tumorigenesis, even preceding APC inactivation (50, 51). However, more recent studies suggest that extensive aneuploidy such as seen in the presence of CIN occurs after inactivation of TP53 (52, 53) or possibly inactivation of both APC and TP53 (54). Since TP53 inactivation usually follows APC inactivation and KRAS activation (37, 42), it is likely that acquisition of our three driver events of interest precedes significant CIN.
We use the measured accumulation rate of mutations in normal tissues, which includes contribution from the mutations that arise from unavoidable errors associated with DNA replication, as well as any environmental factors (55). Once the relative contributions to the overall mutation rate from these two sources are known, our model can be used to quantify the relative proportions of sporadic CRCs that arise predominantly due to replicative errors (“bad luck”) (26) or environmental factors (56).
We find that only a small fraction of human colorectal crypts will carry one of the driver mutations we study (fully inactivated APC or TP53, or activated KRAS). Our results do not contradict the recent report of probable driver mutations in around 1% of normal colorectal crypts in middle-aged individuals (57). Lee-Six et al. (57) examined ∼1,000 colorectal crypts from 42 individuals and examined the crypts for mutations in 90 known CRC genes. None of the crypts carried a driver mutation in KRAS, and no crypts carried two inactivating alterations of APC or TP53, implying that the frequency of such alterations is <0.1%.
Here we assume that point mutation rate does not increase during early stages of premalignant colorectal neoplasia. As more detailed experimental measures become available, our framework could in the future be extended to account for the potential increase in mutation rates of premalignant mutational genotypes on the road to malignancy. Of note, as the predicted probability of CRC from our model is of the same order of magnitude as the reported lifetime risk of CRC, we do not expect mutation rates to be significantly increased prior to malignant transformation (58).
Materials and Methods
Inactivation Rates for TSGs.
A single copy of a wild-type TSG can be inactivated through either mutation or loss of the allele via an LOH event. Inactivating mutation rate per two wild-type copies of a TSG is rTSG, where TSG can be either APC or TP53. Rate of LOH per two wild-type copies of a TSG is rLOH. Thus, if the crypt is TSG wild type (TSG+/+), it can inactivate the first copy of the tumor suppressor through either mutation or LOH with a joint rate of rTSG + rLOH. If one copy of a TSG is already inactivated (TSG+/−) through mutation, the second copy can be lost by either mutation or LOH, with the rate (rTSG + rLOH)/2. If, on the other hand, the first copy was lost through an LOH event, the second copy can only be lost through mutation, with rate rTSG/2. In other words, we are implicitly assuming that the loss of both copies through LOH is fatal for the cells.
Gene-Specific Driver Mutation Rates.
We estimate mutation rates of individual driver genes D, rD, by multiplying the base pair mutation rate u with the number of positions in the gene that will lead to (in)activation if mutated, nD. In other words, rD = nD*u. We previously estimated the number of driver positions in the TSG APC (23). A copy of an APC gene can be inactivated by a genetic alteration that results in a stop codon. There are 8,529 protein-coding bases encoding 2,843 codons in the APC gene. Of these 2,843 codons, there are 1,045 codons in which a base substitution resulting in a stop codon could occur. Only one base per each of the 1,045 codons could mutate to produce a stop codon (e.g., GAG could only mutate to TAG to produce a stop codon). Similarly, typically only one of the three possible substitutions at that base could result in a stop codon. Therefore, there are 1,045/3 = 348 bases “available” for creating stop codons, that is, 348 driver positions in the APC gene. Stop codons can also result from small insertions or deletions, and their abundance compared to single base substitutions has been estimated to be 0.74:1 (23). The total number of driver positions in APC is therefore nAPC = 604 (174% of 348 nonsense driver positions).
There are 1,179 protein-coding bases in the TP53 gene, encoding 393 codons. Out of the 393 codons, 110 codons can be mutated via a single base substitution to produce a stop codon. In these 110 codons, only one out of three bases can be mutated to produce a stop codon. At 95 of these 110 bases, only one of the three possible substitutions can produce a stop codon. At 15 out of the 110 bases, two out of the three possible substitutions can produce a stop codon. Thus the total number of driver positions in the TP53 gene is 1.74*(110 + 15)/3 = 73. Finally, there are approximately nKRAS ∼20 positions in the KRAS gene that lead to its activation if mutated (59).
Rate of LOH.
The rate of LOH in the absence of CIN can be estimated from mutational data from CRCs with microsatellite instability (MIN), since MIN and CIN are mutually exclusive (60, 61). Huang et al. (62) report that the ratio of MIN cancers with two inactivating mutations in APC to those with one inactivating mutation and one LOH event is 1:7. Let us denote by k the ratio between the rate of LOH and the rate of inactivating mutation in MIN cancers. The fraction of cases where the first event is LOH is k/(k + 1), and the fraction of cases where first event is mutation is 1/(k + 1). The fraction of cases where first event is a mutation and second event is LOH is 1/(k + 1) k/(k + 1), and the fraction of cases where both events are mutations is 1/(k + 1) 1/(k + 1). It follows that the ratio of cases where one event is LOH to those where both events are mutations is k2 + 2k. From k2 + 2k = 7 we obtain that k ∼1.8. Thus, the rate of LOH at the APC locus in MIN cancers is 1.8 times higher compared to the rate of inactivating mutation. CRCs with MIN have ∼10 times increased point mutation rate compared to non-MIN cancers (63, 64), so we conclude that the rate of LOH is ∼18 times higher compared to the normal rate of inactivating mutation in APC: rLOH = 18* 604*1.25*10−8 = 1.36*10−4 per year. We assume that rates of LOH at APC and TP53 are approximately the same.
Supplementary Material
Acknowledgments
This work was supported in part by the University of Washington Transitional Support Program.
Footnotes
The authors declare no competing interest.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2003771117/-/DCSupplemental.
Data Availability.
Simulation code is available at https://github.com/chaypaterson/CRC_Initiation.
References
- 1.Vogelstein B. et al., Cancer genome landscapes. Science 339, 1546–1558 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 2.Tomasetti C., Marchionni L., Nowak M. A., Parmigiani G., Vogelstein B., Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc. Natl. Acad. Sci. U.S.A. 112, 118–123 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 3.Reiter J. G., Iacobuzio-Donahue C. A., Pancreatic cancer: Pancreatic carcinogenesis - several small steps or one giant leap? Nat. Rev. Gastroenterol. Hepatol. 14, 7–8 (2016). [DOI] [PubMed] [Google Scholar]
 - 4.Fearon E. R., Molecular genetics of colorectal cancer. Annu. Rev. Pathol. 6, 479–507 (2011). [DOI] [PubMed] [Google Scholar]
 - 5.Vogelstein B., Kinzler K. W., Cancer genes and the pathways they control. Nat. Med. 10, 789–799 (2004). [DOI] [PubMed] [Google Scholar]
 - 6.Armitage P., Doll R., The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer 8, 1–12 (1954). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 7.Knudson A. G., Jr., Mutation and cancer: Statistical study of retinoblastoma. Proc. Natl. Acad. Sci. U.S.A. 68, 820–823 (1971). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 8.Moolgavkar S. H., Venzon D. J., Two-event model for carcinogenesis: Incidence curves for childhood and adult tumors. Math. Biosci. 47, 55–77 (1979). [Google Scholar]
 - 9.Moolgavkar S. H., Knudson A. G. Jr., Mutation and cancer: A model for human carcinogenesis. J. Natl. Cancer Inst. 66, 1037–1052 (1981). [DOI] [PubMed] [Google Scholar]
 - 10.Luebeck E. G., Moolgavkar S. H., Multistage carcinogenesis and the incidence of colorectal cancer. Proc. Natl. Acad. Sci. U.S.A. 99, 15095–15100 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 11.Meza R., Jeon J., Moolgavkar S. H., Luebeck E. G., Age-specific incidence of cancer: Phases, transitions, and biological implications. Proc. Natl. Acad. Sci. U.S.A. 105, 16284–16289 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 12.Attolini C. S.-O. et al., A mathematical framework to determine the temporal sequence of somatic genetic events in cancer. Proc. Natl. Acad. Sci. U.S.A. 107, 17604–17609 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 13.Landau D. A. et al., Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 14.Gerstung M., Eriksson N., Lin J., Vogelstein B., Beerenwinkel N., The temporal order of genetic and pathway alterations in tumorigenesis. PLoS One 6, e27136 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 15.Luebeck E. G., Curtius K., Jeon J., Hazelton W. D., Impact of tumor progression on cancer incidence curves. Cancer Res. 73, 1086–1096 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 16.Axelrod D., Kimmel M., Branching Processes in Biology, (Springer-Verlag, 2015). [Google Scholar]
 - 17.Lang B. M., Kuipers J., Misselwitz B., Beerenwinkel N., Predicting colorectal cancer risk from adenoma detection via a two-type branching process model. PLOS Comput. Biol. 16, e1007552 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 18.Blokzijl F. et al., Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 19.Baker A. M. et al., Quantification of crypt and stem cell evolution in the normal and neoplastic human colon. Cell Rep. 8, 940–947 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 20.Nicholson A. M. et al., Fixation and spread of somatic mutations in adult human colonic epithelium. Cell Stem Cell 22, 909–918.e8 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 21.Langlands A. J. et al., Paneth cell-rich regions separated by a cluster of Lgr5+ cells initiate crypt fission in the intestinal stem cell niche. PLoS Biol. 14, e1002491 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 22.Bailey N. T. J., Elements of Stochastic Processes with Applications to the Natural Sciences, (Wiley Classics Library, 1990). [Google Scholar]
 - 23.Bozic I. et al., Accumulation of driver and passenger mutations during tumor progression. Proc. Natl. Acad. Sci. U.S.A. 107, 18545–18550 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 24.COSMIC , Catalogue of Somatic Mutations in Cancer, www.sanger.ac.uk/genetics/CGP/cosmic/. Accessed 21 November 2019.
 - 25.Kozar S. et al., Continuous clonal labeling reveals small numbers of functional stem cells in intestinal crypts and adenomas. Cell Stem Cell 13, 626–633 (2013). [DOI] [PubMed] [Google Scholar]
 - 26.Tomasetti C., Vogelstein B., Cancer etiology. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science 347, 78–81 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 27.Tomasetti C., Bozic I., The (not so) immortal strand hypothesis. Stem Cell Res. (Amst.) 14, 238–241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 28.Campbell F. et al., Post-irradiation somatic mutation and clonal stabilisation time in the human colon. Gut 39, 569–573 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 29.Potten C. S., Booth C., Hargreaves D., The small intestine as a model for evaluating adult tissue stem cell drug targets. Cell Prolif. 36, 115–129 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 30.Lamlum H. et al., APC mutations are sufficient for the growth of early colorectal adenomas. Proc. Natl. Acad. Sci. U.S.A. 97, 2225–2228 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 31.Snippert H. J., Schepers A. G., van Es J. H., Simons B. D., Clevers H., Biased competition between Lgr5 intestinal stem cells driven by oncogenic mutation induces clonal expansion. EMBO Rep. 15, 62–69 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 32.Vermeulen L. et al., Defining stem cell dynamics in models of intestinal tumor initiation. Science 342, 995–998 (2013). [DOI] [PubMed] [Google Scholar]
 - 33.van de Haar J. et al., Identifying epistasis in cancer genomes: A delicate affair. Cell 177, 1375–1383 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 34.Lawrence M. S. et al., Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 35.National Cancer Institute Surveilance, Epidemiology, and End Results Program, Cancer Stat Facts: Colorectal Cancer. https://seer.cancer.gov/statfacts/html/colorect.html. Accessed 16 December 2019.
 - 36.Bozic I., Nowak M. A., Cancer. Unwanted evolution. Science 342, 938–939 (2013). [DOI] [PubMed] [Google Scholar]
 - 37.Fearon E. R., Vogelstein B., A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990). [DOI] [PubMed] [Google Scholar]
 - 38.Pretlow T. P., Brasitus T. A., Fulton N. C., Cheyer C., Kaplan E. L., K-ras mutations in putative preneoplastic lesions in human colon. J. Natl. Cancer Inst. 85, 2004–2007 (1993). [DOI] [PubMed] [Google Scholar]
 - 39.Jen J. et al., Molecular determinants of dysplasia in colorectal lesions. Cancer Res. 54, 5523–5526 (1994). [PubMed] [Google Scholar]
 - 40.Shibata D., Schaeffer J., Li Z.-H., Capella G., Perucho M., Genetic heterogeneity of the c-K-ras locus in colorectal adenomas but not in adenocarcinomas. J. Natl. Cancer Inst. 85, 1058–1063 (1993). [DOI] [PubMed] [Google Scholar]
 - 41.Vogelstein B. et al., Genetic alterations during colorectal-tumor development. N. Engl. J. Med. 319, 525–532 (1988). [DOI] [PubMed] [Google Scholar]
 - 42.Gerstung M. et al.; PCAWG Evolution & Heterogeneity Working Group; PCAWG Consortium , The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 43.Cho K. R., Vogelstein B., Genetic alterations in the adenoma–Carcinoma sequence. Cancer 70 (suppl. 6), 1727–1731 (1992). [DOI] [PubMed] [Google Scholar]
 - 44.Wojdyla T. et al., Mutation, drift and selection in single-driver hematologic malignancy: Example of secondary myelodysplastic syndrome following treatment of inherited neutropenia. PLOS Comput. Biol. 15, e1006664 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 45.Van den Eynden J., Jiménez-Sánchez A., Miller M. L., Larsson E., Lack of detectable neoantigen depletion signals in the untreated cancer genome. Nat. Genet. 51, 1741–1748 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 46.Janes M. R. et al., Targeting KRAS mutant cancers with a covalent G12C-specific inhibitor. Cell 172, 578–589.e17 (2018). [DOI] [PubMed] [Google Scholar]
 - 47.Jasperson K. W., Tuohy T. M., Neklason D. W., Burt R. W., Hereditary and familial colon cancer. Gastroenterology 138, 2044–2058 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 48.van Leeuwen I. M. M., Byrne H. M., Jensen O. E., King J. R., Crypt dynamics and colorectal cancer: Advances in mathematical modelling. Cell Prolif. 39, 157–181 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 49.Yatabe Y., Tavaré S., Shibata D., Investigating stem cells in human colon by using methylation patterns. Proc. Natl. Acad. Sci. U.S.A. 98, 10839–10844 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 50.Shih I. M. et al., Evidence that genetic instability occurs at an early stage of colorectal tumorigenesis. Cancer Res. 61, 818–822 (2001). [PubMed] [Google Scholar]
 - 51.Nowak M. A. et al., The role of chromosomal instability in tumor initiation. Proc. Natl. Acad. Sci. U.S.A. 99, 16226–16231 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 52.Soto M. et al., p53 prohibits propagation of chromosome segregation errors that produce structural aneuploidies. Cell Rep. 19, 2423–2431 (2017). [DOI] [PubMed] [Google Scholar]
 - 53.Bolhaqueiro A. C. F. et al., Ongoing chromosomal instability and karyotype evolution in human colorectal cancer organoids. Nat. Genet. 51, 824–834 (2019). [DOI] [PubMed] [Google Scholar]
 - 54.Drost J. et al., Sequential cancer mutations in cultured human intestinal stem cells. Nature 521, 43–47 (2015). [DOI] [PubMed] [Google Scholar]
 - 55.Tomasetti C., Li L., Vogelstein B., Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science 355, 1330–1334 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 56.Wu S., Powers S., Zhu W., Hannun Y. A., Substantial contribution of extrinsic risk factors to cancer development. Nature 529, 43–47 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 57.Lee-Six H. et al., The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019). [DOI] [PubMed] [Google Scholar]
 - 58.Roerink S. F. et al., Intra-tumour diversification in colorectal cancer at the single-cell level. Nature 556, 457–462 (2018). [DOI] [PubMed] [Google Scholar]
 - 59.Serebriiskii I. G. et al., Comprehensive characterization of RAS mutations in colon and rectal cancers in old and young patients. Nat. Commun. 10, 3722 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 60.Rajagopalan H., Nowak M. A., Vogelstein B., Lengauer C., The significance of unstable chromosomes in colorectal cancer. Nat. Rev. Cancer 3, 695–701 (2003). [DOI] [PubMed] [Google Scholar]
 - 61.de la Chapelle A., Hampel H., Clinical relevance of microsatellite instability in colorectal cancer. J. Clin. Oncol. 28, 3380–3387 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 62.Huang J. et al., APC mutations in colorectal tumors with mismatch repair deficiency. Proc. Natl. Acad. Sci. U.S.A. 93, 9049–9054 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 63.Cancer Genome Atlas Network , Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 64.Timmermann B. et al., Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS One 5, e15661 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
 
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Simulation code is available at https://github.com/chaypaterson/CRC_Initiation.




