Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 May 18;17(5):e1008970. doi: 10.1371/journal.pcbi.1008970

Mathematical modeling of multiple pathways in colorectal carcinogenesis using dynamical systems with Kronecker structure

Saskia Haupt 1,2,*, Alexander Zeilmann 3, Aysel Ahadova 4, Hendrik Bläker 5, Magnus von Knebel Doeberitz 4, Matthias Kloor 4, Vincent Heuveline 1,2
Editor: Jing Chen6
PMCID: PMC8162698  PMID: 34003820

Abstract

Like many other types of cancer, colorectal cancer (CRC) develops through multiple pathways of carcinogenesis. This is also true for colorectal carcinogenesis in Lynch syndrome (LS), the most common inherited CRC syndrome. However, a comprehensive understanding of the distribution of these pathways of carcinogenesis, which allows for tailored clinical treatment and even prevention, is still lacking. We suggest a linear dynamical system modeling the evolution of different pathways of colorectal carcinogenesis based on the involved driver mutations. The model consists of different components accounting for independent and dependent mutational processes. We define the driver gene mutation graphs and combine them using the Cartesian graph product. This leads to matrix components built by the Kronecker sum and product of the adjacency matrices of the gene mutation graphs enabling a thorough mathematical analysis and medical interpretation. Using the Kronecker structure, we developed a mathematical model which we applied exemplarily to the three pathways of colorectal carcinogenesis in LS. Beside a pathogenic germline variant in one of the DNA mismatch repair (MMR) genes, driver mutations in APC, CTNNB1, KRAS and TP53 are considered. We exemplarily incorporate mutational dependencies, such as increased point mutation rates after MMR deficiency, and based on recent experimental data, biallelic somatic CTNNB1 mutations as common drivers of LS-associated CRCs. With the model and parameter choice, we obtained simulation results that are in concordance with clinical observations. These include the evolution of MMR-deficient crypts as early precursors in LS carcinogenesis and the influence of variants in MMR genes thereon. The proportions of MMR-deficient and MMR-proficient APC-inactivated crypts as first measure for the distribution among the pathways in LS-associated colorectal carcinogenesis are compatible with clinical observations. The approach provides a modular framework for modeling multiple pathways of carcinogenesis yielding promising results in concordance with clinical observations in LS CRCs.

Author summary

Cancer is a disease caused by alterations of the genome. The alterations can affect each component of the genome, whereas only some lead to a change in the functioning of the cell. As there are several of those so-called driver mutations, there are different possibilities in which order they can occur. It is currently assumed that the order of driver mutations is linked to the course of cancer and thus to clinical treatment and even prevention. However, cells with a driver mutation, which carry a risk to grow out to a tumor, are clinically invisible for a long time. This means the early carcinogenesis is a hidden process. Mathematical models allow testing related medical hypotheses to obtain a better understanding of the underlying biological processes. We proposed a mathematical model for different molecular pathways of carcinogenesis based on a linear dynamical system. Thereby, we used the Kronecker structure, a specific structure which allows for a thorough mathematical analysis and medical interpretation. The model consists of multiple components to account for independent and dependent mutational processes. For the presented work, we focused on cancer development in the colon. However, modifications of the model could be applied to other organs.

1 Introduction

Cancer is the second leading cause of death worldwide accounting for an estimated 9.6 million deaths in 2018, whereby the second most common type is colorectal cancer (CRC) [1]. Still, adequate treatment and in particular prevention strategies are lacking in many cases, as it is difficult to investigate the process of cancer development, called carcinogenesis, right from the beginning.

In this work, we present a mathematical model of colorectal carcinogenesis. It takes into account the multiple pathway nature of carcinogenesis (Fig 1A) reflecting different types of CRC based on molecular parameters with individual needs for prevention and treatment [2].

Fig 1. From the medical hypothesis over the modeling approach to the mathematical structure.

Fig 1

The medical hypothesis of multiple pathways of carcinogenesis is widely known for various types of cancer. (A) We present a model for this phenomenon at the example of Lynch syndrome, the most common inherited CRC syndrome, with specific key driver events in the MMR genes, CTNNB1, APC, KRAS and TP53. (B) This current medical understanding of carcinogenesis is translated into a mathematical model using a specific dynamical system, which can be represented by a graph structure, where each vertex in the graph represents a genotypic state and the edges correspond to the transition probabilities between those states. Starting with all colonic crypts in the state of all genes being wild-type and a single MMR germline variant due to Lynch syndrome, we are interested in the distribution of the crypts among the graph at different ages of the patient in order to obtain estimates for the number of crypts in specific states, e.g., adenomatous or cancerous states. (C) The underlying matrix of the dynamical system makes use of the Kronecker sum and product. It is a sparse upper triangular matrix accounting for the assumption that mutations cannot be reverted. This allows fast numerical solving by using the matrix exponential. Each nonzero entry of the matrix represents a connection between genotypic states in the graph.

The mathematical model makes use of a dynamical system with a specific matrix structure using Kronecker products and sums (Fig 1C) in order to systematically describe the mutational events of individual genes (Fig 1B). These mutational events can be independent of or depending on other mutations, accounting for different types of mutations and for currently available data.

To exemplify this approach, we build the model for Lynch syndrome, the most common inherited CRC syndrome with an estimated population frequency of 1 in 180 [3]. Lynch syndrome is associated with an inherited mismatch repair (MMR) gene variant [4]. CRCs which develop in the context of Lynch syndrome mostly are MMR-deficient and enhance microsatellite instability (MSI) [5].

In addition to Lynch syndrome colorectal carcinogenesis, we modify the ansatz to model the sporadic counterpart of Lynch syndrome, often called Lynch-like cancers [6], as well as the classical adenoma-carcinoma sequence first described by Vogelstein and Kinzler [7] for microsatellite-stable (MSS) CRCs. Further, we apply the model to another hereditary CRC syndrome, familial adenomatous polyposis (FAP) [8].

1.1 Organization

To make this paper self-contained, we elucidate the medical background in Section 1.2. Section 2 presents related work and our contribution in this context. The mathematical model is presented in Section 3.1 which is based on different components: The first model component implements independent mutational processes and the other components model known mutational dependencies. Section 3.2 represents modifications for non-Lynch scenarios or cancer in other organs than the colon. Section 4 demonstrates a selection of the results which can be obtained with the model and its modifications. Finally, we conclude in Section 5 discussing the assumptions of the model and their implications. For a mathematical background, we refer to S1 Appendix.

1.2 Medical background

Cancer is a disease caused by alterations of the genome, the carrier of genetic information [9, 10]. Precisely defining these changes, which are required to transform a normal cell of the human body into a malignant cancer cell, is a crucial step towards understanding the development of cancer.

Multiple pathways of carcinogenesis. In the early stages of cancer research, it was unknown whether the development of cancer, a process called carcinogenesis, was a purely chaotic process of random mutations. However, in 1959, Nowell and Hungerford [11] made the observation of a specific recurrent alteration across different cancers of the same type. This suggested the existence of at least a certain degree of order in the chaos.

In the following decade, evidence emerged that one single mutation is normally insufficient to drive a cell into malignancy because cells possess multiple control mechanisms which protect the organism from the uncontrolled growth of single cells. Thus, Vogelstein, Fearon and Kinzler [7, 12] established a step-wise hypothesis of cancer formation in the colon postulating that several mutations are required for the development of cancer cells. This Adenoma-Carcinoma Hypothesis describes the formation of certain precancerous lesions and their progression into a manifest cancer. The model implies that adenomas are the precursor lesions of most colorectal cancers and it describes typical molecular events associated with progression to cancer. The step-wise hypothesis has been validated subsequently in many independent studies for many different cancer types. Currently, it is expected that a minimum number of three mutation events is required to transform a normal cell into a cancer cell. This hypothesis is called the three strikes hypothesis [13]. Accordingly, cancer for the present modeling approach is defined as a state, in which alterations of at least three key signaling pathways or respective genes are present in one crypt (see also Section 4).

Mutations occur over the whole genome, whereby we differentiate between two broad classes: So-called point mutations only affect a single nucleotide, while loss of heterozygosity (LOH) refers to the loss of some region in one copy of the diploid genome, which can result in the deletion of whole genes.

If mutations strike in regions with a protein-encoding function, two main scenarios that can favor uncontrolled cell growth are seen: Somatic mutations can either directly activate oncogenes (typically referred to in the literature as gain-of-function mutations), which physiologically promote appropriate cell growth and proliferation, through conformational changes or impairing self-inactivation, or mutations can damage or destroy tumor suppressor genes (typically referred to in the literature as loss-of-function mutations), which physiologically limit cell growth and proliferation.

These coding mutations have to be identified from all the possible mutations that can occur, as they might have a functional impact on the cell. This includes the identification of oncogenes and tumor suppressor genes, but there are many more mutations to be identified. Moreover, only a certain combination of these mutations will lead to cancer in the end. This might be due to the fact that some mutations have a growth-repressing effect and lead to cell death. Further, there is the possibility of controlling cancer by non-cell autonomous mechanisms, like immune surveillance, which is especially important for the presented example of Lynch syndrome [14]. Apart from that, current data raise the possibility that the immune system may not only remove precursor lesions but also infiltrating cancers, as described for Lynch syndrome-associated cancers [15].

Different combinations of key mutations result in several distinct pathways to be distinguished by the involved genes and the ordering thereof. An important goal in cancer research is to investigate which of these pathways can arise in human carcinogenesis. Here, Lynch syndrome colorectal carcinogenesis is a prime example with three currently hypothesized main pathways of carcinogenesis [16] (Fig 1A) which will be explained in more detail in the next paragraph.

Lynch syndrome-associated colorectal carcinogenesis. Individuals with Lynch syndrome are predisposed to developing certain malignancies with a substantially higher lifetime risk compared to the general population. The most common Lynch syndrome manifestations are CRC (50% [17] compared to 6% in the normal population) and endometrial cancer (40–60% compared to 2.6% in women without Lynch syndrome) [4, 18]. Further, individuals have an increased lifetime risk for many other types of cancer such as in the stomach, small bowel, brain, skin, pancreas, biliary tract, ovary (only for women) and upper urinary tract [19].

Lynch syndrome carriers have an inherited pathogenic variant in one allele of the affected MMR genes MLH1, MSH2, MSH6 or PMS2 [20] passed down in the family from parent to child. Upon the second somatic hit inactivating the remaining allele, MMR deficiency manifests in the affected cell [21]. DNA replication errors, especially those which occur at repetitive sequences (microsatellites consisting of a consecutive series of identical basepairs) cannot be corrected by the mismatch repair system. MMR deficiency leads to microsatellite instability.

MMR deficiency can be an initiating or a secondary event in Lynch syndrome carcinogenesis. This is reflected by the hypothesis of three pathways responsible for colorectal carcinogenesis in Lynch syndrome [22] (see Fig 1): One pathway of carcinogenesis starts with adenoma formation, then MMR deficiency and cancer outgrowth; the second is initiated by MMR deficiency, then adenoma formation and cancer outgrowth; and the third shows MMR deficiency as initiating event and invasive cancer growth.

The relative proportion of one or the other pathway of carcinogenesis and the contribution of certain molecular events is thereby an open question with clinical implications: Ahadova et al. [16] showed that the molecular pathways of carcinogenesis are linked to different mutational processes, e.g., CTNNB1-mutant colorectal carcinomas are associated with immediate invasive growth, following the third presented pathway. Recent independent studies (analyzed in [23]) demonstrated that a substantial proportion of Lynch syndrome individuals develops CRC despite regular colonoscopy and that there is no difference in CRC incidence or stage at detection by colonoscopy with respect to different Lynch syndrome surveillance intervals [24]. This emphasizes the need for improved cancer prevention depending on the molecular footprints of carcinogenesis for Lynch syndrome individuals. Further, there are MMR gene-dependent differences regarding the risk of colorectal adenomas and carcinomas, and regarding somatic mutations in patients with Lynch syndrome [25] which supports the need of adjusting surveillance guidelines based on MMR gene variants.

As a special case of CRC, Lynch syndrome-associated colorectal cancer is widely believed to originate in colonic crypts [26]. Those are found in the epithelia of the colon and consist of different cell types [27], among others, stem cells located at the crypt base. They are important for tissue renewal due to their unlimited proliferative potential, however also prone to mutations. If a cell in a crypt becomes mutated, this mutation has to spread within the crypt such that the whole crypt is mutated and can be measured with current techniques, a process called fixation or monoclonal conversion [28]. Modeling this process and analyzing the role of colonic stem cells located at the crypt base is important to understand the intra-crypt dynamics. We are currently working on these aspects with first results in [29]. However, for the present model, we focus on the evolution of genetic states within crypts as a whole and compare the modeling results with currently available biological and epidemiological data.

2 Related work

First attempts to build mathematical models in cancer research were made in the middle of the 20th century. Armitage and Doll [30, 31] proposed and analyzed one of the first multistage models of carcinogenesis, which are based on the hypothesis that there are multiple subsequent steps before a cancer is formed. The model was extended in the following years [32, 33]. Among the first to consider a model of multiple pathways of carcinogenesis were Tan et al. [34, 35]. These are based on the hypothesis that there are several possible ways in which cancer can develop.

With the increasing medical knowledge about cancer development, it became more and more evident that a single model describing the whole process of carcinogenesis from the genomic, over the cell, up to the tissue, organ and organism-level is too complex to build. Nowadays, there exist different types of models describing individual aspects of carcinogenesis (in an unordered list of example publications):

  • ⊳ Modeling healthy tissue formation, such as the evolution of colonic crypts [3638],

  • ⊳ detecting driver genes [3942],

  • ⊳ estimating the most likely temporal order of key mutations [13, 43],

  • ⊳ modeling the cancer-immune system interaction, including neoantigen presentation [4446],

  • ⊳ predicting effects of intervention strategies on tumor growth and patient survival, such as the effect of screening on adenoma risk [47].

From a mathematical point of view, the modeling makes use of different approaches, such as ordinary differential equations [48, 49], partial differential equations [50], stochastic processes [51, 52], graph theory [5355], and statistics [56, 57].

For hereditary CRCs, in particular, Komarova et al. [48, 58] proposed a model for the occurrence and ordering of key events during carcinogenesis based on ordinary differential equations [48, 58], which was adapted to sporadic carcinogenesis. In particular, it addresses the question of the extent of genetic instability as an early event in carcinogenesis.

A recent paper by Paterson et al. [59] presents a model for quantifying the evolutionary dynamics of CRC initiation and progression based on describing the occurrence of key driver mutations. The individual mutational graphs of APC, KRAS and TP53 in our model correspond to those in [59], considering APC and TP53 as classical tumor suppressor genes and KRAS as classical oncogene in CRC. In addition, the general approach of calculating gene-specific numbers of driver positions as well as assuming APC and KRAS provide fitness advantage but not TP53 are in concordance with [59]. The latter assumption is based on several independent studies [28, 37, 60].

2.1 Contribution

We provide a general mathematical framework that describes arbitrarily complex and arbitrary numbers of pathways and mutations because the chosen Kronecker structure enables a modular construction and an analytic, computationally efficient solution. We use Lynch syndrome carcinogenesis to illustrate the flexibility of the model. Naturally, specific assumptions may vary for other types of cancer. We illustrated model modifications for FAP, Lynch-like and the classical colorectal carcinogenesis.

Instead of focusing on modeling APC inactivation and MMR deficiency as in [48], we choose a more general approach for combining mutations in different genes. Compared to [59], we take into account different modes of cancer evolution beside the classical adenoma-carcinoma sequence of colorectal carcinogenesis, including hereditary forms like Lynch syndrome and familial adenomatous polyposis (FAP). Further, recent data show that in Lynch syndrome-associated CRCs, biallelic mutations of CTNNB1 seem to be required to mediate an oncogenic driver effect [61, 62], which we included in the definition of the gene mutation graphs.

While the approach in [59] is a hybrid approach of linear ordinary differential equations (ODEs) and a stochastic branching process, we use a system of ODEs to model the evolution of all genotypic states which eases the computational solution process tremendously. This goes in hand with the fact that all formulas in our model are exact from a mathematical point of view without using any approximations which in turn allows for an analytical solution of the ODEs by using the matrix exponential.

Further, the model consists of different components for modeling independent and dependent mutational processes taking into account currently available clinical observations and biomedical data.

Finally, our approach makes it possible to easily include new medical insights, while preserving the other properties of the model, like the integration of the involved differential equations. This incorporates the possibility for multiple cancerous genotypic states reflecting the real world heterogeneity of cancer, the consideration of multiple driver genes, as well as the use of different initial values and parameter combinations for modeling other carcinogenesis processes.

3 Methods

3.1 Modeling Lynch syndrome carcinogenesis

In this section, we introduce our model for colorectal carcinogenesis in Lynch syndrome. The model consists of a dynamical system given in the form of a linear ordinary differential equation which is constructed with the help of adjacency matrices describing the joint process of mutations in several genes, including mutations independent of and depending on other mutations. All mutations are assumed to be present in the whole crypt. Mutations which occur in one cell but are washed out as they reach the top of the crypt and undergo apoptosis are not considered in the model.

The system matrix is built in an additive way for implementing independent and dependent mutational processes. The matrix A for the independent processes is based on three main assumptions leading to the Kronecker sum in a natural way: 1) All combinations of mutations in the considered genes are possible and there are no additional genotypic states, 2) no two mutations in different genes occur at the exactly same point in time, 3) the mutational processes are independent of each other (see also Section 2 in S1 Appendix).

The model components representing dependent mutations are constructed in a similar way using the Kronecker structure, but here we do not make the assumptions 2 and 3. This allows for modeling dependent mutations and for the possibility of simultaneous mutations (see model components B, C, D, E and F).

3.1.1 Gene mutation graphs

In the case of colorectal carcinogenesis in Lynch syndrome, the MMR gene mutations are are associated with an increased cancer lifetime risk of Lynch syndrome individuals. Besides the MMR genes, we consider four additional possible driver genes, namely APC, KRAS, CTNNB1 and TP53 which are typical representatives of the oncogenes and tumor suppressor genes affected in the corresponding pathways of Lynch syndrome-associated carcinogenesis.

Each of these genes can have a variety of mutation status:

  • State ∅: In this state, none of the alleles has a point mutation or is affected by an LOH event.

  • States m and mm: These states describe one allele being hit by a point mutation (where the other one is not mutated) and point mutations on both alleles.

  • States l and ll: Similarly, these states describe one (respectively two) allele(s) being affected by an LOH event.

  • State ml: One of the alleles has obtained a point mutation and in the other one, an LOH event occurred. We do not differentiate which allele has which mutation and in which order they happened.

We assume that ll in CTNNB1, APC and TP53 damage a cell in such a way that it directly leads to cell death [59]. Thus, there will be no crypt with all cells being in that state. As we model the evolution of genotypic states of crypts, we do not consider the ll status for CTNNB1, APC and TP53.

As our example is Lynch syndrome carcinogenesis, all cells and hence, also all crypts have a single germline variant in the respective MMR gene and there is no ∅ status for MMR.

Further, APC and TP53 are tumor suppressor genes meaning that both alleles have to be mutated for an inactivation, whereby this two hit hypothesis dates back to Knudson et al in 1971 [63]. In particular, we ignore a possibly dominant-negative effect of APC and TP53 mutations resulting in a single hit necessary for inactivation [64].

In addition, KRAS is an oncogene, where one activating mutation is necessary. In Lynch syndrome-asscociated CRC, biallelic mutations of CTNNB1 seem to be required to mediate an oncogenic driver effect [61, 62].

All these assumptions lead to the vertex sets

VMMR={m,l,mm,ml,ll}, (1)
VCTNNB1={,m,l,mm,ml}, (2)
VAPC={,m,l,mm,ml}, (3)
VKRAS={,m}, (4)
VTP53={,m,l,mm,ml}. (5)

Using these vertex sets, we construct gene mutation graphs, in which we connect the mutation status that differ by only one mutation. This means we assume that only one mutation happens at any specific time point.

Further, we make the assumption that once a mutation has happened it cannot be reversed by another mutation. Because of this, the mutation graphs are directed acyclic graphs and their adjacency matrix can be written as a triangular matrix.

The resulting graphs are illustrated in Fig 2. This figure also displays the edge weights of the gene mutation graph, i.e., the likelihood that we transfer from one mutation status to another. The choice of the edge weights will be explained in the following sections.

Fig 2. Gene mutation graphs for independent mutational processes.

Fig 2

These graphs represent the possible mutation status, i.e., which mutations the alleles of the gene can have accumulated, as vertices ∅, m, l, mm, ll and ml. They are given for (A) MMR gene mutations, (B) APC mutations, (C) KRAS mutations, (D) CTNNB1 mutations, and (E) TP53 mutations. The edges connecting different vertices represent mutations, whereas self-loops, i.e., edges that connect a vertex with itself, describe no mutation occurring at the current point in time. The edges are labeled by the amount of change which happens at each point in time. Note that in the colon, biallelic mutations of CTNNB1 seem to be required to mediate an oncogenic driver effect [61, 62], leading to a gene mutation graph similar to that of APC and TP53.

3.1.2 Point mutations

To model the likelihood ppt(gene) for crypts being affected by point mutations in a specific gene, we make the following configurable assumptions for the example of Lynch syndrome colorectal carcinogenesis. For other types of cancer, or once new medical insights are gathered, they can and should be adapted.

  • ⊳ We would like to model the evolution of crypts over years. Many measurements and estimates are given in days. Thus, we use the factor 365 to convert the measurements per day to measurements per year.

  • ⊳ In each cell division, we accumulate npt = 1.2 point mutations according to measurements in [65], where we assume that a cell division takes one day [27].

  • ⊳ The point mutations are uniformly distributed over the base pairs on the entire genome.

  • ⊳ Each crypt is estimated [37] to consist of approximately 1.7⋅103 to 2.5⋅103 cells, whereas only approximately 75% of them can divide. Thus, we use ncells = 1500 as an approximation to the number of cells per crypt.

  • ⊳ There are nbp,genome = 3.2⋅109 base pairs (bp) on the genome.

  • ⊳ Only the point mutations which occur in hotspots of the genes are relevant for cancer development. Hotspots are regions of a gene which give rise to a phenotypical change if mutated. The size of the hotspots nhs(gene) is gene dependent and is explained in the following.

  • ⊳ Not all point mutations which appear in a crypt take over the entire crypt [28]. We model this in a gene dependent fixation affinity f(gene), i.e., the tendency of a cell with a mutation in a gene to take over the whole crypt.

  • ⊳ We assume that the alleles are independent of each other, i.e., a mutation in one allele does not influence the mutation probability in the other allele. Thus, the likelihood ppt(gene) is twice as large if there is no mutated allele (nmut(gene) = 0) compared to the state where one allele is already mutated (nmut(gene) = 1).

These assumptions lead to the following formula for the likelihood ppt(gene):

ppt(gene)=365nptncellsnhs(gene)nbp,genomef(gene)(1-12nmut(gene)). (6)

Regarding the hotspots, we assume for MLH1, MSH2 and TP53 that the whole coding sequence is susceptible to inactivating point mutations, where we use the reference sequence database at NCBI for coding sequence lengths [66]. For APC, we use mutation data from the publicly available DFCI database using the cBioPortal website [67, 68]. We make use of data from about 4000 CRC samples to identify approximately 2400 hotspots.

For the present parameter choice, we assume for CTNNB1 that only 5 mutations in codon 45 are relevant, according to [16]. Further, for KRAS, we assume 7 relevant mutations [22]. In summary, we obtain the following numbers for nhs given in Table 1.

Table 1. Estimates for nhs.
gene nhs
MLH1 2,270
MSH2 2,800
CTNNB1 5
APC 2,400
KRAS 7
TP53 1,180

The given estimates are used for the computation of the point mutation rates for the individual genes. Those are based on the following data from the literature: MLH1, MSH2 and TP53: [66]; CTNNB1: [16]; APC: [67, 68]; KRAS: [22].

3.1.3 LOH events

We assume that all detectable LOH events are large enough to inactivate an affected gene. In other words, we assume that if LOH affects a certain gene, then an exon will be lost and the gene, therefore, is inactivated. As a consequence, the probability of LOH pLOH(gene) for a given gene is proportional to its length, denoted by nbp(gene).

The probability of a relevant LOH event for a specific gene with nmut(gene) ∈ {0, 1, 2} already mutated alleles and length nbp(gene)bp to be present in the whole crypt is given by

pLOH(gene)=365ncells(1-12nmut(gene))αnbp(gene)f(gene), (7)

where αR>0 is a parameter to be estimated, independent of the considered gene.

The available data for MLH1 suggests that inactivation is twice as likely to occur due to LOH than due to point mutations [69]. Thus, we assume

pLOH(MLH1)=2ppt(MLH1). (8)

Together with (6) and (7), we get

α=2nhs(MLH1)nbp(MLH1)nptnbp,genome. (9)

In order to determine α and pLOH, we again use the reference sequence database at NCBI for the length of individual genes [66] given in Table 2.

Table 2. Estimates for nbp.
gene nbp
MLH1 57,500
MSH2 80,000
CTNNB1 41,000
APC 139,000
TP53 19,200

The following estimates for nbp are necessary for the computation of the LOH rates for the individual genes. They are based on the reference sequence database at NCBI [66].

3.1.4 Fitness advantages and clonal expansion

There is the possibility of introducing fitness changes b(gene) for individual mutation status of a gene. As we model the evolution of mutations at the crypt level, this corresponds to the clonal expansion of the crypts with one of the considered mutations. A fitness advantage is ensured by b(gene)>0 and a disadvantage with b(gene)<0. By using the notion of graphs, this corresponds to a self-loop of the respective genotypic state node with a weight equal to the fitness change. We assume that MMR deficiency leads to a fitness disadvantage [70], i.e., b(MMR)<0, and APC inactivation and KRAS activation lead to a fitness advantage, i.e., b(APC) > 0 and b(KRAS) > 0, in concordance with current measurements [28, 71].

In other words, the proliferation and disappearance of certain genotypic states is jointly modeled by the self-loops in the graph. This largely reduces the number of probability parameters necessary to be determined, accounting for the fact that there are currently not enough prospective data available to estimate or learn all the parameters. However, once there are enough data available, an additional state for dead or disappearing lesions can be introduced. We describe the corresponding formulas in S1 Appendix.

3.1.5 A model for carcinogenesis

Our mathematical model of multiple pathways in Lynch syndrome carcinogenesis is given by a system of linear ordinary differential equations

x˙(t)=(A+B+C+D+E+F)x(t),x(0)=x0. (10)

The system matrix with its additive components implements the independent mutational processes in the matrix A and all mutational dependencies, supported by available data, in the matrices B, C, D, E and F. How the individual matrices are built mathematically is introduced in the following paragraphs.

We shortly explain how the model (10) is solved. While the system matrix has 1250 = 5⋅5⋅2⋅5⋅5 rows and columns, corresponding to all possible genotypes, it is very sparse, as illustrated in Fig 3A.

Fig 3. Sparse matrix structure.

Fig 3

(A) The system matrix (A + B + C + D + E + F) of the linear model is a very sparse matrix, i.e., only a few entries are nonzero. These nonzero entries are colored red in the plot, which also illustrates the fact that (A + B + C + D + E + F) is an upper triangular matrix. (B) The sparsity structure of the matrix expm(A + B + C + D + E + F), which is reminiscent of a Sierpiński fractal, is due to the individual matrices being the Kronecker product and sum of matrices. The two plots also illustrate nicely how modeling sparse local interactions in the matrix (A + B + C + D + E + F) can have a more global effect in expm(A + B + C + D + E + F).

The transpose of the matrix is merely due to different notation conventions for adjacency matrices and differential equations.

We assume that the Lynch syndrome individuals have no mutations at birth except for an MMR germline variant due to a point mutation (90–95% of individuals) or due to an LOH event (5–10% of individuals) [72]. We differentiate these two groups of individuals by using different initial values for the differential equation. The initial value x0 for the first group of individuals is

x0=ncryptsemeeeenomutationsinCTNNB1,APC,KRASandTP53, (11)

where ncrypts = 9.95⋅106 is the estimated [73] number of crypts in the colon and em denotes the unit vector, which is zero everywhere, except for a 1 at the entry corresponding to the state m. This initial value can also be described as a vector which has the entry ncrypts at the position corresponding to the genotype (m, ∅, ∅, ∅, ∅) and is zero everywhere else.

Accordingly, the initial value for the second group of individuals is given by

x0=ncryptseleeeenomutationsinCTNNB1,APC,KRASandTP53. (12)

As stated in Eq (S1–8) in S1 Appendix, the exact solution of the differential equation is given by x(t) = expm(t(A + B + C + D + E + F))x0. We illustrate the sparsity structure of the matrix exponential in Fig 3B.

Model component for independent mutations. We explain how the matrix A for independent mutational processes is built. Having defined the gene mutation graphs with adjacency matrices AMMR, ACTNNB1, AAPC, AKRAS, ATP53 for different genes (Fig 2), we combine them using the Kronecker product as explained in Section 2 in S1 Appendix. Accordingly, the adjacency matrix of the combined model is given by the Kronecker sum of the adjacency matrices of the individual genes

A=AMMRACTNNB1AAPCAKRASATP53. (13)

When only considering independent mutational processes, the model (10) reduces to

x(t)=Ax(t),x(0)=x0, (14)

where the solution can be rewritten in the following way (see Eq (S1–1) in S1 Appendix)

x(t)=expm(tAMMR)emexpm(tACTNNB1)eexpm(tAAPC)eexpm(tAKRAS)eexpm(tATP53)encrypts (15)

for the case of the first group of individuals (11). This reduces the computational costs tremendously, as only several small matrices have to be considered instead of one large matrix.

The model components for mutational dependencies. The first model component, given by matrix A, implements all mutational processes that are independent of each other, which is either due to a independence indicated by data or due to missing medical insight suggesting otherwise. However, mutations change the functional behavior of a cell and thus, there are specific mutations that affect the probability of certain other mutations. In other words, there are mutations which are mutually exclusive or mutations which increase the probability of mutations in other genes [74].

Instead of changing the adjacency matrix A, we add the adjacency matrices for the dependent mutational processes to the independent one. This allows us to study the effects of the different mutational processes individually and makes it possible to include further dependencies when additional data are available in the future.

For the approach presented here, we assume and model the following molecular and biological mechanisms:

  • Matrix B: increased point mutation rate of APC after MMR deficiency,

  • Matrix C: positive association of CTNNB1 and MLH1 alterations,

  • Matrix D: increased LOH rate after APC inactivation,

  • Matrix E: mutual enhancement of effects C and D,

  • Matrix F: increased mutation rate of KRAS after MMR deficiency.

In the following paragraphs, we explain all considered mutational dependencies in detail.

Increased point mutation rate of APC after MMR deficiency. MMR deficiency leads to an increased mutation rate, especially in microsatellites [20]. Among others, this is true for the point mutation rate of APC. Thus, we assume that the point mutation rate of APC is increased by a factor β + 1 if the crypt has an MMR-deficient state. This is assumed to be independent of the state of the other genes.

As we do not want to change the matrix A, we introduce an additional matrix B. This means, instead of multiplying single entries of A by β + 1, we add a matrix B to A with corresponding entries multiplied by β.

We define the matrix B by

B=BMMRBCTNNB1BAPCBKRASBTP53, (16)

where BAPC is the adjacency matrix of the gene mutation graph in Fig 4 and

BMMR=diag(0,0,1,1,1),BCTNNB1=I5=BTP53,BKRAS=I2. (17)

Here, diag(d1,d2,,dn)Rn×n denotes a diagonal matrix with entries di, i ∈ {1, 2, …, n} on its diagonal.

Fig 4. Gene mutation graph of APC for increasing the point mutation rate of APC after MMR deficiency.

Fig 4

The definition (16) of the matrix B yields the desired result of increasing the point mutation rate of APC after MMR deficiency. This can be explained intuitively: We only want to increase the point mutation rate after MMR deficiency, meaning that the MMR state is mm, ml or ll, leading to the matrix BMMR. Further, this influence of MMR on APC is independent of the other genes, meaning that it should hold for all states of the other genes. Thus, we choose the respective identity matrices for KRAS, CTNNB1 and TP53 and connect all matrices via the Kronecker product, instead of the Kronecker sum as in the matrix A.

Positive association of CTNNB1 and MLH1 alterations. According to [25], somatic CTNNB1 mutations are significantly higher in MLH1-cancers than in the other MMR gene-associated CRCs. For illustration purposes, we make the assumption that inactivation of MLH1 and CTNNB1 are triggered by non-independent events. We calculate this dependency with an occurrence rate reffLOH, which we set to reffLOH = 0.9, and introduce an additional matrix C. The latter is based on a combined gene mutation graph for MLH1 and CTNNB1 and its connection with the remaining genes via the Kronecker product. Note that this is possible due to the chosen ordering of the genes.

The matrix CR1250×1250 is given by

C=CMLH1,CTNNB1CAPCCKRASCTP53, (18)

where CAPC = CTP53 = I5 and CKRAS = I2. The matrix CMLH1,CTNNB1 is the adjacency matrix corresponding to the combined gene mutation graph for MLH1 and CTNNB1. We explain in the following how this combined gene mutation graph is built and illustrate it in Fig 5.

Fig 5. Model component for the positive association of MLH1 and CTNNB1.

Fig 5

Part of the combined gene mutation graph for CTNNB1 and MLH1 of the matrix C. The gene mutation graphs for the other possible gene states MLH1 ∈ {l, ll}, CTNNB1 ∈ {m, ml} are defined in an analogous way.

Let _ denote an arbitrary state of the corresponding gene. Instead of multiplying the edge weight pLOH(MMR)/2 of the edge (m, ∅, _, _, _) → (ml, ∅, _, _, _) by (1 − reffLOH) in the original matrix A, we add a matrix C with a corresponding edge weight −reffLOH pLOH(MMR)/2. The following edges are added to the matrix C with the same weight:

(l,,_,_,_)(ll,,_,_,_), (19)
(m,m,_,_,_)(ml,m,_,_,_), (20)
(l,m,_,_,_)(ll,m,_,_,_). (21)

Furthermore, we need to insert the following new edges with edge weight −reffLOH pLOH(MLH1)/2

(m,,_,_,_)(ml,l,_,_,_), (22)
(l,,_,_,_)(ll,l,_,_,_), (23)
(m,m,_,_,_)(ml,ml,_,_,_), (24)
(l,m,_,_,_)(ll,ml,_,_,_). (25)

All other entries of C are zero, leading to a sparse matrix with only 400 non-zero entries.

Increased LOH rate after APC inactivation. The following model component deals with the increased LOH rate of APC-inactivated crypts, which is assumed to be the case in many cancers [52]. In the latter, we will denote those APC-inactivated crypts by APC-/-, which are inactivated due to mm or ml.

As further LOH events can occur for MMR, CTNNB1 and TP53 in APC-/- crypts, we have to introduce individual matrices for each effect leading to the matrix D = D1 + D2 + D3, where

D1=DMMRI5diag(0,0,0,1,1)I2I5, (26)
D2=I5DCTNNB1diag(0,0,0,1,1)I2I5, (27)
D3=I5I5diag(0,0,0,1,1)I2DTP53. (28)

Analogous to the model component B, we define a gene mutation graph of MMR, CTNNB1 and TP53 with parameter δ such that the LOH rate is increased by a factor δ + 1. This is illustrated in Fig 6 for CTNNB1 and TP53, where the gene mutation graph for MMR is defined analogously.

Fig 6. Model component for increasing the LOH rate of MMR, CTNNB1 and TP53 by a factor δ + 1 after APC inactivation.

Fig 6

Gene mutation graph for both genes, CTNNB1 and TP53, of the component D. The gene mutation graph for MMR is defined in an analogous way.

Mutual enhancement of effects C and D. APC inactivation increases the LOH rate of other genes, including MLH1, which is modeled by the matrix D. Further, there is a positive association of MLH1 and CTNNB1 alterations, which we can model in the same way as an LOH event, as described in matrix C. Thus, we would like to demonstrate how to model the mutual enhancement of two effects, which will be described by an additional matrix E. As for the matrix C, we build the combined adjacency matrix for MLH1 and CTNNB1 and combine it with the other genes via the Kronecker product, i.e.,

E=EMLH1,CTNNB1diag(0,0,0,1,1)I2I5, (29)

where again, the ordering is essential to enable an efficient implementation.

This enhancement only affects the APC-/- crypts, thus we use diag(0, 0, 0, 1, 1) for the APC matrix. Analogous to Fig 5, we illustrate parts of the gene mutation graph for the combination of MLH1 and CTNNB1 after APC inactivation in Fig 7.

Fig 7. Model component for the mutual enhancement of two dependencies by a factor δreffLOH.

Fig 7

Part of the gene mutation graph for CTNNB1 and MLH1 after APC inactivation considered by the component E. The gene mutation graphs for the other possible gene states MLH1 ∈ {l, ll}, CTNNB1 ∈ {m, ml}, APC ∈ {ml} are defined in an analogous way.

Increased mutation rate of KRAS after MMR deficiency. KRAS is an oncogene with one point mutation sufficient for activation, where mainly codon 12 or 13 are hit. Codon 13 mutations are known to be associated with and enriched in MMR-deficient cancers, as these mutations are more likely to occur under the influence of MMR deficiency [22]. We will consider this association by increasing the KRAS mutation rate after MMR deficiency by a factor ζ + 1. For this, the matrix F is defined analogously to the matrix B with the corresponding matrix entries multiplied by ζ. The gene mutation graph of KRAS is given in Fig 8.

Fig 8. Model component for increasing the mutation rate of KRAS after MMR deficiency.

Fig 8

Gene mutation graph of KRAS for the matrix F with the KRAS mutation rate increased by a factor ζ.

3.2 Modifications to the model

In Section 3.1, we introduced a mathematical modeling approach for colorectal carcinogenesis using the example of Lynch syndrome. We will present modifications to the model to handle other forms of colorectal carcinogenesis such as Lynch-like and MSS carcinogenesis, as well as colorectal carcinogenesis in FAP individuals.

For example, this can be done by changing the initial values of the model to differentiate between sporadic and hereditary cases or to consider germline variants in different genes, e.g., MMR in Lynch syndrome and APC in FAP.

Further, we can include other mutation status of already included genes, for instance the wild-type state in the MMR gene for the Lynch-like and sporadic MSI case, and we can adapt specific parameters to account for specific carcinogenesis mechanisms like we will do for the example of FAP later in this section.

Finally, we describe the potential for modifications to account for cancer evolution in other organs.

3.2.1 Non-Lynch and FAP

Lynch-like and Lynch syndrome carcinogenesis. The main difference between Lynch-like and Lynch syndrome carcinogenesis is the absence or presence of a monoallelic MMR germline variant as a first hit at birth. In Lynch syndrome carcinogenesis, all body cells, including those constituting colonic crypts, already carry a monoallelic variant in one of the MMR genes, whereas in Lynch-like carcinogenesis all cells start with wild-type MMR genes. By introducing the additional vertex ∅ in VMMR={,m,l,mm,ml,ll} with point mutation and LOH rates described in Sections 3.1.2 and 3.1.3, it is possible to represent those two forms of MSI carcinogenesis. The initial value changes to x0 = 0 except for the entry corresponding to (m, ∅, ∅, ∅, ∅) or (l, ∅, ∅, ∅, ∅) in the hereditary case and (∅, ∅, ∅, ∅, ∅, ∅) in the sporadic case for which the value is set to ncrypts.

MSS carcinogenesis. It is possible to model the evolution of MSS CRCs with the proposed model by not including MMR genes in the vertex set. Due to the absence of MMR in the model, CTNNB1 mutations are much less frequent. The classical adenoma-carcinoma model including APC, KRAS and TP53 is the dominant pathway of carcinogenesis.

FAP carcinogenesis. Another application of the model is the evolution of CRCs in another hereditary syndrome, namely FAP. Those individuals have a single germline variant in APC, which is known to be a point mutation in almost all cases [75, 76]. Thus, the dynamical system starts with all crypts in the state (∅, ∅, m, ∅, ∅).

As reported in [77], we assume that the germline variants are not equally distributed among the base pairs of the APC gene. Instead, they are concentrated at specific codons leading to the fact that we change the number of hotspot base pairs in the FAP case. Due to [78], the classical FAP case is associated with germline variants in codons 1250−1464, leading to the assumption nhs = 600 in our model for FAP simulations. Thus by changing the parameters of the model, we are able to model other cases of colorectal carcinogenesis.

The common regions of germline variants described above are also correlated with the most occurring polyps (more than 5,000) [78] in FAP individuals. With an estimated diameter of 4.8 mm per polyp [79] and 0.09 mm per crypt [80], this would result in 107 crypts in a polypous state. Thus, our model simulations should also reflect that the number of polyps, assumed to consist of APC-/- crypts, should be much higher than in the sporadic case.

3.2.2 Cancer in other organs

In general, it is possible to modify the model in such a way that it can not only model carcinogenesis in the colon but also in other organs. For this, the incorporated genes have to be changed as well as the definitions of point mutations and LOH events have to be adapted to account for different cell structures. The application to other organs will be considered in future work.

4 Results

We present the results of modeling the evolution of human colorectal crypts in a typical Lynch syndrome patient over the course of 70 years. The model starts with a germline variant in MMR in all crypts at birth and yields the temporal evolution of the crypt distribution among all genotypic states, where we only show the results for MLH1 and MSH2, as those are related to the highest CRC incidence in Lynch syndrome [25].

4.1 Evolution of crypts with specific genotypic states

Making use of Eq (S1–15) in S1 Appendix, we extracted and combined different genotypic states from the overall distribution. We did so for MMR-deficient crypts as well as other more advanced states, which we refer to adenomatous and cancerous states. They are defined in the following way:

  • MMR-deficient: MMR-deficient; CTNNB1, APC, KRAS, TP53 intact, i.e., (mm, ∅, ∅, ∅, ∅) + (ml, ∅, ∅, ∅, ∅) + (ll, ∅, ∅, ∅, ∅)

  • State 1: MMR-proficient or MMR-deficient, CTNNB1 activated; APC inactivated; KRAS and TP53 intact (called early adenomatous)

  • State 2: MMR-proficient or MMR-deficient, CTNNB1 activated; APC inactivated; KRAS activated; TP53 intact (called late adenomatous)

  • State 3: MMR-proficient or MMR-deficient, CTNNB1 activated; APC and TP53 inactivated; KRAS activated (called cancerous)

The parameters are set in such a way that the number of MMR-deficient crypts is quantitatively comparable to the clinical data presented in [80]. We show the results for MLH1 and MSH2 in Fig 9. The impact of the parameters on the simulation results are discussed in Section 4.4. The procedures for parameter learning and sensitivity analysis are planned to be included in a more mathematically focused follow-up work.

Fig 9. Number of MMR-deficient crypts over the life of a typical Lynch syndrome patient for MLH1 and MSH2.

Fig 9

The parameters in the model are set in such a way that the simulation results are in concordance with published data [80]. In our model, differences among genes are due to differences in coding region and gene lengths as well as the magnitude of the effects of the dependent mutational processes.

Further, the results for early and advanced adenomatous and cancerous states are given in Fig 10 for a typical Lynch syndrome patient with a germline variant in MLH1. It is important to note that we can analyze, e.g., the relative contribution of MMR-deficient and MMR-proficient adenomatous and cancerous states. With the chosen parameter combinations, this relative contribution changes between the advanced adenomatous and the cancerous states. We will further elaborate these contributions in Section 4.3. Further, it is possible to compare the evolution of these states with respect to the contribution of APC and CTNNB1. Note that some of the parameters are chosen without any bio-molecular data at hand meaning that some of the absolute numbers of crypts presented here may not match the real numbers once measurable. With increasing data available for the mutation rates or the evolution of crypt numbers, the model parameters can be adapted to further improve the similarity of the model output to clinical observations.

Fig 10. Number of crypts over time in a typical MLH1 carrier in combined states, like early adenomatous, advanced adenomatous and cancerous states as defined in the text for the given parameter set.

Fig 10

Due to the model components accounting for different genetic dependencies, the distribution of MMR-deficient and MMR-proficient, as well as the contribution of APC and CTNNB1 change for the different states. Due to the lack of suitable medical data, parameter learning was not performed in a rigorous way. As soon as data are available, this can be done using different mathematical techniques.

4.2 Influences of variants in MMR genes

The model is able to compare the carcinogenesis process for the different MMR genes in order to examine gene-specific differences. This in particular includes the questions of whether and how the distribution of crypts in various states changes when considering different MMR genes. More generally, the distribution among the different pathways of Lynch syndrome carcinogenesis may vary among the MMR genes. As the different pathways of carcinogenesis need different treatment and surveillance strategies, it is essential for Lynch syndrome-related clinical guidelines to examine the gene-specific associations with the pathways of carcinogenesis, as depicted in [25].

An early example is given in Fig 9 showing the differences among MMR-deficient crypt foci which are the first detectable precursor lesions of the Lynch syndrome carcinogenesis pathways 2 and 3 illustrated in Fig 1. Differences among the MMR genes are reported for adenoma and carcinoma incidences of Lynch syndrome individuals [25]. In the model, the differences are due to differences in the properties of the MMR genes, such as coding region and gene lengths, and due to the fact that dependent mutational processes influence the evolution of the crypts differently. As soon as there are more data available on bio-molecular mechanisms or there are further pathogenic variant hypotheses to be tested, these differences can be made even more explicit by introducing additional model components. This will be the subject of future work.

4.3 Distribution among the carcinogenesis pathways

We analyzed the proportion of MMR-proficient and MMR-deficient crypts in various states to determine the proportion in which MMR deficiency occurred as an initial event in carcinogenesis of Lynch syndrome carriers. The results are shown in Fig 11 and are similar to the currently available data [22] with a slight underestimation of MMR-deficient APC-/- crypts compared to MMR-proficient ones.

Fig 11. Proportion of MMR-proficient and MMR-deficient crypts in a typical MLH1 carrier in different states corresponding to the states in the classical adenoma-carcinoma sequence by Vogelstein [7].

Fig 11

Among the APC-/- crypts (left), the number of MMR-deficient crypts is up to 20% higher than the number of MMR-proficient ones. This difference largely increases with the subsequent KRAS activation (KRAS+) (middle) and TP53 inactivation (TP53-/-) (right) leading to the fact that almost all crypts in the last state, corresponding to a cancerous state, are MMR-deficient. These simulation results are in concordance with available data with a slight underestimation of MMR-deficient APC-/- crypts [22].

In general, for independent mutational processes, the distributions in Fig 11 are the same as there are no influences between the different genes. In our model, we can recognize the dependencies, as the distributions vary within the subsequent states. From APC-/- to APC-/- and KRAS-activated crypts, the difference in the proportions of MMR-proficient and MMR-deficient crypts greatly increases with the given parameter setting leading to the fact that almost all APC-/-, KRAS-activated crypts are MMR-deficient. As more of the APC-/- crypts are MMR-deficient, this seems to imply that MMR deficiency is often the initial event in Lynch syndrome carcinogenesis.

Further, the proportions do not change if TP53 inactivation happens because currently, there is no such effect incorporated in our model for, e.g., increasing the mutation rate of TP53 after MMR deficiency or after KRAS activation.

4.4 Analysis of parameter contributions

The results were obtained with the set of parameters given in Table 3. We analyzed the influences of the parameters on the simulation results. First, the number of point mutations npt, the number of cells ncells, and the number of crypts ncrypts determine the absolute values of the analyzed numbers.

Table 3. Parameter setting for the shown results.

Parameter Value
ncrypts 9.95⋅106
ncells 1.5⋅103
nbp,genome 3.2⋅109
npt 1.2
b(MMR) −0.01
b(CTNNB1) 0.0
b(APC) 0.10
b(KRAS) 0.01
b(TP53) 0.0
f(MMR) 2.3⋅10−6
f(CTNNB1) 1.2⋅10−3
f(APC) 8.3⋅10−7
f(KRAS) 2.5⋅10−8
f(TP53) 1.2⋅10−5
reffLOH 0.9
β 103
δ 102
ζ 102

Further, the relation of the hotspot length and the gene length determines the relative frequency of point mutations and LOH events for the individual genes, which can be changed by including mutational dependencies for specific genotypic states. Here, the magnitude of the parameters reffLOH, β, δ, and ζ determines how large the contribution of the individual dependency is.

The parameters b(gene) affect the slope of the crypt evolution curve. In our case, b(MMR)<0 leads to the fact that further MMR-deficient crypts are disadvantageous for the crypt survival leading to fewer additional MMR-deficient crypts with increasing age (Fig 9).

In contrast, APC inactivation is modeled as an advantage for the crypts such that b(APC) > 0 leads to more additional APC-inactivated crypts with increasing age.

Furthermore, the relation of the fixation affinities f(gene) for different genes seems to influence the ordering of the mutations. A larger value of f(gene) leads to a faster fixation in this gene and thus to an earlier event in carcinogenesis (Fig 11).

However, there is still uncertainty in the data about the fitness advantages and disadvantages of individual genetic changes as well as on the fixation affinities of mutations. General information on mutational dependencies and how they affect the phenotype of the cells is crucial to include further bio-molecular mechanisms.

4.5 Non-Lynch and FAP

We compared different types of colorectal carcinogenesis by changing the initial values of the dynamical system or by adapting other parameters.

First, we compared the number of MMR-deficient crypts in Lynch-like and Lynch syndrome individuals, as illustrated in Fig 12. The latter is much larger in Lynch syndrome individuals than in Lynch-like individuals, corresponding with [80].

Fig 12. Comparison of MMR-deficient crypts in Lynch-like and Lynch syndrome individuals.

Fig 12

The number of MMR-deficient crypts is significantly higher in Lynch syndrome individuals compared to Lynch-like individuals, which matches the findings in [80].

This is due to the fact that in Lynch syndrome, a germline variant in one allele of the MMR gene is already present such that an additional somatic mutation leading to MMR-deficiency could be gained earlier in life.

Further, we compared the APC-/- crypt evolution of a typical FAP patient with a sporadic case without a germline variant in APC for all crypts. We used the parameter setting given in Table 3, except for nhs(APC) = 600. We changed the number of hotspot base pairs in the FAP case due to the fact that the germline variants are not equally distributed among the base pairs of the APC gene, as described in Section 3.2.1.

With the given parameter set, our model simulations yield between 104−105 APC-/- crypts, which is below the estimates calculated from the literature (see Section 3.2.1). The time evolution of the number of crypts is shown in Fig 13. It would be necessary for the future to obtain age-dependent data as well as further measurements to be able to adapt the parameters accordingly.

Fig 13. Comparison of APC-/- crypts in the sporadic case and in FAP individuals, where we changed the initial value of the dynamical system as well as nhs(APC) = 600 for FAP.

Fig 13

Our simulation results yield numbers below estimates found in the literature [7880]. With improved measurements, future work will adapt the parameters accordingly.

5 Discussion

We presented a mathematical model for the multiple pathways of colorectal carcinogenesis based on a dynamical system with Kronecker structure, which models the number of colorectal crypts being present in different genotypic states.

The modeling approach consists of different model components for independent and dependent mutational processes. Although the Cancer Dependency Map [81] provides a great resource and extensive information about gene dependencies, data for specific medical contexts are scarce. Thus, the first model component is neutral and starts with the artificial assumption of complete independence. The process of adaptation to known dependencies is illustrated in our example of Lynch syndrome carcinogenesis.

Mathematically, the independence is represented by building mutation graphs for all genes individually and combining them using the Cartesian graph product. This means that the matrix of the corresponding model component can be obtained by combining the adjacency matrices using the Kronecker sum. The use of the Cartesian graph product is based on three assumptions: 1) the genotypic states in the combined graph are exactly the combination of the mutation status of the individual genes. This is a natural choice and not a limitation of the model. If there were additional genotypic states which should be considered, then they would be included in the individual genes already. 2) There is only one mutation at any point in time. However, simultaneous mutations can be included explicitly in the model. This is for example already done in the case of MLH1 and CTNNB1. 3) The mutations considered in this model component are independent of each other. This is true for those mutations with data suggesting independence or due to lack of data indicating dependency. However, if there are data suggesting any dependency, this is considered in other model components.

The model includes further components representing specific correlations and dependencies of genetic events which are chosen in concordance with existing medical hypotheses and data. The corresponding matrices again have a Kronecker structure. Further, all matrices are combined in an additive way which eases the analysis of the individual effects on the overall model solution. In addition, if further medical hypotheses and data are available, it is straightforward to include further mutation dependencies in the model.

As an example, we focused on the evolution of key genotypic states occurring in Lynch syndrome, the most common inherited CRC syndrome, namely alterations in the MMR genes, with focus on MLH1 and MSH2, CTNNB1, APC, KRAS and TP53. There might be other driver mutations in Lynch syndrome-associated colorectal carcinogenesis where empirical data are scarce and thus, these mutations are currently not covered for the specific example of Lynch syndrome modeling. Due to the general structure of the model, it would be possible to consider other driver mutations in future.

In order to apply the modeling approach to Lynch syndrome carcinogenesis, we assume gene-dependent mutation and LOH event rates meaning that the mutation rate of a gene is proportional to the length of the gene and the total number of mutations occurring in a cell during cell division. As there are multiple cells within a crypt each having an individual cell cycle, it takes some time until the mutation is present in the whole crypt, a process called fixation. Further, a mutation could be washed out of the crypt, if it is not advantageous enough for fixation to occur. Thus, we assume that the mutation rate of a gene in a crypt also depends on a fixation tendency of the specific genetic event. The edge weights in the graph representation correspond to the mutation rates between those genotypic states of crypts, where the mutation rates are computed based on the described assumptions.

By this choice of parameters, we were able to obtain simulation results which are in concordance with clinical observations. This includes the number of crypts in a specific genotypic state, like MMR-deficient crypts which are early precursors in Lynch syndrome carcinogenesis [80]. Further, we analyzed the influence of variants in different MMR genes, here for MLH1 and MSH2 as an example, leading to differences in numbers of crypts in specific states. This was recently observed in clinical data [25] suggesting adaptation of Lynch syndrome surveillance guidelines based on MMR gene variants. Here, rigorous analysis of the impact of MMR gene variants, considering also other MMR genes, and other molecular differences is subject of future work.

We are fully aware of the fact that our simulation results are depending on specific a priori assumptions. Moreover, our model is deterministic; therefore, options for assessment of robustness are limited and mainly based on parameter variations. Therefore, development of stochastic modeling approaches is desirable to more faithfully reflect natural cancer evolution, including random events and spontaneous disappearance of precancerous and potentially even cancerous lesions.

We analyzed the proportion of MMR-deficient and MMR-proficient crypts showing APC inactivation as a first indicator for the distribution among the three currently hypothesized pathways of carcinogenesis in Lynch syndrome individuals, with a good concordance to current clinical observations [22]. Future studies will include a more systematic analysis and modeling of this aspect.

The model can be easily modified to other types of carcinogenesis, such as sporadic MMR-deficient cancers, Lynch-like MMR-deficient cancers, other hereditary CRCs like FAP, and microsatellite-stable CRCs.

It is important to note that the modeling approach in general is independent of the specific parameter values. Thus, different assumptions for the mutation rates of individual genes can be used, if appropriate, for another carcinogenesis scenario. Moreover, different assumptions for Lynch syndrome carcinogenesis, e.g., the inclusion of the ll states or dominant-negative effects can be accounted for by adapting parameter values.

In principle, it is possible to apply the model structure to other organs by modifying the mutation probability definitions according to the underlying cell structure and by incorporating different genes with appropriate predominant genetic effects. This will be the subject of further investigation. Further, in the presented example, the model components are based on individual genes and gene-specific aspects. In other words, we consider genes individually and not their signaling pathways as entities. However, in general, it is possible to represent the model components by signaling pathways and the influence of alterations thereof.

In summary, we model carcinogenesis on the basis of the number of crypts being present with specific genotypic states. The latter can be aligned to clinically defined stages such as early adenoma, although we are fully aware of the fact that the congruence between clinical and molecular definitions will be limited due to the dynamics of cancer evolution and the limited availability of comprehensive data. Limitations of data also concern the topic of overdiagnosis and disappearing lesions. From a mathematical point of view, it is straightforward to include spontaneous disappearance of lesions in the modeling approach, as shown in the manuscript. However, there are currently not enough prospective data available to estimate or learn the necessary parameters, e.g., the probability of spontaneous crypt loss for each mutation status. This is the reason why we have chosen a simpler model jointly modeling the proliferation and disappearance by the self-loops in the graph, largely reducing the number of parameters that need to be determined. If more molecular data with the analysis of all possibly relevant genes are available, a comparison of the model with these data will allow for parameter learning of the yet unmeasurable parameters. In this context, we would like to emphasize that the “linear model” used in the present approach only reflects the mathematical framework of linear differential equations, but does not represent the evolutionary process, which we consider as a parallel, competitive process of mutational events, persistence and regression of lesions.

Further, the modular structure of the model allows for an inclusion of further states, e.g., death/disappearing states in a natural way. This also concerns external factors, such as effects of the microenvironment or the role of the immune system: Our model, through the flexibility regarding mutational events and their consequences, can also be used to make specific assumptions about tumor-immune cell interactions, for example assuming a higher immune visibility of MMR-deficient cell clones with high mutation load, which is part of future work.

Supporting information

S1 Appendix. Mathematical background.

This includes basic notions from graph theory, graph products, the Kronecker sum of matrices, linear dynamical systems and their solution.

(PDF)

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2018;68(6):394–424. [DOI] [PubMed] [Google Scholar]
  • 2. Guinney J, Dienstmann R, Wang X, de Reyniès A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nature Medicine. 2015;21(11):1350–1356. 10.1038/nm.3967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Klimstra D, Klöppel G, La Rosa S, Rindi G. Classification of neuroendocrine neoplasms of the digestive system. WHO Classification of tumours, 5th Edition Digestive system tumours. 2019; p. 16–19. [Google Scholar]
  • 4. Jasperson KW, Tuohy TM, Neklason DW, Burt RW. Hereditary and familial colon cancer. Gastroenterology. 2010;138(6):2044–2058. 10.1053/j.gastro.2010.01.054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Boland CR, Goel A. Microsatellite Instability in Colorectal Cancer. Gastroenterology. 2010;138(6):2073–2087.e3. 10.1053/j.gastro.2009.12.064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Carethers JM. Differentiating Lynch-like from Lynch syndrome. Gastroenterology. 2014;146(3):602–604. 10.1053/j.gastro.2014.01.041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Vogelstein B, Kinzler KW. The multistep nature of cancer. Trends in genetics. 1993;9(4):138–141. 10.1016/0168-9525(93)90209-Z [DOI] [PubMed] [Google Scholar]
  • 8. Nielsen M, Aretz S. Familial Adenomatous Polyposis or APC-Associated Polyposis. In: Valle L, Gruber SB, Capellá G, editors. Hereditary Colorectal Cancer: Genetic Basis and Clinical Implications. Cham: Springer International Publishing; 2018. p. 99–111. [Google Scholar]
  • 9. Wunderlich V. Early references to the mutational origin of cancer. International journal of epidemiology. 2006;36(1):246–247. 10.1093/ije/dyl272 [DOI] [PubMed] [Google Scholar]
  • 10. Edler L, Kopp-Schneider A. Origins of the mutational origin of cancer. International journal of epidemiology. 2005;34(5):1168–1170. 10.1093/ije/dyi134 [DOI] [PubMed] [Google Scholar]
  • 11. Nowell P, Hungerford D. A minute chromosome in human chronic granulocytic leukemia. Landmarks in Medical Genetics: Classic Papers with Commentaries. 2004;132(51):103. [Google Scholar]
  • 12. Vogelstein B, Fearon ER, Hamilton SR, Kern SE, Preisinger AC, Leppert M, et al. Genetic alterations during colorectal-tumor development. New England Journal of Medicine. 1988;319(9):525–532. 10.1056/NEJM198809013190901 [DOI] [PubMed] [Google Scholar]
  • 13. Tomasetti C, Marchionni L, Nowak MA, Parmigiani G, Vogelstein B. Only three driver gene mutations are required for the development of lung and colorectal cancers. Proceedings of the National Academy of Sciences. 2015;112(1):118–123. 10.1073/pnas.1421839112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Kloor M, von Knebel Doeberitz M. The Immune Biology of Microsatellite-Unstable Cancer. Trends in Cancer. 2016;2(3):121–133. 10.1016/j.trecan.2016.02.004 [DOI] [PubMed] [Google Scholar]
  • 15. Seppälä TT, Ahadova A, Dominguez-Valentin M, Macrae F, Evans DG, Therkildsen C, et al. Lack of association between screening interval and cancer stage in Lynch syndrome may be accounted for by over-diagnosis: a prospective Lynch syndrome database report. Hereditary Cancer in Clinical Practice. 2019;17(1). 10.1186/s13053-019-0106-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ahadova A, von Knebel Doeberitz M, Bläker H, Kloor M. CTNNB1-mutant colorectal carcinomas with immediate invasive growth: a model of interval cancers in Lynch syndrome. Familial cancer. 2016;15(4):579–586. 10.1007/s10689-016-9899-z [DOI] [PubMed] [Google Scholar]
  • 17. Møller P, Seppälä T, Bernstein I, Holinski-Feder E, Sala P, Evans DG, et al. Cancer risk and survival in path_MMR carriers by gene and gender up to 75 years of age: a report from the Prospective Lynch Syndrome Database. Gut. 2017. 10.1136/gutjnl-2017-314057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Robert Koch-Institut. Cancer in Germany 2011/2012; 2016.
  • 19. Kloor M, von Knebel Doeberitz M, Gebert JF. Molecular testing for microsatellite instability and its value in tumor characterization. Expert Review of Molecular Diagnostics. 2005;5(4):599–611. 10.1586/14737159.5.4.599 [DOI] [PubMed] [Google Scholar]
  • 20. de la Chapelle A. Microsatellite instability. New England Journal of Medicine. 2003;349(3):209–210. 10.1056/NEJMp038099 [DOI] [PubMed] [Google Scholar]
  • 21. Kolodner R. Biochemistry and genetics of eukaryotic mismatch repair. Genes & development. 1996;10(12):1433–1442. 10.1101/gad.10.12.1433 [DOI] [PubMed] [Google Scholar]
  • 22. Ahadova A, Gallon R, Gebert J, Ballhausen A, Endris V, Kirchner M, et al. Three molecular pathways model colorectal carcinogenesis in Lynch syndrome. International journal of cancer. 2018;143(1):139–150. 10.1002/ijc.31300 [DOI] [PubMed] [Google Scholar]
  • 23. Ahadova A, Seppälä TT, Engel C, Gallon R, Burn J, Holinski-Feder E, et al. The “unnatural” history of colorectal cancer in Lynch syndrome: Lessons from colonoscopy surveillance. International Journal of Cancer. 2020;148(4):800–811. 10.1002/ijc.33224 [DOI] [PubMed] [Google Scholar]
  • 24. Engel C, Vasen HF, Seppälä T, Aretz S, Bigirwamungu-Bargeman M, de Boer SY, et al. No difference in colorectal Cancer incidence or stage at detection by colonoscopy among 3 countries with different lynch syndrome surveillance policies. Gastroenterology. 2018;155(5):1400–1409. 10.1053/j.gastro.2018.07.030 [DOI] [PubMed] [Google Scholar]
  • 25. Engel C, Ahadova A, Seppälä TT, Aretz S, Bigirwamungu-Bargeman M, Bläker H, et al. Associations of Pathogenic Variants in MLH1, MSH2, and MSH6 With Risk of Colorectal Adenomas and Tumors and With Somatic Mutations in Patients With Lynch Syndrome. Gastroenterology. 2020;158(5):1326–1333. 10.1053/j.gastro.2019.12.032 [DOI] [PubMed] [Google Scholar]
  • 26. van Leeuwen IMM, Byrne HM, Jensen OE, King JR. Crypt dynamics and colorectal cancer: advances in mathematical modelling. Cell Proliferation. 2006;39(3):157–181. 10.1111/j.1365-2184.2006.00378.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cooper GM. The Cell. Eigth ed. Oxford University Press; 2019. Available from: https://global.oup.com/academic/product/the-cell-9781605358635.
  • 28. Nicholson AM, Olpe C, Hoyle A, Thorsen AS, Rus T, Colombé M, et al. Fixation and Spread of Somatic Mutations in Adult Human Colonic Epithelium. Cell Stem Cell. 2018;22(6):909–918.e8. 10.1016/j.stem.2018.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Haupt S, Gleim N, Ahadova A, Bläker H, von Knebel Doeberitz M, Kloor M, et al. Computational model investigates the evolution of colonic crypts during Lynch syndrome carcinogenesis. bioRxiv. 2020. 10.1101/2020.11.15.383323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. British journal of cancer. 1954;8(1):1. 10.1038/bjc.1954.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Armitage P, Doll R. A two-stage theory of carcinogenesis in relation to the age distribution of human cancer. British journal of cancer. 1957;11(2):161. 10.1038/bjc.1957.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Kendall DG. Birth-and-death processes, and the theory of carcinogenesis. Biometrika. 1960;47(1/2):13–21. 10.2307/2332953 [DOI] [Google Scholar]
  • 33. Serio G. Two-stage stochastic model for carcinogenesis with time-dependent parameters. Statistics & Probability Letters. 1984;2(2):95–103. 10.1016/0167-7152(84)90057-9 [DOI] [Google Scholar]
  • 34. Tan WY, Brown CC. A nonhomogeneous two-stage model of carcinogenesis. Mathematical and Computer Modelling. 1988;11:445–448. 10.1016/0895-7177(88)90531-6 [DOI] [Google Scholar]
  • 35. Tan WY, Hanin LG. Handbook of cancer models with applications. vol. 9. World Scientific; 2008. [Google Scholar]
  • 36. Binder H, Hopp L, Schweiger MR, Hoffmann S, Jühling F, Kerick M, et al. Genomic and transcriptomic heterogeneity of colorectal tumours arising in Lynch syndrome. The Journal of pathology. 2017;243(2):242–254. 10.1002/path.4948 [DOI] [PubMed] [Google Scholar]
  • 37. Baker AM, Cereser B, Melton S, Fletcher AG, Rodriguez-Justo M, Tadrous PJ, et al. Quantification of crypt and stem cell evolution in the normal and neoplastic human colon. Cell reports. 2014;8(4):940–947. 10.1016/j.celrep.2014.07.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Baker AM, Gabbutt C, Williams MJ, Cereser B, Jawad N, Rodriguez-Justo M, et al. Crypt fusion as a homeostatic mechanism in the human colon. Gut. 2019; p. gutjnl–2018–317540. 10.1136/gutjnl-2018-317540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Desper R, Jiang F, Kallioniemi OP, Moch H, Papadimitriou CH, Schäffer AA. Inferring tree models for oncogenesis from comparative genome hybridization data. Journal of computational biology. 1999;6(1):37–51. 10.1089/cmb.1999.6.37 [DOI] [PubMed] [Google Scholar]
  • 40. Gerstung M, Baudis M, Moch H, Beerenwinkel N. Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics. 2009;25(21):2809–2815. 10.1093/bioinformatics/btp505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Woerner SM, Gebert J, Yuan YP, Sutter C, Ridder R, Bork P, et al. Systematic identification of genes with coding microsatellites mutated in DNA mismatch repair-deficient cancer cells. International journal of cancer. 2001;93(1):12–19. 10.1002/ijc.1299 [DOI] [PubMed] [Google Scholar]
  • 42. Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, et al. The Evolutionary History of 2,658 Cancers. Nature. 2020;578(7793):122–128. 10.1038/s41586-019-1907-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Mitchell TJ, Turajlic S, Rowan A, Nicol D, Farmery JHR, O’Brien T, et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx renal. Cell. 2018;173(3):611–623. 10.1016/j.cell.2018.02.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Burini D, Angelis E, Lachowicz M. A Continuous–Time Markov Chain Modeling Cancer–Immune System Interactions. Communications in Applied and Industrial Mathematics. 2018;9:106–118. 10.2478/caim-2018-0018 [DOI] [Google Scholar]
  • 45. Lakatos E, Williams MJ, Schenck RO, Cross WCH, Househam J, Zapata L, et al. Evolutionary dynamics of neoantigens in growing tumors. Nature Genetics. 2020;52(10):1057–1066. 10.1038/s41588-020-0687-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Ballhausen A, Przybilla MJ, Jendrusch M, Haupt S, Pfaffendorf E, Seidler F, et al. The shared frameshift mutation landscape of microsatellite-unstable cancers suggests immunoediting during tumor evolution. Nature Communications. 2020;11(1). 10.1038/s41467-020-18514-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Thiis-Evensen E, Hoff GS, Sauar J, Majak BM, Vatn MH. The effect of attending a flexible sigmoidoscopic screening program on the prevalence of colorectal adenomas at 13-year follow-up. The American journal of gastroenterology. 2001;96(6):1901–1907. 10.1111/j.1572-0241.2001.03891.x [DOI] [PubMed] [Google Scholar]
  • 48. Komarova NL, Lengauer C, Vogelstein B, Nowak MA. Dynamics of genetic instability in sporadic and familial colorectal cancer. Cancer biology & therapy. 2002;1(6):685–692. 10.4161/cbt.321 [DOI] [PubMed] [Google Scholar]
  • 49. Ashkenazi R, Gentry SN, Jackson TL. Pathways to Tumorigenesis—Modeling Mutation Acquisition in Stem Cells and Their Progeny. Neoplasia. 2008;10(11):1170–IN6. 10.1593/neo.08572 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Liu Z, Chen J, Pang J, Bi P, Ruan S. Modeling and Analysis of a Nonlinear Age-Structured Model for Tumor Cell Populations with Quiescence. Journal of Nonlinear Science. 2018; p. 1–29. [Google Scholar]
  • 51. Iwasa Y, Michor F, Nowak MA. Stochastic tunnels in evolutionary dynamics. Genetics. 2004;166(3):1571–1579. 10.1534/genetics.166.3.1571 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Nowak MA, Komarova NL, Sengupta A, Jallepalli PV, Shih IM, Vogelstein B, et al. The role of chromosomal instability in tumor initiation. Proceedings of the National Academy of Sciences. 2002;99(25):16226–16231. 10.1073/pnas.202617399 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Naxerova K, Reiter JG, Brachtel E, Lennerz JK, Van De Wetering M, Rowan A, et al. Origins of lymphatic and distant metastases in human colorectal cancer. Science. 2017;357(6346):55–60. 10.1126/science.aai8515 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Turajlic S, McGranahan N, Swanton C. Inferring mutational timing and reconstructing tumour evolutionary histories. Biochimica et Biophysica Acta (BBA)-Reviews on Cancer. 2015;1855(2):264–275. 10.1016/j.bbcan.2015.03.005 [DOI] [PubMed] [Google Scholar]
  • 55. Beerenwinkel N, Rahnenführer J, Däumer M, Hoffmann D, Kaiser R, Selbig J, et al. Learning multiple evolutionary pathways from cross-sectional data. Journal of computational biology. 2005;12(6):584–598. 10.1089/cmb.2005.12.584 [DOI] [PubMed] [Google Scholar]
  • 56. Chen H, Zhang F. The expected hitting times for finite Markov chains. Linear Algebra and its Applications. 2008;428(11-12):2730–2749. 10.1016/j.laa.2008.01.003 [DOI] [Google Scholar]
  • 57. Buckley JJ. Fuzzy Markov Chains. In: Buckley JJ, editor. Fuzzy Probabilities and Fuzzy Sets for Web Planning. Studies in Fuzziness and Soft Computing. Berlin, Heidelberg: Springer; 2004. p. 35–43. [Google Scholar]
  • 58. Komarova NL, Sengupta A, Nowak MA. Mutation–selection networks of cancer initiation: tumor suppressor genes and chromosomal instability. Journal of theoretical biology. 2003;223(4):433–450. 10.1016/S0022-5193(03)00120-6 [DOI] [PubMed] [Google Scholar]
  • 59. Paterson C, Clevers H, Bozic I. Mathematical Model of Colorectal Cancer Initiation. Proceedings of the National Academy of Sciences. 2020. 10.1073/pnas.2003771117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Van Leeuwen IM, Edwards CM, Ilyas M, Byrne HM. Towards a multiscale model of colorectal cancer. World journal of gastroenterology: WJG. 2007;13(9):1399. 10.3748/wjg.v13.i9.1399 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Arnold A, Tronser M, Sers C, Ahadova A, Endris V, Mamlouk S, et al. The majority of β-catenin mutations in colorectal cancer is homozygous. BMC Cancer. 2020;20(1). 10.1186/s12885-020-07537-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Huels DJ, Ridgway RA, Radulescu S, Leushacke M, Campbell AD, Biswas S, et al. E-Cadherin Can Limit the Transforming Properties of Activating β-Catenin Mutations. The EMBO journal. 2015;34(18):2321–2333. 10.15252/embj.201591739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Knudson AG. Mutation and cancer: statistical study of retinoblastoma. Proceedings of the National Academy of Sciences. 1971;68(4):820–823. 10.1073/pnas.68.4.820 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Dihlmann S, Gebert J, Siermann A, Herfarth C, von Knebel Doeberitz M. Dominant negative effect of the APC1309 mutation: a possible explanation for genotype-phenotype correlations in familial adenomatous polyposis. Cancer Res. 1999;59(8):1857–1860. [PubMed] [Google Scholar]
  • 65. Werner B, Case J, Williams MJ, Chkhaidze K, Temko D, Fernández-Mateos J, et al. Measuring Single Cell Divisions in Human Tissues from Multi-Region Sequencing Data. Nature Communications. 2020;11(1):1035. 10.1038/s41467-020-14844-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, et al. Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation. Nucleic Acids Research. 2016;44(D1):D733–745. 10.1093/nar/gkv1189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery. 2012;2(5):401–404. 10.1158/2159-8290.CD-12-0095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Science Signaling. 2013;6(269):pl1–pl1. 10.1126/scisignal.2004088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Porkka N, Valo S, Nieminen TT, Olkinuora A, Mäki-Nevala S, Eldfors S, et al. Sequencing of Lynch Syndrome Tumors Reveals the Importance of Epigenetic Alterations. Oncotarget. 2017;8(64):108020–108030. 10.18632/oncotarget.22445 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Galeota-Sprung B, Guindon B, Sniegowski P. The Fitness Cost of Mismatch Repair Mutators in Saccharomyces Cerevisiae: Partitioning the Mutational Load. Heredity. 2020;124(1):50–61. 10.1038/s41437-019-0267-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Baker AM, Graham TA. Quantifying human intestinal stem cell and crypt dynamics: the implications for cancer screening and prevention. Expert Review of Gastroenterology & Hepatology. 2016;10(3):277–279. 10.1586/17474124.2016.1134314 [DOI] [PubMed] [Google Scholar]
  • 72. Kloor M, Huth C, Voigt AY, Benner A, Schirmacher P, von Knebel Doeberitz M, et al. Prevalence of mismatch repair-deficient crypt foci in Lynch syndrome: a pathological study. The Lancet Oncology. 2012;13(6):598–606. 10.1016/S1470-2045(12)70109-2 [DOI] [PubMed] [Google Scholar]
  • 73. Hounnou G, Destrieux C. Anatomical study of the length of the human intestine. Surgical and radiologic anatomy. 2002;24(5):290–294. 10.1007/s00276-002-0057-y [DOI] [PubMed] [Google Scholar]
  • 74. Leiserson MD, Wu HT, Vandin F, Raphael BJ. CoMEt: A Statistical Approach to Identify Combinations of Mutually Exclusive Alterations in Cancer. Genome Biology. 2015;16(1):160. 10.1186/s13059-015-0700-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Nagase H, Nakamura Y. Mutations of the APC (adenomatous polyposis coli) gene. Human mutation. 1993;2(6):425–434. 10.1002/humu.1380020602 [DOI] [PubMed] [Google Scholar]
  • 76. Rashid M, Fischer A, Wilson CH, Tiffen J, Rust AG, Stevens P, et al. Adenoma development in familial adenomatous polyposis and MUTYH-associated polyposis: somatic landscape and driver genes. The Journal of pathology. 2016;238(1):98–108. 10.1002/path.4643 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Gryfe R. Inherited colorectal cancer syndromes. Clinics in colon and rectal surgery. 2009;22(4):198. 10.1055/s-0029-1242459 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Kinzler KW, Vogelstein B. Lessons from hereditary colorectal cancer. Cell. 1996;87(2):159–170. 10.1016/S0092-8674(00)81333-1 [DOI] [PubMed] [Google Scholar]
  • 79. Goldstein NS, Bhanot P, Odish E, Hunter S. Hyperplastic-like Colon Polyps That Preceded Microsatellite-Unstable Adenocarcinomas. American Journal of Clinical Pathology. 2003;119(6):778–796. 10.1309/DRFQ0WFUF1G13CTK [DOI] [PubMed] [Google Scholar]
  • 80. Staffa L, Echterdiek F, Nelius N, Benner A, Werft W, Lahrmann B, et al. Mismatch repair-deficient crypt foci in Lynch syndrome–molecular alterations and association with clinical parameters. PLoS One. 2015;10(3):e0121980. 10.1371/journal.pone.0121980 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nature Genetics. 2017;49(12):1779–1784. 10.1038/ng.3984 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008970.r001

Decision Letter 0

Natalia L Komarova, Jing Chen

18 Nov 2020

Dear Mrs Haupt,

Thank you very much for submitting your manuscript "Mathematical Modeling of Multiple Pathways in Colorectal Carcinogenesis using Dynamical Systems with Kronecker Structure" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Jing Chen

Guest Editor

PLOS Computational Biology

Natalia Komarova

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I below comment on the starting points/assumptions for the reasoning and leave the correctness of the specific mathematics and its references to a professional statistical expert.

The paper is very difficult to read, partly because lack of clear organization: Neither the definition of terms used, the basic conceptional understanding of what is ment by ‘cancer’ (and which seemingly is confused with ‘tumor’ in parts of the mns) and the basic assumptions underlying the statistics are not presented distinctly do the reader, and the obligatory section discussing the impact of these/violation of these I do not find. Deep inside the text I find three key factors mentioned:

1. the order in which mutations are accumulated is ignored

2. The edge connectivity assumption states that no two mutations in different genes can occur at the exactly same point in time

3. The third assumption entails that a mutation in one gene does not change the mutation probability in another gene.

the first of which seemingly in conflict with the basic understanding that the order of the events is of interest as mentioned initially, the second partly invalid, and the third obviously wrong: If an initial mutation increases proliferation rate, the probability for one of the cells carrying the mutation to have a second hit is to me obviously increased – which is what we teach the student when explaining carcinogenesis. And which is why the order of mutations is believed to be significant. I do see that there may be ways to read the paper not to find these arguments to be contradictory, but a scientific paper is not to be text to be read in good faith to be understood correctly.

The shifts back-and-to between considering cancer/tumor in general and MMR-deficient tumours specifically has to me the flavor of a paper explaining something to the general public with examples, and is not what is expected in a scientific paper. These shifts make it very difficult to control the relationships between assumptions/input to calculations and results.

The intro mention immune system counteracting cancer development, but does not refer to the growing knowledge of ‘overdiagnosis’ – established cancer may stop growing and/or disappear. By and large the mns seemingly consider cancer a result of a process having some probability to occur, but does not consider the common understanding that these biological events are much more frequent than the end result cancer indicates because there are opposing system removing the mutated cells – it’s briefly mentioned but not discussed to my reading as a major factor. When considering Lynch Syndrome, this is of importance because the initiating factor (MMR deficiency) may/will all along the way constantly impose new mutations which may both block the effect of first mutations and trigger new biological systems with the effect of Darwinistic reduced fitness of the growing cells/clone. This to me is an element of probability, and which seemingly is denied in the assumptions:

“Most of the cancers in the general population occur by chance. These cancers are called sporadic. However, in some families, certain types of cancer appear more frequently. This is either a familial or a hereditary form of cancer…. From a modeling point of view, the advantage of focusing on hereditary tumors is that there are clearly defined molecular events determining the onset of the disease and thereby representing a known mechanism underlying carcinogenesis.”

To this referee, these assumptions are questionable, I would rather say that any individual has a large set of inherent and acquired probabilities for biological mechanisms going wrong, and the difference between familial/inherited cancers and ‘sporadic’ ones is that the familial/inhered cancers cases have different probabilities for some unwanted outcomes in the billions of biological events happening in their cells each second they live. Following such a way of thinking, cancer is close to inevitable if one live long enough, but with increasingly good health and longliving the probability for contracting cancer increases because the incidence of dying before you so do decreases. In such a context, the probability for being live born and live for many years with a significant failure in one of the important system to mend errors, is about a small brim of errors between what is ‘normal’ and what is compatible with life. Boiling it down to a short analogy: The human body is not a machine, because it has the inherent capability of repairing itself, and to me this basic understanding is not good enough visible in the mns.

In sum, to me the mns was interesting reading, but the validity of the conclusions I cannot describe because the authors to me don’t.

Reviewer #2: This is a solid paper that describes a mathematical approach to deterministic modeling of cancer initiation. I have two concerns about the paper: 1) their model is extremely similar to the recently published Paterson et al. paper (Ref. 37) and 2) many of the parameter values are arbitrarily set, even when estimates are available from the literature.

Regarding issue 1), the authors mention the Paterson et al. paper, but are not clear as to how much of their model is virtually a verbatim copy of the Paterson et al. model. They consider 5 genes, APC, KRAS, TP53, CTNNB1 and "MMR", whereas Paterson et al. consider APC, KRAS and TP53. The graph of genotypes and the possible states for the three genes in common are exactly the same as in Paterson et al, as well as the general approach to calculating gene-specific numbers of driver positions, studying accumulation of mutations in a population of crypts (i.e. disregarding tissue hierarchy), assuming APC and KRAS provide fitness advantage but not TP53 etc. The Paterson et al. model is stochastic, but the authors use a dynamical system approximation to derive analytic results, which is also similar to the present paper. Even though the focus of the manuscript is on Lynch syndrome, the authors should clearly state that the approach they take is very similar to the one of Paterson et al. in many aspects.

My second issue is regarding how parameters were set.

For example, regarding results in Fig 10 the authors state: “Number of MMR-deficient Crypts over the Life of a typical Lynch Syndrome Patient for MLH1 and MSH2. The parameters in the model are set in such a way that the simulation results are in concordance with published data”. Why did the authors choose to set parameter values in this way? Since there are many parameters in the model, how did they choose these specific values? A better approach would have been to estimate all parameters from the literature, whenever possible. For example, b(KRAS) has been estimated to be 0.07/year (Ref. 67), and the authors set it to 0.01. Estimates similarly exist for b(APC) (Ref. 13).

There are also some formatting issues, e.g. eq. (1).

Reviewer #3: The manuscript entitled "Mathematical Modeling of Multiple Pathways in Colorectal Carcinogenesis using Dynamical Systems with Kronecker Structure" by Haupt and colleagues propose an extensible mathematical model of colorectal carcinogenesis for Lynch syndrome based on Kronecker sum of mutation graphs adjacency matrices. The authors focus on an important problem, however they hardly put their work into context. Modeling hereditary colorectal cancer in terms of genomic alterations occurring in defined pathways is interesting, however the enthusiasm for this work is highly diminished by a few major shortcomings, as discussed below.

Major comments:

1) it is not clear what purpose this whole exercise will serve. What does this model does for a patient? The authors mention vaguely "individual needs for prevention and treatment", but they do not make any effort to develop this point.

2) The mathematical modeling is very complicated, for sure inaccessible to the (lay) audience working in the cancer research field. The authors should make a substantial effort to make it less obscure.

3) The mathematical framework appears to be "ad hoc", with many assumptions that are not necessarily adequate, justifiable, or justified, so it is hard to appraise how generalizable this approach would be. Even the choice to focus on Lynch syndrome is partial, since this is not the "mainstream" path to carcinogenesis.

4) The paper is, in summary, a very long list of assumptions. Some of them questionable from the cancer biology point of view. Some sound, some very well explained, some not explained that well. An effort should be made justify all assumptions. That said, how generalizable is this approach if it depends on so many assumptions?

Below follows a few comments to support my review.

Specific comments

1) Abstract:

- The abstract requires major revision: after reading one can only guess what the manuscript is about. It appears it is about tumor evolution in a very specific hereditary syndrome, and that this should be important for developing more rational clinical treatments. Yet, this is not put into context, and the discussion of the mathematical models and their assumptions remains vaguely linked to the problem under investigation.

- The authors, in the abstract and introduction, make claims about "different types of colorectal cancer with individual needs for prevention and treatment." Yet these are not introduced and discussed. Explicitly grounding the work in current colorectal cancer research (and the related knowledge on molecular subtypes, etc...) would be highly beneficial.

- I definitely agree with the idea that "... a single model describing the whole process of carcinogenesis ... is too complex to build". I think that this issue is grounded in the reductionist (and often positivist) approach of most scientific research in this fiels (and others too, of course). And yes, "... the increasing medical knowledge ..." had made it more evident. Yet, even postulating the possibility (of a unifying theory) in the first place might be perhaps too ambitious. That said, the authors should perhaps consider to restructure the Introduction section to make it more consequential. Discussing mathematical modeling before discussing cancer and its pathogenesis is a little odd, and does not help the readerships to follow the narrative. More explicitly: sections 1.1 and 1.2, should go after Section 2, and maybe section 1.3 should come at the beginning.

- Citation 16 might not be the correct one: did the authors meant to cite: "doi: 10.1093/bioinformatics/btp505"?

- Lines 109-112: "..., while loss of heterozygosity (LOH) refers to the loss of larger regions of a chromosome, which can result in the deletion of whole genes." -> This is not correct, or well explained. LOH is not COPY NUMBER LOSS or COPY NUMBER ALTERATION... LOH is when one copy of a normally diploid genome is lost at some resolution (a focal loss, or a larger region, or a whole chromosome, it doesn't matter). This can result in loss of function, when coupled with mutations (or other alteration types) in the remaining copy. Please correct.

- Lines 113-119: Discarding 99% of the genome is "old school". Pathogenic, non-coding mutations affecting coding genes have been known for years (see the globin genes in thalassemia for instance) and the relevance of non coding genes is becoming more and more relevant. Hence, the following statement should be amended: "As about 99 % of the genome is not translated into proteins, most mutations do not have a direct consequence on cell viability or behavior."

- Lines 120-121: See above, therefore the following statement – "This means that, from all the possible mutations that can occur, we need to identify the relatively few key events which have a functional impact on the cell" – is not entirely correct and should be revised. There might be only a few CODING mutations that might be relevant, however we have no clues about the non-coding domain.

- Lines 123-125: There is also the possibility that a cancer is controlled by non-cell autonomous mechanisms, that are not defined by the combination of mutational events happening in the initiating cell (i.e., affecting oncogenes, tumor suppressors, and the detrimental mutations affecting the fitness of the cancer "to be" cell)... For instance, the role of immune surveillance... etc.

- Lines 126-128: "Different combinations of key mutations result in several distinct pathways." this sentence does not make sense. Signaling pathways exist independently from the mutations in the genes that encode the proteins participating in said pathways. Different combinations of mutations result in aberrations of specific pathways, usually non redundant ones. Furthermore, pathways don't "arise", they exist in a normal state, they dynamically regulates various processes in cell and tissue homeostasis, and are altered in response to genomic hits (whatever their nature is).

- Lines 128-131: The authors should put this "high-dimensional and complex problem" into context. Perhaps they should mention that a tumor harbors hundreds/thousands of mutations, and that they are scattered over many genes/pathways/processes, etc... Perhaps they should also define the "problem". Is the the problem making sense of all the possible mutation combinations, finding the one or few combinations of drivers? If yes, for a given cancer? For all cancers in a population? Since it is also mentioned "medical interpretability", are the authors referring to individual patients? If so, it should be explicitly stated and explained.

- Lines 134-136: "Mutations are errors which occur during DNA replication within cell division and which are not repaired by one of the error detection, repair and control systems present in all organisms." It is odd to explain the biochemistry of mutations in details here, after having discussed them before in the manuscript, giving for granted the reader was already well acquainted with the concept.

- Lines 147-152: perhaps when distinguishing hereditary from sporadic cancers the authors could introduce the concept of somatic versus germ-line mutation. The explanation provided is confusing, for instance using the word "genetic" ("The former is due to a combination of genetic and environmental factors...") points to the genetic material, the DNA, but it might also suggest heritability, etc... And so on... Making the distinction between "somatic" and "germ-line" mutation is key when discussing sporadic and hereditary cancers... Especially since somatic mutations are mentioned later on in the paragraph (line 158).

- Lines 168-173: Here it should perhaps be discussed the existence, in the crypts, of stem cells, and then mention tissue renewal, and why it is important to look at stem cells and hence crypts...

- Lines 180-185: here the authors seem to imply that MMR genes are not to be considered tumor suppressors... If they are not tumor suppressors, what are they? Aren't tumor suppressor genes involved, broadly, in these following processes?

1) Stop cells from dividing; 2) Repair DNA damage; and 3) Start programmed cell death.

So, it appears that MMR are bona-fide tumor suppressors...

B) Introduction – Section 3.

- Lines 215-217: "... the accumulation of mutations in two different genes of the same cell, ...", perhaps the authors want to say the "OCCURRENCE of mutations", since rarely in the same cell mutations accumulate in the same gene...

- Lines 230-237: about the "Independence of the Processes" it is stated that it "... entails that a mutation in one gene does not change the mutation probability in another gene." Is this assumption reasonable? MMR mutations are exactly the mutations that makes other mutations more likely. Similarly, mutations in other tumor suppressor genes are linked to subsequent genomic alterations, therefore this assumption is not correct. The possibility that this could be useful to create a "baseline" to measure how much this assumption is violated is besides the point... What value there is in measuring the deviation from a "wrong" baseline? The model derived can still be god at describing tumorigenesis. At least the authors could make the argument that this wrong assumption is necessary to start working, then... maybe down the road they will be able to revisit the whole thing and make better assumptions...

C) Section 4.1

- Lines 313-316: When the authors discuss the "ll" state, they mention that "... in CTNNB1, APC and TP53 damage a cell in such a way that it directly leads to cell death. Thus, there will be no crypt with all cells being in that state. As we model the evolution of genotypic states of crypts, we do not consider the ll states for CTNNB1, APC and TP53." It is not clear whether this assumption is reasonable. Let's take into consideration TP53. Usually mutations in this genes are dominant negative, so one hit is enough to have a phenotype, so to speak (despite TP53 being considered a tumor suppressor). So, parsimony in tumor evolution might explain better the absence of the "ll" state for this gene, rather then lethality... Furthermore, it is not clear why to distinguish "ll" from "mm"... Also two hits of "m" types can potentially be lethal, so it is not clear why to make a special case of "ll" and not "mm", or "ml" for these genes... This assumption might be justifiable, in some ways, even it could be grounded in tumor biology, but the authors should make a better case for it than they do. They could look into large genomic studies, and for instance have a look at the frequencies of double hits in their favourite genes, and perhaps empirically justify their assumption... It is true that TP53 is very rarely mutated twice, etc...

- Lines 319-322: these lines makes the assumption above even more confusing: the authors knows that TP53 is consider a tumor suppressor, they also know about dominant negative effect, yet... While they ignore this effect for APC, they make no comment about TP53... In other words, without some clear consequential explanations, it might look like the authors "bend" cancer biology as it is necessary.

Additional minor comments:

1) Abstract: "Like many other tumors, colorectal cancers develop through multiple pathways and different driver mutations." -> I would rephrased this stating that "... colorectal cancers through ALTERATIONS AFFECTING multiple pathways RESULTING FROM different driver mutations".

2) Abstract: "e suggest a linear autonomous dynamical system modeling the evolution of the different pathways of colorectal carcinogenesis." -> Unclear, please rephrase.

3) Figure one would be better explained by labeling the panels and then explaining them accordingly in the corresponding legend.

4) Introduction: references for the Lynch syndrome, MSI, etc should be added.

5) Please check format of references 9 and 10.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Pål Møller

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008970.r003

Decision Letter 1

Natalia L Komarova, Jing Chen

21 Feb 2021

Dear Mrs Haupt,

Thank you very much for submitting your manuscript "Mathematical Modeling of Multiple Pathways in Colorectal Carcinogenesis using Dynamical Systems with Kronecker Structure" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations, especially those about more accurate description of the background and assumptions, and discussions about the limitations of the work.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Jing Chen

Guest Editor

PLOS Computational Biology

Natalia Komarova

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Firstly, I am grateful for the opportunity to discuss this paper with the authors, and trust them to understand that my expected contribution is to discuss problems and neither to give global statements nor promote parts to which I may agree. I also take the opportunity to add that the problems the authors address are well known to me, as are the difficulties to describe them in mathematical models which are as simple as possible (Descartes) but not too simple (Einstein). When IT systems now have the ability to undertake the numeric complexity of the analyses described, time is due to do so. When no one has all the knowledge needed to do so alone, each of us will have different focus. Mine are the assumptions used in the mathematical algorithms and their consequences.

In general, the authors have responded to my initial comments, but the mns is still very complex with many words and I still find problems needed to be clarified. I discuss these below partly in the order spotted when reading the revised mns and partly trying to make a readable narrative, not in the order of importance as is usual – because it is unclear to me to which degree they may modify the results, and the identification of the problems may be the only step to currently agree upon, and which in itself is the requirement to at some time to arrive at a conclusion.

If I had a solution to the problems I describe below, I would have submitted a paper with these myself, but I have not. In part the solution may be there is no solution, to which I may refer interested to the conflict between the works/conceptions by Newton and Hawkins on causation and probabilities (to put it simply: a rare stochastic event does not have a cause). My intention with describing the problems below, is to ask for a more clear definition of the terms used, the assumptions used, and a more distinct discussion of the limitations of the conclusions arrived at.

The definition of ‘cancer’ is still lacking, and combined with the terminology ‘a linear model’ and ‘cause of death’ one get the understanding that an untreated cancer will cause death. In contrast, the many crypts in the gut are constantly proliferating and the cells are expelled into the lumen and die. This may be the fate of MMR deficient crypt cells – growing precancers but not visible tumours because they are – in contrast to adenomas - constantly loosing their mass. Also, the immune system is known to recognize MMR deficient cells as ‘foreign’ and trying to kill them (cfr immunocompetent cell infiltrations, HLA associations, and vaccination attempts). The assumption (Vogelstein) in which we all believed some decades ago, was that 1) there is a limited number of precancerous tumour stage and we may prevent cancer if these are removed, 2) the initial cancer is local without spread and 3) if a manifest cancer is removed before spread the patient is cured/will not die. The initial observed effect of intervention (colonoscopy) did not meet the first assumption, and which was commonly agreed as the basis for shortening the intervals between colonoscopies. To our surprise, shorter intervals between colonoscopies had no effect on – or possibly increased – colon cancer incidence in LS which is incompatible with the first assumption. And – even more surprisingly – shorter intervals between colonoscopy has not been demonstrated to reduce mortality from colon cancer, which is incompatible with the second assumption. While I do agree that there is no knowledge indicating exactly how to quantify these problems (in sum referred to as ‘overdiagnosis’), it to me should be possible to estimate some parameters to make some examples (often referred to as iteration using statistical models to arrive at best fit with observations made). I do acknowledge that there are too many parameters and too heavy computational procedures to undertake Monte-Carlo-like simulations to search for strata testing all possible combinations and this is basically why the statistical models used have been developed to arrive at most probable overall results, but assuming some (extreme) parameters as a sort of sensitivity analysis to assess robustness of the results arrived at should be doable. When not done, to me, the results are ‘assuming this the results are that, but we don’t know to which degree the assumptions may be true and to which degree they may influence the results’.

As an illustration of the above, the leftmost diagram in Fig 1 is not what Ahadova et al published, but in contrast to their attempt to quantify the results, the figure is indicating that the bottom pathway starting with a MMR deficient crypt is dominating. Which is compliant with results in this paper, and should lead to a re-consideration of the assumptions made (iterating). The inherent problem of probabilistic modelling, is that each argument initially (a-priori) are to be given a value (NULL is not tolerated), and the values are not to be too low (zero a-priori value will give zero as results whatever the other arguments are when calculated as a product), and a very low a-priori value will be a probability ‘dark well’ from which one never may escape. To escape from this, a-priori values are often arbitrarily given as 0.5 and no arguments set to lower than 0.1 (cfr InSiGHT classification system for determining pathogenicity of the variants discussed in this paper). The resulting probabilities are in consequence by definition ‘false’ – they are not ‘real’ probabilities but the consequences of the assumptions given as a requirement to use the algorithms. While the independent mutations assumed in this mns have estimatable prevalences, the complex mutations I discuss have so low a priori probabilities that they may not easily be included in probability algorithms. If these actually are the causes of cancers killing the patients, however, the main results of this paper will not be valid to explain why some patients die and others do not. Again, this is about what is meant by ‘cancer’ and to which degree ‘cancer’ may be a surrogate for ‘death’. What I ask for is not an expansion in word to describe the principles in more details, but short statement(s) of results when including different a-priori assumptions in the calculations.

The statement “Mutations can either activate oncogenes (called gain-of-function mutations), which normally promote appropriate cell growth and proliferation, or mutations can damage or destroy tumor suppressor genes (called loss-of-function mutations), which normally limit cell growth and proliferation.” is misleading/not true: The consented view is that activation of oncogene transcription is through deactivating the normal homeostatic way these are controlled by suppressor genes through deranging the suppressor genes to loose their ability to suppress oncogene transcription. The oncogenes were denoted oncogenes by the misconception that mutations in these could increase their transcription, while the opposite is true: oncogenes loose their ability to increase cell division if mutated. To the end that mutations in suppressor genes are the main cause of gain of function of oncogenes. The normal homeostatic mechanisms activating oncogenes are intricate, and to the degree these are implicated in carcinogenesis it may be related to the inflammatory mechanisms associated to increased BMI now increasing the incidence of colorectal and endometrial cancer and which may be partly counteracted by COX inhibitors (aspirin and naproxene). Chronic myeloid leukemia, which is not inherited, is caused by a chimeric de-novo gene caused by a somatic mutation upstream to the functional coding part deranging the genes ability to be switched off by the normally controlling suppressor-genes. Such mutations increasing the transcription of an oncogene are by and large not compatible with life and not seen in newborns. They should not be mentioned in the current context. Whatever references one may have to the opposite, the old concepts I was told in medical school were misconceptions blocking understanding both normal homeostasis and carcinogenesis. The take-home message for true gain-of-function mutations should be that they are incompatible with life and therefore not inherited. (A more detailed discussion including inherited EPCAM tail deletion causing methylation/epigenetic inactivation of MSH2 having the same effect as a deranging variant in the MSH2 coding structure is outside the level of this comment). Inherited cancer is an observed (slightly) reduced capability of de-regulating oncogenes, leading to a slightly increased probability for carcinogenesis by age. In particular, dominantly inherited cancer is incompatible with substantially increased cancer incidence in fertile ages. Which is exemplified, but not discussed, when FAP is used as an example: the probability for a single adenoma caused by the CIN pathway to become a cancer is very low, and the disease is recessively inherited because heterozygotes have very low probability to produce null-allelic cells to become adenomas which in turn have a very low probability each to become cancers. The mixture of discussing FAP and LS together, the former a recessively inherited disorder where the adenoma rarely progress to cancer and where cancer by and large emerges in the adenomas, and LS where adenomas were assumed to be rare but obligate precursers with very high risk to become cancers, has been a major confounder prohibiting the understanding of LS carcinogenesis. The current mns to me repeat this confusion, while on the contrary it should distinctly be described as a misunderstanding.

The statements :” Lynch syndrome is caused by an inherited mismatch repair (MMR) gene mutation [4]. Colorectal cancers which develop due to Lynch syndrome therefore are MMR-deficient and show microsatellite instability (MSI) [5].” are not correct and misleading: As mentioned earlier, an inherited variant should not be denoted ‘mutation’, and the logical connection between the inherited variant causing cancer and the MSI phenotype is an observed association, not necessarily a logical consequence. The word ‘caused’ is in the present context of probability calculations misused: The inherited variant is associated with an increased probability of developing a cancer, and – actually - acquired mutations in the MLH1 genes may more frequently than inherited pathogenic variants cause MSI cancers. The major cause of an MSI cancer is not an ‘inherited’ variant. The concept of causation is theoretically very difficult and should be avoided in the current context: no single factor is ‘the cause’ of cancer – it is a coincidence of many factors.

This is again implicated in the following statement: “ The step-wise hypothesis has been validated subsequently in many independent studies for many different cancer types. Currently, it is expected that a minimum number of three mutation events is required to transform a normal cell into a cancer cell. This hypothesis is called the three strikes hypothesis [13].

Individuals with Lynch syndrome are predisposed to developing certain malignancies with a substantially higher lifetime risk compared to the general population. The most common Lynch syndrome manifestations are colorectal cancer (CRC, 50 % [19] compared to 6 % in the normal population) and endometrial cancer (40–60 % compared 128 to 2.6 % in women without Lynch syndrome) [4, 20]. Further, individuals have an increased lifetime risk for many other types of cancer such as in the stomach, small bowel, brain, skin, pancreas, biliary tract, ovary (only for women) and upper urinary tract [21]. Lynch syndrome carriers are predisposed to develop MSI cancers due to have an inherited pathogenic variant in one allele of the affected MMR genes MLH1 , MSH2 , MSH6 or PMS2 [18] passed down in the family from parent to child. Upon the second somatic hit inactivating the remaining allele, MMR deficiency manifests in the 136 affected cell [17]. DNA replication errors, especially those which occur at repetitive sequences (microsatellites consisting of a consecutive series of identical basepairs) cannot be corrected by the mismatch repair system. MMR deficiency leads to microsatellite instability. MMR deficiency can be an initiating or a secondary event in Lynch syndrome carcinogenesis. This is reflected by the hypothesis of three pathways responsible for colorectal carcinogenesis in Lynch syndrome [22] (see Fig 1): One pathway of carcinogenesis starts with adenoma formation, then MMR deficiency and cancer outgrowth; the second is initiated by MMR deficiency, then adenoma formation and cancer outgrowth; and the third shows MMR deficiency as initiating event and invasive cancer growth.” In this statement initially LS is considered one delineated entity, while thereafter it is declared to be four distinct syndromes caused by four different genes when deranged, and EPCAM is not mentioned as it is another gene causing epigenetic inactivation of MSH2 – the MSH2 gene is not ‘mutated’ when the pathogenic inherited variant is EPCAM. The four different genes have different capacity to correct replication errors in different structures in the genome. They behave differently in every demonstrated way to measure their activities and corresponding consequences when not functioning properly. It is not proper to lump them all together in one group in the current context.

In particular, it is to me unlikely that the triggering event in colon cancer associated with PMS2 is the PMS2 variant – in contrast to MLH1 and PMS2 carriers, PMS2 colon cancer may follow the CIN/APC pathway and effects of intervention indicate that these actually may be prevented by adenomectomy: in the upper leftmost part of Fig 1 PMS2-associated colon cancer may by and large follow the upper pathway. To which degree the same is true for MSH6 carriers are to me unclear. The paper is to me based on observations in carriers of pathogenic MLH1/MSH2 variants, and should indicate so both in title and text.

Reading the paper, I get the impression that a ‘linear’ tree-step model is commonly accepted as the ordinary carcinogenetic pathway. To me, this is not true. The most frequent cancer in women – breast cancer – and the most frequent inherited cancer in women inherited breast/ovarian cancer – is not: In contrast to colon cancer where the patient is considered cured if no spread at diagnosis and no recurrence after some years, recurrence rate for breast cancer never flattens out. In particular, BRCA1-associated breast/ovarian cancer is demonstrated to spread before a tumour is detectable – to the effect of clinical advice on prophylactic oophorectomy and mastectomy in healthy carriers. In the three-step model assumed by the authors, in these as frequent both sporadic and inherited cancers, the three step model is now recognized as false: Often, and especially so in BRCA1 carriers, the step causing spread (which should be the biological precursor of death) is very early. The breast cancer oncologists traditionally denote it ‘dormant breast cancer’. The discussion of over-diagnosis in breast cancer screening is commonly acknowledged, the corresponding in LS should be no surprise and a model to explain inherited cancer in path_MLH1/MSH2 carriers cannot ignore this. The problem with the LS discussion is that – in contrast to breast cancer screening – one assume cancer may be prevented by early diagnosis. This never was the goal of breast cancer screening, and has now been proven not to be true for LS as well. And, it is neither uniformly accepted to be true for sporadic colon cancer.

In short: The graphical image of path_MLH1/MSH2 carcinogenesis may be considered as opposing vectors in a multi-axis space, where the observed cancer incidence is the summary projection to one axis, and which is demonstrated to be a function of time between examinations to demonstrate cancer: over-diagnosis is a time-axis in the space. Without discussing this, the results presented give limited information.

The above argument are to indicate why ‘a linear model’ is not an obvious assumption to be true, over-diagnosis when colonoscopy is a demonstrated problem, the assumption of LS colorectal cancer to be preventable by frequent enough colonoscopy is not demonstrated to be true, while – in contrast to BRCA1 associated breast/ovarian cancer – early diagnosis has been proven to dramatically reduce mortality.

The arguments above are to illustrate that while I get the impression of the paper that cancer = death validating ‘cancer’ as a surrogate endpoint for death, this is to me not so. And, back to the start, the paper still has no definition of ‘cancer’: An adenoma with micro-invasive cell growth may be biologically very different from a metastatic tumour.

In short – the generalization of the concepts discussed above are to me not true, neither is the overall terminology LS when the paper describes some biological findings in in MLH1/MSH2 carriers.

Independently of the above, the assumptions: “we assume that all genes implements all mutational processes that are independent of each other, which is either due to a independence indicated by data or due to missing medical insight suggesting otherwise

Independence of the Processes We require that the processes are independent of each other. In the context of the example above, we interpret from the first assumption that all combinations of mutations in the different genes are possible (i.e., there are no mutations that prevent other mutations) and that there are no additional states. This also implies that the order in which mutations are accumulated is ignored.” are wrong. Mutations as events may be interrelated – in linkage terminology when estimating genetic distance between loci denoted ‘inference’. The example of today is a SarsCov2 virus variant having increased infection rate due to multiple ‘mutations’ and the ‘missing link’ as an evolutionary process is not demonstrated. The evolutionary demonstration is copies of large gene sequences (transposable elements), in the current context an obvious example is the copying of the basic immune system structure to multiple close-to identical genetic parts dispersed and specialized to have different functions as afferent and efferent pieces in the immune response, including LS cancers. These multi-topic multi-mutations events are well known in cancer, may in most cases cause cell death, but may extremely rarely cause viable cells with complex alterations in their genomes. In the very many cell divisions in MSI crypts in MLH1/MSH2 carriers, cancer may extremely rarely be caused by such events causing a viable cancer cell not to be identified and destroyed by the host immune system. Such a hypothesis is conform with current understanding of cancer and evolutionary genetics and in accordance with empirical observations in LS.

In sum, there are arguments to be discussed to interpret the conclusions. While the results may be correct, to me the mns is still not good enough discussing to which degrees the mechanisms described are the ‘causes’ of the cancers observed. The demonstration of the concepts in the paper to be true, is not validating that these are the major and most frequent causes neither of cancer nor of death. This is not about the algorithms and results per se, but how to discuss the assumptions and conclusions in a broader perspective. The mns would to me be more valuable and not invite the counter-arguments above if proactively discuss these in more details.

Reviewer #3: The manuscript entitled "Mathematical Modeling of Multiple Pathways in Colorectal Carcinogenesis using Dynamical Systems with Kronecker Structure" by Haupt and colleagues has been extensively revamped and reorganized, receiving most if not all suggestion this and the other reviewers have made. The revised manuscript is definitely clearer and easier to read (for instance, there is now an explicit distinction between assumptions made for the Lynch syndrome case, and those that are general, and so on).

That said, there is still the need to double check and proof read the text, since some sentences are still a little unclear, or require revision of the punctuation. An example for instance is at page 15/55 (in the manuscript version with highlighed changes), lines 341-342: "For other types of cancer [COMMA HERE] or once new medical insights are gathered, they can and should be adapted.

Finally, I think a main point still remains to be addressed (or at least explicitly discussed): mutations and genetic alterations in cancer occur "clustered" in pathways –– NB: here not intended as sequences of events leading to carcinogenesis, but as signaling and biological pathways regulating cell processes –– so it would be interesting to understand and discuss how the model can accommodate this concept. The Wnt pathway, for instance, can be affected via alternative hits (in APC, Axin, β-catenin, etc, ...), so one is left to wonder how this is (or would be) handled in the model (if handled at all).

In other words, is the assumption that the Wnt pathway (or the MMR-deficient process, or the RAS pathway) is altered as a whole –– hence the genes in said pathways are interchangeable –– or the genes in the model are individually considered?

For MMR-deficiency it is acknowledged that the mutations could be in different genes, what about the other pathways?

There are implications here, since assuming and parametrizing only using the APC gene for the whole Wnt pathway, for instance, does not exclude that when this gene is not mutated, other genes with the same effect (i.e., altering Wnt signaling) are indeed mutated. Hence the whole model might be somewhat incorrect. Given the methodological nature of the paper, this aspect could be simply discussed, not overselling the results obtained on the medical and biological side (since they could not be exactly correct...)

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Pål Møller

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008970.r005

Decision Letter 2

Natalia L Komarova, Jing Chen

16 Apr 2021

Dear Mrs Haupt,

We are pleased to inform you that your manuscript 'Mathematical Modeling of Multiple Pathways in Colorectal Carcinogenesis using Dynamical Systems with Kronecker Structure' has been provisionally accepted for publication in PLOS Computational Biology.

As you can see in the comments attached, Reviewer 1 has suggested minor editing to place your work in a future perspective, which you could consider. 

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Jing Chen

Guest Editor

PLOS Computational Biology

Natalia Komarova

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I have carefully read the authors responses to my last comments. I think I rest my case: this is their way of seeing the topic addressed, which is of interest, and which I should not try to distort besides clearing up concepts and nomenclature the way I have tried to make their position not to be misunderstood.

I trust the authors to have mended the mns as they indicate in the response, and feel no need to use my time to re-read the mns in details once more and which would have delayed this response.

I may still have one last wish, which would be that the mns declares that they present one way of seeing while there may be others valid positions as well and which may not mutually dis-validate each others: Proving one theory to be right may not necessarily imply another theory to be wrong. I would be happy if the mns clearly state that, especially when dealing with LS which has shown us that most we short time ago thought was 'true', still is 'true' but there is another reality as well which is 'true' and possibly as important when considering LS. This would also serve proactively against critics now and in the future - one may assume more facts to modify the paradigms we believe in, there is no reason to assume that we today have reached the final step of knowledge. In short, to me the paper would be even more valuable and probably more robust if proactively mention that it presents but one way of putting in context what we currently know.

Again, thx for the opportunity to referee this mns - it has been an interesting ride.

Reviewer #3: This manuscript is covering a very specific problem in a very specific field and it would not be of interest to the general public.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Pål Møller

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008970.r006

Acceptance letter

Natalia L Komarova, Jing Chen

11 May 2021

PCOMPBIOL-D-20-01806R2

Mathematical Modeling of Multiple Pathways in Colorectal Carcinogenesis using Dynamical Systems with Kronecker Structure

Dear Dr Haupt,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Katalin Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Mathematical background.

    This includes basic notions from graph theory, graph products, the Kronecker sum of matrices, linear dynamical systems and their solution.

    (PDF)

    Attachment

    Submitted filename: Point-by-point_response.pdf

    Attachment

    Submitted filename: Point-by-point_response.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES