Abstract
Characterizing the mode – the way, manner, or pattern – of evolution in tumours is important for clinical forecasting and optimizing cancer treatment. Sequencing studies have inferred various modes, including branching, punctuated, and neutral evolution, but it is unclear why a particular pattern predominates in any given tumour. Here we propose that tumour architecture is key to explaining the variety of observed genetic patterns. We examine this hypothesis using spatially explicit population genetic models and demonstrate that, within biologically relevant parameter ranges, different spatial structures can generate four tumour evolutionary modes: rapid clonal expansion; progressive diversification; branching evolution; and effectively almost neutral evolution. Quantitative indices are presented for describing and classifying these evolutionary modes. Using these indices, we show that our model predictions are consistent with empirical observations for cancer types with corresponding spatial structures. The manner of cell dispersal and the range of cell-cell interaction are found to be essential factors in accurately characterizing, forecasting and controlling tumour evolution.
A tumour is a product of somatic evolution in which mutation, selection, genetic drift, and cell dispersal generate a patchwork of cell subpopulations (clones) with varying degrees of aggressiveness and treatment sensitivity. 1 A primary goal of modern cancer research is to characterize this evolutionary process, to enable precise, patient-specific prognostic forecasts and to optimize targeted therapy regimens. However, studies revealing the evolutionary features of particular cancers raise as many questions as they answer. Why do different tumour types exhibit different modes of evolution 2–8 ? What conditions sustain the frequently observed pattern of branching evolution, in which clones diverge and evolve in parallel 2,9–11 ? And why do some pan-cancer analyses indicate that many tumours evolve neutrally, 12 whereas others support extensive selection 13 ?
Factors proposed as contributing to tumour evolution include microenvironmental heterogeneity, niche construction, and positive ecological interactions between clones. 1,14–17 But because such factors have not been well characterized across human cancer types, it remains unclear how they might relate to evolutionary modes. On the other hand, it is well established that tumours exhibit a wide range of architectures and types of cell dispersal 18,19 (Figure 1), the evolutionary effects of which have not been systematically examined. Because gene flow (the transfer of genetic information between localized populations 20 ) is a principle force in evolutionary dynamics, we hypothesized that different tumour structures might result in different evolutionary modes. To test this hypothesis, we developed a way to formulate multiple classes of mathematical models, each tailored to a different class of tumour, within a single general framework, and we implemented this framework as a stochastic computer program.
Our modelling approach is built on basic tenets of cancer evolutionary theory. 1 Simulated tumours arise from a single cell that has acquired a fitness-enhancing mutation. Each time a tumour cell divides, its daughter cells can acquire passenger mutations, which have no fitness effect, and more rarely driver mutations, which confer a fitness advantage. In solid tumours, we assume that cells compete with one another for space and other resources. Whereas previous studies have assumed that tumours grow into empty space, our model also allows us to simulate the invasion of normal tissue − a defining feature of malignancy.
Results
Tumour architecture can determine the mode of evolution
To test whether varying tumour architecture suffices to alter the tumour evolutionary mode, we consider four particular models with different spatial structures and manners of cell dispersal but identical evolutionary parameters (driver mutation rate and distribution of driver fitness effects). We set the dispersal probability per cell division such that all tumours take a similar amount of time to grow from one cell to one million cells, corresponding to several years in real time.
Our first case is a non-spatial model that has been proposed as appropriate to leukaemia, 21,22 a tumour type in which mutated stem cells in semi-solid bone marrow produce cancer cells that mix and proliferate in the bloodstream (Figure 1a). When simulating tumour growth in the absence of spatial constraints, rapid clonal expansions can result from driver mutations that increase the cell division rate by as little as a few percent, and eventually the vast majority of cells share the same set of driver mutations (Figure 2a-d). These characteristics are reminiscent of chronic myeloid leukaemia, in which cell proliferation is driven by a single change to the genome, 23 and acute myeloid leukaemia, which has relatively few drivers. 24
In our second model, consistent with the biology of colorectal adenoma, 25 and in common with previous computational models of colorectal carcinoma, 5, 26, 27 we simulate a tumour that consists of large glands (Figure 1b) and that grows via gland fission (bifurcation). Although the driver mutation rate and the fitness effect are exactly the same as in the previous case, the addition of spatial structure dramatically alters the mode of tumour evolution. The organization of cells into glands limits the extent to which driver mutations can spread through the population, so that selective sweeps become progressively localized as the tumour expands. For our parameter values this process leads to a highly branched, fan-like driver phylogenetic tree and ever greater spatial diversity, with different combinations of driver mutations predominating even in neighbouring glands (Figure 2e-h). The mean tumour cell fitness increases substantially but there is also extensive, positively correlated intra-tumour variation in cell fitness values and passenger mutation counts (Extended Data Figure 1a-b). Model outcomes are similar even if cells are able to acquire drivers that directly increase the gland fission rate, because such mutations rarely spread within glands (Extended Data Figure 2a).
The third case corresponds to a glandular tumour that grows by invading adjacent normal tissue, as documented in various types of solid tumour, including many colorectal, breast, and lung cancers. 19, 28 Glandular tumours are subdivided into localized cell communities (Figure 1c), whose small size has previously been inferred by community detection methods 29 and mathematical modelling. 30 To obtain additional estimates of gland size in four cancer types, we used semi-automated analysis of histology slides (Extended Data Figure 3) and found that each gland contains between a few hundred and a few thousand cells (Extended Data Figure 4a). In simulations with gland sizes within this range, we find that even small increases in cell fitness can spark rapid clonal expansions. Clonal interference nevertheless inhibits selective sweeps, resulting in a zonal tumour in which large regions share the same combination of driver mutations (Figure 2i-l; Extended Data Figure 1c-d). Simulated invasive glandular tumours typically exhibit stepwise increases in driver diversity and a phylogeny with several long branches, qualitatively consistent with observations in numerous cancer types. 2, 3, 11 Restricting cell dispersal to the tumour boundary, with no dispersal within the tumour bulk (to simulate tumours that lack intratumoural budding 28 or in which proliferation is confined to the boundary 31 ), results in somewhat shorter branches (Extended Data Figure 2b).
Our fourth and final model represents a tumour with no glandular structure and with growth confined to its boundary (Figure 1d). Expansive tumour growth associated with a clearly defined boundary and no sign of active migration occurs in tissues that impose relatively weak physical resistance. 18 Boundary-growth models have in particular been proposed as appropriate for simulating the evolution of certain kinds of hepatocellular carcinoma (HCC), 7,32 although it should be noted that HCC in general exhibits a wide range of growth patterns. 33 The spatial structure of the boundary-growth model favours genetic drift, rather than selection. For our fixed parameter values, tumour evolution in this case is effectively almost neutral (Figure 2m-p; Extended Data Figure 1e), and mutations can spread only by surfing on a wave of population expansion. 34−36 Consequently the mutation burden generally increases from the tumour core to its boundary (Extended Data Figure 1f). Selection is only slightly more prominent when cells can compete with their nearest neighbours within the tumour mass (Extended Data Figure 2c). Suppression of selection in the boundary-growth model is consistent with evidence of effectively neutral evolution in HCC, 7 as well as the existence of large, well-differentiated benign tumours such as leiomyomas 37 and fibroadenomas 38 that only rarely progress to malignancy.
Characterization of evolutionary modes and comparison with data
Together, our models demonstrate that variation in the range of cell-cell interaction and the manner of cell dispersal alone can generate distinct modes of tumour evolution. We next seek to describe these modes more precisely in terms of summary evolutionary indices that can be computed from both our simulations and real cancer genomic data (Figure 3a). The first index we consider is clonal diversity (denoted D), which grows with the number of large nodes in the driver phylogenetic tree (as in the final column of Figure 2). The second index n is the mean number of driver mutations per cell, which represents the average depth of the driver phylogenetic tree. Any pair of values of these two indices corresponds to a distinct set of phylogenetic trees. The nodes of these trees represent clones, and their size is proportional to clone population size. The space of attainable n and D values (Figure 3b) is bounded below by the line D = 1 and above by the curve D = 1/(2 − n)2 (see Methods). Locations close to the upper boundary correspond to more highly branched trees than locations close to the lower boundary, and locations on the left correspond to trees with shorter branches than locations on the right.
To compare our model outcomes to data, we determined the evolutionary indices of phylogenetic trees previously inferred from multi-region sequencing of four solid cancer types: clear cell renal cell carcinoma (ccRCC), 9 non-small-cell lung cancer (NSCLC), 10 breast cancer 39 and mesothelioma. 40 We also calculated indices from single-cell sequencing data for breast cancer 41 and uveal melanoma. 42 Despite their methodological diversity, these six studies yielded remarkably similar evolutionary indices. The majority of data points (28 of 35) lie above a trajectory corresponding to sequential selective sweeps (pink curve in Figure 3b) and below a reference curve that represents an intermediate degree of branching (pale blue curve in Figure 3b; Supplementary information). All the tumours have 1 < D < 12 and 3 ≤ n < 14. Notwithstanding limitations of sampling, sequencing, and phylogenetic inference methods, a useful computational model of invasive tumour evolution should generate summary indices that are consistent with these data points, corresponding to branching evolution with a small number of main branches.
The simulation results of the four models discussed previously form four distinct clusters with respect to the summary indices n and D (Figure 3c; mean silhouette width 0.60). Neutral counterparts of these four models − which have the same parameter values, except that the driver fitness effect is reduced to zero − cluster together, near the boundary-growth model. As expected, we find that the evolutionary indices of the solid tumours are consistent with outcomes of our invasive glandular model, and this consistency is robust to varying gland size, driver mutation rate, and driver mutation effect within plausible ranges (Extended Data Figure 5). Particularly close agreement between unadjusted model output and data occurs when the average driver fitness effect is 0.2 (Figures 3d).
An important caveat in the above comparison is that the unadjusted model output includes all driver mutations down to a frequency of one in a million, whereas solid tumour sequencing protocols fail to detect most mutations at frequencies below 5%. 9 This difference in sensitivity means that D values calculated from data are expected to underestimate true tumour diversity. It follows that a fairer comparison can be made by removing rare mutations from the model output, to simulate imperfect sensitivity. Such adjustment strengthens the agreement between model and data (Figures 3e and Extended Data Figure 6).
Since the non-spatial model most plausibly represents liquid tumour evolution, we compared its predictions to additional data for acute myeloid leukaemia. 24 We found robust correspondence between the model and this data set (Figure 3b,c; Supplementary Figure 1). Within plausible parameter ranges, 83% of tumours simulated using a non-spatial model have (n, D) coordinates consistent with the selective-sweeps evolutionary mode.
Alternative models that have different spatial structures are less consistent with data for both solid and liquid tumours. For the gland fission model, 83% of simulated tumours have coordinates above the intermediate-branching curve, corresponding to high values of D relative to n (Supplementary Figure 2). For the boundary growth model, both n and D are typically close to 1 (Supplementary Figure 3). These outcomes are summarised in Table 1, which provides quantitative definitions of evolutionary modes in terms of evolutionary indices (see also Figure 3f and Supplementary Table 1).
Table 1.
Evolutionary mode | Role of selection | Definition in terms of summary indices | Tree shape | Associated tumour characteristics | Agreement |
---|---|---|---|---|---|
Selective sweeps | Strong | D < 10/3 and below I-B curve | Approx. linear | Non-spatial (or little spatial structure) | 99% (83%) |
Progressive diversification | Locally strong | n > 2; D > 20 | Highly branched | Gland fission | 98% (39%) |
Branching | Strong but constrained by clonal interference | n > 2; 10/3 < D < 20 | Branched | Invasive glandular (budding; infiltration) | 94% (62%) |
Effectively almost neutral | Weak | n < 2 and D above I-B curve | Approx. star-shaped | Boundary growth (or very rapid growth) | 99% (85%) |
Results for a variant of the invasive glandular model, in which normal cells are absent and the tumour grows into empty space, are also less consistent with data (Supplementary Figure 4). In this empty space model, the speed at which the tumour expands (via cell dispersal into empty space) typically exceeds the speed at which clones spread within the tumour (via cell dispersal into fully occupied glands), which leads to a more star-shaped or highly-branched phylogeny (high D relative to n). Conversely, when tumour cells must compete with normal cells at the tumour boundary (as in the third row of Figure 2), the speed at which driver mutations spread within the tumour is similar to the speed of tumour growth, which enables some driver mutations to reach high frequency and results in sparser branching (Extended Data Figure 5). Yet another alternative model, which includes normal cells but confines cell dispersal to the tumour boundary, thwarts the spread of driver mutations and generates similar D but smaller n values (Supplementary Figure 5).
Further analysis of tumour evolutionary modes
A complementary way to describe modes of tumour evolution is in terms of phylogenetic tree shape or balance. Because tree balance indices developed for characterizing organismal evolution are poorly suited to tumour data, we developed an index 43 that is robust to variation in sampling and sequencing protocols (Methods). This index J 1 takes a high value for trees in which branching events tend to split the tree into subtrees of similar size. Low values are assigned to trees that are approximately linear or are dominated by a single node.
Just as for indices n and D, the tree balance values predicted by our invasive glandular tumour model are consistent with the values obtained from sequencing data (Figures 4a). Typical J 1 values for both this model and the data are between 0 and 0.5 − substantially below the maximum value of 1 corresponding to perfectly balanced trees. The consistency remains when we adjust the model output by removing rare driver mutations (Extended Data Figure 6, which constitutes a fairer comparison), even though the associated trees appear very different (Extended Data Figure 7) and have dissimilar degree distributions (Extended Data Figure 8). Agreement between model and data is also observed for alternative balance indices after removing rare mutations (Supplementary Figures 6, 7 and 8). Conversely, neutral models and models that do not account for glandular structure predict smaller or more variable tree balance values than the data for solid tumours (Figure 4a). Tree balance values for the non-spatial model are consistent with data for acute myeloid leukaemia (Figure 4a).
Whereas n, D and J 1 are determined only by the final tumour state, other indices can be based on time series data. For example, the mean clonal turnover magnitude provides an alternative to n for measuring the extent of evolutionary change, and the mean clonal turnover time indicates whether evolutionary change occurs mostly early or late during tumour growth (Methods). As expected, after an appropriate axis transformation, the pattern of clusters of D versus (Figure 4b) resembles the pattern of clusters of D versus n (Figure 3c). Plotting D versus a transformed reveals a somewhat similar pattern, except that models with low exhibit high stochastic variation in (Figure 4c). Clonal turnover occurs relatively late in the non-spatial model but throughout tumour growth in the gland fission model. Given sufficient data, evolutionary modes can thus be described and classified in terms of various summary indices capturing distinct aspects of tumour evolution (for alternative diversity indices see Extended Data Figure 9).
Influence of tumour architecture on mutation frequency distributions
As researchers and clinicians seldom have access to multi-regional sequencing data, nor the longitudinal data needed to track how tumour clone sizes change over time, tumour phylogenies and evolutionary parameters are commonly inferred from mutation frequencies measured from a single biopsy sampled at a single time point. Moreover, current cancer sequencing technologies are neither sensitive enough to detect the majority of low-frequency mutations, nor precise enough to distinguish between high-frequency and clonal (100% frequency) mutations. Accordingly, the most relevant part of the mutation frequency distribution for practical purposes is in the intermediate frequency range. One way to examine differences between distributions within this intermediate range is to plot the cumulative mutation count (the number of mutations present at or above frequency f) versus the inverse mutation frequency (1/f). In a neutral non-spatial model, this graph is a straight line (Figure 5a, blue points). Because the transformed mutation frequency distributions of many human cancers are also approximately linear, it has been proposed that neutral tumour evolution is widespread. 12 Deviations from this theoretical straight line have been taken as evidence of selection. 27, 44
Our population genetic modelling illustrates how not only selection but also tumour architecture has important effects on tumour mutation frequency distributions (Figure 5; Extended Data Figure 10). In particular, when the cumulative mutation count is plotted against the inverse mutation frequency, the curve for the neutral model is no longer linear. Instead, for spatial models, the average non-neutral curve can be closer to a straight line than the average neutral model curve. These results confirm and extend previous findings 27, 35 indicating that methods using mutation frequencies to infer selection in solid tumours can yield incorrect conclusions if they fail to account for effects of population structure. Inappropriate choice of null model can therefore explain otherwise contradictory findings regarding the prevalence of neutral evolution in human cancers. 13, 45
Discussion
In summary, we have found that differences in the range of cell-cell interaction and the manner of cell dispersal are sufficient to generate a spectrum of tumour evolutionary modes. This finding has important implications both for understanding tumour genomic data and for interpreting the results of previous computational models. Whereas mathematical oncologists have focussed on mutation fitness effects 5, 12, 27, 44, 46, 47 or microenvironmental heterogeneity, 15 our perspective instead emphasizes the importance of population structure and gene flow in tumour evolution.
Prominent studies have variously used non-spatial models, 12, 21, 44, 46 gland fission models, 5, 27 or variants of the Eden growth model (in which cells compete with their nearest neighbours) 32, 47 to investigate aspects of tumour evolution (see Methods for further discussion of prior work). Our results imply that, at best, each of these model types is appropriate only in special cases. Accurate models of solid tumour evolution must faithfully recapitulate interactions within localized patches of cancer cells − so-called tumour communities 29 − and between cancer cells and normal cells.
Consistent with previous work, 48 our models predict substantial variability in tumour evolutionary modes due to stochasticity in the timing, location, and fitness effects of driver mutations. Our finding that this random variation approaches the variability observed within and between solid tumour types (Figure 3b, c) suggests it will be challenging to infer precise information about tumour structure and growth patterns from phylogenetic data, even given prior knowledge of mutation rates and fitness effects. Nevertheless, of the model types we have examined, we have shown that the most plausible for simulating evolution in the majority of malignant solid tumours, which exhibit branching evolution, 11 is the invasive glandular model introduced herein. A key feature of this model is that the speed at which a fitter clone spreads within its immediate ancestor is similar to the ancestor’s own expansion speed.
It follows from our findings that tumour architecture determines how well biopsy samples reflect intra-tumour heterogeneity. Oncologists typically base treatment decisions on the presence or absence of particular mutations in cells taken from only a small region of a solid tumour. Tumour types with structures that promote diversification are predicted to be the least responsive to targeted therapies unless truncal mutations can be reliably identified and targeted.
Our framework also implies that a change in tumour architecture during cancer progression can lead to a change in the mode of tumour evolution. For example, the “big bang” model of colorectal cancer 4, 5 posits that early selective sweeps are followed by effectively neutral evolution, such that mutation frequency is determined by the time of mutation occurrence. This idea was previously examined using a computational model of tumour growth via gland fission, with a maximum of one driver mutation per cell. 5 Based on more sophisticated population genetic modelling, we find reason to expect ongoing selection throughout the very early stages of colorectal tumour progression (when growth is driven by gland fission), enabling multiple driver mutations to reach high frequencies. In later stages, after cells from the adenoma invade neighbouring tissue and give rise to an adenocarcinoma, we predict a transition to either branching evolution or − because the invasion begins with numerous and/or highly transformed, rapidly expanding subclones 49 − effectively neutral evolution. Punctuated evolution in colorectal tumours can thus be explained by the transition from gland fission to invasive growth. This explanation is broadly consistent with the big bang model and more recent multi-region sequencing studies, 6, 49, 50 while also agreeing with results of comparative genomic analysis, which indicate that colorectal cancers evolve subject to strong positive selection and have more driver mutations per cell than most other cancer types. 13, 51 Transitions between evolutionary modes were recently investigated in a mathematical modelling study of ductal carcinoma, 52 which complements the current work by likewise highlighting the importance of spatial competition.
In clear cell renal cell carcinoma, separate studies have found that tumour architecture 53, 54 and evolutionary trajectory 9 are predictors of cancer progression and survival. Evolutionary mode correlates with both tumour architecture and clinical outcome in childhood cancers. 8, 55 The manner of cell dispersal has prognostic value in colorectal and other solid cancers. 28 By mechanistically connecting tumour architecture to the mode of tumour evolution, our work provides the blueprint for a new generation of patient-specific models for forecasting tumour progression 16, 48 and for optimizing evolutionarily-informed treatment regimens. 56–58 This drive towards personalized models motivates further efforts to characterize how spatial structure interacts with other biological factors, such as spatially-varying carrying capacity, 29 alternative manners of cell dispersal, 19 immune interactions, 59 cancer stem cell hierarchies, 60 and frequency- or density-dependent cell fitness. 17
Methods
Previous mathematical models of tumour population genetics
Many previous studies of tumour population genetics have used non-spatial branching processes, 21 in which cancer clones grow exponentially without interacting. Unless driver mutations increase cell fitness by less than 1%, these models predict lower clonal diversity and lower numbers of driver mutations than typically observed in solid tumours. 46 Among spatial models, a popular option is the Eden growth model (or boundary-growth model), in which cells are located on a regular grid with a maximum of one cell per site, and a cell can divide only if an unoccupied neighbouring site is available to receive the new daughter cell. 32, 47, 61 Other methods with one cell per site include the voter model 32, 62, 63 (in which cells can invade neighbouring occupied sites) and the spatial branching process 47 (in which cells budge each other to make space to divide). Further mathematical models have been designed to recapitulate glandular tumour structure by allowing each grid site or “deme” to contain multiple cells and by simulating tumour growth via deme fission throughout the tumour 5, 26 or only at the tumour boundary. 27 A class of model in which cancer cells are organized into demes and disperse into empty space has also been proposed. 36, 52, 64 Supplementary Table 2 summarizes selected studies representing the state of the art of stochastic modelling of tumour population genetics.
Our main methodological innovations are to implement all these distinct model structures, and additional models of invasive tumours, within a common framework, and to combine them with methods for tracking driver and passenger mutations at single-cell resolution. The result is a highly flexible framework for modelling tumour population genetics that can be used to examine consequences of variation not only in mutation rates and selection coefficients, but also in spatial structure and manner of cell dispersal. 65
Computational model structure
Simulated tumours in our models are made up of patches of interacting cells located on a regular grid of sites. In keeping with the population genetics literature, we refer to these patches as demes. All demes within a model have the same carrying capacity, which can be set to any positive integer. Each cell belongs to both a deme and a genotype. If two cells belong to the same deme and the same genotype then they are identical in every respect, and hence the model state is recorded in terms of such subpopulations rather than in terms of individual cells. For the sake of simplicity, computational efficiency, and mathematical tractability, we assume that cells within a deme form a well-mixed population. The well-mixed assumption is consistent with previous mathematical models of tumour evolution 5, 26, 27, 36, 64 and with experimental evidence in the case of stem cells within colonic crypts. 66
Initial conditions
A simulation begins with a single tumour cell located in a deme at the centre of the grid. If the model is parametrized to include normal cells then these are initially distributed throughout the grid such that each deme’s population size is equal to its carrying capacity. Otherwise, if normal cells are absent, then the demes surrounding the tumour are initially unoccupied.
Stopping condition
The simulation stops when the number of tumour cells reaches a threshold value. Because we are interested only in tumours that reach a large size, if the tumour cell population succumbs to stochastic extinction then results are discarded and the simulation is restarted (with a different seed for the pseudo-random number generator).
Within-deme dynamics
Tumour cells undergo stochastic division, death, dispersal, and mutation events, whereas normal cells undergo only division and death. The within-deme death rate is density-dependent. When the deme population size is less than or equal to the carrying capacity, the death rate takes a fixed value d 0 that is less than the initial division rate. When the deme population size exceeds carrying capacity, the death rate takes a different fixed value d 1 that is much greater than the largest attainable division rate. Hence all genotypes grow approximately exponentially until the carrying capacity is attained, after which point the within-deme dynamics resemble a birth-death Moran process − a standard, well characterized model of population genetics.
In all spatially structured simulations we set d 0 = 0 to prevent demes becoming empty. For the non-spatial (branching process) model, we set d 0 > 0 and dispersal rate equal to zero, so that all cells always belong to a single deme (with carrying capacity greater than the maximum tumour population size).
Mutation
When a cell divides, each daughter cell inherits its parent’s genotype plus a number of additional mutations, drawn from a Poisson distribution. Each mutation is unique, consistent with the infinite-sites assumption of canonical population genetics models. Whereas some previous studies have examined the effects of only a single driver mutation (Supplementary Table 2), in our model there is no limit on the number of mutations a cell can acquire. Most mutations are passenger mutations with no phenotypic effect. The remainder are drivers, each of which increases the cell division or dispersal rate.
The program records the immediate ancestor of each clone (defined in terms of driver mutations) and the matrix of Hamming distances between clones (that is, for each pair of clones, how many driver mutations are found in only one clone), which together allow us to reconstruct driver phylogenetic trees. To improve efficiency, the distance matrix excludes clones that failed to grow to more than ten cells and failed to produce any other clone before becoming extinct.
Driver mutation effects
Whereas previous models have typically assumed that the effects of driver mutations combine multiplicatively, this can potentially result in implausible trait values (especially in the case of division rate if the rate of acquiring drivers scales with the division rate). To remain biologically realistic, our model invokes diminishing returns epistasis, such that the average effect of driver mutations on a trait value r decreases as r increases. Specifically, the effect of a driver is to multiply the trait value r by a factor of 1 + s(1 − r/m), where s > 0 is the mutation effect and m is an upper bound. Nevertheless, because we set m to be much larger than the initial value of r, the combined effect of drivers in all models in the current study is approximately multiplicative. For each mutation, the value of the selection coefficient s is drawn from an exponential distribution.
Dispersal
Depending on model parameterization, dispersal occurs via either invasion or deme fission (Supplementary Table 3). In the case of invasion, the dispersal rate corresponds to the probability that a cell newly created by a division event will immediately attempt to invade a neighbouring deme. This particular formulation ensures consistency with a standard population genetics model known as the spatial Moran process. The destination deme is chosen uniformly at random from the four nearest neighbours (Von Neumann neighbourhood). Invasion can be restricted to the tumour boundary, in which case the probability that a deme can be invaded is 1 − N/K if N ≤ K and 0 otherwise, where N is the number of tumour cells in the deme and K is the carrying capacity. If a cell fails in an invasion attempt then it remains in its original deme. If invasion is not restricted to the tumour boundary then invasion attempts are always successful.
In fission models, a deme can undergo fission only if its population size is greater than or equal to carrying capacity. As with invasion, deme fission immediately follows cell division (so that results for the different dispersal types are readily comparable). The probability that a deme will attempt fission is equal to the sum of the dispersal rates of its constituent cells (up to a maximum of 1). Deme fission involves moving half of the cells from the original deme into a new deme, which is placed beside the original deme. If the dividing deme contains an odd number of cells then the split is necessarily unequal, in which case each deme has a 50% chance of receiving the larger share. Genotypes are redistributed between the two demes without bias according to a multinomial distribution. Cell division rate has only a minor effect on deme fission rate because a deme created by fission takes only a single cell generation to attain carrying capacity.
If fission is restricted to the tumour boundary then the new deme’s assigned location is chosen uniformly at random from the four nearest neighbours, and if the assigned location already contains tumour cells then the fission attempt fails. If fission is allowed throughout the tumour then an angle is chosen uniformly at random, and demes are budged along a straight line at that angle to make space for the new deme beside the original deme.
Our particular method of cell dispersal is chosen to enable comparison between our results and those of previous studies and to facilitate mathematical analysis. In particular, when the deme carrying capacity is set to 1, our model approximates an Eden growth model (if fission is restricted to the tumour boundary, or if dispersal is restricted to the tumour boundary and normal cells are absent), a voter model (if invasion is allowed throughout the tumour), or a spatial branching process (if fission is allowed throughout).
To fairly compare different spatial structures and manners of cell dispersal, we set dispersal rates in each case such that the time taken for a tumour to grow from one cell to one million cells is approximately the same as in the neutral Eden growth model with maximal dispersal rate. This means that, across models, the cell dispersal rate decreases with increasing deme size. Given that tumour cell cycle times are on the order of a few days, the timespans of several hundred cell generations in our models realistically correspond to several years of tumour growth. More specifically, if we assume tumours take between 5 and 50 years to grow and the cell cycle time is between 1 and 10 days (both uniform priors) then the number of cell generations is between 400 and 8,000 in 95% of plausible cases. This order of magnitude is consistent with tumour ages inferred from molecular data. 67
We note that, in addition to gland fission, gland fusion has been reported in normal human intestine, 68 which raises the possibility that gland fusion occurs during colorectal tumour development. However, the rate of crypt fission in tumours is much elevated, relative to the rate in healthy tissue, and must exceed the rate of crypt fusion (or else the tumour would not grow). Therefore, even if crypt fusion occurs in human tumours, we do not expect it to have a substantial influence on evolutionary mode. This view is supported by previous computational modelling. 69
Two versus three dimensions
We chose to conduct our study in two dimensions for two main reasons. First, the effects of deme carrying capacity on evolutionary dynamics are qualitatively similar in two and three dimensions, yet a two-dimensional model is simpler, easier to analyse, and easier to visualize. Second, we aimed to create a method that is readily reproducible using modest computational resources and yet can represent the long-term evolution of a reasonably large tumour at single-cell resolution.
One million cells in two dimensions corresponds to a cross-section of a three-dimensional tumour with many more than one million cells. Therefore, compared to a three-dimensional model, a two-dimensional model can provide richer insight into how evolutionary dynamics change over a large number of cell generations. Developing an approximate, coarse-grained analogue of our model that can efficiently simulate the population dynamics of very large tumours with different spatial structures in three dimensions is an important direction for future research.
Implementation
The program implements Gillespie’s exact stochastic simulation algorithm 70 for statistically correct simulation of cell events. The order of event selection is 1. deme, 2. cell type (normal or tumour), 3. genotype, and 4. event type. At each stage, the probability of selecting an item (deme, cell type, genotype or event type) is proportional to the sum of event rates for that item, within the previous item. We measure elapsed time in terms of cell generations, where a generation is equal to the expected cell cycle time of the initial tumour cell.
Sequencing data
We surveyed the multiregion and single-cell tumour sequencing literature to identify data sets suitable for comparison with our model results. Studies published prior to 2015 (for example, 71–74 ) were found to have insufficient sequencing depth for our purposes. We also excluded studies that reconstructed phylogenies using samples from metastases or from multifocal tumours (for example, 75–80 ) because our model is not designed to correspond to such scenarios. The seven studies we chose to include in our comparison are characterized by either high-coverage multiregion sequencing or large-sample single-cell sequencing of several tumours.
The ccRCC investigation 81 we selected involved multi-region deep sequencing, targeting a panel of more than 100 putative driver genes. Three studies of NSCLC, 10 mesothelioma 40 and breast cancer 39 conducted multi-region whole-exome sequencing (first two studies) or whole-genome sequencing (latter study), and reported putative driver mutations. We also used data from single-cell RNA sequencing studies of uveal melanoma 42 and breast cancer, 41 in which chromosome copy number variations were used to infer clonal structure, and a study of AML that used single-cell DNA sequencing. 24 All seven studies constructed phylogenetic trees, which are readily comparable to the trees predicted by our modelling. The methodological diversity of these studies contributes to demonstrating the robustness of the patterns we seek to explain.
From each of the seven cohorts we obtained data for between three and eight tumours. In the ccRCC data set, we focussed on the five tumours for which driver frequencies were reported in the original publication. For NSCLC, we used data for the five tumours for which at least six multi-region samples were sequenced. In mesothelioma, we selected the six tumours that had at least five samples taken. From the breast cancer multi-region study, we used data for the three untreated tumours that were subjected to multi-region sequencing. From the single-cell sequencing studies of uveal melanoma and breast cancer we used all the published data (eight tumours in each case), and from the AML study we selected a random sample of eight tumours.
In multi-region sequencing data sets it is uncertain whether all putative driver mutations were true drivers of tumour progression. One way to interpret the data (interpretation I1) is to assume that all putative driver mutations were true drivers that occurred independently. Alternatively, the more conservative interpretation I2 assumes that each mutational cluster (a distinct peak in the variant allele frequency distribution) corresponds to exactly one driver mutation, while all other mutations are hitchhikers. Thus I1 permits linear chains of nodes that in I2 are combined into single nodes (compare Supplementary Figures 9 and 10), and I1 leads to a higher estimate of the mean number of driver mutations per cell (our summary index n). If both the fraction of putative driver mutations that are not true drivers (false positives) and the fraction of true driver mutations that are not counted as such (false negatives) are low, or if these fractions approximately cancel out, then interpretation I1 will give a good approximation of n whereas I2 will give a lower bound. For the ccRCC, NSCLC and breast cancer cases in our data set, I1 generates values of n in the range 3-10 (mean 6.1), consistent with estimates based on other methodologies, 13, 51 whereas for I2 the range is only 1-4 (mean 2.5). Accordingly we use interpretation I1.
Clonal diversity index
To measure clonal diversity, we use the inverse Simpson index defined as , where pi is the frequency of the ith combination of driver mutations. For example, if the population comprises k clones of equal size then pi = 1/k for every value of i, and so D = 1/(k × 1/k 2) = k. Clonal diversity has a lower bound D = 1. The inverse Simpson index is relatively robust to adding or removing rare types, which makes it appropriate for comparing data sets with differing sensitivity thresholds. Further examples are illustrated in Supplementary Figure 11.
D is constrained by an upper bound for trees with n < 2, where n is the mean number of driver mutations per cell. Indeed, n = ∑ i i · pi ≥ p 1 + 2(1 − p 1) = 2 − p 1, so p 1 ≥ 2 − n > 0, since n < 2. Therefore
To see that this bound is tight, assume 1 ≤ n < 2 and consider a star-shaped tree with N nodes such that p 1 = 2 − n and other nodes have equal weights p i = (1 − p 1)/(N − 1) = (n − 1)/(N − 1) for i ≥ 2. The mean number of driver mutations per cell is p 1 + 2(1 − p 1) = 2 − p 1 = n, and the inverse Simpson index is
This quantity goes to 1/(2 − n)2 as the number of nodes N goes to infinity, so the bound 1/(2 − n)2 may be approached arbitrarily closely.
It is informative to derive the relationship between D and n for a population that evolves via a sequence of clonal sweeps, such that each new sweep begins only after the previous sweep is complete. For a given value of n, our simulations rarely produce trees with D values below the curves of this trajectory. Suppose that a population comprises a parent type and a daughter type, with frequencies p and 1 − p, respectively. If the daughter has m driver mutations then the parent must have m − 1 driver mutations and n must satisfy m − 1 ≤ n ≤ m. More specifically,
where {n} denotes the fractional part of n. The trajectory is therefore described by
We additionally calculate a curve representing the maximum possible diversity of linear trees. In the main text and below, we refer to this curve as corresponding to trees with an intermediate degree of branching. Specifically, this intermediate-branching curve is defined such that for every point below the curve (and with D > 1) there exist both linear trees and branching trees that have the corresponding values of n and D, whereas for every point above the curve there exist only branching trees. Derivation of the curve’s equation is provided in Supplementary information. A first-order approximation (accurate within 1% for n ≥ 2.2) is D ≈ 9(2n −1)/8.
To assess the extent to which clusters of points (n, D) are well separated, we calculated silhouette widths using the cluster R package. 82 A positive mean silhouette width indicates that clusters are distinct.
Other diversity indices
Our diversity index fulfils the same purpose as the intratumour heterogeneity (ITH) index used in the TRAC-ERx Renal study, 9 defined as the ratio of the number of subclonal driver mutations to the number of clonal driver mutations. However, compared to ITH, our index has the advantages of being a continuous variable and being robust to methodological differences that affect ability to detect low frequency mutations. In calculating ITH from sequencing data we include all putative driver mutations, whereas ref 9 used only a subset of mutations. For model output, we classify mutations with frequency above 99% as clonal and we exclude mutations with frequency less than 1%. ITH and the inverse Simpson index are strongly correlated across our models (Spearman’s ρ = 0.98, or ρ = 0.81 for cases with D > 2; Extended Data Figure 9c).
The Shannon index defined as ∑ i Pi log Pi is another alternative to the Simpson index. The exponential of this index has the same units as the inverse Simpson index (equivalent number of types). Compared to the Simpson index, the Shannon index gives more weight to rare types, which makes it somewhat less suitable for comparing data sets with differing sensitivity thresholds.
Defining evolutionary modes in terms of indices D and n
In defining regions in terms of indices D and n (Table 1 and Figure 3c), we first note that if a population undergoes a succession of non-overlapping clonal sweeps then at most two clones coexist at any time, and hence D ≤ 2. Allowing for some overlap between sweeps, we define the selective sweeps region as having D < 10/3 and D below the intermediate-branching curve. We put the upper boundary at D = 10/3 because this intersects with the intermediate-branching curve at n = 2.
We use D = 20 to define the boundary between the branching and progressive diversification regions. The TRACERx Renal study 9 instead categorized trees containing more than 10 clones as highly branched, as opposed to branched. It is appropriate for us to use a higher threshold because our regions are based on true tumour diversity values, rather than the typically lower values inferred from multi-region sequencing data. Finally, we define an effectively almost neutral region containing star-shaped trees with n < 2 and D above the intermediate-branching curve.
It is possible to construct trees that do not fit the labels we have assigned to regions. For example (as shown in Supplementary information), there exist linear trees within the branching and progressive diversification regions. Such exceptions are an unavoidable consequence of representing high-dimensional objects − such as phylogenetic trees − in terms of a small number of summary indices. Our labels are appropriate for the subset of trees that we have shown arise from tumour evolution.
Previously defined tree balance indices
Conventionally, the balance of a tree is the degree to which branching events split the tree into subtrees with the same number of leaves, or terminal nodes. A balanced tree thus indicates more equal extinction and speciation rates than an unbalanced tree. 83 Tree balance indices are commonly used to assert the correctness of tree reconstruction methods and to classify trees. We consider three previously defined indices, all of which are imbalance indices, which means that more balanced trees are assigned smaller values. We subtract each of these indices from 1 to obtain measurements of tree balance.
Let T = (V, E) be a tree with a set of nodes V and edges E. Let |V| = N, and hence |E| = N − 1 (since each node has exactly one parent, except the root). We define l as the number of leaves of the tree. The root is labelled 1 and the leaves are numbered from N − l + 1 to N. There is only one cladogram with two leaves, which is maximally balanced according to all the previously defined indices discussed below. We also consider the single-node tree to be maximally balanced with respect to these previously defined indices. The following definitions then apply when l ≥ 3.
For each leaf j, we define υj as the number of interior nodes between j and the root, which is included in the count. Then a normalised version of Sackin’s index, originally introduced in ref 84, is defined as
where to be able to compare indices of trees on different number of leaves l, we subtract the minimal value for a given l and divide by the range of the index on all trees on n leaves, as in ref 85.
For i an interior node of a binary tree T, define TL (i) as the number of leaves subtended by the left branch of Ti the subtree rooted at i, and TR (i) the number of leaves subtended by its right branch. Then the unnormalised Colless index 86 of T is
Since Colless’ index is defined only for bifurcating trees, we use the default normalised Colless-like index 𝕮MDM ,In(l+e),norm defined in ref 85. This consists of measuring the dissimilarity between the subtrees T′ rooted at a given internal node by computing the mean deviation from the median (MDM) of the f-sizes of these subtrees. In this case f (l) = ln(l + e) and the f-size of T′ is defined as
These dissimilarities are then summed and the result is normalised as for Sackin’s index.
The cophenetic value φ(i, j) of a pair of leaves i, j is the depth of their lowest common ancestor (such that the root has depth 0). The total cophenetic index 87 of T is then the sum of the cophenetic values over all pairs of leaves, and a normalised version is
where here the minimal value of the cophenetic index is 0 for all l (for a star-shaped tree with l leaves).
These three balance indices were designed for analysing species phylogenies and are thus defined on cladograms, which are trees in which leaves correspond to extant species and internal nodes are hypothetical common ancestors. Conventional cladograms have no notion of node size. Cladograms also lack linear components as each internal node necessarily corresponds to a branching event. The driver phylogenetic trees reported in multiregion sequencing studies and generated by our models are instead clone trees (also known as mutation trees), in which all nodes of non-zero size represent extant clones. To apply previous balance indices to driver phylogenetic trees we first converted the trees to cladograms by adding a leaf to each non-zero-sized internal node and by collapsing linear chains of zero-sized nodes.
Whereas diversity indices such as D are relatively robust to the addition or removal of rare clones, the balance indices described above are much less robust because they treat all clones equally, regardless of population size (Supplementary Figures 6, 7 and 8). This hampers comparison between model results and data for two reasons. First, due to sampling error, even high quality multiregion sequencing studies underestimate the number of subclonal, locally abundant driver mutations by approximately 25%. 81 Second, bulk sequencing cannot detect driver mutations present in only a very small fraction of cells.
A robust tree balance index
To overcome the shortcomings of previous indices, we have developed a more robust tree balance index based on an extended definition: tree balance is the degree to which internal nodes split the tree into subtrees of equal size, where size refers to the sum of all node populations.
Let f (v) > 0 denote the size of node v. For i an internal node, let V(Ti ) denote the set of nodes of Ti, the subtree rooted at i. We then define
For i in the set of internal nodes Ṽ, and j in the set C(i) of children of i, we define . We then compute the balance score of a node i ∈ Ṽ as the normalised Shannon entropy of the sizes of the subtrees rooted at the children of i:
where d+(i) is the out-degree (the number of children) of node i. Finally, for each node i, we weight the balance score by the product of and a non-root dominance factor . Our normalised balance index is then
Supplementary Figure 11 illustrates the calculation of J 1 for four exemplary trees. We further describe the desirable properties of this index, and its relationship to other tree balance indices, in another article. 43
When n ≤ 2 (where n is the mean number of driver mutations per cell), the non-root dominance factor cannot exceed n − 1, while the other factors in J 1 are at most 1, which implies J 1 ≤ n − 1 for all n ≤ 2. Also for n > 2 we have J 1 ≤ 1 < n − 1. Thus it is impossible to construct trees that have J 1 > n − 1, as shown in Figure 4a.
Clonal turnover indices
For each time point t ≥ δt, we define a clonal turnover index as
where fi (t) is the frequency of clone i at time t, and τ is 10% of the total simulation time measured in cell generations. The mean value over time measures the total extent of clonal turnover.
To determine whether clonal turnover mostly occurred early, late or throughout tumour evolution we calculate the weighted average
where max(t) denotes the final time of the simulation. This quantity takes values between 0 and 1, and is higher if clonal turnover occurs mostly late during tumour growth. If the rate of clonal turnover is constant over time then ≈ 0.55.
Histology slide analysis to determine the number of cells per gland
We randomly selected five tumours of each of four cancer types (colorectal cancer, clear cell renal cancer, lung adenocarcinoma and breast cancer) from the TCGA reference database (http://portal.gdc.cancer.gov). Using QuPath v0.2.0m4, 88 we manually delineated five representative groups of tumour cells in each image and automatically counted the number of cells in each group. We defined a group as a set of tumour cells directly touching each other, separated from other groups by stroma or other non-tumour tissue (Extended Data Figure 3).
The number of cells per group ranged from 5 to 8,485, with 50% of cases between 53 and 387 cells (Extended Data Figure 4a). Variation in the number of cells per group was larger between rather than within tumours, whereas cell density was relatively consistent between tumours (Extended Data Figure 4b). Because our cell counts are derived from cross sections, they will underestimate the true populations of three-dimensional glands. On the other hand, it is unknown what proportion of cells are able to self-renew and contribute to long-term tumour growth and evolution. 89 On balance, therefore, it is reasonable to assume that each gland of an invasive, glandular tumour can contain between a few hundred and a few thousand interacting cells. This range of values is, moreover, remarkably consistent with results of a recent study that used a very different method to infer the number of cells in tumour-originating niches. Across a range of tissue types, the latter study concluded that cells typically interact in communities of 300-1,900 cells. 30 Another recent study of breast cancer applied the Louvain method for community detection to identify two-dimensional tumour communities typically in the range of 10-100 cells. 29
Extended Data
Supplementary Material
Acknowledgments
We thank Michael Hochberg, Mykola Lebid, Katharina Jahn, Richard Neher, Andreas Deutsch, Kiril Korolev, Cleo-Aron Weis, Benjamin Werner, Andrea Sottoriva and Artem Kaznatcheev for helpful discussions, and we thank Darlan Minussi for sharing data. R.N. and N.B. were supported by ERC Synergy Grant 609883. J.N.K. was supported by the German Consortium for Translational Cancer Research (DKTK/DKFZ) fellowship program and by RWTH Aachen START grant 2018/691906. R.N. was also supported by the National Cancer Institute of the National Institutes of Health under Award Number U54CA217376. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The results published here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Footnotes
Author contributions
R.N. conceived the research question and designed and created the modelling framework. R.N., D.B. and C.L.S. ran computational models and analysed the output. R.N. and J.L. developed tree balance indices. J.N.K. obtained and analysed histology slides. Y.V. derived the formula for the maximum possible diversity of linear trees (Supplementary information). R.N. wrote the manuscript with critical comments and input from N.B. and the other authors. All authors have read and approved the final manuscript.
Competing interests
The authors declare no competing interests.
Contributor Information
Robert Noble, Email: robert.noble@city.ac.uk.
Niko Beerenwinkel, Email: niko.beerenwinkel@bsse.ethz.ch.
Data availability
Data can be accessed at https://github.com/robjohnnoble/ModesOfEvolution.
Code availability
Our computational modelling code is available in an online repository. 65 Our data analysis R package is at https://github.com/robjohnnoble/demon_analysis. Specific figure plotting code is at https://github.com/robjohnnoble/ModesOfEvolution. Muller plots were drawn using the ggmuller R package. 90
References
- [1].Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481:306–13. doi: 10.1038/nature10762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Davis A, Gao R, Navin N. Tumor evolution Linear, branching, neutral or punctuated? Biochimica et Biophysica Acta - Reviews on Cancer. 2017;1867:151–161. doi: 10.1016/j.bbcan.2017.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Turajlic S, Sottoriva A, Graham T, Swanton C. Resolving genetic heterogeneity in cancer. Nature Reviews Genetics. 2019;20:404–416. doi: 10.1038/s41576-019-0114-6. [DOI] [PubMed] [Google Scholar]
- [4].Sun R, Hu Z, Curtis C. Big Bang Tumor Growth and Clonal Evolution. Cold Spring Harbor Perspectives in Medicine a. 2017:028381. doi: 10.1101/cshperspect.a028381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Sottoriva A, et al. A Big Bang model of human colorectal tumor growth. Nature Genetics. 2015;47:209–216. doi: 10.1038/ng.3214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Suzuki Y, et al. Multiregion ultra-deep sequencing reveals early intermixing and variable levels of intra-tumoral heterogeneity in colorectal cancer. Molecular Oncology. 2017;11:124–139. doi: 10.1002/1878-0261.12012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Ling S, et al. Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution. Proceedings of the National Academy of Sciences. 2015;112:E6496–E6505. doi: 10.1073/pnas.1519556112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Karlsson J, et al. Four evolutionary trajectories underlie genetic intratumoral variation in childhood cancer. Nature Genetics. 2018;50:944–950. doi: 10.1038/s41588-018-0131-y. [DOI] [PubMed] [Google Scholar]
- [9].Turajlic S, et al. Deterministic Evolutionary Trajectories Influence Primary Tumor Growth: TRACERx Renal. Cell. 2018;173:595–610.:e11. doi: 10.1016/j.cell.2018.03.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Jamal-Hanjani M, et al. Tracking the Evolution of Non?Small-Cell Lung Cancer. New England Journal of Medicine. 2017;376 doi: 10.1056/NEJMoa1616288. NEJMoa1616288. [DOI] [PubMed] [Google Scholar]
- [11].Dentro SC, et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell. 2021:1–16. doi: 10.1016/j.cell.2021.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nature Genetics. 2016;48:238–244. doi: 10.1038/ng.3489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Martincorena I, et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell. 2017;171:1029–1041.:e21. doi: 10.1016/j.cell.2017.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta. 2010;1805:1–28. doi: 10.1016/j.bbcan.2009.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Lloyd MC, et al. Darwinian Dynamics of Intratumoral Heterogeneity: Not Solely Random Mutations but Also Variable Environmental Selection Forces. Cancer Research. 2016;76:3136–3144. doi: 10.1158/0008-5472.CAN-15-2962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Maley CC, et al. Classifying the evolutionary and ecological features of neoplasms. Nature Reviews Cancer. 2017;17:605–619. doi: 10.1038/nrc.2017.69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Noble RJ, et al. Paracrine Behaviors Arbitrate Parasite-Like Interactions Between Tumor Subclones. Frontiers in Ecology and Evolution. 2021;9:1–15. doi: 10.3389/fevo.2021.675638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Friedl P, Alexander S. Cancer invasion and the microenvironment: Plasticity and reciprocity. Cell. 2011;147:992–1009. doi: 10.1016/j.cell.2011.11.016. [DOI] [PubMed] [Google Scholar]
- [19].Pandya P, Orgaz JL, Sanz-Moreno V. Modes of invasion during tumour dissemination. Molecular Oncology. 2017;11:5–27. doi: 10.1002/1878-0261.12019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Slarkin M. Gene flow in natural populations. Annual Review of Ecology and Systematics. 1985;16:393–430. [Google Scholar]
- [21].Durrett R. Branching Process Models of Cancer. Springer International Publishing; Cham: 2015. [Google Scholar]
- [22].Clapp G, Levy D. A review of mathematical models for leukemia and lymphoma. Drug Discovery Today: Disease Models. 2015;16:1–6. doi: 10.1016/j.ddmod.2014.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Hehlmann R, Hochhaus A, Baccarani M. Chronic myeloid leukaemia. The Lancet. 2007;370:342–350. doi: 10.1016/S0140-6736(07)61165-9. [DOI] [PubMed] [Google Scholar]
- [24].Morita K, et al. Clonal evolution of acute myeloid leukemia revealed by high-throughput single-cell genomics. Nature Communications. 2020;11:5327. doi: 10.1038/s41467-020-19119-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Preston SL, et al. Bottom-up Histogenesis of Colorectal Adenomas : Origin in the Monocryptal Adenoma and Initial Expansion by Crypt Fission. Cancer Research. 2003:3819–3825. [PubMed] [Google Scholar]
- [26].Ryser MD, Min B-H, Siegmund KD, Shibata D. Spatial mutation patterns as markers of early colorectal tumor cell mobility. Proceedings of the National Academy of Sciences. 2018:201716552. doi: 10.1073/pnas.1716552115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Sun R, et al. Between-region genetic divergence reflects the mode and tempo of tumor evolution. Nature Genetics. 2017;49:1015–1024. doi: 10.1038/ng.3891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Lugli A, Zlobec I, Berger MD, Kirsch R, Nagtegaal ID. Tumour budding in solid cancers. Nature Reviews Clinical Oncology. 2021;18:101–115. doi: 10.1038/s41571-020-0422-y. [DOI] [PubMed] [Google Scholar]
- [29].Jackson HW, et al. The single-cell pathology landscape of breast cancer. Nature. 2020;578:615–620. doi: 10.1038/s41586-019-1876-x. [DOI] [PubMed] [Google Scholar]
- [30].Buder T, Deutsch A, Klink B, Voss-Böhme A. Patterns of Tumor Progression Predict Small and Tissue-Specific Tumor-Originating Niches. Frontiers in Oncology. 2019;8:1–8. doi: 10.3389/fonc.2018.00668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Hoefflin R, et al. Spatial niche formation but not malignant progression is a driving force for intratumoural heterogeneity. Nature Communications. 2016;7:1–12. doi: 10.1038/ncomms11845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Waclaw B, et al. A spatial model predicts that dispersal and cell turnover limit intratumour heterogeneity. Nature. 2015;525:261–264. doi: 10.1038/nature14971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Schlageter M, Terracciano LM, D’Angelo S, Sorrentino P. Histopathology of hepatocellular carci-noma. World Journal of Gastroenterology. 2014;20:15955. doi: 10.3748/wjg.v20.i43.15955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Hallatschek O, Nelson DR. Gene surfing in expanding populations. Theoretical population biology. 2008;73:158–70. doi: 10.1016/j.tpb.2007.08.008. [DOI] [PubMed] [Google Scholar]
- [35].Fusco D, Gralka M, Kayser J, Anderson A, Hallatschek O. Excess of mutational jackpot events in expanding populations revealed by spatial Luria?Delbrück experiments. Nature Communications. 2016;7:12760. doi: 10.1038/ncomms12760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Gidoin C, Peischl S. Range Expansion Theories Could Shed Light on the Spatial Structure of Intra-tumour Heterogeneity. Bulletin of Mathematical Biology. 2018 doi: 10.1007/s11538-018-00540-6. [DOI] [PubMed] [Google Scholar]
- [37].Murase E, Siegelman ES, Outwater EK, Perez-Jaffe LA, Tureck RW. Uterine leiomyomas: Histopathologic features, MR imaging findings, differential diagnosis, and treatment. Radiographics. 1999;19:1179–1197. doi: 10.1148/radiographics.19.5.g99se131179. [DOI] [PubMed] [Google Scholar]
- [38].Basara Akin I, Balci P. Fibroadenomas: a multidisciplinary review of the variants. Clinical Imaging. 2021;71:83–100. doi: 10.1016/j.clinimag.2020.10.058. [DOI] [PubMed] [Google Scholar]
- [39].Yates LR, et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nature Medicine. 2015;21 doi: 10.1038/nm.3886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Zhang M, et al. Clonal architecture in mesothelioma is prognostic and shapes the tumour microenvironment. Nature Communications. 2021;12:1–12. doi: 10.1038/s41467-021-21798-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Minussi DC, et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature. 2021;592:302–308. doi: 10.1038/s41586-021-03357-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Durante MA, et al. Single-cell analysis reveals new evolutionary complexity in uveal melanoma. Nature Communications. 2020;11 doi: 10.1038/s41467-019-14256-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Lemant J, Le Sueur C, Manojlovic V, Noble Robust R. Universal Tree Balance Indices. bioRxiv. 2021 doi: 10.1101/2021.08.25.457695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Williams MJ, et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nature Genetics. 2018;50:895–903. doi: 10.1038/s41588-018-0128-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Tarabichi M, et al. Neutral tumor evolution? Nature Genetics. 2018;50:1630–1633. doi: 10.1038/s41588-018-0258-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Bozic I, et al. Accumulation of driver and passenger mutations during tumor progression. Proceedings of the National Academy of Sciences of the United States of America. 2010;107:18545–50. doi: 10.1073/pnas.1010978107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Chkhaidze K, et al. Spatially constrained tumour growth affects the patterns of clonal selection and neutral drift in cancer genomic data. PLOS Computational Biology. 2019;15:e1007243. doi: 10.1371/journal.pcbi.1007243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Noble R, Burley JT, Le Sueur C, Hochberg ME. When, why and how tumour clonal diversity predicts survival. Evolutionary Applications. 2020;13:1558–1568. doi: 10.1111/eva.13057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Ryser MD, et al. Minimal barriers to invasion during human colorectal tumor growth. Nature Communications. 2020;11:1280. doi: 10.1038/s41467-020-14908-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Cross W, et al. The evolutionary landscape of colorectal tumorigenesis. Nature Ecology and Evolution. 2018;2:1661–1672. doi: 10.1038/s41559-018-0642-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Sabarinathan R, et al. The whole-genome panorama of cancer drivers. bioRxiv. 2017;10 doi: 10.1101/190330. [DOI] [Google Scholar]
- [52].West J, Schenck RO, Gatenbee C, Robertson-Tessi M, Anderson ARA. Normal tissue architecture determines the evolutionary course of cancer. Nature Communications. 2021;12:2060. doi: 10.1038/s41467-021-22123-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Fukatsu A, et al. Growth Pattern, an Important Pathologic Prognostic Parameter for Clear Cell Renal Cell Carcinoma. American Journal of Clinical Pathology. 2013;140:500–505. doi: 10.1309/AJCPIMPE6ZFT8AME. [DOI] [PubMed] [Google Scholar]
- [54].Verine J, et al. Architectural Patterns are a Relevant Morphologic Grading System for Clear Cell Renal Cell Carcinoma Prognosis Assessment. The American Journal of Surgical Pathology. 2018;42:423–441. doi: 10.1097/PAS.0000000000001025. [DOI] [PubMed] [Google Scholar]
- [55].Andersson N, et al. Extensive clonal branching shapes the evolutionary history of high-risk pediatric cancers. Cancer Research. 2020;80:1512–1523. doi: 10.1158/0008-5472.CAN-19-3468. [DOI] [PubMed] [Google Scholar]
- [56].Bacevic K, et al. Spatial competition constrains resistance to targeted cancer therapy. Nature Communications. 2017;8:1995. doi: 10.1038/s41467-017-01516-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Gallaher JA, Enriquez-Navas PM, Luddy KA, Gatenby RA, Anderson AR. Spatial Het-erogeneity and Evolutionary Dynamics Modulate Time to Recurrence in Continuous and Adaptive Cancer Therapies. Cancer Research. 2018;78:2127–2139. doi: 10.1158/0008-5472.CAN-17-2649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Viossat Y, Noble R. A theoretical analysis of tumour containment. Nature Ecology and Evolution. 2021;5:826–835. doi: 10.1038/s41559-021-01428-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].von Loga K, et al. Extreme intratumour heterogeneity and driver evolution in mismatch repair deficient gastro-oesophageal cancer. Nature Communications. 2020;11:1–14. doi: 10.1038/s41467-019-13915-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Scott JG, Maini PK, Anderson ARA, Fletcher AG. Inferring Tumor Proliferative Organization from Phylogenetic Tree Measures in a Computational Model. Systematic Biology. 2020;69:623–637. doi: 10.1093/sysbio/syz070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Eden M. A Two-Dimensional Poisson Growth Process. Journal of the Royal Statistical Society Series B. 1965;27:497–504. [Google Scholar]
- [62].Williams T, Bjerknes R. Stochastic model for abnormal clone spread through epithelial basal layer. Nature. 1972;236:19–21. doi: 10.1038/236019a0. [DOI] [PubMed] [Google Scholar]
- [63].Ahmed Z, Gravel S. Intratumor Heterogeneity and Circulating Tumor Cell Clusters. Molecular Biology and Evolution. 2018;35:2135–2144. doi: 10.1093/molbev/msy115. [DOI] [PubMed] [Google Scholar]
- [64].Durrett R, Foo J, Leder K. Spatial Moran models, II: cancer initiation in spatially structured tissue. Journal of Mathematical Biology. 2016;72:1369–1400. doi: 10.1007/s00285-015-0912-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Noble R. demon: deme-based oncology model. 2019. URL https://github.com/robjohnnoble/demon_model .
- [66].Snippert HJ, et al. Intestinal crypt homeostasis results from neutral competition between symmetrically dividing Lgr5 stem cells. Cell. 2010;143:134–144. doi: 10.1016/j.cell.2010.09.016. [DOI] [PubMed] [Google Scholar]
- [67].Gerstung M, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578:122–128. doi: 10.1038/s41586-019-1907-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Baker A-m, et al. Crypt fusion as a homeostatic mechanism in the human colon Gut. 2019;68:1986–1993. doi: 10.1136/gutjnl-2018-317540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Nicholson AM, et al. Fixation and Spread of Somatic Mutations in Adult Human Colonic Epithelium. Cell Stem Cell. 2018:1–10. doi: 10.1016/j.stem.2018.04.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry. 1977;81:2340–2361. [Google Scholar]
- [71].Gerlinger M, et al. Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing. New England Journal of Medicine. 2012;366:883–892. doi: 10.1056/NEJMoa1113205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Zhang J, et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science. 2014;346:256–259. doi: 10.1126/science.1256930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Sottoriva A, et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:4009–14. doi: 10.1073/pnas.1219747110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Gerlinger M, et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nature Genetics. 2014;46:225–233. doi: 10.1038/ng.2891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Cooper CS, et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nature Genetics. 2015;47:367–372. doi: 10.1038/ng.3221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Boutros PC, et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nature Genetics. 2015:1–14. doi: 10.1038/ng.3315. [DOI] [PubMed] [Google Scholar]
- [77].Zhao Z-M, et al. Early and multiple origins of metastatic lineages within primary tumors. Proceedings of the National Academy of Sciences. 2016;113:2140–2145. doi: 10.1073/pnas.1525677113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].McPherson A, et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nature genetics. 2016;48:758–67. doi: 10.1038/ng.3573. [DOI] [PubMed] [Google Scholar]
- [79].Rasche L, et al. Spatial genomic heterogeneity in multiple myeloma revealed by multi-region sequencing. Nature Communications. 2017;8:1–10. doi: 10.1038/s41467-017-00296-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Pectasides E, et al. Genomic heterogeneity as a barrier to precision medicine in gastroesophageal adeno-carcinoma. Cancer Discovery. 2018;8:37–48. doi: 10.1158/2159-8290.CD-17-0395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Turajlic S, et al. Tracking Cancer Evolution Reveals Constrained Routes to Metastases: TRACERx Renal. Cell. 2018:1–14. doi: 10.1016/j.cell.2018.03.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. cluster: Cluster Analysis Basics and Extensions. 2019 [Google Scholar]
- [83].Shao K-T, Sokal RR. Tree Balance. Systematic Zoology. 1990;39:266. [Google Scholar]
- [84].Sackin MJ. “Good” and “Bad” Phenograms. Systematic Biology. 1972;21:225–226. [Google Scholar]
- [85].Mir A, Rotger L, Rosselló F. Sound Colless-like balance indices for multifurcating trees. PLoS ONE. 2018;13:559–560. doi: 10.1371/journal.pone.0203401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [86].Colless DH. Review of Phylogenetics, The Theory and Practice of Phylogenetic Systematics. Systematic Zoology. 1982;31:100–104. [Google Scholar]
- [87].Mir A, Rosselló F, Rotger La. A new balance index for phylogenetic trees. Mathematical Biosciences. 2013;241:125–136.:1202.1223. doi: 10.1016/j.mbs.2012.10.005. [DOI] [PubMed] [Google Scholar]
- [88].Bankhead P, et al. QuPath: Open source software for digital pathology image analysis. Scientific Reports. 2017;7:16878. doi: 10.1038/s41598-017-17204-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [89].Batlle E, Clevers H. Cancer stem cells revisited. Nature Medicine. 2017;23:1124–1134. doi: 10.1038/nm.4409. [DOI] [PubMed] [Google Scholar]
- [90].Noble R. ggmuller: Create Muller Plots of Evolutionary Dynamics. 2019. URL https://cran.r-project.org/package=ggmuller .
- [91].St Hill CA, et al. The high affinity selectin glycan ligand C2-O-sLex and mRNA transcripts of the core 2 β-1,6-N-acetylglusaminyltransferase (C2GnT1) gene are highly expressed in human colorectal adenocar-cinomas. BMC Cancer. 2009;9:79. doi: 10.1186/1471-2407-9-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [92].Brunet É, Derrida B, Mueller AH, Munier S. Effect of selection on ancestry: An exactly soluble case and its phenomenological generalization. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. 2007;76:1–20.:0704.3389. doi: 10.1103/PhysRevE.76.041104. [DOI] [PubMed] [Google Scholar]
- [93].Neher RA, Hallatschek O. Genealogies of rapidly adapting populations. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:437–42. doi: 10.1073/pnas.1213113110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data can be accessed at https://github.com/robjohnnoble/ModesOfEvolution.
Our computational modelling code is available in an online repository. 65 Our data analysis R package is at https://github.com/robjohnnoble/demon_analysis. Specific figure plotting code is at https://github.com/robjohnnoble/ModesOfEvolution. Muller plots were drawn using the ggmuller R package. 90