Abstract
Clonal genome evolution is a key feature of asexually reproducing species and human cancer development. While many studies have described the landscapes of clonal genome evolution in cancer, few determine the underlying evolutionary parameters from molecular data, and even fewer integrate theory with data. We derived theoretical results linking mutation rate, time, expansion dynamics, and biological/clinical parameters. Subsequently, we inferred time-resolved estimates of evolutionary parameters from mutation accumulation, mutational signatures and selection. We then applied this framework to predict the time of speciation of the marbled crayfish, an enigmatic, globally invasive parthenogenetic freshwater crayfish. The results predict that speciation occurred between 1986 and 1990, which is consistent with biological records. We also used our framework to analyze whole-genome sequencing datasets from primary and relapsed glioblastoma, an aggressive brain tumor. The results identified evolutionary subgroups and showed that tumor cell survival could be inferred from genomic data that was generated during the resection of the primary tumor. In conclusion, our framework allowed a time-resolved, integrated analysis of key parameters in clonally evolving genomes, and provided novel insights into the evolutionary age of marbled crayfish and the progression of glioblastoma.
Author summary
Genomes evolve under the accumulation of mutations, and under the pressure of selective forces. While additional mechanisms are at play in sexually reproducing species, this is not the case in clonal genomes. Our study focuses on a parthogenetic animal and on cancer, since both possess a clonal genome, and in both cases evolutionary forces are key to understand expansion. We used modelling of mutation accumulation, in combination with Darwinian selection and with clock-like mutagenic processes. Using this framework, we showed a remarkably recent emergence date for P. virginalis and established its potential as a model system for clonal genome evolution. We highlighted subtle temporal dynamics of selection in tumor samples, and showed that tumor cell survival was correlated with the time to recurrence. Our findings illustrate the potential of this framework for modelling of clonal evolution and for the use of evolutionary parameters in a clinical context.
Introduction
The evolution of genomes is shaped by many factors, among which the random accumulation of mutations over time plays a fundamental role [1,2]. Because of this, the characteristics of mutated sites can be used as a lens to observe the evolutionary processes that shaped the genome in the past. For instance, the ratio of nonsynonymous to synonymous mutations reveals if selection had an impact on the genome [3]. Furthermore, biological processes, including clock-like processes, leave a footprint in the form of recently developed mutational signatures [4]. The frequency of mutated alleles could also elucidate the timeline of the evolving genome, or selection [5], but this can be confounded by stochastic drift, or by alterations of ploidy [6,7]. These individual evolutionary parameters may or may not exert an influence on each other. An integrated analysis aims at modelling these elements and their interplay, in order to gain a better understanding on their role on the genome.
Far from being homogeneous, the probability of a mutation depends on many factors such as the genomic location [3], mutator alleles, local nucleotide context or mutagenic exposures [4]. Other genomic modifications include recombination in sexual reproduction, copy number variants and genomic rearrangements, gene transfers and hybridization. The capacity of any genomic modification to be inherited is partly stochastic, for instance through genetic drift [8], but can be favored or disfavored by positive or negative selection. Genome evolution was historically observed through the analysis of phenotypes [9], and can now be determined more precisely using high-throughput sequencing in parallel with experimental or cohort settings, such as mutation accumulation experiments, or the analysis of genetic trios [10,11].
Under certain conditions, genomes can evolve clonally, which involves a more limited set of mechanisms. This is particularly relevant for asexually reproducing species and for human cancers. Mutation rate, growth and variant frequencies are key parameters of clonally evolving genomes [12]. They determine the speed of evolution and function under the influence of selective pressures.
Selection can be quantified using the ratio of nonsynonymous to synonymous mutations (dNdS), where a lower than expected ratio indicates purifying (negative) selection, and a higher ratio indicates positive selection. The expected dNdS ratio is not trivial to determine, because the identity of the neighboring genomic bases, or the location of the mutation in the gene transcript, can alter the frequency of certain mutations. Selection is also a multifaceted, dynamic event which actively depends on the environment [13,14]. Of note, the notion of stochastic drift, which corresponds to the random variation of the frequency of alleles (or, of clones), is a process distinct from selection. Stochastic drift can happen without the advent of selection (neutral drift), or in addition to it [1].
A prominent species with a clonally evolving genome is the marbled crayfish (Procambarus virginalis), a newly discovered freshwater crayfish [15,16]. Marbled crayfish reproduce by apomictic parthenogenesis, with the offspring being genetically identical copies of their mothers [17,18]. Interestingly, genetic analyses have suggested that the global marbled crayfish population represents a single clone, indicating that it was formed only recently and by a single foundational animal [19,20]. Morphological and genetic examinations have identified Procambarus fallax, a sexually reproducing slough crayfish from Florida, as the parent species of the marbled crayfish [21]. Furthermore, a recent phylo-geographic analysis of P. fallax suggested that the anthropogenic transport and cultivation of a triploid and parthenogenetically reproducing P. fallax specimen could be the origin of the marbled crayfish [21]. The offspring of this foundational specimen were subsequently distributed through the aquarium trade and released into various environments, thus forming numerous stable wild populations around the globe [20]. However, important details about the speciation of the marbled crayfish are not known and need to be supported by genetic analysis.
Clonal genome evolution also plays an important role in cancer formation. Indeed, cancer genome evolution is characterized by the accumulation of somatic mutations into a pathogenic tumoral genome. Several authors have described the critical role of mutational patterns and selection in cancer [1,3,22], while neutral evolution is still debated [5,7]. In glioblastoma, the analysis of tumor trajectories revealed a tumor initiation years before diagnosis [23]. Consequently, it would be of great interest to infer evolutionary parameters over the course of tumor progression.
In this study, we aimed to develop an integrated analysis of clonal genome evolution. To this end, we reformulated the dependence of mutation accumulation on variant allele frequency, and used this formulation to determine the links between the mutation rate, growth and survival rates. We further integrated these parameters with selection estimates, obtained from the non-synonymous to synonymous ratio. Finally, we integrated time estimates in our model, based on clock-like mutational signatures. We applied our approach to the clonally evolving marbled crayfish. We provided a detailed view of mutation accumulation and selection, and estimated the time of speciation. We further applied our framework to clonal genome evolution in cancer, using recently published samples of primary and recurrent glioblastoma [23].
Results
The genetic near-monoclonality of the marbled crayfish population [19,20] establishes this species as an excellent model system for studying clonal genome evolution. In order to assess the mutation rate of the P. virginalis genome, we used paired-end whole-genome sequencing at an average of 17x coverage (per strand), for a line of one ancestral animal and two direct descendants from our laboratory colony of P. virginalis, that were sampled over a period of seven years (Fig 1A). The mutation rate was calculated as the average number of de novo mutations in animals 34 and 35 as compared to animal 1, per nucleotide and per year. From these samples, we obtained a range for the mutation rate equal to , The lower bound corresponds to strictly filtered mutations divided by the number of sites in the whole genome, while the upper bound corresponds to the number of mutations remaining after a relaxed filtering, divided by the number of callable sites. The range of the mutation rate of P. virginalis genome overlaps with known mutation rates in arthropods, and is also comparable to the mutation rates observed in human somatic healthy or cancerous cells (Fig 1B).
Fig 1. Mutation rate of P. virginalis and coalescent.
(A) Genealogy of laboratory animals, with sequenced animals marked in grey. (B) Mutation rate in P. virginalis, in other arthropods (Silliman et al. 2021 [24], Yang et al. 2015 [25], Liu et al. 2017 [26], Oppold et al. 2016 [27], Flynn et al. 2017 [13], Ho et al. 2020 [28], Keightley et al. 2014 [29], Keightley et al. 2015 [30]), and in Homo sapiens (Ohno et al. 2019 [31], Lee-Six et al. 2018 [32], Blokzijl et al. 2016 [33], Martincorena et Campbell 2015 [10], Ma et al. 2018 [34]). HSC: Hematopoietic Stem Cells, ASCs: Adult Stem Cells (small intestine, colon and liver). Error bars correspond to 95% confidence intervals. (C) Coalescent tree based on a constant mutation rate and sequences of sampled animals. The posterior probability of each branch is indicated in red.
We made an evaluation of the evolutionary age of P. virginalis, using a Markov Chain Monte Carlo with Bayesian evolutionary analysis [35] on whole-genome sequencing datasets from 13 animals (Fig 1C). We generated 10 million states, which allowed convergence of the sampled states, and led to adequately large effective sample sizes (see Methods for details). The resulting coalescent tree showed that animals 1, 34 and 35 correctly clustered together, as well as animals from German wild populations (Hannover, Reilingen, Moosweiher) and from the likely foundational laboratory lineage of the German wild populations (Heidelberg). Furthermore, samples from Madagascar formed a separate branch. Interestingly, Petshop 2 [19] was nested in the branch of animals from Madagascar. This is consistent with the notion that the Malagasy population was founded by an animal that was originally obtained from a German pet shop. Posterior probabilities (Fig 1C, red annotations) indicate highly probable branching for all but the top coalescent event, which has 0.5206 probability. From this tree, the most recent common ancestor of the 13 animals occured in 1988.0 (95% CI: [1986.1; 1989.8]). This is anterior, and therefore broadly consistent, with the first documented appearance of P. virginalis in 1995 [16].
We next modeled mutation accumulation under a fast growth scenario. In this model, the number of mutations dM, arising in a time increment, scaled with the mutation rate and other evolutionary parameters (S1 Text, p.2, expression (1)). Then, we noticed that allele frequency could be expressed based on ploidy and the number of animals [5], and that the number of animals could be further expressed using the rate of reproduction, offspring survival, and the population size (S1 Text, p.4, expression (7)). This reformulation of allele frequency appeared advantageous, because it could then be used to simplify the expression of dM. As a result, under this fast growth scenario, dM could be simply expressed using the mutation rate, constant terms, and allele frequency (S1 Text, p.5, expression (12)). As a consequence, mutation accumulation gave information on the mutation rate, but not on selection (S1 Text). This provided the rationale for examining the dynamics of the mutation rate in P. virginalis using the M(1/f) curve, where M is the number of mutations and f is the allelic frequency (Fig 2A). The resulting curve suggested that the mutation rate changed over time, with 4 phases delineated by a segmented regression (Fig 2A; p = 0.06). The mutation rate was reduced in phase 3, as compared to phases 1 and 2, and increased in phase 4 (Fig 2A). Under our model, selection s is not observable using M(1/f) (S1 Text, Eq 12). We therefore used the ratio of non-synonymous to synonymous mutations to estimate s (Fig 2B). The resulting values were compatible with unity, suggesting the absence of selection.
Fig 2. Mutation accumulation, selection and time course of P. virginalis genome evolution.
(A) Mutation accumulation as a function of the inverse allele frequency 1/f (black) and phases from automated segmentation (breakpoints in grey, segments in red). The confidence band at 95% level is shown in grey. (B) Non-synonymous to synonymous ratio (dNdS). The smoothed ratio is shown in red. (C) Stack plot of exposure, the contribution of each mutational signature. This includes clock-like single-base substition 1 (SBS1) signature, and clock-like single-base substitution 5 (SBS5) signature. Confidence bands at 95% probability are indicated in grey. (D) Mutation accumulation as a function of time. The confidence band at 95% is shown in grey and the smoothened mutation accumulation is shown in red.
In order to obtain time-resolved estimates, we then used previously established clock-like mutational single-base signatures (SBS1 and SBS5) [36–38] as a proxy for the time course of mutation accumulation (Fig 2C). Because mutational signatures are currently lacking in arthropods, but the underlying mechanisms appear conserved in evolution [39,40], we used human mutational signatures. We further assumed that the arrow of time from past to present corresponds to the arrow of increasing 1/f. To obtain a time course in arbitrary units, we calculated the integral of the clock-like components of mutation accumulation (Fig 2D, Methods). According to the mathematical model, the slope of this curve is proportional to the mutation rate as a function of time (S1 Text, Eq 12). The results (Fig 2D) showed that this mutation rate exhibited less variation than the mutation rate per division (Fig 2A). Because the temporal and per-division mutation rates differ in particular by the growth rate (S1 Text), this might indicate fluctuations in the growth rate of the marbled crayfish population. As a whole, our analyses suggested distinct phases, detected significant variations of evolutionary parameters in P. virginalis, and allowed to trace its speciation to a time point that is consistent with biological records.
In P. virginalis, we developed a framework to analyse the evolution of a clonal genome, which is driven by germline mutations. This framework can in principle also be applied to analyse the clonal evolution of a tumor genome, which is driven by somatic mutations. Since glioblastoma is a high grade tumor with systematic recurrence and poor patient survival, a better understanding of its evolutionary parameters is important. We therefore applied our framework to a published set of whole-genome sequencing data of primary and recurrent glioblastoma tumors [23]. This study also estimated the age of primary tumors, allowing further data integration. Based on the curve M(1/f), we generated mutation rate profiles (Figs 3A, see S1A for individual samples), which we further segmented into phases (Fig 3A, p < 2.2x10-16). The results indicated distinct variations in the mutation rate in primary and recurrent samples (Fig 3; S1 Table). In the exemplary sample 1 in Fig 3A, the segmentation separates 5 phases significantly. K-fold cross-validation showed that the mean square error was 2379.4 for 4 phases and 1184.0 for 5 phases. The difference between test and validation mean square errors was 291.0 for 4 phases and 135.5 for 5 phases. These results strongly suggest the absence of overfitting. After we excluded the outermost phases 1 and 5, where changes in mutation frequencies may correspond to an early slow growth phase, or where our analysis may miss low-frequency mutations [23], the mutation rate per division decreased steadily in phases 2–4.
Fig 3. Mutation accumulation, selection and time dynamics of a representative glioblastoma tumor (patient 1, primary tumor), expansion patterns and survival ratio.
(A) Mutation accumulation as a function of the inverse allele frequency 1/f (black) and phases from automated segmentation (breakpoints are indicated as dashed vertical lines, segments are indicated in blue). The confidence band at 95% level is shown in grey. (B) Non-synonymous to synonymous ratio dNdS. In the lower inset, purple and blue bars show non-synonymous and synonymous mutations, respectively. The smoothened ratio is shown in red. (C) Clock-like and non-clock-like mutational signatures. (D) Mutation accumulation as a function of time. (E) Stratification of expansion curves ωγN into 4 subgroups: A: Convex, B: Peak, C: Increase, D: Paused Start (42 primary tumor GBM samples). (F) Dependence of time to recurrence on the γR/γP ratio. Fit1 corresponds to a linear regression of time versus log10(γR/γP), with intercept = 19.511 (standard error SE = 2.544) and slope = -5.819 (SE = 1.455), fit2 corresponds to a linear regression of time versus log10(log10(γR/γP)), with intercept = 18.922 (SE = 1.806) and slope = -29.321 (SE = 5.285).
We next looked at selection using the dN/dS ratio. Taking confidence bounds into account, the results were compatible with neutral selection for most tumors (Figs 3B, S1B per sample). However, 11 primary tumor samples showed evidence of negative selection during intervals, for instance sample 35 (S1B Fig for sample 35). We also observed evidence for positive selection in two primary tumor samples (Samples 2 and 7, S1B Fig). Interestingly, 7 out of 9 recurrent tumor samples underwent prolonged phases of negative selection (for example, sample 4, 1/f in [1/0.5; 1/0.1], S1B Fig), while 2 samples exhibited short phases of negative selection. No recurrent tumor sample showed any significant phase of positive selection.
We next determined the timeline of tumor evolution, by examining the frequency of the stable, clock-like SBS1 signature. Using the information on the clock-like signature SBS1, and using Eq (9) (Methods), we reconstructed M as a function of time (Figs 3D, S1D per sample), in arbitrary units. The slope of this curve is proportional to the mutation rate per time unit. Similar to the mutation rate per division in Fig 3A, the mutation rate per time unit decreased from phase 2 to phase 4, though less markedly. Furthermore, similar to the situation for P. virginalis, the difference observed in sample 1 might indicate fluctuating growth during phases 2–4. More specifically, differences between per-division and temporal mutation rates corresponded in our model to the growth rate, the survival rate, and the number of cells (S1 Text, Eq 8).
We next asked if these terms could be used to characterize the set of 42 primary and recurrent tumor pairs. Using temporal and per division mutation rates, we could reconstruct these terms, which correspond to the product ωγN. This quantity corresponds to growth (ω), modulated by the survival rate (γ), and scaled by the number of cells (N). In other terms, the product ωγN reflects the effective expansion of the tumor. Consequently, we denoted the product ωγN as expansion parameters in the following. We examined the corresponding curves and found that unsupervised clustering allowed us to classify the tumors into four subgroups: (A) Convex, (B) Peak, (C) Increase and (D) Paused Start (Fig 3E).We then looked at a possible association between the patterns of the ωγN curve in the primary tumors, and the time to the recurrence, but the results were inconclusive (p = 0.4916, n = 19). However, the time difference between the resection of the primary tumor and the resection of the recurrence is known for a subset of samples, and the age of tumors was estimated previously [23]. This allowed us to transform the time course from arbitrary units into real units (Methods, Eq 2, S2 Fig). Furthermore, we extended our modelling to be able to express the transition from the primary to the recurrent tumor (Methods, Eq 4). With this, we could determine the tumor survival ratio from time estimates. Using the previously established values of 2 years and 7 years [23] as the lowest and highest limits for the time course of the primary tumors, we could determine a range for the value of the tumor survival ratio γR/γP for each individual sample (Methods, Eq 8, S2 Table). As a result, the lowest value of the ratio γR/γP, corresponding to a tumor emergence about 2 years before diagnosis, was always higher than 1 (median = 27.8 [17.4; 54.0] for the lower bound, median = 97.5 [60.9; 189.0] for the upper bound, n = 20 samples). These results indicated that tumor cell survival was higher at the start of the recurrence than at the end of the primary tumor growth. Not surprisingly, γR/γP ratios were associated with the time to recurrence (Fig 3F, padj = 1.258×10−3 and padj = 8.649×10−4), with higher γR/γP ratios corresponding to shorter time to recurrence. Collectively, these results uncover substantial variations of evolutionary parameters among glioblastoma samples, and provide an improved understanding of growth and survival in tumor subgroups.
Discussion
In this study, we presented an integrated framework to analyse the evolution of clonally evolving genomes. We first determined the mutation rate of P. virginalis to be in the range of [3.51x10-8; 1.165x10-4]/nt/y, which encompasses the mutation rates in other arthropods, the human germline, and in human somatic healthy cells. The upper end is also comparable to microsatellites in arthropods and other species [41–44], and close to the somatic mutation rate in human cancer. Data about mutation rates in triploid genomes are scarce, and it appears possible that it may be associated with higher mutation rates. Interestingly, a high mutation rate was reported in polyploid plants (10−5 order of magnitude [45]). We detected separate evolutionary phases, during which the mutation rate varied significantly. However, the dNdS ratio remained relatively constant, indicating the absence of selection. These findings support the argument that the mutation rate should not be considered constant [1,46,47].
We traced the speciation of P. virginalis to 1988 (95% confidence limits: [1986; 1990]), in agreement with first reports of this animal in 1995 [16]. This exceptionally young evolutionary age is consistent with the largely monoclonal population structure showing only incipient genetic differentiation [19,20]. It also provides experimental support for the hypothesis that the global marbled crayfish population descended from a single anthropogenic transport and release event [16,21] and further establishes the species as a unique model system.
In tumor samples, our approach allowed a single patient-level analysis of evolutionary parameters, and similarly revealed the presence of different phases, variations of the mutation rate, and a few significant events of selection. Multisector and single-cell sequencing studies have highlighted high levels of heterogeneity and either clonal selection, or an almost complete overlap, between primary and relapsed glioblastoma tumors [48–50]. In this context, our study identified varied patterns, either of selection, or of neutral evolution. This appears comparable to previously published results [3,51], where either selection or neutrality was observed, depending on the context. Interestingly, negative selection occurred most often early in the mutation accumulation process of primary tumors, and corresponded to a low mutational load (S1 Fig, exemplary tumors 2,28,42), in agreement with recent results [52], whereas selection in recurrent tumors followed this pattern only to some extent.
Utilizing the difference between temporal and per-division mutation rate, we could stratify the samples into 4 subgroups. While clinical subtypes for GBM have been described, single-cell studies revealed high intratumoral heterogeneity [48]. In this context, our approach offers a possible alternative, although association with clinical outcome remains to be established. Building on previously estimated tumor age, we could also derive the survival ratio for tumor cells in the recurrence, relatively to the primary tumor. We found that tumor cells survive better at the start of the recurrence, albeit with important variations. This supports the notion that GBM regrowth can be more aggressive after surgical resection [53,54], possibly because resection-induced astrocyte injury can support faster growth [54], or because the tumor microenvironment can promote tumor regrowth [55–57]. Conversely, a stronger immune response might also inhibit tumor regrowth.
As our study aims to explore novel connections between diverse fields of research, we find it important to explain several limitations. For example, a more precise determination of the P. virginalis mutation rate could be achieved by the development of novel tools that are more amenable to triploid genomes and by experimental validation [25,29]. Also, the coalescent tree could be refined by the use of sequencing datasets with higher genome coverage to reduce the potential impact of noisy variants. For the tumor samples, it would be important to better understand the potential effect of ploidy changes. This could be achieved by restricting the analysis to diploid regions, or by the integration of ploidy information into the model. In an extended model, copy number information could also provide information on the timing of certain mutations [58,59].
In conclusion, this integrated analysis of mutation accumulation, dNdS ratio and mutational signatures provided a detailed landscape of evolutionary parameters in two paradigms of clonal genome evolution. We showed an exceptionally recent emergence date for P. virginalis and established its potential as a model system. We highlighted subtle temporal dynamics of selection in tumor samples, and showed that a quantification of tumor cell survival was correlated with the time to recurrence. Our findings illustrate the potential of this framework for modelling of clonal evolution and for the use of evolutionary parameters in a clinical context.
Materials and methods
Ethics statement
The commitee responsible for the usage of human subject data from the EGAS00001003184 study is the DKFZ-HIPO Data Access Committee of Heidelberg Center for Personalized Oncology. The approval was granted by this committee. We used the data in compliance with the declared, and approved, usage.
Procambarus virginalis samples
Freshwater crayfish samples from [19] were used. Additionally, samples from Madagascar 1 sample and Moosweiher sample were resequenced (S3 and S4 Tables). Animal 1 corresponds to the lab strain, acquired from a pet shop. New genomic DNA samples were taken from animal 34 and animal 35, which, as animal 1, also correspond to lab strains animals, and which are direct offsprings of animal 1. These new samples were prepared and submitted for whole genome sequencing following the protocol already described [19]. The genealogy and birth date of animals were retrieved from laboratory records and field records (S3 Table). Sequence data was trimmed using Trimmomatic v0.32 (settings: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:40, adapter sequence: TruSeq3-PE). Next, trimmed data was mapped to Pvir genome assembly v04 (https://www.ncbi.nlm.nih.gov, Bioproject Accession: PRJNA356499), using Bowtie2 (v2.2.6, setting:--sensitive). The quality of this assembly is comparable to other published genomes in non-standard organisms, but there is still a higher level of fragmentation. This might preclude or render mutation detection more difficult, in these parts of the genome which are located in or at the boundary of a gap. Aligned reads were sorted, cleared from duplicates, sorted and indexed using samtools. Subsequently, variant calling was performed using Freebayes v0.9.21-g7dd41db (parameters:--report-all-haplotype-alleles -P 0.7 -p 3--min-mapping-quality 30--min-base-quality 20--min-coverage 6--report-genotype-likelihood-max).
Glioblastoma Multiforme samples
The glioblastoma primary and recurrent tumor samples correspond to the WGS cohort already described in [23]. In particular, summary information can be found in supplementary table 1 of [23]. After approval of the research project, access to the SNP data of primary and recurrent tumor samples, as well as time to recurrence when available, was granted.
Mutation rate of P. virginalis
Mutation accumulation between animals 1 and descendant animals 34 and 35 was used to calculate the mutation rate. SNP variants were examined in terms of quality and coverage. Variants with quality≥35, coverage≥50 (strict cutoff for the lower bound of the mutation rate) and 25 (relaxed cutoff for the upper bound of the mutation rate) and ≤200 were retained for the main estimate of the mutation rate. Coverages 200 and higher exhibited altered SNP distribution and were thus excluded because possibly corresponding to a distinct part of P. virginalis genome (possibly highly repetitive and variable domains). The number of callable sites was 486234 for the upper bound of the mutation rate. Eligible sites were considered when the alternate allele was present in only one line [60]. Subsequently, the mutation rate per nucleotide per year was calculated as the count of biallelic mutated nucleotides in animal 34 (respectively, animal 35) as compared to animal 1, divided by the count of nucleotides in the triploid genome of P. virginalis, N = 10.5x109 (for the lower bound of the mutation rate), and by the number of callable sites (for the upper bound), and divided by the time (5.75 and 6.08 years), between the birth dates of animal 1 and 34 (respectively, birth date of animal 35). We assumed that counts of new mutations follow a Binomial distribution, and with this we determined the standard deviation on the count of mutations observed (genotyping uncertainty). Second, we assumed that the standard deviation for the date of animal birth equated to a third of the total uncertainty on time of birth. Third, we calculated the standard deviation between mutation rates for animal 34 and animal 35 (biological variability). Finally, we took the total standard deviation as the quadratic sum of these three components (assuming that the different sources of variability follow a normal distribution).
Coalescent time
Time to most recent common ancestor for P. virginalis samples was determined using Bayesian evolutionary analysis by sampling trees (BEAST v1.10.4 [35]). Mutation data with quality >35 and coverage depth >15 was used in this analysis (a coverage cutoff of 25 was not justified here because samples other than animals 1, 34 and 35 possessed a notably lower average sequencing depth). Samples birth dates were used as tip dates. Further BEAST parameters used were: simple substitution model with estimated base frequencies, strict clock, skyride coalescent prior. The length of chain for the Markov chain Monte Carlo was 10M. These parameters were built into the BEAST input file, using the utility BEAUTi. The outputs were analyzed using the utility TRACER. In particular, convergence was read from the sampled states curves of the different parameters, and effective sample sizes were adequately >100 (2984 or more), indicating sufficiently decorrelated sampled states.
Study of mutation accumulation
An infinitesimal increment of mutations dM was defined from the evolutionary parameters: mutation rate, ploidy, cell survival, growth and number of cells (see S1 Text for a detailed description). Noticing that these parameters may be heterogeneous in the population (of cells, or animals), we stratified this expression for each subclone, with homogeneous parameters inside of a subpopulation (subclone). We have then noticed that observable allele frequency of a mutation is the one obtained after sequencing and SNP calling, and adapted the expression given in [5] consequently. We linked the observable allele frequency with the features of the subclone where this mutation appeared, namely the change in subclone size, the number of cells, ploidy and time (S1 Text, Eq (5)). Then, hoping to obtain an expression of allele frequency which could be instrumental in the expression of dM, we have determined a formula for the increment of the number of cells dN and for the increment of inverse allele frequency d(1/f). For this latter increment, we have made the assumption that ploidy was constant. With this, we could indeed use the expression of allele frequency in the expression of dM. As a result; we deduced the mutation accumulation dM as a function of inverse allele frequency 1/f, the mutation rate, and constants, in each subclone. Finally, the equation for mutation accumulation over all subclones dM was obtained by summing the individual contributions dM of each subclone. The mutation accumulation curve M(1/f), of which slope corresponds to dM/d(1/f), was plotted from SNP data (filtered by quality phred score QUAL ≥30, depth ≥10, depth of alternate allele ≥3), and from the corresponding allele frequencies. For uncertainties, we used bins of +/-0.25 over 1/f. In each bin, the mutation count was subjected to bootstrap resampling, which yielded a bootstrap distribution. The confidence interval at 95% was taken as the interval bounded by the 2.5% and 97.5% quantiles of the bootstrap distribution. For P. virginalis, the confidence interval for mutation accumulation was calculated assuming a student distribution.
Mutation annotation and dNdS ratio
Mutations were annotated as synonymous or non-synonymous (including splice or stopgain mutations) using SNPdat v1.0.5. The bias-correcting method dNdScv [3] was tested, but could not be applied on the relatively low number of mutations at hand. Also, dNdScv does not provide longitudinal estimates. As a more pliant, but non bias-correcting method, we calculated the dNdS ratio as the quotient of non-synonymous mutations by synonymous mutations in a sample, divided by the average quotient in the full genome. The average quotient of non-synonymous to synonymous in humans was calculated from the reference coding sequences (hg19): fasta sequences were filtered out of coding sequences not starting with AUG or not ending with a stop codon, converted to codons, sorted and counted using a custom bash script. Using a spreadsheet, all non-redundant mutations were evaluated as synonymous or nonsynonymous. The total count of synonymous (respectively, nonsynonymous) possible mutations per codon was taken as the count of this codon type multiplied by the number of possible synonymous (nonsynonymous) mutations evaluated from the spreadsheet. The dN/dS ratio was taken as the sum over all codons of the count of nonsynonymous mutations, divided by the sum over all codons of the count of synonymous mutations. Uncertainties were determined by bootstrap resampling of mutations, and calculations of the dNdS ratio for each bootstrap sample.
Mutational signatures
Mutational signatures are combinations of mutations which are representative of the action of different mutagenic processes, such as exogenous (ultraviolet light) and endogenous (5-methylcytosine deamination) mechanisms, enzymatic DNA editing, DNA repair mistakes, and DNA replication infidelity [4]. The single-base substitution mutational signatures (in 3-nucleotides context) for human subjects were downloaded from the COSMIC database (https://cancer.sanger.ac.uk/signatures/; version 3.1 as of 11.08.2020). Since mutational mechanisms are conserved across the animal kingdom, we hypothesized that human mutational signatures could be applied to the marbled crayfish. For the longitudinal analysis of the mutational signatures, mutation data was binned using a bin half-width equal to 0.5 on the inverse allele frequency. The exposure of binned data was determined using R 3.5.2 with package YAPSA (version 1.8.0), where exposure corresponds to the individual contribution of each signature. Uncertainty on mutational signatures was determined by bootstrap resampling of mutations and generation of the binned data and YAPSA exposures on the resampled data. We have used 1000 bootstrap replicates as a compromise between an ideally larger (1M) number of replicates, and reasonable computing time. Large mutation sets (>100,000 mutations) were subsampled to 50,000–60,000 mutations for the bootstrap analysis. Mean, median, percentiles and 95% confidence bounds were determined using the resulting bootstrap distribution.
Time course
We made the assumption that clock-like mutational signatures SBS1 and SBS5 were a surrogate indicator for time (SBS1 only for glioblastoma, in agreement with [36]). Because mutagenic mechanisms are conserved across the animal kingdom, and because mutational signatures in arthropods are currently lacking, we have assumed that human mutational signatures are sufficiently representative in the marbled crayfish. We further assumed that the arrow of time, could be identified with the arrow of inverse allele frequency 1/f. Under these assumptions, an increment of time can be determined in arbitrary units, by integrating θ, which denotes the exposure to clock-like mutations SBS1 or SBS5, over an increment of inverse allele frequency 1/f. Since the exposure θ is also proportional to the number of cells in the tumor, it is necessary to normalize θ, by dividing its value by the number of cells N. Since N is proportional to 1/f under some assumptions (S1 Text), N can be replaced, up to an unknown constant, by 1/f. This yielded the formula for determining time t in arbitrary units, over an interval of time which is unknown, but identifiable with an interval over inverse allele frequency 1/f:
| (1) |
where ta.u. is the time in arbitrary units (a.u.), and θ corresponds to the exposure to clock-like mutational signatures. By computing ta.u. at all values of 1/f, we obtained a vector of values Ta.u. for the time in arbitrary units, from its minimum value, min(Ta.u.) (0 by definition), to its maximum value, max(Ta.u.) (which corresponds to integration from (1/f)min to (1/f)max. Confidence limits at 95% for time were calculated using the confidence bounds for mutational signature SBS1, taken as exposure θ.
Time calibration in recurrent glioblastomas. In a subset of glioblastoma samples, we additionally know the time-to-relapse, denoted T, in months. We thereby identify the time course of the relapse [0; T] in months, with the time course in arbitrary units [0; max(Ta.u.)] (see the preceding paragraph "Time course" for the calculation of Ta.u.). To obtain t, the time in real units at any instant t in the interval [0; T], ta.u. is multiplied by the scaling factor T/max(Ta.u.):
| (2) |
Time propagation from primary to recurrent glioblastomas. To obtain a link between the time course in the primary tumor, and the time course in the recurrent tumor, we studied the ratio of mutation accumulation between the end of primary tumor (subscripted ’P’, taken as the last 5% time points) and start of recurrence (subscript ’R’, first 5% time points), over the entire tumor. This is justified by the observation that the passage from the primary tumor to the recurrence corresponds to the instant of primary tumor resection. Using Eq 1 (S1 Text), this ratio could be written as follows:
| (3) |
where subscript i per subclone is not used, because we considered here that evolution parameters are taken over the whole tumor. The constant term π can be normalized out of this ratio. Further, we have assumed that the mutation rate μ and division rate ω stay constant over this short period, because they are intrinsic features of the tumor cells. However, the count of tumor cells N(t), and the tumor cell survival rate γ(t) could not be considered constant. Regarding N(t), we expressed it as the ratio of inverse allele frequency, since it is proportional to N [5]:
| (4) |
Of note, expression (4) is biased in practice by mutations which are not de novo in the recurrence, but inherited from the primary tumor. Ideally, only de novo mutations should be included to perform this calculation. Finally, the survival rate of tumor cells, γ, also could not be considered constant, and we had no indicator or surrogate for this value. For this reason, we have set an arbitrary value for the survival at end of primary tumor, relatively to the start of recurrence, γP/γR = 1/300.
Using the above, we could obtain expression (5) for dM/dt at end of primary tumor:
| (5) |
From this, and since the number of mutations at the end of the primary tumor, as well as the rest of parameters, was known, the time in real units at the end of primary tumor could be determined as follows:
| (6) |
Similarly to the time course in the recurrence, the fact that the last instant of the recurrence, dtP, was known, allowed to calibrate the time course t in the primary at each instant, by multiplying by the scaling factor dtP/dta.u.,P:
| (7) |
Tumor cell survival ratio
For time calibration to real units, we have made an assumption on tumor cell survival ratio γR/γP to determine real time in the primary tumor. To quantify the survival ratio, we have proceeded the other way around, using an assumption on the time course in the primary tumor, in order to determine the ratio γR/γP. We have utilized previously established values from [23] which estimates that the time between the most recent common ancestor (TMRCA) lies either 2 years or 7 years before primary tumor resection. As a consequence, we have assumed that these durations corresponded to the lowest and highest limits for the time course of the primary tumors The calculation of the tumor cell survival ratio was done by reformulating expression (6) into the following expression:
| (8) |
where all terms on the right hand side are known, either from the data (dM, f, dtR) or from assumptions (dtP).
Tumor expansion profile
From Eq (8) in S1 Text, time and 1/f are proportional, with, as modulators, the growth rate ω, the tumor cell survival rate γ, and the number of tumor cells N:
| (9) |
where the increment d(1/f) is known from the data, and increment dt could be determined above, in section "Time course". As a consequence, one obtains the product ωγN, by dividing the increment d(1/f) by the increment dt. The expansion parameters ωγN are known within the constants of expression (8) from S1 Text, which are π/Ki,r. The curves giving ωγN as a function of time are denoted expansion curves or expansion profiles.
Curve segmentation
Segments for curves M(1/f) were determined using R package segmented (v1.1–0), using an objective adjusted R2 set to 0.995 (P. virginalis) and 0.9995 (GBM), and using the lowest number of segments which attained this objective, limited to a maximum of 20 breakpoints. P-values of significant changes between segments were evaluated using Davies’ test (implementation from R package segmented). Because the regions before the first breakpoint, and after the last breakpoint, display marked and consistent differences with the general profile of the curve, we hypothesized that the automated segmentation revealed clonal mutations, or mutations which could originate from contamination by normal tissue (mutations with allele frequency lower than the first breakpoint), as well as mutations likely affected by the limit of detection of SNVs on sequencing data (mutations with allele frequency higher than the last breakpoint), We exclude those mutations, and restrict the accepted range to the interval between the first and last breakpoints.
Classification of expansion profiles
Expansion profiles ωγN(t) were subjected to curve clustering by k-means. To this aim, expansion profiles of 42 primary tumors were converted into a uniform arbitrary timeline of 1 to 1000, centered and standardized. Clusters were then determined using R package kml version 2.4.6, with default number of clusters and 5 redrawings. The quality of clustering was inspected using the criterion of Calinski-Harabatz (CH), which value was 25.43599 and 24.73392 for 3 and 4 clusters, respectively (CH values were in the 20.85733–22.32881 range for 2, 5 or 6 clusters). Because the partition into 4 clusters had a reasonably high CH value and was able to represent the "Increasing" type of curve while the partitions with 2 or 3 clusters did not, the partition into 4 clusters was retained.
Statistical analyses
R [61] and Python [62] were used for all statistical analyses. Confidence intervals at 95% probability for the tree root in P. virginalis are taken as the 95% highest posterior density (HPD) interval. All statistical tests were unpaired and two-sided, with a level of significance set at 5%. Segmentation p-values correspond to the test of significant difference between segments (Davies’ test, R package segmented). To check against overfitting, k-fold cross-validation of the segmentation procedure was conducted, with k = 10 folds. The resulting difference between test and validation mean square errors (MSE) was then examined. A non-increasing difference, when including one additional phase, indicates the absence of overfitting. Correlation coefficients between SBS1 and SBS5 were determined using Pearson method, and summarized by their median and IQR over GBM samples. A comparison between groups was made using an unpaired Wilcoxon rank-sum test. Differential time to recurrence between subgroups in the manually sorted ωγN curves was assessed using a Kruskal-Wallis rank-sum test between curve types "Peak", "Increase" and "Paused Start". The "Convex" curve type was excluded, because there was only 1 instance of this kind of curve, in the subset of samples with infomation on the time to recurrence. The association of the γR/γP ratio with the time to recurrence was assessed with a linear regression, using a simple or double log10 scale on the γR/γP ratio, with Bonferroni adjustment.
Supporting information
Panels A1-D1 show the primary tumor T1, panels A2-D2 show the recurrent tumor T2. (A) Mutation accumulation as a function of the inverse of allele frequency 1/f (black) and phases from automated segmentation (breakpoints (grey) and segments (blue)). The confidence band at 95% level is indicated in grey. (B) Nonsynonymous to synonymous ratio. Purple and blue stars show nonsynonymous and synonymous mutations, respectively. The smoothened ratio is shown in red. (C) Clock-like and non-clock-like mutational signatures. (D) Mutation accumulation as a function of time.
(PDF)
(A) Dynamics of growth rate ω times tumor cell survival rate γ times number of cells N, for (P) the primary tumor and (R) the recurrence. (B) Time-resolved mutation accumulation for primary tumor and recurrence.
(PDF)
(PDF)
P-values for test of difference between segments for P. virginalis and glioblastoma samples.
(DOCX)
Characteristics of tumor cell survival ratio γR/γP (n = 20).
(DOCX)
Procambarus virginalis Samples.
(DOCX)
Raw sequencing data for new and resequenced Procambarus virginalis samples.
(DOCX)
(XLSX)
Acknowledgments
We would like to thank Verena Körber and Thomas Höfer for helpful discussions and for providing data, and Julian Gutekunst for discussions about the methods. We would also like to thank Katharina Hanna for data and for crayfish culture, and Sina Tönges for sample processing. We further acknowledge the German Cancer Research Center Genomics and Proteomics Core Facility for whole-genome sequencing.
Data Availability
Sequence data for marbled crayfish data have been deposited as a National Center for Biotechnology Information BioProject (accession number: PRJNA356499). Glioblastoma data were accessed from the European Genome-phenome Archive (EGA) database, with accession number: EGAS00001003184 (glioblastoma).
Funding Statement
ANR STEM R20117HH
References
- 1.Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, et al. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet. 2016. Oct 14;17(11):704–14. doi: 10.1038/nrg.2016.104 [DOI] [PubMed] [Google Scholar]
- 2.Kent DG, Green AR. Order Matters: The Order of Somatic Mutations Influences Cancer Evolution. Cold Spring Harb Perspect Med. 2017. Apr 3;7(4):a027060. doi: 10.1101/cshperspect.a027060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell. 2017. Nov 16;171(5):1029–1041.e21. doi: 10.1016/j.cell.2017.09.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013. Aug 22;500(7463):415–21. doi: 10.1038/nature12477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat Genet. 2016. Mar;48(3):238–44. doi: 10.1038/ng.3489 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sottoriva A, Barnes CP, Graham TA. Catch my drift? Making sense of genomic intra-tumour heterogeneity. Biochim Biophys Acta Rev Cancer. 2017. Apr;1867(2):95–100. doi: 10.1016/j.bbcan.2016.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Balaparya A, De S. Revisiting signatures of neutral tumor evolution in the light of complexity of cancer genomic data. Nat Genet. 2018. Dec;50(12):1626–8. doi: 10.1038/s41588-018-0219-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tataru P, Simonsen M, Bataillon T, Hobolth A. Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data. Syst Biol. 2017. Jan 1;66(1):e30–46. doi: 10.1093/sysbio/syw056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Benton ML, Abraham A, LaBella AL, Abbot P, Rokas A, Capra JA. The influence of evolutionary history on human health and disease. Nat Rev Genet. 2021. May;22(5):269–83. doi: 10.1038/s41576-020-00305-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. Science. 2015. Sep 25;349(6255):1483–9. doi: 10.1126/science.aab4082 [DOI] [PubMed] [Google Scholar]
- 11.Katju V, Bergthorsson U. Old Trade, New Tricks: Insights into the Spontaneous Mutation Process from the Partnering of Classical Mutation Accumulation Experiments with High-Throughput Genomic Approaches. Genome Biol Evol. 2018. Nov 26;11(1):136–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Burrell RA, McGranahan N, Bartek J, Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013. Sep;501(7467):338–45. doi: 10.1038/nature12625 [DOI] [PubMed] [Google Scholar]
- 13.Flynn JM, Chain FJJ, Schoen DJ, Cristescu ME. Spontaneous Mutation Accumulation in Daphnia pulex in Selection-Free vs. Competitive Environments. Mol Biol Evol. 2017. Jan;34(1):160–73. doi: 10.1093/molbev/msw234 [DOI] [PubMed] [Google Scholar]
- 14.Hausser J, Alon U. Tumour heterogeneity and the evolutionary trade-offs of cancer. Nat Rev Cancer. 2020. Apr;20(4):247–57. doi: 10.1038/s41568-020-0241-6 [DOI] [PubMed] [Google Scholar]
- 15.Scholtz G, Braband A, Tolley L, Reimann A, Mittmann B, Lukhaup C, et al. Ecology: Parthenogenesis in an outsider crayfish. Nature. 2003. Feb 20;421(6925):806. doi: 10.1038/421806a [DOI] [PubMed] [Google Scholar]
- 16.Lyko F. The marbled crayfish (Decapoda: Cambaridae) represents an independent new species. Zootaxa. 2017. Dec 12;4363(4):544–52. doi: 10.11646/zootaxa.4363.4.6 [DOI] [PubMed] [Google Scholar]
- 17.Martin P, Dorn NJ, Kawai T, Heiden C van der, Scholtz G. The enigmatic Marmorkrebs (marbled crayfish) is the parthenogenetic form of Procambarus fallax (Hagen, 1870). Contrib Zool. 2010;79(3):107. [Google Scholar]
- 18.Vogt G. The marbled crayfish: a new model organism for research on development, epigenetics and evolutionary biology. J Zool. 2008;276(1):1–13. [Google Scholar]
- 19.Gutekunst J, Andriantsoa R, Falckenhayn C, Hanna K, Stein W, Rasamy J, et al. Clonal genome evolution and rapid invasive spread of the marbled crayfish. Nat Ecol Evol. 2018. Mar;2(3):567–73. doi: 10.1038/s41559-018-0467-9 [DOI] [PubMed] [Google Scholar]
- 20.Maiakovska O, Andriantsoa R, Tönges S, Legrand C, Gutekunst J, Hanna K, et al. Genome analysis of the monoclonal marbled crayfish reveals genetic separation over a short evolutionary timescale. Commun Biol. 2021. Jan 18;4(1):1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gutekunst J, Maiakovska O, Hanna K, Provataris P, Horn H, Wolf S, et al. Phylogeographic reconstruction of the marbled crayfish origin. Commun Biol. 2021. Sep 17;4:1096. doi: 10.1038/s42003-021-02609-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature. 2020. Feb;578(7793):94–101. doi: 10.1038/s41586-020-1943-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Körber V, Yang J, Barah P, Wu Y, Stichel D, Gu Z, et al. Evolutionary Trajectories of IDHWT Glioblastomas Reveal a Common Path of Early Tumorigenesis Instigated Years ahead of Initial Diagnosis. Cancer Cell. 2019. Apr 15;35(4):692–704.e12. doi: 10.1016/j.ccell.2019.02.007 [DOI] [PubMed] [Google Scholar]
- 24.Silliman K, Indorf JL, Knowlton N, Browne WE, Hurt C. Base-substitution mutation rate across the nuclear genome of Alpheus snapping shrimp and the timing of isolation by the Isthmus of Panama. BMC Ecol Evol. 2021. May 28;21(1):104. doi: 10.1186/s12862-021-01836-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yang S, Wang L, Huang J, Zhang X, Yuan Y, Chen JQ, et al. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature. 2015. Jul 23;523(7561):463–7. doi: 10.1038/nature14649 [DOI] [PubMed] [Google Scholar]
- 26.Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S. Direct Determination of the Mutation Rate in the Bumblebee Reveals Evidence for Weak Recombination-Associated Mutation and an Approximate Rate Constancy in Insects. Mol Biol Evol. 2017. Jan;34(1):119–30. doi: 10.1093/molbev/msw226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Oppold AM, Pedrosa JAM, Bálint M, Diogo JB, Ilkova J, Pestana JLT, et al. Support for the evolutionary speed hypothesis from intraspecific population genetic data in the non-biting midge Chironomus riparius. Proc Biol Sci. 2016. Feb 24;283(1825):20152413. doi: 10.1098/rspb.2015.2413 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ho EKH, Macrae F, Latta LC, McIlroy P, Ebert D, Fields PD, et al. High and Highly Variable Spontaneous Mutation Rates in Daphnia. Mol Biol Evol. 2020. Nov 1;37(11):3258–66. doi: 10.1093/molbev/msaa142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Keightley PD, Ness RW, Halligan DL, Haddrill PR. Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics. 2014. Jan;196(1):313–20. doi: 10.1534/genetics.113.158758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Keightley PD, Pinharanda A, Ness RW, Simpson F, Dasmahapatra KK, Mallet J, et al. Estimation of the spontaneous mutation rate in Heliconius melpomene. Mol Biol Evol. 2015. Jan;32(1):239–43. doi: 10.1093/molbev/msu302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ohno M. Spontaneous de novo germline mutations in humans and mice: rates, spectra, causes and consequences. Genes Genet Syst. 2019. Apr 9;94(1):13–22. doi: 10.1266/ggs.18-00015 [DOI] [PubMed] [Google Scholar]
- 32.Lee-Six H, Øbro NF, Shepherd MS, Grossmann S, Dawson K, Belmonte M, et al. Population dynamics of normal human blood inferred from somatic mutations. Nature. 2018. Sep;561(7724):473–8. doi: 10.1038/s41586-018-0497-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Blokzijl F, de Ligt J, Jager M, Sasselli V, Roerink S, Sasaki N, et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature. 2016. Oct 13;538(7624):260–4. doi: 10.1038/nature19768 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018. Mar 15;555(7696):371–6. doi: 10.1038/nature25795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007. Nov 8;7:214. doi: 10.1186/1471-2148-7-214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, et al. Clock-like mutational processes in human somatic cells. Nat Genet. 2015. Dec;47(12):1402–7. doi: 10.1038/ng.3441 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Petljak M, Alexandrov LB, Brammeld JS, Price S, Wedge DC, Grossmann S, et al. Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell. 2019. Mar 7;176(6):1282–1294.e20. doi: 10.1016/j.cell.2019.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Baez-Ortega A, Gori K, Strakova A, Allen JL, Allum KM, Bansse-Issa L, et al. Somatic evolution and global expansion of an ancient transmissible cancer lineage. Science. 2019. Aug 2;365(6452):eaau9923. doi: 10.1126/science.aau9923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bauer NC, Corbett AH, Doetsch PW. The current state of eukaryotic DNA base damage and repair. Nucleic Acids Res. 2015. Dec 2;43(21):10083–101. doi: 10.1093/nar/gkv1136 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sedgwick B. Repairing DNA-methylation damage. Nat Rev Mol Cell Biol. 2004. Feb;5(2):148–57. doi: 10.1038/nrm1312 [DOI] [PubMed] [Google Scholar]
- 41.Seyfert AL, Cristescu MEA, Frisse L, Schaack S, Thomas WK, Lynch M. The rate and spectrum of microsatellite mutation in Caenorhabditis elegans and Daphnia pulex. Genetics. 2008. Apr;178(4):2113–21. doi: 10.1534/genetics.107.081927 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Chapuis MP, Plantamp C, Streiff R, Blondin L, Piou C. Microsatellite evolutionary rate and pattern in Schistocerca gregaria inferred from direct observation of germline mutations. Mol Ecol. 2015;24(24):6107–19. doi: 10.1111/mec.13465 [DOI] [PubMed] [Google Scholar]
- 43.Yue GH, David L, Orban L. Mutation rate and pattern of microsatellites in common carp (Cyprinus carpio L.). Genetica. 2007. Mar 1;129(3):329–31. doi: 10.1007/s10709-006-0003-8 [DOI] [PubMed] [Google Scholar]
- 44.Jakovlić I, Gui JF. Recent invasion and low level of divergence between diploid and triploid forms of Carassius auratus complex in Croatia. Genetica. 2011. Jun 1;139(6):789–804. doi: 10.1007/s10709-011-9584-y [DOI] [PubMed] [Google Scholar]
- 45.Ramsey J, Schemske DW. Pathways, Mechanisms, and Rates of Polyploid Formation in Flowering Plants. Annu Rev Ecol Syst. 1998;29(1):467–501. [Google Scholar]
- 46.Rubanova Y, Shi R, Harrigan CF, Li R, Wintersinger J, Sahin N, et al. Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig. Nat Commun. 2020. Feb 5;11(1):731. doi: 10.1038/s41467-020-14352-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.DeWitt WS, Harris KD, Ragsdale AP, Harris K. Nonparametric coalescent inference of mutation spectrum history and demography. Proc Natl Acad Sci U S A. 2021. May 25;118(21):e2013798118. doi: 10.1073/pnas.2013798118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hernández Martínez A, Madurga R, García-Romero N, Ayuso-Sacido Á. Unravelling glioblastoma heterogeneity by means of single-cell RNA sequencing. Cancer Lett. 2022. Feb 28;527:66–79. doi: 10.1016/j.canlet.2021.12.008 [DOI] [PubMed] [Google Scholar]
- 49.Li J, Zhao YH, Tian SF, Xu CS, Cai YX, Li K, et al. Genetic alteration and clonal evolution of primary glioblastoma into secondary gliosarcoma. CNS Neurosci Ther. 2021. Dec;27(12):1483–92. doi: 10.1111/cns.13740 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kim H, Zheng S, Amini SS, Virk SM, Mikkelsen T, Brat DJ, et al. Whole-genome and multisector exome sequencing of primary and post-treatment glioblastoma reveals patterns of tumor evolution. Genome Res. 2015. Mar;25(3):316–27. doi: 10.1101/gr.180612.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sottoriva A, Kang H, Ma Z, Graham TA, Salomon MP, Zhao J, et al. A Big Bang model of human colorectal tumor growth. Nat Genet. 2015. Mar;47(3):209–16. doi: 10.1038/ng.3214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tilk S, Tkachenko S, Curtis C, Petrov DA, McFarland CD. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. eLife 2022;11:e67790 doi: 10.7554/eLife.67790 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Okolie O, Bago JR, Schmid RS, Irvin DM, Bash RE, Miller CR, et al. Reactive astrocytes potentiate tumor aggressiveness in a murine glioma resection and recurrence model. Neuro-Oncol. 2016. Dec;18(12):1622–33. doi: 10.1093/neuonc/now117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pirzkall A, McGue C, Saraswathy S, Cha S, Liu R, Vandenberg S, et al. Tumor regrowth between surgery and initiation of adjuvant therapy in patients with newly diagnosed glioblastoma. Neuro-Oncol. 2009. Dec 1;11(6):842–52. doi: 10.1215/15228517-2009-005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Scott JN, Rewcastle NB, Brasher PM, Fulton D, MacKinnon JA, Hamilton M, et al. Which glioblastoma multiforme patient will become a long-term survivor? A population-based study. Ann Neurol. 1999. Aug;46(2):183–8. [PubMed] [Google Scholar]
- 56.Monteiro AR, Hill R, Pilkington GJ, Madureira PA. The Role of Hypoxia in Glioblastoma Invasion. Cells. 2017. Nov 22;6(4):45. doi: 10.3390/cells6040045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sabelström H, Quigley DA, Fenster T, Foster DJ, Fuchshuber CAM, Saxena S, et al. High density is a property of slow-cycling and treatment-resistant human glioblastoma cells. Exp Cell Res. 2019. May 1;378(1):76–86. doi: 10.1016/j.yexcr.2019.03.003 [DOI] [PubMed] [Google Scholar]
- 58.McGranahan N, Favero F, de Bruin EC, Birkbak NJ, Szallasi Z, Swanton C. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci Transl Med. 2015. Apr 15;7(283):283ra54. doi: 10.1126/scitranslmed.aaa1408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, et al. The evolutionary history of 2,658 cancers. Nature. 2020. Feb;578(7793):122–8. doi: 10.1038/s41586-019-1907-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.López-Cortegano E, Craig RJ, Chebib J, Samuels T, Morgan AD, Kraemer SA, et al. De Novo Mutation Rate Variation and Its Determinants in Chlamydomonas. Mol Biol Evol. 2021. Sep 1;38(9):3709–23. doi: 10.1093/molbev/msab140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2018. [Google Scholar]
- 62.Van Rossum G, Drake FL Jr. Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica; 1995. [Google Scholar]



