Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2023 Dec 14;19(12):e1011085. doi: 10.1371/journal.pgen.1011085

Time-resolved, integrated analysis of clonally evolving genomes

Carine Legrand 1,2,*, Ranja Andriantsoa 1, Peter Lichter 3,4, Günter Raddatz 1, Frank Lyko 1
Editor: Ville Mustonen5
PMCID: PMC10754456  PMID: 38096267

Abstract

Clonal genome evolution is a key feature of asexually reproducing species and human cancer development. While many studies have described the landscapes of clonal genome evolution in cancer, few determine the underlying evolutionary parameters from molecular data, and even fewer integrate theory with data. We derived theoretical results linking mutation rate, time, expansion dynamics, and biological/clinical parameters. Subsequently, we inferred time-resolved estimates of evolutionary parameters from mutation accumulation, mutational signatures and selection. We then applied this framework to predict the time of speciation of the marbled crayfish, an enigmatic, globally invasive parthenogenetic freshwater crayfish. The results predict that speciation occurred between 1986 and 1990, which is consistent with biological records. We also used our framework to analyze whole-genome sequencing datasets from primary and relapsed glioblastoma, an aggressive brain tumor. The results identified evolutionary subgroups and showed that tumor cell survival could be inferred from genomic data that was generated during the resection of the primary tumor. In conclusion, our framework allowed a time-resolved, integrated analysis of key parameters in clonally evolving genomes, and provided novel insights into the evolutionary age of marbled crayfish and the progression of glioblastoma.

Author summary

Genomes evolve under the accumulation of mutations, and under the pressure of selective forces. While additional mechanisms are at play in sexually reproducing species, this is not the case in clonal genomes. Our study focuses on a parthogenetic animal and on cancer, since both possess a clonal genome, and in both cases evolutionary forces are key to understand expansion. We used modelling of mutation accumulation, in combination with Darwinian selection and with clock-like mutagenic processes. Using this framework, we showed a remarkably recent emergence date for P. virginalis and established its potential as a model system for clonal genome evolution. We highlighted subtle temporal dynamics of selection in tumor samples, and showed that tumor cell survival was correlated with the time to recurrence. Our findings illustrate the potential of this framework for modelling of clonal evolution and for the use of evolutionary parameters in a clinical context.

Introduction

The evolution of genomes is shaped by many factors, among which the random accumulation of mutations over time plays a fundamental role [1,2]. Because of this, the characteristics of mutated sites can be used as a lens to observe the evolutionary processes that shaped the genome in the past. For instance, the ratio of nonsynonymous to synonymous mutations reveals if selection had an impact on the genome [3]. Furthermore, biological processes, including clock-like processes, leave a footprint in the form of recently developed mutational signatures [4]. The frequency of mutated alleles could also elucidate the timeline of the evolving genome, or selection [5], but this can be confounded by stochastic drift, or by alterations of ploidy [6,7]. These individual evolutionary parameters may or may not exert an influence on each other. An integrated analysis aims at modelling these elements and their interplay, in order to gain a better understanding on their role on the genome.

Far from being homogeneous, the probability of a mutation depends on many factors such as the genomic location [3], mutator alleles, local nucleotide context or mutagenic exposures [4]. Other genomic modifications include recombination in sexual reproduction, copy number variants and genomic rearrangements, gene transfers and hybridization. The capacity of any genomic modification to be inherited is partly stochastic, for instance through genetic drift [8], but can be favored or disfavored by positive or negative selection. Genome evolution was historically observed through the analysis of phenotypes [9], and can now be determined more precisely using high-throughput sequencing in parallel with experimental or cohort settings, such as mutation accumulation experiments, or the analysis of genetic trios [10,11].

Under certain conditions, genomes can evolve clonally, which involves a more limited set of mechanisms. This is particularly relevant for asexually reproducing species and for human cancers. Mutation rate, growth and variant frequencies are key parameters of clonally evolving genomes [12]. They determine the speed of evolution and function under the influence of selective pressures.

Selection can be quantified using the ratio of nonsynonymous to synonymous mutations (dNdS), where a lower than expected ratio indicates purifying (negative) selection, and a higher ratio indicates positive selection. The expected dNdS ratio is not trivial to determine, because the identity of the neighboring genomic bases, or the location of the mutation in the gene transcript, can alter the frequency of certain mutations. Selection is also a multifaceted, dynamic event which actively depends on the environment [13,14]. Of note, the notion of stochastic drift, which corresponds to the random variation of the frequency of alleles (or, of clones), is a process distinct from selection. Stochastic drift can happen without the advent of selection (neutral drift), or in addition to it [1].

A prominent species with a clonally evolving genome is the marbled crayfish (Procambarus virginalis), a newly discovered freshwater crayfish [15,16]. Marbled crayfish reproduce by apomictic parthenogenesis, with the offspring being genetically identical copies of their mothers [17,18]. Interestingly, genetic analyses have suggested that the global marbled crayfish population represents a single clone, indicating that it was formed only recently and by a single foundational animal [19,20]. Morphological and genetic examinations have identified Procambarus fallax, a sexually reproducing slough crayfish from Florida, as the parent species of the marbled crayfish [21]. Furthermore, a recent phylo-geographic analysis of P. fallax suggested that the anthropogenic transport and cultivation of a triploid and parthenogenetically reproducing P. fallax specimen could be the origin of the marbled crayfish [21]. The offspring of this foundational specimen were subsequently distributed through the aquarium trade and released into various environments, thus forming numerous stable wild populations around the globe [20]. However, important details about the speciation of the marbled crayfish are not known and need to be supported by genetic analysis.

Clonal genome evolution also plays an important role in cancer formation. Indeed, cancer genome evolution is characterized by the accumulation of somatic mutations into a pathogenic tumoral genome. Several authors have described the critical role of mutational patterns and selection in cancer [1,3,22], while neutral evolution is still debated [5,7]. In glioblastoma, the analysis of tumor trajectories revealed a tumor initiation years before diagnosis [23]. Consequently, it would be of great interest to infer evolutionary parameters over the course of tumor progression.

In this study, we aimed to develop an integrated analysis of clonal genome evolution. To this end, we reformulated the dependence of mutation accumulation on variant allele frequency, and used this formulation to determine the links between the mutation rate, growth and survival rates. We further integrated these parameters with selection estimates, obtained from the non-synonymous to synonymous ratio. Finally, we integrated time estimates in our model, based on clock-like mutational signatures. We applied our approach to the clonally evolving marbled crayfish. We provided a detailed view of mutation accumulation and selection, and estimated the time of speciation. We further applied our framework to clonal genome evolution in cancer, using recently published samples of primary and recurrent glioblastoma [23].

Results

The genetic near-monoclonality of the marbled crayfish population [19,20] establishes this species as an excellent model system for studying clonal genome evolution. In order to assess the mutation rate of the P. virginalis genome, we used paired-end whole-genome sequencing at an average of 17x coverage (per strand), for a line of one ancestral animal and two direct descendants from our laboratory colony of P. virginalis, that were sampled over a period of seven years (Fig 1A). The mutation rate was calculated as the average number of de novo mutations in animals 34 and 35 as compared to animal 1, per nucleotide and per year. From these samples, we obtained a range for the mutation rate equal to μ=[3.51·108;1.17·104]/nt/y, The lower bound corresponds to strictly filtered mutations divided by the number of sites in the whole genome, while the upper bound corresponds to the number of mutations remaining after a relaxed filtering, divided by the number of callable sites. The range of the mutation rate of P. virginalis genome overlaps with known mutation rates in arthropods, and is also comparable to the mutation rates observed in human somatic healthy or cancerous cells (Fig 1B).

Fig 1. Mutation rate of P. virginalis and coalescent.

Fig 1

(A) Genealogy of laboratory animals, with sequenced animals marked in grey. (B) Mutation rate in P. virginalis, in other arthropods (Silliman et al. 2021 [24], Yang et al. 2015 [25], Liu et al. 2017 [26], Oppold et al. 2016 [27], Flynn et al. 2017 [13], Ho et al. 2020 [28], Keightley et al. 2014 [29], Keightley et al. 2015 [30]), and in Homo sapiens (Ohno et al. 2019 [31], Lee-Six et al. 2018 [32], Blokzijl et al. 2016 [33], Martincorena et Campbell 2015 [10], Ma et al. 2018 [34]). HSC: Hematopoietic Stem Cells, ASCs: Adult Stem Cells (small intestine, colon and liver). Error bars correspond to 95% confidence intervals. (C) Coalescent tree based on a constant mutation rate and sequences of sampled animals. The posterior probability of each branch is indicated in red.

We made an evaluation of the evolutionary age of P. virginalis, using a Markov Chain Monte Carlo with Bayesian evolutionary analysis [35] on whole-genome sequencing datasets from 13 animals (Fig 1C). We generated 10 million states, which allowed convergence of the sampled states, and led to adequately large effective sample sizes (see Methods for details). The resulting coalescent tree showed that animals 1, 34 and 35 correctly clustered together, as well as animals from German wild populations (Hannover, Reilingen, Moosweiher) and from the likely foundational laboratory lineage of the German wild populations (Heidelberg). Furthermore, samples from Madagascar formed a separate branch. Interestingly, Petshop 2 [19] was nested in the branch of animals from Madagascar. This is consistent with the notion that the Malagasy population was founded by an animal that was originally obtained from a German pet shop. Posterior probabilities (Fig 1C, red annotations) indicate highly probable branching for all but the top coalescent event, which has 0.5206 probability. From this tree, the most recent common ancestor of the 13 animals occured in 1988.0 (95% CI: [1986.1; 1989.8]). This is anterior, and therefore broadly consistent, with the first documented appearance of P. virginalis in 1995 [16].

We next modeled mutation accumulation under a fast growth scenario. In this model, the number of mutations dM, arising in a time increment, scaled with the mutation rate and other evolutionary parameters (S1 Text, p.2, expression (1)). Then, we noticed that allele frequency could be expressed based on ploidy and the number of animals [5], and that the number of animals could be further expressed using the rate of reproduction, offspring survival, and the population size (S1 Text, p.4, expression (7)). This reformulation of allele frequency appeared advantageous, because it could then be used to simplify the expression of dM. As a result, under this fast growth scenario, dM could be simply expressed using the mutation rate, constant terms, and allele frequency (S1 Text, p.5, expression (12)). As a consequence, mutation accumulation gave information on the mutation rate, but not on selection (S1 Text). This provided the rationale for examining the dynamics of the mutation rate in P. virginalis using the M(1/f) curve, where M is the number of mutations and f is the allelic frequency (Fig 2A). The resulting curve suggested that the mutation rate changed over time, with 4 phases delineated by a segmented regression (Fig 2A; p = 0.06). The mutation rate was reduced in phase 3, as compared to phases 1 and 2, and increased in phase 4 (Fig 2A). Under our model, selection s is not observable using M(1/f) (S1 Text, Eq 12). We therefore used the ratio of non-synonymous to synonymous mutations to estimate s (Fig 2B). The resulting values were compatible with unity, suggesting the absence of selection.

Fig 2. Mutation accumulation, selection and time course of P. virginalis genome evolution.

Fig 2

(A) Mutation accumulation as a function of the inverse allele frequency 1/f (black) and phases from automated segmentation (breakpoints in grey, segments in red). The confidence band at 95% level is shown in grey. (B) Non-synonymous to synonymous ratio (dNdS). The smoothed ratio is shown in red. (C) Stack plot of exposure, the contribution of each mutational signature. This includes clock-like single-base substition 1 (SBS1) signature, and clock-like single-base substitution 5 (SBS5) signature. Confidence bands at 95% probability are indicated in grey. (D) Mutation accumulation as a function of time. The confidence band at 95% is shown in grey and the smoothened mutation accumulation is shown in red.

In order to obtain time-resolved estimates, we then used previously established clock-like mutational single-base signatures (SBS1 and SBS5) [3638] as a proxy for the time course of mutation accumulation (Fig 2C). Because mutational signatures are currently lacking in arthropods, but the underlying mechanisms appear conserved in evolution [39,40], we used human mutational signatures. We further assumed that the arrow of time from past to present corresponds to the arrow of increasing 1/f. To obtain a time course in arbitrary units, we calculated the integral of the clock-like components of mutation accumulation (Fig 2D, Methods). According to the mathematical model, the slope of this curve is proportional to the mutation rate as a function of time (S1 Text, Eq 12). The results (Fig 2D) showed that this mutation rate exhibited less variation than the mutation rate per division (Fig 2A). Because the temporal and per-division mutation rates differ in particular by the growth rate (S1 Text), this might indicate fluctuations in the growth rate of the marbled crayfish population. As a whole, our analyses suggested distinct phases, detected significant variations of evolutionary parameters in P. virginalis, and allowed to trace its speciation to a time point that is consistent with biological records.

In P. virginalis, we developed a framework to analyse the evolution of a clonal genome, which is driven by germline mutations. This framework can in principle also be applied to analyse the clonal evolution of a tumor genome, which is driven by somatic mutations. Since glioblastoma is a high grade tumor with systematic recurrence and poor patient survival, a better understanding of its evolutionary parameters is important. We therefore applied our framework to a published set of whole-genome sequencing data of primary and recurrent glioblastoma tumors [23]. This study also estimated the age of primary tumors, allowing further data integration. Based on the curve M(1/f), we generated mutation rate profiles (Figs 3A, see S1A for individual samples), which we further segmented into phases (Fig 3A, p < 2.2x10-16). The results indicated distinct variations in the mutation rate in primary and recurrent samples (Fig 3; S1 Table). In the exemplary sample 1 in Fig 3A, the segmentation separates 5 phases significantly. K-fold cross-validation showed that the mean square error was 2379.4 for 4 phases and 1184.0 for 5 phases. The difference between test and validation mean square errors was 291.0 for 4 phases and 135.5 for 5 phases. These results strongly suggest the absence of overfitting. After we excluded the outermost phases 1 and 5, where changes in mutation frequencies may correspond to an early slow growth phase, or where our analysis may miss low-frequency mutations [23], the mutation rate per division decreased steadily in phases 2–4.

Fig 3. Mutation accumulation, selection and time dynamics of a representative glioblastoma tumor (patient 1, primary tumor), expansion patterns and survival ratio.

Fig 3

(A) Mutation accumulation as a function of the inverse allele frequency 1/f (black) and phases from automated segmentation (breakpoints are indicated as dashed vertical lines, segments are indicated in blue). The confidence band at 95% level is shown in grey. (B) Non-synonymous to synonymous ratio dNdS. In the lower inset, purple and blue bars show non-synonymous and synonymous mutations, respectively. The smoothened ratio is shown in red. (C) Clock-like and non-clock-like mutational signatures. (D) Mutation accumulation as a function of time. (E) Stratification of expansion curves ωγN into 4 subgroups: A: Convex, B: Peak, C: Increase, D: Paused Start (42 primary tumor GBM samples). (F) Dependence of time to recurrence on the γRP ratio. Fit1 corresponds to a linear regression of time versus log10(γRP), with intercept = 19.511 (standard error SE = 2.544) and slope = -5.819 (SE = 1.455), fit2 corresponds to a linear regression of time versus log10(log10(γRP)), with intercept = 18.922 (SE = 1.806) and slope = -29.321 (SE = 5.285).

We next looked at selection using the dN/dS ratio. Taking confidence bounds into account, the results were compatible with neutral selection for most tumors (Figs 3B, S1B per sample). However, 11 primary tumor samples showed evidence of negative selection during intervals, for instance sample 35 (S1B Fig for sample 35). We also observed evidence for positive selection in two primary tumor samples (Samples 2 and 7, S1B Fig). Interestingly, 7 out of 9 recurrent tumor samples underwent prolonged phases of negative selection (for example, sample 4, 1/f in [1/0.5; 1/0.1], S1B Fig), while 2 samples exhibited short phases of negative selection. No recurrent tumor sample showed any significant phase of positive selection.

We next determined the timeline of tumor evolution, by examining the frequency of the stable, clock-like SBS1 signature. Using the information on the clock-like signature SBS1, and using Eq (9) (Methods), we reconstructed M as a function of time (Figs 3D, S1D per sample), in arbitrary units. The slope of this curve is proportional to the mutation rate per time unit. Similar to the mutation rate per division in Fig 3A, the mutation rate per time unit decreased from phase 2 to phase 4, though less markedly. Furthermore, similar to the situation for P. virginalis, the difference observed in sample 1 might indicate fluctuating growth during phases 2–4. More specifically, differences between per-division and temporal mutation rates corresponded in our model to the growth rate, the survival rate, and the number of cells (S1 Text, Eq 8).

We next asked if these terms could be used to characterize the set of 42 primary and recurrent tumor pairs. Using temporal and per division mutation rates, we could reconstruct these terms, which correspond to the product ωγN. This quantity corresponds to growth (ω), modulated by the survival rate (γ), and scaled by the number of cells (N). In other terms, the product ωγN reflects the effective expansion of the tumor. Consequently, we denoted the product ωγN as expansion parameters in the following. We examined the corresponding curves and found that unsupervised clustering allowed us to classify the tumors into four subgroups: (A) Convex, (B) Peak, (C) Increase and (D) Paused Start (Fig 3E).We then looked at a possible association between the patterns of the ωγN curve in the primary tumors, and the time to the recurrence, but the results were inconclusive (p = 0.4916, n = 19). However, the time difference between the resection of the primary tumor and the resection of the recurrence is known for a subset of samples, and the age of tumors was estimated previously [23]. This allowed us to transform the time course from arbitrary units into real units (Methods, Eq 2, S2 Fig). Furthermore, we extended our modelling to be able to express the transition from the primary to the recurrent tumor (Methods, Eq 4). With this, we could determine the tumor survival ratio from time estimates. Using the previously established values of 2 years and 7 years [23] as the lowest and highest limits for the time course of the primary tumors, we could determine a range for the value of the tumor survival ratio γRP for each individual sample (Methods, Eq 8, S2 Table). As a result, the lowest value of the ratio γRP, corresponding to a tumor emergence about 2 years before diagnosis, was always higher than 1 (median = 27.8 [17.4; 54.0] for the lower bound, median = 97.5 [60.9; 189.0] for the upper bound, n = 20 samples). These results indicated that tumor cell survival was higher at the start of the recurrence than at the end of the primary tumor growth. Not surprisingly, γRP ratios were associated with the time to recurrence (Fig 3F, padj = 1.258×10−3 and padj = 8.649×10−4), with higher γRP ratios corresponding to shorter time to recurrence. Collectively, these results uncover substantial variations of evolutionary parameters among glioblastoma samples, and provide an improved understanding of growth and survival in tumor subgroups.

Discussion

In this study, we presented an integrated framework to analyse the evolution of clonally evolving genomes. We first determined the mutation rate of P. virginalis to be in the range of [3.51x10-8; 1.165x10-4]/nt/y, which encompasses the mutation rates in other arthropods, the human germline, and in human somatic healthy cells. The upper end is also comparable to microsatellites in arthropods and other species [4144], and close to the somatic mutation rate in human cancer. Data about mutation rates in triploid genomes are scarce, and it appears possible that it may be associated with higher mutation rates. Interestingly, a high mutation rate was reported in polyploid plants (10−5 order of magnitude [45]). We detected separate evolutionary phases, during which the mutation rate varied significantly. However, the dNdS ratio remained relatively constant, indicating the absence of selection. These findings support the argument that the mutation rate should not be considered constant [1,46,47].

We traced the speciation of P. virginalis to 1988 (95% confidence limits: [1986; 1990]), in agreement with first reports of this animal in 1995 [16]. This exceptionally young evolutionary age is consistent with the largely monoclonal population structure showing only incipient genetic differentiation [19,20]. It also provides experimental support for the hypothesis that the global marbled crayfish population descended from a single anthropogenic transport and release event [16,21] and further establishes the species as a unique model system.

In tumor samples, our approach allowed a single patient-level analysis of evolutionary parameters, and similarly revealed the presence of different phases, variations of the mutation rate, and a few significant events of selection. Multisector and single-cell sequencing studies have highlighted high levels of heterogeneity and either clonal selection, or an almost complete overlap, between primary and relapsed glioblastoma tumors [4850]. In this context, our study identified varied patterns, either of selection, or of neutral evolution. This appears comparable to previously published results [3,51], where either selection or neutrality was observed, depending on the context. Interestingly, negative selection occurred most often early in the mutation accumulation process of primary tumors, and corresponded to a low mutational load (S1 Fig, exemplary tumors 2,28,42), in agreement with recent results [52], whereas selection in recurrent tumors followed this pattern only to some extent.

Utilizing the difference between temporal and per-division mutation rate, we could stratify the samples into 4 subgroups. While clinical subtypes for GBM have been described, single-cell studies revealed high intratumoral heterogeneity [48]. In this context, our approach offers a possible alternative, although association with clinical outcome remains to be established. Building on previously estimated tumor age, we could also derive the survival ratio for tumor cells in the recurrence, relatively to the primary tumor. We found that tumor cells survive better at the start of the recurrence, albeit with important variations. This supports the notion that GBM regrowth can be more aggressive after surgical resection [53,54], possibly because resection-induced astrocyte injury can support faster growth [54], or because the tumor microenvironment can promote tumor regrowth [5557]. Conversely, a stronger immune response might also inhibit tumor regrowth.

As our study aims to explore novel connections between diverse fields of research, we find it important to explain several limitations. For example, a more precise determination of the P. virginalis mutation rate could be achieved by the development of novel tools that are more amenable to triploid genomes and by experimental validation [25,29]. Also, the coalescent tree could be refined by the use of sequencing datasets with higher genome coverage to reduce the potential impact of noisy variants. For the tumor samples, it would be important to better understand the potential effect of ploidy changes. This could be achieved by restricting the analysis to diploid regions, or by the integration of ploidy information into the model. In an extended model, copy number information could also provide information on the timing of certain mutations [58,59].

In conclusion, this integrated analysis of mutation accumulation, dNdS ratio and mutational signatures provided a detailed landscape of evolutionary parameters in two paradigms of clonal genome evolution. We showed an exceptionally recent emergence date for P. virginalis and established its potential as a model system. We highlighted subtle temporal dynamics of selection in tumor samples, and showed that a quantification of tumor cell survival was correlated with the time to recurrence. Our findings illustrate the potential of this framework for modelling of clonal evolution and for the use of evolutionary parameters in a clinical context.

Materials and methods

Ethics statement

The commitee responsible for the usage of human subject data from the EGAS00001003184 study is the DKFZ-HIPO Data Access Committee of Heidelberg Center for Personalized Oncology. The approval was granted by this committee. We used the data in compliance with the declared, and approved, usage.

Procambarus virginalis samples

Freshwater crayfish samples from [19] were used. Additionally, samples from Madagascar 1 sample and Moosweiher sample were resequenced (S3 and S4 Tables). Animal 1 corresponds to the lab strain, acquired from a pet shop. New genomic DNA samples were taken from animal 34 and animal 35, which, as animal 1, also correspond to lab strains animals, and which are direct offsprings of animal 1. These new samples were prepared and submitted for whole genome sequencing following the protocol already described [19]. The genealogy and birth date of animals were retrieved from laboratory records and field records (S3 Table). Sequence data was trimmed using Trimmomatic v0.32 (settings: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:40, adapter sequence: TruSeq3-PE). Next, trimmed data was mapped to Pvir genome assembly v04 (https://www.ncbi.nlm.nih.gov, Bioproject Accession: PRJNA356499), using Bowtie2 (v2.2.6, setting:--sensitive). The quality of this assembly is comparable to other published genomes in non-standard organisms, but there is still a higher level of fragmentation. This might preclude or render mutation detection more difficult, in these parts of the genome which are located in or at the boundary of a gap. Aligned reads were sorted, cleared from duplicates, sorted and indexed using samtools. Subsequently, variant calling was performed using Freebayes v0.9.21-g7dd41db (parameters:--report-all-haplotype-alleles -P 0.7 -p 3--min-mapping-quality 30--min-base-quality 20--min-coverage 6--report-genotype-likelihood-max).

Glioblastoma Multiforme samples

The glioblastoma primary and recurrent tumor samples correspond to the WGS cohort already described in [23]. In particular, summary information can be found in supplementary table 1 of [23]. After approval of the research project, access to the SNP data of primary and recurrent tumor samples, as well as time to recurrence when available, was granted.

Mutation rate of P. virginalis

Mutation accumulation between animals 1 and descendant animals 34 and 35 was used to calculate the mutation rate. SNP variants were examined in terms of quality and coverage. Variants with quality≥35, coverage≥50 (strict cutoff for the lower bound of the mutation rate) and 25 (relaxed cutoff for the upper bound of the mutation rate) and ≤200 were retained for the main estimate of the mutation rate. Coverages 200 and higher exhibited altered SNP distribution and were thus excluded because possibly corresponding to a distinct part of P. virginalis genome (possibly highly repetitive and variable domains). The number of callable sites was 486234 for the upper bound of the mutation rate. Eligible sites were considered when the alternate allele was present in only one line [60]. Subsequently, the mutation rate per nucleotide per year was calculated as the count of biallelic mutated nucleotides in animal 34 (respectively, animal 35) as compared to animal 1, divided by the count of nucleotides in the triploid genome of P. virginalis, N = 10.5x109 (for the lower bound of the mutation rate), and by the number of callable sites (for the upper bound), and divided by the time (5.75 and 6.08 years), between the birth dates of animal 1 and 34 (respectively, birth date of animal 35). We assumed that counts of new mutations follow a Binomial distribution, and with this we determined the standard deviation on the count of mutations observed (genotyping uncertainty). Second, we assumed that the standard deviation for the date of animal birth equated to a third of the total uncertainty on time of birth. Third, we calculated the standard deviation between mutation rates for animal 34 and animal 35 (biological variability). Finally, we took the total standard deviation as the quadratic sum of these three components (assuming that the different sources of variability follow a normal distribution).

Coalescent time

Time to most recent common ancestor for P. virginalis samples was determined using Bayesian evolutionary analysis by sampling trees (BEAST v1.10.4 [35]). Mutation data with quality >35 and coverage depth >15 was used in this analysis (a coverage cutoff of 25 was not justified here because samples other than animals 1, 34 and 35 possessed a notably lower average sequencing depth). Samples birth dates were used as tip dates. Further BEAST parameters used were: simple substitution model with estimated base frequencies, strict clock, skyride coalescent prior. The length of chain for the Markov chain Monte Carlo was 10M. These parameters were built into the BEAST input file, using the utility BEAUTi. The outputs were analyzed using the utility TRACER. In particular, convergence was read from the sampled states curves of the different parameters, and effective sample sizes were adequately >100 (2984 or more), indicating sufficiently decorrelated sampled states.

Study of mutation accumulation

An infinitesimal increment of mutations dM was defined from the evolutionary parameters: mutation rate, ploidy, cell survival, growth and number of cells (see S1 Text for a detailed description). Noticing that these parameters may be heterogeneous in the population (of cells, or animals), we stratified this expression for each subclone, with homogeneous parameters inside of a subpopulation (subclone). We have then noticed that observable allele frequency of a mutation is the one obtained after sequencing and SNP calling, and adapted the expression given in [5] consequently. We linked the observable allele frequency with the features of the subclone where this mutation appeared, namely the change in subclone size, the number of cells, ploidy and time (S1 Text, Eq (5)). Then, hoping to obtain an expression of allele frequency which could be instrumental in the expression of dM, we have determined a formula for the increment of the number of cells dN and for the increment of inverse allele frequency d(1/f). For this latter increment, we have made the assumption that ploidy was constant. With this, we could indeed use the expression of allele frequency in the expression of dM. As a result; we deduced the mutation accumulation dM as a function of inverse allele frequency 1/f, the mutation rate, and constants, in each subclone. Finally, the equation for mutation accumulation over all subclones dM was obtained by summing the individual contributions dM of each subclone. The mutation accumulation curve M(1/f), of which slope corresponds to dM/d(1/f), was plotted from SNP data (filtered by quality phred score QUAL ≥30, depth ≥10, depth of alternate allele ≥3), and from the corresponding allele frequencies. For uncertainties, we used bins of +/-0.25 over 1/f. In each bin, the mutation count was subjected to bootstrap resampling, which yielded a bootstrap distribution. The confidence interval at 95% was taken as the interval bounded by the 2.5% and 97.5% quantiles of the bootstrap distribution. For P. virginalis, the confidence interval for mutation accumulation was calculated assuming a student distribution.

Mutation annotation and dNdS ratio

Mutations were annotated as synonymous or non-synonymous (including splice or stopgain mutations) using SNPdat v1.0.5. The bias-correcting method dNdScv [3] was tested, but could not be applied on the relatively low number of mutations at hand. Also, dNdScv does not provide longitudinal estimates. As a more pliant, but non bias-correcting method, we calculated the dNdS ratio as the quotient of non-synonymous mutations by synonymous mutations in a sample, divided by the average quotient in the full genome. The average quotient of non-synonymous to synonymous in humans was calculated from the reference coding sequences (hg19): fasta sequences were filtered out of coding sequences not starting with AUG or not ending with a stop codon, converted to codons, sorted and counted using a custom bash script. Using a spreadsheet, all non-redundant mutations were evaluated as synonymous or nonsynonymous. The total count of synonymous (respectively, nonsynonymous) possible mutations per codon was taken as the count of this codon type multiplied by the number of possible synonymous (nonsynonymous) mutations evaluated from the spreadsheet. The dN/dS ratio was taken as the sum over all codons of the count of nonsynonymous mutations, divided by the sum over all codons of the count of synonymous mutations. Uncertainties were determined by bootstrap resampling of mutations, and calculations of the dNdS ratio for each bootstrap sample.

Mutational signatures

Mutational signatures are combinations of mutations which are representative of the action of different mutagenic processes, such as exogenous (ultraviolet light) and endogenous (5-methylcytosine deamination) mechanisms, enzymatic DNA editing, DNA repair mistakes, and DNA replication infidelity [4]. The single-base substitution mutational signatures (in 3-nucleotides context) for human subjects were downloaded from the COSMIC database (https://cancer.sanger.ac.uk/signatures/; version 3.1 as of 11.08.2020). Since mutational mechanisms are conserved across the animal kingdom, we hypothesized that human mutational signatures could be applied to the marbled crayfish. For the longitudinal analysis of the mutational signatures, mutation data was binned using a bin half-width equal to 0.5 on the inverse allele frequency. The exposure of binned data was determined using R 3.5.2 with package YAPSA (version 1.8.0), where exposure corresponds to the individual contribution of each signature. Uncertainty on mutational signatures was determined by bootstrap resampling of mutations and generation of the binned data and YAPSA exposures on the resampled data. We have used 1000 bootstrap replicates as a compromise between an ideally larger (1M) number of replicates, and reasonable computing time. Large mutation sets (>100,000 mutations) were subsampled to 50,000–60,000 mutations for the bootstrap analysis. Mean, median, percentiles and 95% confidence bounds were determined using the resulting bootstrap distribution.

Time course

We made the assumption that clock-like mutational signatures SBS1 and SBS5 were a surrogate indicator for time (SBS1 only for glioblastoma, in agreement with [36]). Because mutagenic mechanisms are conserved across the animal kingdom, and because mutational signatures in arthropods are currently lacking, we have assumed that human mutational signatures are sufficiently representative in the marbled crayfish. We further assumed that the arrow of time, could be identified with the arrow of inverse allele frequency 1/f. Under these assumptions, an increment of time can be determined in arbitrary units, by integrating θ, which denotes the exposure to clock-like mutations SBS1 or SBS5, over an increment of inverse allele frequency 1/f. Since the exposure θ is also proportional to the number of cells in the tumor, it is necessary to normalize θ, by dividing its value by the number of cells N. Since N is proportional to 1/f under some assumptions (S1 Text), N can be replaced, up to an unknown constant, by 1/f. This yielded the formula for determining time t in arbitrary units, over an interval of time which is unknown, but identifiable with an interval over inverse allele frequency 1/f:

ta.u.=(1/f)min(1/f)maxθ·f·d(1/f), (1)

where ta.u. is the time in arbitrary units (a.u.), and θ corresponds to the exposure to clock-like mutational signatures. By computing ta.u. at all values of 1/f, we obtained a vector of values Ta.u. for the time in arbitrary units, from its minimum value, min(Ta.u.) (0 by definition), to its maximum value, max(Ta.u.) (which corresponds to integration from (1/f)min to (1/f)max. Confidence limits at 95% for time were calculated using the confidence bounds for mutational signature SBS1, taken as exposure θ.

Time calibration in recurrent glioblastomas. In a subset of glioblastoma samples, we additionally know the time-to-relapse, denoted T, in months. We thereby identify the time course of the relapse [0; T] in months, with the time course in arbitrary units [0; max(Ta.u.)] (see the preceding paragraph "Time course" for the calculation of Ta.u.). To obtain t, the time in real units at any instant t in the interval [0; T], ta.u. is multiplied by the scaling factor T/max(Ta.u.):

t=ta.u.·T/max(Ta.u.). (2)

Time propagation from primary to recurrent glioblastomas. To obtain a link between the time course in the primary tumor, and the time course in the recurrent tumor, we studied the ratio of mutation accumulation between the end of primary tumor (subscripted ’P’, taken as the last 5% time points) and start of recurrence (subscript ’R’, first 5% time points), over the entire tumor. This is justified by the observation that the passage from the primary tumor to the recurrence corresponds to the instant of primary tumor resection. Using Eq 1 (S1 Text), this ratio could be written as follows:

(dM/dt)P(dM/dt)R=(μ(t)·π·ω(t)·γ(t)·N(t))P(μ(t)·π·ω(t)·γ(t)·N(t))R, (3)

where subscript i per subclone is not used, because we considered here that evolution parameters are taken over the whole tumor. The constant term π can be normalized out of this ratio. Further, we have assumed that the mutation rate μ and division rate ω stay constant over this short period, because they are intrinsic features of the tumor cells. However, the count of tumor cells N(t), and the tumor cell survival rate γ(t) could not be considered constant. Regarding N(t), we expressed it as the ratio of inverse allele frequency, since it is proportional to N [5]:

N(t)PN(t)R=(1/f)P(1/f)R. (4)

Of note, expression (4) is biased in practice by mutations which are not de novo in the recurrence, but inherited from the primary tumor. Ideally, only de novo mutations should be included to perform this calculation. Finally, the survival rate of tumor cells, γ, also could not be considered constant, and we had no indicator or surrogate for this value. For this reason, we have set an arbitrary value for the survival at end of primary tumor, relatively to the start of recurrence, γPR = 1/300.

Using the above, we could obtain expression (5) for dM/dt at end of primary tumor:

(dMdt)P=(1/f)P(1/f)R·γPγR·(dMdt)R. (5)

From this, and since the number of mutations at the end of the primary tumor, as well as the rest of parameters, was known, the time in real units at the end of primary tumor could be determined as follows:

dtP=(dM)P(1/f)P(1/f)R·γPγR·(dMdt)R. (6)

Similarly to the time course in the recurrence, the fact that the last instant of the recurrence, dtP, was known, allowed to calibrate the time course t in the primary at each instant, by multiplying by the scaling factor dtP/dta.u.,P:

dt=dta.u.·dtPdta.u.,P. (7)

Tumor cell survival ratio

For time calibration to real units, we have made an assumption on tumor cell survival ratio γRP to determine real time in the primary tumor. To quantify the survival ratio, we have proceeded the other way around, using an assumption on the time course in the primary tumor, in order to determine the ratio γRP. We have utilized previously established values from [23] which estimates that the time between the most recent common ancestor (TMRCA) lies either 2 years or 7 years before primary tumor resection. As a consequence, we have assumed that these durations corresponded to the lowest and highest limits for the time course of the primary tumors The calculation of the tumor cell survival ratio was done by reformulating expression (6) into the following expression:

γRγP=(dMdt)R(1/f)R(1/f)P·(dMdt)P, (8)

where all terms on the right hand side are known, either from the data (dM, f, dtR) or from assumptions (dtP).

Tumor expansion profile

From Eq (8) in S1 Text, time and 1/f are proportional, with, as modulators, the growth rate ω, the tumor cell survival rate γ, and the number of tumor cells N:

d(1/f)ω·γ·N·dt, (9)

where the increment d(1/f) is known from the data, and increment dt could be determined above, in section "Time course". As a consequence, one obtains the product ωγN, by dividing the increment d(1/f) by the increment dt. The expansion parameters ωγN are known within the constants of expression (8) from S1 Text, which are π/Ki,r. The curves giving ωγN as a function of time are denoted expansion curves or expansion profiles.

Curve segmentation

Segments for curves M(1/f) were determined using R package segmented (v1.1–0), using an objective adjusted R2 set to 0.995 (P. virginalis) and 0.9995 (GBM), and using the lowest number of segments which attained this objective, limited to a maximum of 20 breakpoints. P-values of significant changes between segments were evaluated using Davies’ test (implementation from R package segmented). Because the regions before the first breakpoint, and after the last breakpoint, display marked and consistent differences with the general profile of the curve, we hypothesized that the automated segmentation revealed clonal mutations, or mutations which could originate from contamination by normal tissue (mutations with allele frequency lower than the first breakpoint), as well as mutations likely affected by the limit of detection of SNVs on sequencing data (mutations with allele frequency higher than the last breakpoint), We exclude those mutations, and restrict the accepted range to the interval between the first and last breakpoints.

Classification of expansion profiles

Expansion profiles ωγN(t) were subjected to curve clustering by k-means. To this aim, expansion profiles of 42 primary tumors were converted into a uniform arbitrary timeline of 1 to 1000, centered and standardized. Clusters were then determined using R package kml version 2.4.6, with default number of clusters and 5 redrawings. The quality of clustering was inspected using the criterion of Calinski-Harabatz (CH), which value was 25.43599 and 24.73392 for 3 and 4 clusters, respectively (CH values were in the 20.85733–22.32881 range for 2, 5 or 6 clusters). Because the partition into 4 clusters had a reasonably high CH value and was able to represent the "Increasing" type of curve while the partitions with 2 or 3 clusters did not, the partition into 4 clusters was retained.

Statistical analyses

R [61] and Python [62] were used for all statistical analyses. Confidence intervals at 95% probability for the tree root in P. virginalis are taken as the 95% highest posterior density (HPD) interval. All statistical tests were unpaired and two-sided, with a level of significance set at 5%. Segmentation p-values correspond to the test of significant difference between segments (Davies’ test, R package segmented). To check against overfitting, k-fold cross-validation of the segmentation procedure was conducted, with k = 10 folds. The resulting difference between test and validation mean square errors (MSE) was then examined. A non-increasing difference, when including one additional phase, indicates the absence of overfitting. Correlation coefficients between SBS1 and SBS5 were determined using Pearson method, and summarized by their median and IQR over GBM samples. A comparison between groups was made using an unpaired Wilcoxon rank-sum test. Differential time to recurrence between subgroups in the manually sorted ωγN curves was assessed using a Kruskal-Wallis rank-sum test between curve types "Peak", "Increase" and "Paused Start". The "Convex" curve type was excluded, because there was only 1 instance of this kind of curve, in the subset of samples with infomation on the time to recurrence. The association of the γRP ratio with the time to recurrence was assessed with a linear regression, using a simple or double log10 scale on the γRP ratio, with Bonferroni adjustment.

Supporting information

S1 Fig. Mutation accumulation, selection and time dynamics of GBM tumors.

Panels A1-D1 show the primary tumor T1, panels A2-D2 show the recurrent tumor T2. (A) Mutation accumulation as a function of the inverse of allele frequency 1/f (black) and phases from automated segmentation (breakpoints (grey) and segments (blue)). The confidence band at 95% level is indicated in grey. (B) Nonsynonymous to synonymous ratio. Purple and blue stars show nonsynonymous and synonymous mutations, respectively. The smoothened ratio is shown in red. (C) Clock-like and non-clock-like mutational signatures. (D) Mutation accumulation as a function of time.

(PDF)

S2 Fig. Transition of primary tumor to recurrent tumor.

(A) Dynamics of growth rate ω times tumor cell survival rate γ times number of cells N, for (P) the primary tumor and (R) the recurrence. (B) Time-resolved mutation accumulation for primary tumor and recurrence.

(PDF)

S1 Text. Supplementary Methods.

(PDF)

S1 Table. Supplementary Table 1.

P-values for test of difference between segments for P. virginalis and glioblastoma samples.

(DOCX)

S2 Table. Supplementary Table 2.

Characteristics of tumor cell survival ratio γRP (n = 20).

(DOCX)

S3 Table. Supplementary Table 3.

Procambarus virginalis Samples.

(DOCX)

S4 Table. Supplementary Table 4.

Raw sequencing data for new and resequenced Procambarus virginalis samples.

(DOCX)

S1 Data. Source Data for Figures.

(XLSX)

Acknowledgments

We would like to thank Verena Körber and Thomas Höfer for helpful discussions and for providing data, and Julian Gutekunst for discussions about the methods. We would also like to thank Katharina Hanna for data and for crayfish culture, and Sina Tönges for sample processing. We further acknowledge the German Cancer Research Center Genomics and Proteomics Core Facility for whole-genome sequencing.

Data Availability

Sequence data for marbled crayfish data have been deposited as a National Center for Biotechnology Information BioProject (accession number: PRJNA356499). Glioblastoma data were accessed from the European Genome-phenome Archive (EGA) database, with accession number: EGAS00001003184 (glioblastoma).

Funding Statement

ANR STEM R20117HH

References

  • 1.Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, et al. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet. 2016. Oct 14;17(11):704–14. doi: 10.1038/nrg.2016.104 [DOI] [PubMed] [Google Scholar]
  • 2.Kent DG, Green AR. Order Matters: The Order of Somatic Mutations Influences Cancer Evolution. Cold Spring Harb Perspect Med. 2017. Apr 3;7(4):a027060. doi: 10.1101/cshperspect.a027060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell. 2017. Nov 16;171(5):1029–1041.e21. doi: 10.1016/j.cell.2017.09.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013. Aug 22;500(7463):415–21. doi: 10.1038/nature12477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat Genet. 2016. Mar;48(3):238–44. doi: 10.1038/ng.3489 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sottoriva A, Barnes CP, Graham TA. Catch my drift? Making sense of genomic intra-tumour heterogeneity. Biochim Biophys Acta Rev Cancer. 2017. Apr;1867(2):95–100. doi: 10.1016/j.bbcan.2016.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Balaparya A, De S. Revisiting signatures of neutral tumor evolution in the light of complexity of cancer genomic data. Nat Genet. 2018. Dec;50(12):1626–8. doi: 10.1038/s41588-018-0219-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tataru P, Simonsen M, Bataillon T, Hobolth A. Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data. Syst Biol. 2017. Jan 1;66(1):e30–46. doi: 10.1093/sysbio/syw056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Benton ML, Abraham A, LaBella AL, Abbot P, Rokas A, Capra JA. The influence of evolutionary history on human health and disease. Nat Rev Genet. 2021. May;22(5):269–83. doi: 10.1038/s41576-020-00305-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. Science. 2015. Sep 25;349(6255):1483–9. doi: 10.1126/science.aab4082 [DOI] [PubMed] [Google Scholar]
  • 11.Katju V, Bergthorsson U. Old Trade, New Tricks: Insights into the Spontaneous Mutation Process from the Partnering of Classical Mutation Accumulation Experiments with High-Throughput Genomic Approaches. Genome Biol Evol. 2018. Nov 26;11(1):136–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Burrell RA, McGranahan N, Bartek J, Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013. Sep;501(7467):338–45. doi: 10.1038/nature12625 [DOI] [PubMed] [Google Scholar]
  • 13.Flynn JM, Chain FJJ, Schoen DJ, Cristescu ME. Spontaneous Mutation Accumulation in Daphnia pulex in Selection-Free vs. Competitive Environments. Mol Biol Evol. 2017. Jan;34(1):160–73. doi: 10.1093/molbev/msw234 [DOI] [PubMed] [Google Scholar]
  • 14.Hausser J, Alon U. Tumour heterogeneity and the evolutionary trade-offs of cancer. Nat Rev Cancer. 2020. Apr;20(4):247–57. doi: 10.1038/s41568-020-0241-6 [DOI] [PubMed] [Google Scholar]
  • 15.Scholtz G, Braband A, Tolley L, Reimann A, Mittmann B, Lukhaup C, et al. Ecology: Parthenogenesis in an outsider crayfish. Nature. 2003. Feb 20;421(6925):806. doi: 10.1038/421806a [DOI] [PubMed] [Google Scholar]
  • 16.Lyko F. The marbled crayfish (Decapoda: Cambaridae) represents an independent new species. Zootaxa. 2017. Dec 12;4363(4):544–52. doi: 10.11646/zootaxa.4363.4.6 [DOI] [PubMed] [Google Scholar]
  • 17.Martin P, Dorn NJ, Kawai T, Heiden C van der, Scholtz G. The enigmatic Marmorkrebs (marbled crayfish) is the parthenogenetic form of Procambarus fallax (Hagen, 1870). Contrib Zool. 2010;79(3):107. [Google Scholar]
  • 18.Vogt G. The marbled crayfish: a new model organism for research on development, epigenetics and evolutionary biology. J Zool. 2008;276(1):1–13. [Google Scholar]
  • 19.Gutekunst J, Andriantsoa R, Falckenhayn C, Hanna K, Stein W, Rasamy J, et al. Clonal genome evolution and rapid invasive spread of the marbled crayfish. Nat Ecol Evol. 2018. Mar;2(3):567–73. doi: 10.1038/s41559-018-0467-9 [DOI] [PubMed] [Google Scholar]
  • 20.Maiakovska O, Andriantsoa R, Tönges S, Legrand C, Gutekunst J, Hanna K, et al. Genome analysis of the monoclonal marbled crayfish reveals genetic separation over a short evolutionary timescale. Commun Biol. 2021. Jan 18;4(1):1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gutekunst J, Maiakovska O, Hanna K, Provataris P, Horn H, Wolf S, et al. Phylogeographic reconstruction of the marbled crayfish origin. Commun Biol. 2021. Sep 17;4:1096. doi: 10.1038/s42003-021-02609-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature. 2020. Feb;578(7793):94–101. doi: 10.1038/s41586-020-1943-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Körber V, Yang J, Barah P, Wu Y, Stichel D, Gu Z, et al. Evolutionary Trajectories of IDHWT Glioblastomas Reveal a Common Path of Early Tumorigenesis Instigated Years ahead of Initial Diagnosis. Cancer Cell. 2019. Apr 15;35(4):692–704.e12. doi: 10.1016/j.ccell.2019.02.007 [DOI] [PubMed] [Google Scholar]
  • 24.Silliman K, Indorf JL, Knowlton N, Browne WE, Hurt C. Base-substitution mutation rate across the nuclear genome of Alpheus snapping shrimp and the timing of isolation by the Isthmus of Panama. BMC Ecol Evol. 2021. May 28;21(1):104. doi: 10.1186/s12862-021-01836-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yang S, Wang L, Huang J, Zhang X, Yuan Y, Chen JQ, et al. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature. 2015. Jul 23;523(7561):463–7. doi: 10.1038/nature14649 [DOI] [PubMed] [Google Scholar]
  • 26.Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S. Direct Determination of the Mutation Rate in the Bumblebee Reveals Evidence for Weak Recombination-Associated Mutation and an Approximate Rate Constancy in Insects. Mol Biol Evol. 2017. Jan;34(1):119–30. doi: 10.1093/molbev/msw226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Oppold AM, Pedrosa JAM, Bálint M, Diogo JB, Ilkova J, Pestana JLT, et al. Support for the evolutionary speed hypothesis from intraspecific population genetic data in the non-biting midge Chironomus riparius. Proc Biol Sci. 2016. Feb 24;283(1825):20152413. doi: 10.1098/rspb.2015.2413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ho EKH, Macrae F, Latta LC, McIlroy P, Ebert D, Fields PD, et al. High and Highly Variable Spontaneous Mutation Rates in Daphnia. Mol Biol Evol. 2020. Nov 1;37(11):3258–66. doi: 10.1093/molbev/msaa142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Keightley PD, Ness RW, Halligan DL, Haddrill PR. Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics. 2014. Jan;196(1):313–20. doi: 10.1534/genetics.113.158758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Keightley PD, Pinharanda A, Ness RW, Simpson F, Dasmahapatra KK, Mallet J, et al. Estimation of the spontaneous mutation rate in Heliconius melpomene. Mol Biol Evol. 2015. Jan;32(1):239–43. doi: 10.1093/molbev/msu302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ohno M. Spontaneous de novo germline mutations in humans and mice: rates, spectra, causes and consequences. Genes Genet Syst. 2019. Apr 9;94(1):13–22. doi: 10.1266/ggs.18-00015 [DOI] [PubMed] [Google Scholar]
  • 32.Lee-Six H, Øbro NF, Shepherd MS, Grossmann S, Dawson K, Belmonte M, et al. Population dynamics of normal human blood inferred from somatic mutations. Nature. 2018. Sep;561(7724):473–8. doi: 10.1038/s41586-018-0497-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Blokzijl F, de Ligt J, Jager M, Sasselli V, Roerink S, Sasaki N, et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature. 2016. Oct 13;538(7624):260–4. doi: 10.1038/nature19768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018. Mar 15;555(7696):371–6. doi: 10.1038/nature25795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007. Nov 8;7:214. doi: 10.1186/1471-2148-7-214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, et al. Clock-like mutational processes in human somatic cells. Nat Genet. 2015. Dec;47(12):1402–7. doi: 10.1038/ng.3441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Petljak M, Alexandrov LB, Brammeld JS, Price S, Wedge DC, Grossmann S, et al. Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell. 2019. Mar 7;176(6):1282–1294.e20. doi: 10.1016/j.cell.2019.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Baez-Ortega A, Gori K, Strakova A, Allen JL, Allum KM, Bansse-Issa L, et al. Somatic evolution and global expansion of an ancient transmissible cancer lineage. Science. 2019. Aug 2;365(6452):eaau9923. doi: 10.1126/science.aau9923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bauer NC, Corbett AH, Doetsch PW. The current state of eukaryotic DNA base damage and repair. Nucleic Acids Res. 2015. Dec 2;43(21):10083–101. doi: 10.1093/nar/gkv1136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sedgwick B. Repairing DNA-methylation damage. Nat Rev Mol Cell Biol. 2004. Feb;5(2):148–57. doi: 10.1038/nrm1312 [DOI] [PubMed] [Google Scholar]
  • 41.Seyfert AL, Cristescu MEA, Frisse L, Schaack S, Thomas WK, Lynch M. The rate and spectrum of microsatellite mutation in Caenorhabditis elegans and Daphnia pulex. Genetics. 2008. Apr;178(4):2113–21. doi: 10.1534/genetics.107.081927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chapuis MP, Plantamp C, Streiff R, Blondin L, Piou C. Microsatellite evolutionary rate and pattern in Schistocerca gregaria inferred from direct observation of germline mutations. Mol Ecol. 2015;24(24):6107–19. doi: 10.1111/mec.13465 [DOI] [PubMed] [Google Scholar]
  • 43.Yue GH, David L, Orban L. Mutation rate and pattern of microsatellites in common carp (Cyprinus carpio L.). Genetica. 2007. Mar 1;129(3):329–31. doi: 10.1007/s10709-006-0003-8 [DOI] [PubMed] [Google Scholar]
  • 44.Jakovlić I, Gui JF. Recent invasion and low level of divergence between diploid and triploid forms of Carassius auratus complex in Croatia. Genetica. 2011. Jun 1;139(6):789–804. doi: 10.1007/s10709-011-9584-y [DOI] [PubMed] [Google Scholar]
  • 45.Ramsey J, Schemske DW. Pathways, Mechanisms, and Rates of Polyploid Formation in Flowering Plants. Annu Rev Ecol Syst. 1998;29(1):467–501. [Google Scholar]
  • 46.Rubanova Y, Shi R, Harrigan CF, Li R, Wintersinger J, Sahin N, et al. Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig. Nat Commun. 2020. Feb 5;11(1):731. doi: 10.1038/s41467-020-14352-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.DeWitt WS, Harris KD, Ragsdale AP, Harris K. Nonparametric coalescent inference of mutation spectrum history and demography. Proc Natl Acad Sci U S A. 2021. May 25;118(21):e2013798118. doi: 10.1073/pnas.2013798118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hernández Martínez A, Madurga R, García-Romero N, Ayuso-Sacido Á. Unravelling glioblastoma heterogeneity by means of single-cell RNA sequencing. Cancer Lett. 2022. Feb 28;527:66–79. doi: 10.1016/j.canlet.2021.12.008 [DOI] [PubMed] [Google Scholar]
  • 49.Li J, Zhao YH, Tian SF, Xu CS, Cai YX, Li K, et al. Genetic alteration and clonal evolution of primary glioblastoma into secondary gliosarcoma. CNS Neurosci Ther. 2021. Dec;27(12):1483–92. doi: 10.1111/cns.13740 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Kim H, Zheng S, Amini SS, Virk SM, Mikkelsen T, Brat DJ, et al. Whole-genome and multisector exome sequencing of primary and post-treatment glioblastoma reveals patterns of tumor evolution. Genome Res. 2015. Mar;25(3):316–27. doi: 10.1101/gr.180612.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Sottoriva A, Kang H, Ma Z, Graham TA, Salomon MP, Zhao J, et al. A Big Bang model of human colorectal tumor growth. Nat Genet. 2015. Mar;47(3):209–16. doi: 10.1038/ng.3214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tilk S, Tkachenko S, Curtis C, Petrov DA, McFarland CD. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. eLife 2022;11:e67790 doi: 10.7554/eLife.67790 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Okolie O, Bago JR, Schmid RS, Irvin DM, Bash RE, Miller CR, et al. Reactive astrocytes potentiate tumor aggressiveness in a murine glioma resection and recurrence model. Neuro-Oncol. 2016. Dec;18(12):1622–33. doi: 10.1093/neuonc/now117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Pirzkall A, McGue C, Saraswathy S, Cha S, Liu R, Vandenberg S, et al. Tumor regrowth between surgery and initiation of adjuvant therapy in patients with newly diagnosed glioblastoma. Neuro-Oncol. 2009. Dec 1;11(6):842–52. doi: 10.1215/15228517-2009-005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Scott JN, Rewcastle NB, Brasher PM, Fulton D, MacKinnon JA, Hamilton M, et al. Which glioblastoma multiforme patient will become a long-term survivor? A population-based study. Ann Neurol. 1999. Aug;46(2):183–8. [PubMed] [Google Scholar]
  • 56.Monteiro AR, Hill R, Pilkington GJ, Madureira PA. The Role of Hypoxia in Glioblastoma Invasion. Cells. 2017. Nov 22;6(4):45. doi: 10.3390/cells6040045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Sabelström H, Quigley DA, Fenster T, Foster DJ, Fuchshuber CAM, Saxena S, et al. High density is a property of slow-cycling and treatment-resistant human glioblastoma cells. Exp Cell Res. 2019. May 1;378(1):76–86. doi: 10.1016/j.yexcr.2019.03.003 [DOI] [PubMed] [Google Scholar]
  • 58.McGranahan N, Favero F, de Bruin EC, Birkbak NJ, Szallasi Z, Swanton C. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci Transl Med. 2015. Apr 15;7(283):283ra54. doi: 10.1126/scitranslmed.aaa1408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, et al. The evolutionary history of 2,658 cancers. Nature. 2020. Feb;578(7793):122–8. doi: 10.1038/s41586-019-1907-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.López-Cortegano E, Craig RJ, Chebib J, Samuels T, Morgan AD, Kraemer SA, et al. De Novo Mutation Rate Variation and Its Determinants in Chlamydomonas. Mol Biol Evol. 2021. Sep 1;38(9):3709–23. doi: 10.1093/molbev/msab140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2018. [Google Scholar]
  • 62.Van Rossum G, Drake FL Jr. Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica; 1995. [Google Scholar]

Decision Letter 0

Justin C Fay, Ville Mustonen

26 Jun 2023

Dear Dr Legrand,

Thank you very much for submitting your Research Article entitled 'Time-resolved, integrated analysis of clonally evolving genomes' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Ville Mustonen

Guest Editor

PLOS Genetics

Justin Fay

Section Editor

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this manuscript, Legrand and colleagues develop a framework to study mutation accumulation in asexual organisms, in which they look at the relationship between mutation accumulation and variant allele frequency as a proxy for mutational age. This allows disentangling the links between mutation rate, tissue growth and survival of de novo mutations. Authors apply their framework in both a seemingly young parthenogenetic species (the marbled crayfish Procambarus virginalis) and clonally-evolving tissues (glioblastoma), analyzing previously published data and some newly sequenced crayfish sequences.

My background is in empirical population genomics and I have some experience working with non-model parthenogenetic species. My expertise is very limited regarding cancer evolution. As a result, my comments below reflect my expertise (or lack thereof) on these different aspects. Overall, I enjoyed reading the paper, from what I can tell the approach is original and mostly sound and the analyses are rigorous (but see my first major comment below). I however believe that the authors could do a better job at demonstrating the importance of their findings and its broad interest to the PLOS Genetics audience (see major comment 2).

Major comment 1 – The framework heavily relies on a precise estimation of the mutation rate, which is a complex endeavor for non-model organisms. Thankfully, authors can rely on a marbled crayfish line which was maintained in the lab over seven years (2012-2019), basically providing a mutation accumulation experiment. Authors used whole-genome sequencing of three individuals sampled in 2012, 2017 and 2018 to call SNPs using state-of-the-art methods.

1.1 – First, it is not possible from this manuscript alone to assess the quality of the resequencing data, and I believe a reference was omitted L294-295 (“These new samples were prepared and submitted for whole genome sequencing following the protocol already described”). From Table S3 in Gutekunst et al. 2018 (which data were partly re-analyzed for this study), the average sequencing depth for the 2012 sample (“Animal 1”) seems to be around 72x, but no information is provided in the current manuscript regarding the number of reads per individual, the sequencing strategy (I guess 2x150 bp) or the average sequencing depth in the 2017 and 2018 samples, while deep sequencing is crucial for accurate SNP calling in the context of de novo mutation detection. The reader can just assume sequencing depth is high, since “variants with quality≥35, coverage≥50 and ≤200 were retained for the main estimate of the mutation rate” (L314). More details on the data would help readers assess their quality.

1.2 – My second and more important point stems from this sentence L314, as authors identified mutations at sites where depth was comprised between 50 and 200 – what would be defined as callable sites. After finding candidate de novo mutations in the two more recent individuals (how many per individual? Where they at heterozygous or homozygous states?), they divided their count by the total count of nucleotides in the triploid genome (10.5 Gb, L320). Unless authors made a typo, this is not correct: the denominator should not be the total genome size, but the number of callable sites (i.e., the fraction of the genome that could actually be surveyed). For a correct estimation of the mutation rate, authors could for example look at Lopez-Cortegano et al. (doi:10.1093/molbev/msab140, see p. 3719). I would assume this error biases downwards the mutation rate estimated for marbled crayfish, which may have implications for the calibration of the tree inferred with BEAST (Fig. 1C) and the history of the parthenogenetic lineage as a whole (e.g., L76, L269-271). Note that a similar error might have occurred with the dN/dS ratio analysis, but it is unclear in the current version how the “average quotient” was estimated for the full genome (L366). I would kindly ask the authors to estimate the mutation rate for marbled crayfish with a correct method (callable sites can easily be identified from BAM files using mosdepth, doi: 10.1093/bioinformatics/btx699), and update their findings accordingly.

Major comment 2 – While I usually enjoy short papers, I believe this manuscript would benefit from a more detailed writing in many places. As it combines theory and empirical data analysis in both a non-model organism and tumors, additional details would help convey the context, importance and broad interest of these findings. Below I provide some examples of parts where I would want to get more detailed information, trying to rank these by perceived importance.

2.1 – Introduction: I would expect some framing of the scientific question before getting into the specifics of the study systems. At the moment, it is unclear what is gained by developing “an integrated analysis of clonal genome evolution”, but it should become apparent in the first few paragraphs of the introduction. Likewise, the marbled crayfish is a puzzling system, and more details should be provided in the section L68-77: genetic differentiation compared to what (L74)? What is the “particular mode of asexual reproduction” (L71)? What is the specific event mentioned L76? Is parthenogenesis obligatory (are we sure there are no unintentional crosses in the lab)? More information on the origin of this species would help understand its value for the current study.

2.2 – Discussion: After reading the manuscript several times, I am still not sure which are the “important insights” (L286) being delivered. What does the framework tell us about clonal evolution, and/or the evolution of glioblastoma? It would help if authors drew more connections to relevant literature. To provide an example (but as stated above, I am not an expert in cancer genomics), are the pervasive signatures of selection detected in some samples (L206) in line with recent findings (e.g., Tilk et al. 2022, doi: https://doi.org/10.7554/eLife.67790)? What is the general impact of selection on the results? What is the (clinical? biological?) relevance of the four subtypes detected (L276)? Finally, which are the potential limits of this work and what could be added?

2.3 – Results: The rationale explained in the S1 file should be integrated a bit more in the main text, as the approach is currently quite vague (L133-135).

I cannot interpret the comparison of clock-like and non-clock-like mutational signatures as it stands, and I don’t get why human mutational signatures are used for crayfish genomes (I must be missing something). Are expansion parameters (L231) loosely linked to any other measure used in cancer genomics? The invasion history of the marbled crayfish should be more detailed on L120-127.

Below are some minor comments:

L57-58: in my understanding, the term “mutation” is used for two different things here, the process itself L58 (the apparition of a new variant in the pool), and its consequence L57 (the new variant itself, i.e., a substitution). Maybe consider editing.

L92: unclear why “time” here.

L96-100: see major comment 1, but please add more information here (number of individuals, average depth…).

L105: please provide references for known mutation rates in the main text on top of the legend of Fig. 1. Also it seems there is at least one mutation rate estimated for Crustaceans (https://doi.org/10.1186/s12862-021-01836-3).

L109: grey colors are not displayed on the Fig. 1.

L130: is there any outgroup available to root the tree? If not, is it a problem?

L144: is the dN/dS ratio actually computed from fixed differences, or from polymorphisms (so pN/pS)?

L150: red instead of blue.

L153: grey instead of red.

L167-168: from what I understood, the dating of the marbled crayfish lineage comes from the (incorrect) estimation of the mutation rate, not the framework, which seems at odds with the statement L91-93.

L174: “better understanding of *its* evolutionary parameters”.

L178-179: Fig. 3A not 4A.

L204-212: this heterogeneity is interesting and could be discussed further.

L244: is this only for the tumor shown in Fig. 3, or for all of them? Likewise, what is the impact of the assumption made on the TMRCA L444-445? Please clarify.

L262-264: this statement seems at odds with L163-165 and L266-269. More generally, this makes me wonder if I really understand how the framework disentangles between variable mutation rate and growth. This could be made clearer in the main text.

L295: please add references for the sequencing protocol etc…

L302-305: the reference genome was built from short reads and is quite fragmented, is this supposed to have an impact on mutation detection? Likewise, was anything done to remove sites close to indels?

L348: define “features of the subclone”.

L351: is this assumption of constant ploidy reasonable for tumors? Please elaborate.

L371: maybe introduce a bit what are these mutation signatures.

L383: are these good indicators for time in the crayfish?

L477-479: as is, I don’t get the inherent value of this classification.

The data availability section only mentions already published sequencing data (both for the marbled crayfish and glioblastoma). There is no mention of samples re-sequenced for this study (L290: samples Madagascar 1 and Moosweiher, L292: animal 34 and animal 35). The numerical data that underlies figures is not provided. Finally, and while it is not stated in PLOS' data availability policy, scripts are also not provided, but I think these should be available somewhere, for reproducibility and if readers are interested in your framework’s implementation.

Reviewer #2: Legrand et al. apply mutational timing methods previously introduced in the cancer genomics literature to infer evolutionary parameters from two types of clonally evolving populations: one cohort of clonally evolving marbled crayfish including both published and newly generated datasets, and one published dataset of glioblastoma patients. The authors should be commended for the integrative analysis of clonal genomes in both species and cancer evolution. That being said, it is important that for this paper to be accepted, the authors address some major technical concerns summarized below.

Major points:

- The authors did not account for potential confounders of the background mutation rate of each gene in their dN/dS analyses. They should control for the sequence composition of the gene and mutational signatures, by e.g. using a method like dNdScv that avoids common mutation biases affecting dN/dS using trinucleotide context-dependent substitution matrices.

- The automated segmentation of inverse allele frequency spectra M(1/f) appears to overfit the data:

> Examples of this are statements like: “In the exemplary sample 1 in Fig 3A, the segmentation separates 5 phases significantly.”

> The authors should reexamine the frequency spectrum using previously published methods to segment the spectra, by distinguishing mutations in the non-clonal tail from clonal and subclonal mutations (e.g. using SNV clustering methods like MOBSTER).

- In the primary-recurrent analysis, the classification of expansion profiles is extremely weak. Expansion profiles ωγN(t) are classified visually and manually curated without further explanation. This entire analysis should be removed unless a well-grounded methodology can be used to define expansion profiles (e.g. using unsupervised clustering).

- Could the authors expand on the impact of ploidy on their analysis? Particularly in the case of GBM, where the tumors often have a substantial number of copy number aberrations. They should ensure to limit their timing analysis to diploid regions. Could they also speculate in the Discussion about combining their approach with the timing of copy number gains using SNVs in amplified regions?

- Could the authors demonstrate that a lower coverage threshold >15 is justified for coalescent time analyses? Are noisy variants affecting the coalescence estimates?

Minor points:

- In line 183, could the authors explain why they expect that mutation frequencies in the high-frequency phase (phase 1) are expected to include artifacts? This should not really be an issue unless germline variants appear to be clonal in the GBM samples and have not been exhaustively filtered out.

- In lines 178 and 179, the figure callouts should refer to Fig. 3 instead of Fig. 4.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: The data availability section only mentions already published sequencing data (both for the marbled crayfish and glioblastoma). There is no mention of samples resequenced for this study (L290: samples Madagascar 1 and Moosweiher, L292: animal 34 and animal 35).

The numerical data that underlies figures is not provided.

Finally, and while it is not stated in PLOS' data availability policy, scripts are also not provided.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Decision Letter 1

Justin C Fay, Ville Mustonen

27 Nov 2023

Dear Dr Legrand,

We are pleased to inform you that your manuscript entitled "Time-resolved, integrated analysis of clonally evolving genomes" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Ville Mustonen

Guest Editor

PLOS Genetics

Justin Fay

Section Editor

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I believe authors have addressed all my comments appropriately and, in my opinion, those of the other reviewer as well. I am looking forward to seeing this in print.

Reviewer #2: The authors have address most of my comments or described them as limitations of the data in the Discussion.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-23-00320R1

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Justin C Fay, Ville Mustonen

8 Dec 2023

PGENETICS-D-23-00320R1

Time-resolved, integrated analysis of clonally evolving genomes

Dear Dr Legrand,

We are pleased to inform you that your manuscript entitled "Time-resolved, integrated analysis of clonally evolving genomes" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofi Zombor

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Mutation accumulation, selection and time dynamics of GBM tumors.

    Panels A1-D1 show the primary tumor T1, panels A2-D2 show the recurrent tumor T2. (A) Mutation accumulation as a function of the inverse of allele frequency 1/f (black) and phases from automated segmentation (breakpoints (grey) and segments (blue)). The confidence band at 95% level is indicated in grey. (B) Nonsynonymous to synonymous ratio. Purple and blue stars show nonsynonymous and synonymous mutations, respectively. The smoothened ratio is shown in red. (C) Clock-like and non-clock-like mutational signatures. (D) Mutation accumulation as a function of time.

    (PDF)

    S2 Fig. Transition of primary tumor to recurrent tumor.

    (A) Dynamics of growth rate ω times tumor cell survival rate γ times number of cells N, for (P) the primary tumor and (R) the recurrence. (B) Time-resolved mutation accumulation for primary tumor and recurrence.

    (PDF)

    S1 Text. Supplementary Methods.

    (PDF)

    S1 Table. Supplementary Table 1.

    P-values for test of difference between segments for P. virginalis and glioblastoma samples.

    (DOCX)

    S2 Table. Supplementary Table 2.

    Characteristics of tumor cell survival ratio γRP (n = 20).

    (DOCX)

    S3 Table. Supplementary Table 3.

    Procambarus virginalis Samples.

    (DOCX)

    S4 Table. Supplementary Table 4.

    Raw sequencing data for new and resequenced Procambarus virginalis samples.

    (DOCX)

    S1 Data. Source Data for Figures.

    (XLSX)

    Attachment

    Submitted filename: PGENETICS-D-23-00320R1_Response-to-reviewers.pdf

    Data Availability Statement

    Sequence data for marbled crayfish data have been deposited as a National Center for Biotechnology Information BioProject (accession number: PRJNA356499). Glioblastoma data were accessed from the European Genome-phenome Archive (EGA) database, with accession number: EGAS00001003184 (glioblastoma).


    Articles from PLOS Genetics are provided here courtesy of PLOS

    RESOURCES