Tumor growth is an evolutionary process governed by somatic mutation, clonal selection and random genetic drift, constrained by the co-evolution of the microenvironment1,2. Tumor subclones are subpopulations of tumor cells with a common set of mutations resulting from the expansion of a single cell during tumor development, and have been observed in a significant fraction of cancers and across multiple cancer types3. Peter Nowell proposed that tumors evolve through sequential genetic events4, whereby one cell acquires a selective advantage so that its lineage becomes predominant. According to this traditional model, the selective advantage is conferred by a small set of driver mutations, but, as the subclones that bear them expand successively, they accumulate passenger mutations as well, which can be detected in sequencing experiments1. Genomes of individual tumors contain hundreds to many thousands of these genetic variants, at a wide range of frequencies5,6. Given that genetic drift alone can drive novel variants to high frequencies, it is of great interest to discern the relative importance of selection and drift in shaping the frequency distribution of variants in any given tumor.
Williams et al.7 recently proposed a way to do so. They found that a simple model of tumor growth in which all novel variants are selectively neutral, that is, whose dynamics are governed entirely by drift, predicts a linear relationship between the number of mutations M(f) present in a fraction f of cells and the reciprocal of that fraction: They argued that deviation from this null model, i.e. the R-squared of the linear fit is below the minimum observed in neutral simulations (R2 < 0.98), indicates the presence of selection and that this can be tested by means of variant allele frequencies (VAFs) from which f can be derived. Applying this rationale to real cancer data from The Cancer Genome Atlas (TCGA), the test proposed by Williams et al. did not reject the null model, that is neutrality, in about one third of the cases and the authors concluded that these tumors are neutrally evolving. More recently, multiple myelomas with evidence for the proposed linear relationship were associated with poorer prognosis8.
While providing an interesting approach to infer selection in human cancers, unfortunately four major simplifying assumptions underlie the analysis by Williams et al. that might render the conclusions questionable.
First, inferring f of variants from their VAF requires accurate estimates of local copy number, overall tumor purity and ploidy. Williams et al. attempted to account for some of these factors by restricting their analyses to variants with VAF between 0.12 and 0.24 and located in copy-neutral regions of the genome. However, even in that limited VAF window, the VAF of a mutation does not reflect its true f in many cases. For example, in tumors with whole genome duplications, i.e. 37% of tumors in the analyzed dataset9, the peak of clonal mutations acquired after the whole genome doubling event is at or below VAF = 0.25 (one out of four copies in a 100% pure tumor sample), which would lead to artificial deviation from the linear fit within that VAF window.
Second, the interpretation of the analyses is inconsistent with the use of neutrality as a null model. Failure to reject the null hypothesis is not the same as proving it true, i.e. that all neutral simulations have R2 > 0.98 does not prove that non-neutral simulations would never yield R2 > 0.98. One would need to demonstrate that this condition is sufficient to infer neutrality but also, no equally suited models of non-neutral tumor growth should yield R2 > 0.98.
To assess this, we simulated simple tumor growth in which we explicitly model one subclonal expansion with a selective advantage, i.e. increasing its division rate λ and/or the mutation rate µ of the subclone (Supplementary Methods). Using the original method described by Williams et al., neutrality is rejected only within a narrow range of λ and µ values tested that would lead to detectable subclones (true rejection of neutrality in ~11% of simulations; Fig. 1a). We conclude that a linear fit with R2 > 0.98 is not sufficient to call neutrality and that improper use of this model could result in substantial over-calling of neutrality.
Figure 1.
(a) Neutrality calls in simulations of tumor growth with subclonal expansion underlying selective sweeps. The tree topology being modelled is represented on the right together with the parameters of the neutral evolution equations for the two subpopulations of cells (Supplementary Methods). The subclone’s fraction (subclone %) increases with its selective advantage advsubclone. We vary the λ = 1 + advsubclone and µ parameters of the subclone along a grid. Simulations are defined as true non-neutral (light blue) or false neutral (dark blue) when the growing subclone has expanded sufficiently to be detectable and the sweep is not complete, i.e. 10% ≤ subclone % ≤ 90%, otherwise the subclone is considered beyond detection (light green). Non-neutral call: R2 < 0.98; neutral call: R2 ≥ 0.98. (b) As (a), using the Gillespie algorithm to simulate branching processes10. Simulations leading to subclones beyond detection are either called neutral (light green) or non-neutral (dark green). Because of the stochastic nature of branching processes, different subclone % values are obtained across simulations from the same advsubclone values. For five increasing advsubclone values, we report median ± mad of the subclone % across the simulations. (c) Summary ROC curve for the neutral vs. non-neutral classification based on the R2 values in 1,919 non-neutral simulations from (b), and 1,919 simulations of neutral tumors. The false positive rate and the true positive rate are highlighted for R2 = 0.98 used by Williams et al. (d) dN/dS analysis. Maximum likelihood estimates of the dN/dS ratios and associated 95% confidence intervals for (sub)clonal mutations in TCGA tumors categorized into neutral and non-neutral groups. Ratios for missense and truncating mutations are given. dN/dS > 1 indicates positive selection.
Third, the deterministic model of tumor growth described by Williams et al. relies on strong biological assumptions, among which are synchronous cell divisions, constant cell death and constant mutation and division rates. Stochastic models of tumor growth are biologically more realistic, as they allow for asynchronous divisions and probabilistic mutation acquisition, cell death and division rates. Using simple branching processes to simulate neutral and non-neutral growth10 (Supplementary Methods), we show that R2 > 0.98 for is neither a necessary nor a sufficient property of neutrally evolving tumors (Fig. 1b). Although it can be shown that the expected cumulative number of mutations – i.e. the average over many independent samples – 10 due to the biological noise modeled in branching processes, a typical realization of the neutral process in a single sample deviates substantially from the expected linear fit, rendering an R-squared threshold inaccurate to infer neutrality. As a result, discrimination of neutral and non-neutral simulated tumors using a linear fit is almost arbitrary, with 53.5% false positive neutral calls in non-neutral tumors (Fig. 1b) and an area under the ROC curve of 0.42 for the classification of 1,919 neutral and 1,919 non-neutral tumors (Fig. 1c).
Fourth, we reason that in tumors called neutral, no subclonal selection should be detected. To evaluate this, we use an orthogonal method to identify selection, based on the observed variants themselves rather than on their allele frequencies. dN/dS analysis derives the fraction of mutated non-synonymous positions to the fraction of mutated synonymous positions in the coding regions. It has been widely used to detect the presence of negative or positive selection of non-synonymous variants in coding regions11,12. We applied a dN/dS model optimized for the detection of selection in somatic cancer variants13 to TCGA exome data using a published list of 192 cancer genes14 (Supplementary Methods). The analysis was performed separately using variants called as clonal or subclonal (Supplementary Methods), in tumors called neutral and non-neutral based on the rationale outlined by Williams and colleagues7. dN/dS ratio analysis revealed significant positive selection in subclonal mutations of tumors classified as neutral (Fig. 1d), further suggesting that the approach described by Williams et al. is under-equipped to detect the presence or absence of selection.
In summary, Williams et al. proposed that about one third of tumors are neutrally evolving. However, we highlight four simplifying assumptions – to our knowledge not previously highlighted – and find that the proposed approach will often identify individual tumors as neutral when they are non-neutral and non-neutral when they are neutral. A new paper by the same group15 introduces a Bayesian test for detecting selection from VAFs. The test estimates selection coefficients and, as such, is an important advance over Williams et al.’s frequentist test, which does not. The authors acknowledge that the test can only detect large fitness differences, but nevertheless call tumors that fail it “neutral” when they are merely those in which a weak test has failed to detect selection. We note that neutral theory has been developed in population genetics, ecology and cultural evolution and that similar tests have been proposed in all of these fields and, in all, eventually been found wanting for the same reason: variant abundance distributions do not contain enough information to exclude selection16–18. It is of clinical importance to identify and better understand the drivers of the potentially more aggressive (sub)clones expanding under selective biological or therapeutic pressure, as these are good candidates for predicting resistance and exploring combination therapy. Williams et al. are to be commended for having introduced explicit neutral tumor growth models into tumor genomics. However, quantifying the relative importance of drift and selection in shaping the allele frequencies of single tumors clearly remains an open challenge. Studies relying on their proposed test (e.g. 8) might, then, need reevaluation.
Supplementary Material
Acknowledgments
This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202), and the Wellcome Trust (FC001202). MT is a postdoctoral fellow supported by the European Union’s Horizon 2020 research and innovation program (Marie Skłodowska-Curie Grant Agreement No. 747852-SIOMICS). PVL is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of The Francis Crick Institute. IM is funded by a Cancer Research UK Career Development Fellowship (C57387/A21777). DCW is funded by the Li Ka Shing foundation. This work was supported by grant 1U24CA210957 to PTS. FM would like to acknowledge the support of The University of Cambridge, Cancer Research UK and Hutchison Whampoa Limited. Parts of this work was funded by CRUK core grant C14303/A17197. This project was enabled through access to the MRC eMedLab Medical Bioinformatics infrastructure, supported by the Medical Research Council (grant number MR/L016311/1). Parts of the results published here are based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.
Footnotes
Members of the PCAWG Evolution and Heterogeneity Working Group
Stefan C. Dentro1,2,3,*, Ignaty Leshchiner4,*, Moritz Gerstung5,*, Clemency Jolly1,*, Kerstin Haase1,*, Maxime Tarabichi1,2,*, Jeff Wintersinger6,7,*, Amit G. Deshwar6,7,*, Kaixian Yu8,*, Santiago Gonzalez5,*, Yulia Rubanova6,7,*, Geoff Macintyre9,*, David J. Adams2, Pavana Anur10, Rameen Beroukhim4,11, Paul C. Boutros6,12, David D. Bowtell13, Peter J. Campbell2, Shaolong Cao8, Elizabeth L. Christie13,14, Marek Cmero14,15, Yupeng Cun16, Kevin J. Dawson2, Jonas Demeulemeester1,17, Nilgun Donmez18,19, Ruben M. Drews9, Roland Eils20,21, Yu Fan8, Matthew Fittall1, Dale W. Garsed13,14, Gad Getz4,22,23,24, Gavin Ha4, Marcin Imielinski25,26, Lara Jerman5,27, Yuan Ji28,29, Kortine Kleinheinz20,21, Juhee Lee30, Henry Lee-Six2, Dimitri G. Livitz4, Salem Malikic18,19, Florian Markowetz9, Inigo Martincorena2, Thomas J. Mitchell2,31, Ville Mustonen32, Layla Oesper33, Martin Peifer16, Myron Peto10, Benjamin J. Raphael34, Daniel Rosebrock4, S. Cenk Sahinalp19,35, Adriana Salcedo12, Matthias Schlesner20, Steven Schumacher4, Subhajit Sengupta28, Ruian Shi6, Seung Jun Shin8,36, Lincoln D. Stein12, Ignacio Vázquez-García2,31, Shankar Vembu6, David A. Wheeler37, Tsun-Po Yang16, Xiaotong Yao25,26, Ke Yuan9,38, Hongtu Zhu8, Wenyi Wang8,#, Quaid D. Morris6,7,#, Paul T. Spellman10,#, David C. Wedge3,39,#, Peter Van Loo1,17,#
1The Francis Crick Institute, London NW1 1AT, United Kingdom; 2Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom; 3Big Data Institute, University of Oxford, Oxford OX3 7LF, United Kingdom; 4Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; 5European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge CB10 1SD, United Kingdom; 6University of Toronto, Toronto, ON M5S 3E1, Canada; 7Vector Institute, Toronto, ON M5G 1L7, Canada; 8The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; 9Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, United Kingdom; 10Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97231, USA; 11Dana-Farber Cancer Institute, Boston, MA 02215, USA; 12Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada; 13Peter MacCallum Cancer Centre, Melbourne, VIC 3000, Australia; 14University of Melbourne, Melbourne, VIC 3010, Australia; 15Walter + Eliza Hall Institute, Melbourne, VIC 3000, Australia; 16University of Cologne, 50931 Cologne, Germany; 17University of Leuven, B-3000 Leuven, Belgium; 18Simon Fraser University, Burnaby, BC V5A 1S6, Canada; 19Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada; 20German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; 21Heidelberg University, 69120 Heidelberg, Germany; 22Massachusetts General Hospital Center for Cancer Research, Charlestown, MA 02129, USA; 23Massachusetts General Hospital, Department of Pathology, Boston, MA 02114, USA; 24Harvard Medical School, Boston, MA 02215, USA; 25Weill Cornell Medicine, New York, NY 10065, USA; 26New York Genome Center, New York, NY 10013, USA; 27University of Ljubljana, 1000 Ljubljana, Slovenia; 28NorthShore University HealthSystem, Evanston, IL 60201, USA; 29The University of Chicago, Chicago, IL 60637, USA; 30University of California Santa Cruz, Santa Cruz, CA 95064, USA; 31University of Cambridge, Cambridge CB2 0QQ, United Kingdom; 32University of Helsinki, 00014 Helsinki, Finland; 33Carleton College, Northfield, MN 55057, USA; 34Princeton University, Princeton, NJ 08540, USA; 35Indiana University, Bloomington, IN 47405, USA; 36Korea University, Seoul, 02481, Republic of Korea; 37Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; 38University of Glasgow, Glasgow G12 8RZ, United Kingdom; 39Oxford NIHR Biomedical Research Centre, Oxford OX4 2PG, United Kingdom.
Competing interest
The authors declare no competing interests.
Author contribution
MT, IM, MG, AML, FM, PTS, QDM, OCL, DCW, PVL participated in argumentation. MT, OCL, DCW and PVL derived the deterministic equations. MT wrote the code and generated the figures, with input from IM, MG, OCL, DCW and PVL. MT, OCL, DCW, PVL drafted the manuscript, revised by IM, MG, AML, FM, PTS, and QDM. All authors read and approved the manuscript.
References
- 1.Greaves M, Maley CC. CLONAL EVOLUTION IN CANCER. Nature. 2012;481:306–313. doi: 10.1038/nature10762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yates LR, Campbell PJ. Evolution of the cancer genome. Nat Rev Genet. 2012;13:795–806. doi: 10.1038/nrg3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Andor N, et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat Med. 2016;22:105–113. doi: 10.1038/nm.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194:23–28. doi: 10.1126/science.959840. [DOI] [PubMed] [Google Scholar]
- 5.Nik-Zainal S, et al. The Life History of 21 Breast Cancers. Cell. 2012;149:994–1007. doi: 10.1016/j.cell.2012.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dentro SC, Wedge DC, Van Loo P. Principles of Reconstructing the Subclonal Architecture of Cancers. Cold Spring Harb Perspect Med. 2017:7. doi: 10.1101/cshperspect.a026625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat Genet. 2016;48:238–244. doi: 10.1038/ng.3489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Johnson DC, et al. Neutral tumor evolution in myeloma is associated with poor prognosis. Blood. 2017;130:1639–1643. doi: 10.1182/blood-2016-11-750612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zack TI, et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013;45:1134–1140. doi: 10.1038/ng.2760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bozic I, Gerold JM, Nowak MA. Quantifying Clonal and Subclonal Passenger Mutations in Cancer Evolution. PLOS Comput Biol. 2016;12:e1004731. doi: 10.1371/journal.pcbi.1004731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986;3:418–426. doi: 10.1093/oxfordjournals.molbev.a040410. [DOI] [PubMed] [Google Scholar]
- 12.Goldman N, Yang ZA. codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- 13.Martincorena I, et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell. 2017;171:1029–1041.e21. doi: 10.1016/j.cell.2017.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Forbes SA, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017;45:D777–D783. doi: 10.1093/nar/gkw1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Williams MJ, et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat Genet. 2018:1. doi: 10.1038/s41588-018-0128-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hammal OA, Alonso D, Etienne RS, Cornell SJ. When Can Species Abundance Data Reveal Non-neutrality? PLOS Comput Biol. 2015;11:e1004134. doi: 10.1371/journal.pcbi.1004134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Herzog HA, Bentley RA, Hahn MW. Random drift and large shifts in popularity of dog breeds. Proc R Soc B Biol Sci. 2004;271:S353–S356. doi: 10.1098/rsbl.2004.0185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Leigh EG. Neutral theory: a historical perspective. J Evol Biol. 2007;20:2075–2091. doi: 10.1111/j.1420-9101.2007.01410.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.