Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Apr 5.
Published in final edited form as: Nat Rev Genet. 2017 Feb 13;18(4):213–229. doi: 10.1038/nrg.2016.170

The evolution of tumour phylogenetics: principles and practice

Russell Schwartz 1, Alejandro A Schäffer 2
PMCID: PMC5886015  NIHMSID: NIHMS952104  PMID: 28190876

Abstract

Rapid advances in high-throughput sequencing and a growing realization of the importance of evolutionary theory to cancer genomics have led to a proliferation of phylogenetic studies of tumour progression. These studies have yielded not only new insights but also a plethora of experimental approaches, sometimes reaching conflicting or poorly supported conclusions. Here, we consider this body of work in light of the key computational principles underpinning phylogenetic inference, with the goal of providing practical guidance on the design and analysis of scientifically rigorous tumour phylogeny studies. We survey the range of methods and tools available to the researcher, their key applications, and the various unsolved problems, closing with a perspective on the prospects and broader implications of this field.


Cancer is a genetic disease characterized by a progressive accumulation of genomic aberrations that are sometimes augmented by predisposing germline mutations1. In the 1970s, Nowell2 and others proposed that this accumulation of mutations is guided by evolutionary principles via a process of diversification and selection for mutations that promote tumour cell proliferation and survival. The idea that evolutionary mechanisms underlie cancer progression has become a guiding principle in understanding, predicting, and controlling cancer progression3, metastasis4, and therapeutic responses5,6. Models of tumour evolution have incorporated advanced evolutionary theory79 and complex evolutionary mechanisms that have been revealed by modern genomic technologies10,11. The application of evolutionary principles to cancers has blossomed into a field in its own right, with a rich foundation of theory and methods for interpreting tumour evolution12,13. Here, we survey one influential thread: the use of phylogenetics — that is, evolutionary tree building — to understand tumour progression.

Although evolutionary theory has proven to be powerful for understanding cancer progression, evolutionary processes are quite different in cancers versus in species14 in ways that are important to phylogenetic inference. These differences manifest in at least four areas: first, the types of aberration that commonly arise; second, the rates of mutation; third, the extent and intensity of selection; and fourth, the typically high heterogeneity of tumour cell subclones. One frequent feature of cancer evolution is hypermutability15, often associated with types of mutation that are rare in species evolution. Hypermutability phenotypes include chromosome instability (CIN) phenotypes that are characteristic of p53 dysfunction16, microsatellite instability (MIN)17, and elevated point mutation phenotypes, such as those arising from dysregulation of the APOBEC family of deaminase proteins17,18. Some variant types, such as copy number variants (CNVs), may be induced by multiple mechanisms — including breakage–fusion–bridge (BFB) cycles, missegregation of chromosomes, and genome doubling — each producing distinct scales and locations of aberrations1922. Other tumour-specific mutational mechanisms include the following: kataegis23, in which single nucleotide variants (SNVs) occur at a high rate in a small chromosomal region; chromothripsis24, in which a single chromosome shatters and reassembles in a seemingly random manner; and chromoplexy25, a complex structural variation characterized by chains of BFB-induced chromosome rearrangements occurring in successive mitoses.

Likewise, patterns of elevated SNV accumulation can differ widely by tissue of origin or from patient to patient. Alexandrov et al.26 characterized dozens of ‘mutation signatures’ defining the nucleotide biases exhibited in subsets of cancers, some with known environmental triggers27, others attributable to specific sources of somatic hypermutability18, and some of unknown cause. Mechanisms of hypermutability may vary by tumour and over time in ways that are not observed in species evolution21,2831. Treatment creates another complication, as chemotherapy or radiation therapy can themselves cause double-strand breaks in the DNA32 or other forms of hypermutation33,34, inducing new mutation signatures26,30. Conversely, prophylactic therapies can suppress hypermutability35.

The predominant mechanisms of selection in cancers also differ from those in species evolution. Most studies of tumour evolution have assumed selection for mutations that promote survival, proliferation, or other phenotypic hallmarks of cancer36. Selection, like diversification, can be dynamic, as cell populations adapt to or change their microenvironment11. However, recent work has suggested that selection often plays only a minor part in tumour evolution, in contrast to its role in Darwinian evolution of species. The repeated observation of substantial intra-tumour heterogeneity21,3744 runs counter to the idea that only the fittest subclones survive. Some recent studies have suggested that some tumours evolve by effectively neutral processes without selection, at least pretreatment4547. It has been suggested that strong versus weak selection might be reconciled by a ‘punctuated equilibrium’ model9, in which long periods of slow mutation under weak selection are interrupted by short bursts of rapid evolution under strong selection, although this model cannot explain the evidence for a lack of selection in some tumours48.

Therapy must also be considered when modelling selection49,50. In contrast to the disagreement about whether tumour evolution is non-Darwinian at the pretreatment stage, there is general agreement that treatment leads to selection which can alter the dominant clones10,14,34,51. Single-agent treatment can lead to relapse49,52 by selecting for non-responsive clones29,53. Durable targeted therapies may require the identification of driver mutations in all tumour subclones and the design of patient-specific drug combinations8,11,54,55.

High heterogeneity is another characteristic feature of tumour evolution. Higher intra-tumour heterogeneity has been associated with poorer prognosis8,5658 and linked with the ability of the tumour to resist immune surveillance and therapy3,59,60. Progression, metastasis, and therapeutic resistance frequently proceed from clones that were rare at earlier progression stages41,43,49,61. Interactions among distinct clones may also drive tumour progression, for example through tumour self-seeding4,62 and cooperation between clones63,64.

This Review examines one important direction in which evolutionary models are shaping cancer research: the use of phylogenetic methods in interpreting genomic data from cancers. We specifically seek to provide guidance to the users of phylogenetic methods in cancer research and to those critically reading about those uses, especially those lacking formal training in phylogenetics. To accomplish that, we give a short overview of the field, we review past uses of tumour phylogenetics, and we explain some relevant principles of phylogenetic inference. We conclude with speculation about the challenges and opportunities for realizing the potential of phylogenetics in cancer research.

Overview of tumour phylogenetics

The recognition that cancer is an evolutionary phenomenon led to the insight that computational methods for reconstructing evolutionary processes — that is, phylogenetics — might prove valuable for making sense of tumour progression processes. Tsao et al. were among the first to suggest that variations in microsatellite markers could be used to infer a tree model of the evolution of tumour cells65. The idea was subsequently put into practice for bulk comparative genomic hybridization (CGH) data by Desper et al.66. After percolating for a decade within a specialist community of evolutionary and computational biologists, this type of analysis has exploded to become a new field known as tumour phylogenetics, which aims to reconstruct tumour evolution from genomic variations. In almost all cases, the goal of such work is to produce evolutionary trees, potentially allowing for uncertainty among the space of possible trees explaining a data set21,67,68.

Within that basic framework, tumour phylogenetics encompasses diverse methods. This diversity includes various data types, referring both to the basic study design (cross-cohort studies of many tumours, single-patient studies of regional bulk genomic assays, or studies of single-cell variability in single tumours) (FIG. 1) and the type or types of genomic data profiled (initially, pre-sequencing marker types such as large-scale CGH66 or fluorescence in situ hybridization (FISH)69; now, predominantly next-generation sequencing (NGS)-derived SNVs70 or CNVs71, and sometimes more exotic variant types such as gene expression, DNA methylation, or histone marks10,14,32,72,73). The diversity also includes variation by mathematical model; that is, the mathematical representation of the kinds of mutational processes one intends to study. The model may capture both the kind of mutations considered (for example, SNVs versus structural variants (SVs)20,74) and basic questions such as whether those mutations are assumed to be under selection2,7,11,14,17,72,75 or selectively neutral4,7679. Furthermore, this diversity of methods includes variation in the algorithms applied; that is, the computational instructions used to find an optimal tree or trees consistent with both the data and the model. The importance and utility of in silico models to study various phenomena in cancer goes far beyond tumour phylogenetics, and other kinds of models have been reviewed elsewhere12,13. Many of the papers cited therein take a traditional mathematical modelling approach with emphasis on the mathematics, on simulation studies, on parameter estimation, and on validating the model. As tumour phylogenetics has gained in popularity, phylogenetics now tends to show up as a small part of high-impact studies. These studies are understandably focused on data sets that were derived from human subjects and were expensive and complicated to collect. One of the main messages of this Review is that when mathematical models are used in these studies, the importance of validating the models against simulated and observed data should not be forgotten.

Figure 1. Classification of tumour phylogeny methods by study design.

Figure 1

a | Cross-sectional tumour phylogeny methods model distinct tumours (coloured circles) sampled from multiple patients as though they are species. These methods infer phylogenies (also known as oncogenetic trees) in which tumours are grouped approximately into subtypes, with tree edges corresponding to common recurring mutations that identify a subtype. b | Regional bulk tumour phylogeny methods are applied to bulk genomic samples from a single patient, typically subregions of a tumour or distinct tumour sites (coloured circles). Trees provide a coarse model of the major cell lineages developing over the course of progression in the single patient. c | Single-cell tumour phylogeny methods build phylogenetic trees using variations between single cells (coloured circles) in one or more tumour sites. Trees group cells into major clonal subgroups and infer shared ancestry and mutation events at the level of single clones.

Most studies of tumour phylogenetics to date have adapted standard algorithms that were developed for species phylogenetics (for example, maximum par-simony21,61, minimum evolution73, neighbour joining71,80, UPGMA21, or various maximum likelihood or Bayesian probabilistic inference methods81,82), occasionally comparing multiple standard approaches in a single study21,83 (TABLES 1,2). Only recently have new phylogeny algorithms emerged to deal with the peculiarities of tumour versus species evolution8488. In the next section, we survey the diversity of methods available, with particular focus on those suited to modern sequencing technologies.

Table 1.

Software tools available for tumour phylogenetics

Tool* Data type Model type Algorithm type Refs
Cross-sectional data
TO-DAG Bulk, presence/absence of any aberrations Probabilistic Combinatorial optimization 91
ct-cbn CNVs Probabilistic (non-standard) Specialized 113
NAM CNVs Probabilistic (non-standard) Maximum likelihood (EM) 112
N/A CNVs Distance-based (various) Several off-the-shelf 125
N/A CNVs Maximum parsimony Combinatorial optimization 177
RESIC (generalized from genes to pathways) DNAseq-based SNVs and CNVs, and gene expression Probabilistic (pathway generalization of RESIC) Specialized (simulation-based) 118
RESIC DNAseq-based SNVs and CNVs or gene expression Probabilistic Specialized (simulation-based) 117,§
N/A Gain/loss events Probabilistic (non-standard) Statistical hypothesis testing and PCA 109
METREX Gene expression Distance-based (various) WLS via Fitch (from Phylip), neighbour joining, and FASTME 123
N/A Gene expression Distance-based (WLS minimum evolution) Fitch (from Phylip) 124
unmix Gene expression Distance-based (minimum spanning tree) Combinatorial optimization (with deconvolution) 139,§
Rtreemix Generalized binary mutation array, cross-sectional Probabilistic (mixture model) Maximum likelihood (EM) 115
Mtreemix Generalized mutation array Probabilistic (mixture model) Maximum likelihood (EM) 111§
oncotrees Large CNVs or cytogenetic breaks Statistical (non-standard) Combinatorial optimization 66§
oncotrees Large CNVs or cytogenetic breaks Distance-based (non-standard) Combinatorial optimization 122
DiProg Large CNVs or cytogenetic breaks Probabilistic Combinatorial optimization (ILP) 120
oncomodel Large CNVs or cytogenetic breaks Probabilistic Maximum likelihood 108
N/A Large CNVs or cytogenetic breaks Statistical (non-standard) Custom heuristic optimization 178
BML Mutational array Probabilistic Bayesian sampling (MCMC) 90
CAPRI, TRONCO, and PiCnic Mutational array Specialized probabilistic (PiCnic is a general pipeline) Custom heuristic optimization 121,179,180
Single-patient, bulk data
PhyloSub SNVs Probabilistic Bayesian sampling (MCMC) and maximum likelihood (EM) 119§
BitPhylogeny Methylation, WGS Probabilistic Bayesian sampling (MCMC) 86§
GRAFT DNAseq-based SNVs, CNVs, and rearrangements Specialized Combinatorial optimization 134
Single-patient, multiple-site, bulk data
cITUP DNAseq-based SNV VAFs Probabilistic (joint deconvolution and phylogenetics) Combinatorial optimization (quadratic programming) 144
MEDICC DNAseq- or CGH-based CNVs Minimum evolution Combinatorial optimization 85§
TuMult CNVs (large-scale) Maximum parsimony Combinatorial optimization 129§
Clomial DNAseq-based SNV VAFs Probabilistic Maximum likelihood (EM) 142
PhyloWGS DNAseq-based SNV and CNV VAFs Probabilistic Bayesian sampling (MCMC) 135§
Canopy DNAseq-based SNV and CNV VAFs Probabilistic Bayesian sampling (MCMC) 137
SPRUCE DNAseq-based SNV and CNV VAFs Specialized (joint deconvolution and phylogenetics) Combinatorial enumeration 136
SubcloneSeeker Any variant with a VAF Specialized (joint deconvolution and phylogenetics Combinatorial enumeration 138
AncesTree SNVs Weighted parsimony Combinatorial optimization (ILP) 131
rec-BTP SNVs Specialized (joint deconvolution and phylogenetics) Combinatorial optimization 130
LICHeE SNVs Specialized (joint deconvolution and phylogenetics) Combinatorial optimization 132
SCHISM Output of a clone prediction program such as PyClone or SciClone Probabilistic Maximum likelihood (genetic algorithm) 143
Single-patient, single-cell data
N/A FISH Probabilistic Maximum likelihood (EM) 181
N/A FISH Probabilistic Maximum likelihood (EM) 69
N/A FISH Weighted maximum parsimony (with constraint satisfaction) Combinatorial optimization (ILP) 169
FISHtrees FISH Maximum parsimony (with several different formulations of the optimization problem) Combinatorial optimization 84§,149§,151§,152§
N/A FISH Maximum parsimony (rectilinear) Combinatorial optimization 153
N/A qPCR and FISH Maximum parsimony Combinatorial optimization (PAUP) 182
OncoNEM scSeq-based SNVs Probabilistic Maximum likelihood (specialized heuristic) 154§
SCITE scSeq-based SNVs Probabilistic Bayesian sampling (MCMC) 87
muttree SNVs Probabilistic Maximum likelihood (specialized optimization) 89

CGH, comparative genomic hybridization; CNV, copy number variant; DNAseq, bulk DNA sequencing; EM, expectation maximization; FISH, fluorescence in situ hybridization; ILP, integer linear programming; MCMC, Markov chain Monte Carlo; MST, minimum spanning tree; N/A, not applicable; PCA, principal components analysis; qPCR, quantitative PCR; scSeq, single-cell sequencing; SNV, single nucleotide variant; VAF, variant allele frequency; WGS, whole-genome sequencing; WLS, weighted least squares.

*

Additional related tools, including tools that identify subclones by deconvolution, are listed in Supplementary information S1 (table), which also contains more information, including the URLs, for the tools listed here. For consistency with the text, the order of tools is sorted primarily by study type and secondarily by data type. Supplementary information S1 (table) is provided in Excel and includes an explicit study type column to allow the reader to sort the rows in the same way or in other ways.

These studies have some phylogenetic aspects, but do not produce phylogenies as their primary output.

§

These studies use some of the more important or innovative software packages.

Table 2.

Case studies using tumour phylogenetics

Data type* Method Refs
WGS Bayesian 183
WES and FISH cytogenetics Bayesian 101
Binary SGAs BEAST and PAUP 35
WES Broad Institute custom heuristic (parsimony, branch sibling model, and grafting at tips) 98
WES Broad Institute custom heuristic (parsimony, branch sibling model, and grafting at tips) 147
WES-based SNVs and SNP-based CNVs Broad Institute custom heuristic (parsimony, branch sibling model, and grafting at tips) 22
DNAseq REFS 23,53 67
WGS and targeted deep sequencing Custom heuristic in three stages 50
WES Custom heuristic in three stages 148
Ultra-deep multi-region DNAseq-based SNVs Custom heuristic in three stages 28
FISH FISHtrees 41
FISH FISHtrees 102
FISH FISHtrees 58
DNAseq, methylation, and CNVs Minimum evolution and third-party tool for some CNVs 73
CNVs MEDICC 43
DNAseq, aCGH, and FISH MEDICC 47
SNP-based CNVs, and LOH MEDICC 184
Cross-sectional DNAseq Mtreemix 110
Karyotyping Mtreemix and REF. 114 92
aCGH Neighbour joining 56
Regional aCGH Neighbour joining 145
scSeq-based CNVs Neighbour joining 71
WGA scSeq Neighbour joining 156
scSeq Neighbour joining 80
scSeq Neighbour joining 185
CNVs, RNA expression, and methylation Neighbour joining and RESIC 186
Single-cell microsatellite Neighbour joining using L1 distance between alleles 78
Targeted methylation Neighbour joining using methylation Hamming distance and ABC parameter inference 79
Targeted methylation Neighbour joining using methylation Hamming distance and ABC parameter inference 187
Targeted DNAseq Neighbour joining from ape with clones from PyVCF 100
Single-cell microsatellite Neighbour joining with Manhattan distance 76
Separate SNVs and CNVs with blood normal Neighbour joining, maximum likelihood, and ultrametric 83
20 poly(G) tracts, regional Neighbour joining with bootstrapping, and MrBayes 106
Deep DNAseq Not documented 54
CNVs Not documented 188
WES-based SNVs and CNVs Not documented 64
Ultra-deep multi-region DNAseq Not documented 40
WES, and genotyping for SNVs Not documented 46
Regional WES-based VAFs, and aneuploidy Maximum parsimony 146
Microsatellites Maximum parsimony (Camin–Sokal) from Phylip (using MIX) 172
DNAseq-based SNVs, CNVs, and gene-fusion VAFs Maximum parsimony (manual) 158
WES Maximum parsimony (Wagner) from Phylip 61
WES Maximum parsimony (Wagner) from Phylip 99
SNVs and CNVs Maximum parsimony from Phylip (using B&B) 42
Deep DNAseq-based SNVs and indels Maximum parsimony from Phylip (using Dnapars) 49
Microsatellites Maximum parsimony from Phylip (using penny) 96
Regional WES-based SNVs Maximum parsimony in phangorn (in Bioconductor) 30
scSeq-based CNVs Maximum parsimony in phangorn (in Bioconductor) 157
WES, with verification by ultra-deep NGS Maximum parsimony, with a third-party max-mini tool 167
WES Maximum parsimony, maximum likelihood, and Bayesian 68
Targeted deep sequencing Maximum parsimony from Phylip with bootstrapping 189
WGS and WES Maximum parsimony and UPGMA using MEGA5 21
WES, FISH, and targeted deep sequencing PyClone and neighbour joining on three FISH loci 70
WGS-based SNVs and CNVs, and a custom targeted method for clonality PyClone, EXPANDS and BIoNJ 53
WES PyClone and customized maximum likelihood 190
WGS- or array-based CNVs, and scSeq TITAN and PyClone for bulk data, and MrBayes for scSeq data 191
CNVs (and other data for non-phylogenetic analyses) TuMult 59
WES TEDGs 34
Microsatellites Statistical analysis of allele sizes 107
X chromosome microsatellites in males Statistical analysis of allele sizes 165

See Supplementary information S2 (table) for more information about each of these studies, and also for some comparable studies that did not use tumour phylogenetics but addressed similar problems by non-phylogenetic methods. ABC, approximate Bayesian computation; aCGH, array comparative genomic hybridization; CNV, copy number variant; FISH, fluorescence in situ hybridization; DNAseq, bulk DNA sequencing; indel, small insertions or deletions; LOH, loss of heterozygosity; NGS, next-generation sequencing (typically a mixture of WES and WGS); scSeq, single-cell sequencing; SGAs, somatic genetic abnormalities; SNV, single nucleotide variant; TEDGs, tumor evolutionary directed graphs; VAFs, variant allele frequencies; WES, whole-exome sequencing; WGA, whole-genome amplification; WGS, whole-genome sequencing.

*

DNA samples are bulk samples unless they are explicitly noted as coming from single cells.

These studies are, in our opinion, the most important, innovative or controversial.

This variety of phylogeny methods has corresponded to a variety of applications. Tumour evolutionary trees, which were once merely conceptual models2, are now central in the results of many studies11. Early uses of phylogeny methods often focused on applying the new tool of tumour phylogenetics to old problems, such as using evidence of evolutionary selection to separate driver mutations from passenger mutations29,50, or using novel algorithms to find the order and timing of driver mutations8991 or to determine how these driver mutations associate with progression stages92. Other key results have emerged organically, for example from studies addressing the still controversial question of whether tumour evolution follows the expectations of classical clonal evolution theory9395 in producing predominantly linear phylogenies54,76,96,97, whether it exhibits predominantly branched evolution exemplified by the early divergence of subclones30,33,40,42,49,73,83,98100, or whether it occupies some continuum encompassing both extremes in different tumours34,101. Researchers continue to find new applications for phylogeny models, such as the use of phylogenies prognostically to predict the likely future progression of a tumour43,58,85,92,102; such applications are an evolution of older approaches that have been used to predict progression from simpler measures of tumour heterogeneity38,58,59,102105.

One worrisome trend among these studies is their seemingly conflicting conclusions about the evolutionary trajectories of cancers, such as on the questions of linear versus branched evolution or Darwinian selection versus no selection. The distinctions may be traced to differences in the application of phylogenetics, such as looking at distinct marker types (for example, SNVs versus CNVs) or using distinct evolutionary models or phylogeny algorithms. For example, the studies that concluded that there was little selection in some tumours looked mostly at SNVs and CNVs, but perhaps there is selection in those tumours via evolutionary mechanisms that would be apparent only when looking at other marker types, such as karyotypes or methylation patterns. Few studies have tested whether the phylogenetic inferences made are robust to a change of methods, with notable exceptions68,106.

Variations on tumour phylogenetics

Recent years have seen a rapid proliferation of methods for tumour phylogenetics. In this section, we categorize some of the seminal advances made. We can roughly distinguish three classes of method, based on the kind of phylogeny study for which they are designed: cross-sectional methods, which use data on many tumours to build trees describing the common progression pathways across a population; regional bulk methods, which build trees for single patients through bulk genomic assays of distinct tumour sites or regions; and single-cell methods, which build trees from the cell-to-cell variations in single tumours (FIG. 1). Not all methods fit neatly within one category, but the categories provide a crude organization for the description of methods below.

Within these high-level categories, we see a diversity of genomic data types (TABLE 3), evolutionary models, and phylogeny algorithms. Below, we consider a subset of methods that were of particular importance in introducing new techniques to the field or were of unique value to likely users. TABLE 1 and the extended version, Supplementary information S1 (table), provide a more comprehensive list of important methods. TABLE 2 and the extended version, Supplementary information S2 (table), list important studies that have made use of tumour phylogeny methods.

Table 3.

Experimental technologies and data types for tumour phylogenetics

Technology
and data type
Comments Refs
Pre-NGS technologies
Large-scale cytogenetic abnormalities Convenient before sequencing became ubiquitous, but superseded by more comprehensive genomic studies 66
Microsatellite markers Rapidly evolving, usually neutral markers 78
FISH Useful for probing small numbers of CNVs in single cells; largely displaced by scSeq, but still important owing to its practicality for much larger numbers of single cells 69
aCGH Early high-throughput method for bulk CNV profiling; still in use, although being displaced by DNAseq 145
Expression microarrays Convenient for high-throughput before RNAseq became widely available; not commonly used for phylogenetics, as it provides only a noisy and indirect measure of genetic evolution 123
SNP chips Designed initially for genotyping and association studies, but also used in many cancer studies to infer copy number profiles along the genome and to infer CNVs
Bulk sequence technologies
DNAseq SNVs Perhaps the most commonly used marker type, it provides whole-exome or whole-genome profiles of evolution by point mutations 117
DNAseq CNVs CNVs can be inferred by local changes in sequence coverage, instead of using aCGH or SNP arrays 117
RNAseq expression More precise and accurate replacement for expression microarrays; nonetheless remains a niche technology for phylogeny studies 124
DNA methylation Measured by bisulfite sequencing, provides unique information on the evolution of the cell state that is not apparent from conventional DNAseq methods; some methylation markers are neutral, others evolve to select for gene expression 79
scSeq technologies
DNAseq SNVs Uniquely powerful method for identifying large numbers of phylogenetic markers at the single-cell level; only recently making inroads as the technology has matured and data quality has improved 87,154
DNAseq CNVs Perhaps the dominant technology for single-cell tumour phylogenetics, offering coarse-grained profiles of evolution by copy number change; robust to data quality issues in emerging scSeq technologies 71
Single-cell microsatellites Not a widely used technique but one important to early tumour phylogeny studies; offers important advantages in profiling a putatively selectively neutral marker type 76

aCGH, array comparative genomic hybridization; CNVs, copy number variants; DNAseq, bulk DNA sequencing; FISH, fluorescence in situ hybridization; NGS, next-generation sequencing; RNAseq, RNA sequencing; scSeq, single-cell sequencing; SNVs, single nucleotide variants.

Cross-sectional tumour phylogenetics

Key ideas behind cross-sectional tumour phylogenetics originate in the pre-phylogenetic work of Fearon and Vogelstein, who proposed that bulk analysis of collections of tumours from multiple patients could allow one to infer the likely orders of aberrations and stages of progression (for example, from adenoma to carcinoma) so that each aberration is associated with progression to a specific stage93. They proposed a linear (event 2 follows event 1 follows event 0) model for the progression of colorectal cancer. This Fearon–Vogelstein model, although a simplification107, has been highly influential on thinking about tumour evolution. Phylogenetic methods were first brought to the reconstruction of tumour progression pathways by Desper et al., who generalized the Fearon–Vogelstein linear progression model to allow branching in the form of a tree, sometimes called an oncogenetic tree66. FIGURE 1a provides an illustration of the oncogenetic tree model for interpreting cross-sectional data that has come from multiple patients. In the original oncogenetic tree model, each tree edge corresponds to a possible aberration with an associated probability of occurrence. Paths in the tree correspond to possible sequences of accumulating aberrations.

Many methods have since applied this basic strategy of inferring trees or graphs of possible progression sequences from combinations of mutations observed across a patient cohort. We refer the reader to a general phylogenetics text82 for more background on the basic classes of phylogenetic models and algorithms summarized in TABLE 4 and their trade-offs. The original Desper et al. method66 was a character-based phylogeny method, meaning that it modelled evolution from a discrete set of phylogenetic markers (variant loci), and it was specifically a kind of maximum parsimony method, meaning that it was a combinatorial optimization method that sought to explain a data set with the smallest number of distinct mutations possible. Character-based methods tend to be most informative for reconstructing the sequence of mutations and unobserved ancestral states, but they become computationally infeasible on large marker sets. Parsimony methods are the most computationally efficient of the character-based methods, but they depend on the assumption that mutations are rather rare, which is a questionable assumption for tumours. The field later moved largely towards more sophisticated probabilistic character-based methods108,109, which seek either the most probable tree (the maximum likelihood method) or some measure of the space of possible trees and tree parameters (Bayesian sampling). Compared with the earlier approaches, such models better handle high mutation rates, noisy data, and uncertainty in tree inferences, but can be more computationally demanding than parsimony methods. Beerenwinkel et al.110,111 introduced an important class of probabilistic model that enables the joint inference of several possible trees for binary mutation data, via the Mtreemix tool, an approach that was later generalized to CNV data112,113 and became the basis of the newer Rtreemix package114116. More recent approaches include making better use of the detailed information specifically offered by DNA sequencing (for example, as in the RESIC117 approach and a later pathway-level variant118). Algorithmically, most such methods rely on comparatively faster maximum likelihood techniques113. However, more advanced Bayesian models commonly use variants of Markov chain Monte Carlo (MCMC) sampling, which is a statistical technique for exploring the ranges of possible tree models and evolutionary parameters but at a much greater computational cost than maximum likelihood methods90,119. The recurring theme of trade-offs between more realistic and more computationally tractable models has inspired a great deal of research into more exotic algorithmic techniques in this domain91,120,121.

Table 4.

Phylogeny models and algorithms

Model or algorithm name Description Refs
Evolutionary models and objective functions
Maximum parsimony (MP) Simplest phylogeny model; assumes that mutations are rare and so the tree with the fewest mutations is the most plausible 96
Minimum evolution (ME) Distance-based analogue of maximum parsimony; assumes that the tree with the least amount of evolution is the most plausible 85
Probabilistic Broad class of models well suited to complicated evolutionary scenarios, noisy data, and sampling over unknown evolutionary parameters; generally divided into maximum likelihood (ML), used to find one best-fitting tree for the data and model, and Bayesian, used to identify the space of plausible trees and parameters consistent with the model and data 86,110,111,135,137
Weighted least-squares (WLS) Distance-based model defining the most plausible tree as that most closely approximating an input set of distances between taxa by a mean-square measure 121
Phylogeny algorithms and algorithmic techniques
Combinatorial Broad class of methods frequently used for character-based phylogenies to optimize over a discrete set of possible topologies; generally the most efficient methods, but suitable only for simpler models; examples include B&B, in which one exhaustively searches a space of all possible solutions while avoiding provably unproductive subspaces, and integer linear programming (ILP) or quadratic programming (QP), in which one converts the problem to a special class of mathematical optimization for which efficient solver programs are available 89,131,144
Heuristic search Broad class of algorithms designed to approximately search a space of trees that are based on empirical effectiveness but are not proven to find the best possible trees; also used when solving for phylogeny models for which efficient, exact methods are unknown; a common generic heuristic is a genetic algorithm (for example, REF. 143), in which one generates a pool of possible solutions and ‘evolves’ them under a model of mutation and mating; many phylogeny-specific heuristics have also been developed (for example, REF. 82) 121,143,176
Neighbour joining (NJ) Fast method for phylogenetics by successively refining subtrees, approximating a minimum-evolution tree while allowing a possibility of temporally impossible scenarios 76
Unweighted pair group with arithmetic mean (UPGMA) Method for hierarchically constructing a tree by successively joining subtrees, yielding fast tree reconstruction but being dependent on the molecular clock hypothesis (that variation accumulates at equal rates in all tree branches) 21
Markov chain Monte Carlo (MCMC) Class of algorithms that is suitable to many forms of probabilistic model and allows one to explore parameter ranges and uncertainty in assignments, but is generally too computationally costly to use on trees of more than a small number of nodes 87

The major alternative to character-based methods are distance-based methods, which use mutation data to estimate evolutionary distances between samples, and these distances then serve as the basis for tree inference. Such methods can handle much larger marker sets at the cost of losing the fine-scale modelling of mutational events achieved by character-based methods. Desper et al. extended their approach to distance-based methods122 and later extended those from DNA to RNA expression data123. Riester et al.124 developed a similar approach specifically for RNA sequencing data using minimum evolution phylogenies, which is a distance-based analogue of parsimony methods. Liu et al.125 applied cross-sectional distance-based methods to CNVs using several off-the-shelf distance-based phylogeny tools.

Oncogenetic tree methods in recent years have been primarily used to analyse DNA sequencing-derived SNV or CNV data43,70,85, but they have also been used for methylation data73,79. They have proven to be valuable primarily for the original purpose of identifying combinations and orders of recurring driver mutations. In hindsight, the cross-sectional tumour phylogeny methods are domain-specific clustering methods that happen to use phylogenetics tools on the assumption that distinct tumours can share common evolutionary trajectories. However, this was not clear until sequencing studies revealed both inter-tumour and intra-tumour heterogeneity, and this finding is part of the ‘evolution’ of tumour phylogenetics alluded to in the title of this Review.

Given the diversity of methods available, though, one should be aware that simulation studies126,127 have shown that qualitative results may depend considerably on the model used to generate the data. Furthermore, most methods for cross-sectional data were developed before the extent of intra-tumour heterogeneity in observed data was appreciated37, and tree inferences from cross-sectional data can be unreliable in the presence of intra-tumour heterogeneity128. These latter observations help motivate the trend towards phylogenetic studies of single tumours, discussed below.

Regional bulk tumour phylogenetics

A major step forwards was the recognition that one could produce phylo-genies for single patients, initially through sampling multiple regions or tumour sites. One treats each site sequenced as if it were a species and infers a tree connecting those species. FIGURE 1b provides an illustration of a regional bulk phylogeny built from samples of multiple tumour sites and multiple regions within a tumour site for a single patient. The earliest such tools used data types that predate NGS, such as large-scale CNVs used by TuMult, a parsimony-based combinatorial algorithm129. Similar ideas have since been brought to DNA sequencing-derived data types, including SNVs (as used in rec-BTP130, AncesTree131, and LICHeE132) and CNVs (as used in TITAN133 and MEDICC85). Given the variations in the rates and mechanisms of SNV versus CNV evolution, some methods have found particular power in combining data types, as is done by GRAFT134, PhyloWGS135, SPRUCE136, and Canopy137. The available methods also cover a range of models and algorithmic techniques, including various combinatorial (parsimony-like) character-based methods130,131,134,138, probabilistic character-based methods133,135, and distance-based minimum evolution85.

An important variation on regional bulk tumour phylogenetics is the combination of phylogenetics with clonal deconvolution from bulk sequence139. Here, deconvolution means the inference of clonal subpopulations from one or more bulk genomic samples. Numerous tools are now available for clonal deconvolution (for example, SciClone140, PyClone141, and Clomial142). Some tumour phylogeny methods listed in TABLE 1 explicitly depend on clonal deconvolution either as a preprocessing step or integrated into the phylogenetic inference strategy. These include some early approaches to deconvolution that were motivated explicitly by the application to tumour phylogenetics139, tumour phylogeny methods such as SCHISM143, which require a third-party clonal deconvolution program to generate their input data, and tools such as cITUP144 and LICHeE132, which fully integrate deconvolution and phylogenetics into a single inference.

Regional bulk phylogenetics has been used in several seminal studies, building on earlier work on multi-region progression without explicit phylogenetics104. Early, pre-NGS examples of true multi-region tumour phylogenetics include the use of microsatellite markers by Khalique et al.96 and of array CGH (aCGH) by Navin et al.145. Regional bulk tumour phylogeny methods using DNA sequencing-derived markers have since seen extensive use (TABLE 2). Many studies that apply regional bulk phylogenetics approaches have relied on standard methods or phylogeny programs derived from species evolution42,61,73,79,146. Others have developed custom heuristic phylogeny approaches28,50,98,147,148 or relied on manual phylogeny-like inferences33,52. Only recently have mature third-party tools become available (TABLE 1) and begun to appear in case studies (TABLE 2). Examples include the studies by Schwarz et al.43, which applied the MEDICC software to ovarian tumours to demonstrate that relapse tumours typically show early divergence from the primary tumour, by Sottoriva et al.47, which also used MEDICC applied to colorectal tumours, resulting in the ‘big bang model’ of evolution without apparent selection, and by Sottoriva et al.59, which utilized TuMult129 to help demonstrate the role of intra-tumour heterogeneity in promoting resistance in glioblastomas.

Single-cell tumour phylogenetics

The advance that most raised awareness of tumour phylogenetics among non-computational cancer researchers was its application to single-cell data, allowing the generation of a phylogenetic tree based on individual tumour cells extracted from a single patient (FIG. 1c). Single-cell tumour phylogenetics predates single-cell sequencing (scSeq), as it was applied through various older methods offering more limited profiling of single cells via microsatellite76 or FISH69 markers; such approaches remain valuable owing to their ability to examine much larger numbers of cells than scSeq39,84,149 (TABLE 3). Nevertheless, the introduction of scSeq to tumour phylogenetics by Navin et al.71 deserves much of the credit for bringing tumour phylogenetics into the mainstream of cancer research. Since that work, methods for and applications of scSeq in tumour evolution have proliferated, along with related analyses on the data needs of robust scSeq-based phylogenetic analysis150.

The majority of published tools for single-cell phylogenetics are still based on pre-scSeq technologies84,148,151153, with just a handful having been developed specifically for scSeq. Kim and Simon89 introduced the muttree program, which uses a custom combinatorial inference to find trees optimized for a specialized probabilistic model that differs from the models used by other tools which accept the same input. Ross and Markowetz154 and Jahn et al.87 developed sophisticated Bayesian probabilistic models for scSeq-derived SNVs, and these models were implemented in OncoNEM and SCITE, respectively.

Most applications of scSeq phylogenetics to date have thus relied on tools for general species phylogenetics or on phylogenies that have been manually constructed without an explicit model or algorithm (for example, see REF. 39). Navin et al.71 relied on neighbour joining155, which had earlier been used by Frumkin et al.76 with microsatellite data, to infer phylogenies from scSeq-derived CNVs. Neighbour joining was also used by Xu et al.80 for application to renal cancers and by Wang et al.156 for what was, until recently157, the largest scSeq study of tumour evolution.

An illustrative tutorial

Data, models, algorithms

As more cancer research groups embrace phylogenetic methods, it becomes important to understand what goes into a phylogenetic analysis and how to interpret its results. Phylogenetics is a complicated subject for which tools can easily be misused. In this section, we provide guidance to aspiring users of tumour phylogenetics and those who want to read such work critically.

Our primary take-away message is this: there is no such thing as a generically ‘correct’ approach to phylogenetics. Phylogenetic inference, like many problems in bioinformatics, depends on a model representing the biological processes we seek to explain, a data source that we seek to explain in terms of that model, and an algorithm to fit the data to the model. Effective use of phylogenetics involves making appropriate choices of model, data, and algorithm so that all three are mutually consistent and suited to the question at hand.

To frame this discussion, we present it in terms of a hypothetical research study. Let us suppose we have a general scientific question: what are the common recurring sequences and timings of CNVs over the progression from healthy breast tissue to breast cancer? We further suppose that we have gathered data to address that question: whole-genome DNA sequencing at 50× coverage from 200 single cells from a tumour and matched normal control. We then imagine that we built a phylogeny using an off-the-shelf neighbour joining phylogeny program152, as was done in several prominent studies71,76,80,97,145,158. This is a qualitatively similar plan to the pioneering work of Navin et al.71,157. How can we evaluate, and perhaps improve upon, this initial plan?

Is our model consistent with our data?

If we carry out the study described above, it will yield a phylogenetic tree, and we can expect that tree to be qualitatively similar to those of Navin et al.71 and Wang et al.156: an early split of clones into ploidy classes (diploid, pseudodiploid, and tetraploid) followed by later separation by more fine-scale CNVs. This may indeed be the true evolutionary history of the tumour, but we also need to consider that it may be an artefact of the approach. In fact, the study as described will yield this outcome regardless of the actual evolutionary history of the tumour for reasons implicit in the model of evolution that our strategy assumed.

The described approach uses a phylogeny model designed primarily for SNV data; such a model assumes that evolution occurs by mutations independently accumulating one at a time, with roughly equal rates. This happens to be a reasonable simplification for species evolution82, and probably even for tumour evolution — provided that we are tracking evolution in which SNVs accumulate largely without selection8,4648. However, it is a questionable model for CNVs, as CNVs violate the model assumption that changes in distinct variant regions accumulate independently; instead, CNVs accumulate at multiple scales, from localized gene-scale variants to variation at the scale of large chromosome segments, whole chromosomes or even whole-genome ploidy17,74,159. The mismatch between model and data can lead to discrepancies between evolutionary distance measures. For example, a 3 billion bp change induced by a whole-genome duplication will yield the same estimated evolutionary distance as 300,000 independent 10 kb changes, even though genome duplications are common events21,22,147,160 and far more likely to occur than 300,000 independent 10 kb changes. That discrepancy will lead to large-scale changes being misinterpreted as being older than they actually are relative to localized changes, which could radically skew our trees.

If we recognize this issue, it would be logical to propose that we fix the model. There are models for representing the more complex nature of evolution by CNVs versus evolution by SNVs51, and there are some custom-designed phylogeny tools for specific variants of CNV evolution43,69,84,85,151,152. A Bayesian probabilistic model, as has been used in many tumour phylogeny approaches47,79,86,135, can handle arbitrarily complex evolutionary scenarios and is well suited to learning the complicated lineage-specific rate parameters one would need for such a model78,101,152, given the diversity of CNV hypermutability mechanisms any given tumour might exhibit20,25,94,161. Let us suppose, then, that we replace the Euclidean distance model with a Bayesian probabilistic model that captures the multi-scale nature of CNV evolution, thus bringing our model in line with our data. Are we now finished?

Aligning algorithm and model

Unfortunately, the change to a Bayesian model is insufficient because we cannot change the model without also changing the algorithm. In principle, one could use neighbour joining with a more nuanced probabilistic model of evolutionary distances. However, a distance-based method such as neighbour joining will work poorly if we lack large numbers of mutations of each type to average out uncertainty over the mutation frequencies and relative orders, and will therefore be likely to fail for important but rarer CNVs, such as ploidy changes.

Recognizing that problem, we can adopt a more appropriate algorithm for a probabilistic model, such as the MCMC approach of the BitPhylogeny program86. Although other classes of algorithm can be used with such probabilistic models108,112,113,143,162,163, MCMC sampling is the standard for accurately fitting a complicated probabilistic model for which we do not yet have a specialized body of theory87. Let us suppose, then, that we have replaced the neighbour joining algorithm with an MCMC Bayesian sampler, making our algorithm consistent with our model. Having synchronized algorithm to model, and model to data, are we finished?

Aligning model and data

Unfortunately, the algorithm change is insufficient, because in synchronizing our algorithm to our model, we selected an algorithm that is not appropriate to our data. Algorithms, like models, carry assumptions and limitations. One limitation of MCMC is computational cost82: the number of steps for which one must run an MCMC algorithm to get accurate results generally grows exponentially with the number of species (or cells) in the data. This limitation is perilous to the novice user because an MCMC algorithm can still generate a tree as an output, even if it has not run for long enough to identify the right tree. MCMC phylogeny algorithms were therefore traditionally used only for the order of 10–20 species, although somewhat larger numbers are now possible85. State-of-the-art Bayesian methods in tumour phylogenetics are commonly accelerated with a technique called approximate Bayesian computation (ABC)164, as used in two recent investigations47,79; with this technique, one accelerates sampling by collapsing sets of solutions that appear to be similar by one or more summary statistics. However, the curse of exponential blow-up (in which the number of possible trees an algorithm must consider grows exponentially with the number of ‘species’ they contain) is that better algorithms will allow us only a few more species, not the order-of-magnitude increase we need to handle 200 cells. In short, the algorithm we have chosen is probably unsuitable for our single-cell data.

A logical next step, then, would be to ask whether we might use a different kind of data more appropriate to our approach. There are other marker types that we could consider — such as SNVs87,154, expression123, methylation73, or microsatellites165. However, as we are interested in evolution by CNVs, we must keep the marker type unchanged and instead change only the study design. We might propose to use a regional bulk method, replacing our 200 single cells with bulk sequencing of 10 regions from each of 20 tumours. Each of the twenty resulting trees is on a scale an MCMC sampler can handle. Similar regional MCMC strategies for regional bulk sequencing have yielded important insights into tumour evolution in prior studies47,59,79 and have been used successfully for CNV data86,129. We will then have a model appropriate to our data, an algorithm appropriate to our model, and a data set appropriate to our algorithm, harmonizing the three components of our method. Are we now finished?

Aligning method and questions

We are unfortunately still not finished because by changing the data collection strategy to smaller sets of ‘species’ per tree, we have ended up with data sets that are too small to resolve the fine-scale trajectories of CNV evolution. Most solid tumours have chromosome replication defects that lead to rapid accumulation of CNVs20, and progression can happen via clones that are minor or rare in the earlier tumour stages11,41,50,53,54,166,167 and that may lay dormant through much of the clinical progression168. We can expect that there are too many CNVs among ten tumour regions to have hope of resolving the orders or timings of CNV events157. Our study design might be fine for other questions about CNV evolution, but not for the question we are asking. We still have not managed to find a model, algorithm, and data source that are consistent with one another and with the question we are asking.

FIGURE 2 provides a simplified overview of the pitfalls in this process, as we seek to infer a true tree (FIG. 2a) but struggle with erroneous inferences induced by a mismatch between the evolutionary model and data type (FIG. 2b), between the algorithm and the model (FIG. 2c), and between the data type and the research question (FIG. 2d).

Figure 2. Some challenges in synchronizing data, models and algorithms when applying tumour phylogenetics to a scientific question.

Figure 2

An illustration of a hypothetical scenario described in the main text, in which we seek to infer a phylogenetic history of copy number variant (CNV) events in the progression of a single tumour. Each tree shows the potential evolution of genomic copy number profiles for a set of observed clones (blue lines) and computationally inferred intermediate states (red lines) for a single tumour. a | The hypothetical ‘true’ tree describing the evolution of a set of clones from a diploid root via a series of CNVs: gain or loss of copy number in a localized region, as well as whole-genome duplication, leading to a doubled copy number genome-wide. b | Incorrect inference due to the use of a model designed for single-base changes, leading to a substantially incorrect phylogeny involving various biologically ‘impossible’ evolutionary events, such as partial (non-integer) gain, loss, or whole-genome duplications, leading to fractional copy numbers. c | Improved but still inaccurate inference after correcting to an evolutionary model cognizant of the type of variation occurring with CNVs; this eliminates impossible events and leads to a more accurate tree topology, but still fails to identify the correct tree because the analysis is using an algorithm that identifies biologically plausible but still sub-optimal phylogenies for this kind of evolutionary model. d | Still inaccurate inference after changing to a more sophisticated model and algorithm that are well suited to CNV evolution but make it impractical to use single-cell sequence data; this forces a change to a bulk genomic data type, leading to inadequate sampling of extant clones to capture the rapid mutation process typical of CNV-driven evolution and observed in the true tree.

That does not mean that we are out of options. We could try a wholly different approach, perhaps reverting to our original scSeq study design but using a parsimony model with a faster algorithm that might be better able to handle the scale of data. That is essentially what was done in a recent scSeq study of breast cancer157. TABLES 1,2 give examples of available methods and prior studies that may be helpful for finding an existing tool or strategy that has been successfully applied to similar questions. We could try to bring in more exotic algorithms (for example, integer linear programming (ILP)120,131,144,169) or completely different marker types (for example, FISH69,84,151,152). However, we should be aware that we might run through every existing option for a model, an algorithm, and the data type and still fail to find a combination that is mutually consistent and appropriate to our questions. What, then, is the cancer researcher to do?

A final important point is that posing a computational problem is not the same thing as solving it, even if we have perfect data and a perfect model of the relevant biological mechanisms. Many reasonable phylogenetics problems are classified as ‘intractable’ (REF. 170), which informally means that for even moderately large data sets, we may not have any method for finding a good tree efficiently. Often, we will need to develop new computational theory to find an adequate explanation of the data within the models of evolution that we believe describe them. The answer to the question above — what, then, is the cancer researcher to do? — is often to recognize that there is no standard off-the-shelf technique available for many important questions and that developing one is a research problem which will require time and significant expertise in computational biology. BOX 1 provides a few examples of important unsolved methodological problems in tumour phylogenetics.

Box 1 | Outstanding problems in tumour phylogenetics.

Novel or heterogeneous data sources

There are many tumour phylogeny methods for common forms of DNA variation (such as single nucleotide variants (SNVs) and copy number variants (CNVs)), some methods for other genomic data (for example, karyotypes, expression data or methylation data), and a few methods integrating multiple forms of data29,73. The field is just beginning to make sense of other sources of information, such as spatial distributions of cell populations70,79,83, and to make productive use of heterogeneous data10,134.

Comprehensive evolutionary models

We are currently lacking the quantitative models that are required for phylogenetic inference of complex evolutionary events, particularly for recently discovered mechanisms such as chromothripsis or kataegis. However, models are also lacking even for SNVs, which exhibit nuanced combinations of mutational signatures26,27 and selective pressures8 that vary by tumour type, patient, and time9,21,48,146,175 and may require extensive sampling to estimate46.

Tumour-specific phylogeny algorithms

Most tumour phylogeny work to date has relied on a handful of conventional phylogeny algorithms (neighbour joining, maximum parsimony, maximum likelihood or Markov chain Monte Carlo), and it remains unclear which, if any, are sufficiently accurate for tumour evolution for any given data type and tumour type. Phylogenies including more exotic tumour-specific mechanisms will require new classes of algorithms, which is a largely unexplored topic.

Beyond ‘species’ trees

Models drawn from species trees themselves may be inadequate descriptions of clonal evolution of tumours10 for such reasons as cooperation between clones, seeding of metastases by multiple clones, reseeding of primary tumours8, or co-evolution with the microenvironment176. More specialized tree models, generalizations to non-tree-based evolution, and methods informed by more sophisticated population genetics and ecological models are just beginning to emerge47.

Statistical analysis, study design and reproducibility

Few studies examine enough subjects to draw statistically sound conclusions in the presence of extensive inter-tumour heterogeneity14, particularly for single-cell sequencing studies, which usually involve at most a few tumours150. Questions that depend on finding reproducible features across many tumours — for example, whether tumour evolution is linear or branched, or whether it branches early or late — have largely been addressed anecdotally rather than by adequately powered analyses. There are currently no accepted methods to judge whether a phylogenetic tree provides a well-supported fit to a single tumour86,89. The field has barely explored the problem of how to plan a study to ensure that informative and robust phylogenetic tree building will even be possible.

Conclusions and discussion

The use of phylogenetic techniques in cancer research is growing, as is evidenced by the large body of work completed in the past 3 years and referred to herein. Studies of cancer phylogenetics have advanced far beyond the theoretical evolutionary model of Nowell2 to reveal the enormous complexity of the actual processes of tumour evolution14,20,26,171, and to uncover the heterogeneity of those processes both patient to patient101,110,172 and lineage to lineage in a single patient3,8,21,40,73,145. Such studies have revealed mechanisms underlying this heterogeneity21,85,145,173, the dynamics by which these mechanisms themselves evolve over tumour progression47, and possibilities for novel prognostic indicators43,56,57,105. As our knowledge of tumour evolution has expanded, tumour phylogenetics has itself evolved from a new tool for asking old questions, such as distinguishing driver from passenger mutations13,174, to a source of new questions on topics such as how the evolutionary landscapes of tumours are shaped by environmental factors16,21,26,61 and treatment32 and how they can reveal the past and possibly the future of the progression of a tumour2730,47,59 in ways that are tangential to the specific driver mutations dominant in a given tumour at some time. In this Review, we have sought to survey key methods used and results obtained to date and to provide insight into how best to harness phylogenetic tools for new applications. We conclude by considering where tumour phylogenetics might go next.

Most uses of tumour phylogenetics to date have been in retrospective studies; a major opportunity is moving to prospective studies in research clinics. Looking ahead to this opportunity, we consider what happened in clinical cancer research with the advent of gene expression microarrays and NGS. Both technologies were expected to have an impact on cancer diagnosis and treatment, but they have had different outcomes. Gene expression microarrays were shown to have prognostic value in hundreds of research studies, but are not currently widely used in the clinic. By contrast, NGS is being used in the clinic and has led to the phenomenon of ‘tumour boards’ formed by multidisciplinary scientists and clinicians who study mutation profiles, determine which mutations are ‘actionable’ on the basis of approved drugs, and develop ‘precision medicine’ treatment plans. We hypothesize that the difference is primarily due to the type of output that these two technologies produce. NGS generates lists of discrete mutations that can be validated and evaluated individually. By contrast, microarrays yield patterns of expression changes, sometimes called gene expression signatures, that are conceptually similar to the nucleotide variant mutational signatures of cancer genomes26. Similarly, so far the prognostic value from tumour phylogenetics has come from analysing the patterns of e v olut ion43,58, not from analysing discrete events.

Tumour phylogenetics is far from achieving the reproducibility that is needed for clinical work. The field will need to overcome resistance to complex data and dynamic analysis, and must develop principled, robust methods of analysis implemented in software that is used in many laboratories. Tumour phylogenetics is itself evolving, but not as quickly as biotechnologies to measure aberrations in tumours. In the future, new phylogeny methods will need to be tested on more data sets and compared head to head. The head-to-head comparisons will be very complicated if new methods address one of the key unsolved problems in the field: the integration of different data types beyond CNVs and SNVs (BOX 1). The analysis methods that are deemed most fit will be selected by more research groups and gain wider usage. After some specific methods are used in hundreds of studies, we hope that the results will be sufficiently robust and interpretable to aid in patient prognosis and treatment planning. Similarly, clinicians interpreting phylogenetic analyses should insist that different methods be tried and that results are actionable only when different methods of analysis lead to the same qualitative understanding of a patient’s tumour68,106.

Finally, one cannot comprehensively discuss the future prospects of tumour phylogenetics without considering the education of cancer researchers, or biomedical researchers in general. We have provided guidance on how someone new to tumour phylogenetics might evaluate and carry out research in this domain, but such basic principles can only go so far. As we have shown here, effective use of even well-developed tools requires some understanding of their mathematical and algorithmic underpinnings. Finding or developing the appropriate phylogeny tools for a given application will often involve difficult problems of model selection (as reported by Yuan et al.86) and algorithm design that lie far beyond what we can discuss here. Furthermore, by the time one has identified a study cohort and planned data collection, the questions it is possible to ask with these tools are already constrained. In addition, there are limits to what one can ask at all with the available computational tools. Some important questions require new tool development or theoretical advances before they can be answered. It is crucial to involve computational biologists early in the study design phase, to ensure that it will be possible, in principle, for the study to resolve the questions that motivate it. More specifically, these specialists can determine that analysis tools appropriate to the data, evolutionary models, and questions do currently exist or that there is a plausible path to developing appropriate tools. Even the casual user must be able to recognize these situations. Actually posing and solving new data-driven questions, within the constraints of the limits of biotechnology and human cohorts, are demanding skills that will be needed by the leaders of future research efforts in cancer evolution. Few life scientists today are adequately trained in the fundamentals of computational thinking to handle these questions, and not many computational scientists are adequately trained in the challenges of genomic data and research involving human subjects. If we are to realize the full potential of cancer phylogenetics, we will require a sea change in the training of cancer researchers to inculcate a sophisticated understanding of how to reason about data-driven research. The required changes in education practice are likely to face institutional obstacles, but resolving them is as important to the future progress of cancer research as any purely scientific question considered here.

Supplementary Material

supplementary table 1
supplementary table 2

Acknowledgments

This research was supported in part by the Intramural Research Program of the National Library of Medicine (part of the US National Institutes of Health) and by a grant from the Pennsylvania Department of Health (grant number 4100070287). The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations or conclusions.

Glossary

Selection

An evolutionary process in which one population (or subclone, in the context of cancer) is favoured for growth or survival over another

Cancer progression

A change of cancer from a less serious to a more serious state, typically in a manner recognizable by pathologists

Metastasis

A progression in which cancer cells spread to a location in the body that is physically distant from the primary tumour site

Subclones

Subpopulations of cells in a tumour the cells in each subclone are almost or completely genetically identical for all measured cancer-related variants

Hypermutability

An elevated mitotic mutation rate, relative to that in healthy cells this is often specific to a given mutation type (for example a single nucleotide variant or a copy number variant)

Intra-tumour heterogeneity

Variation in the genomes of different cells in the same tumour

Tumour self-seeding

A process by which descendants of cells that escaped the primary tumour re-enter circulation and return to the primary site

Mathematical model

A formal mathematical abstraction of a physical or biological process, such as a set of evolutionary mechanisms

Footnotes

Competing interests statement

The authors declare competing interests: see Web version for details.

SUPPLEMENTARY INFORMATION

See online article: S1 (table) | S2 (table)

ALL LINKS ARE ACTIVE IN THE ONLINE PDF

References

  • 1.Hanks S, et al. Constitutional aneuploidy and cancer predisposition caused by biallelic mutations in (BUB1B) Nat. Genet. 2004;36:1159–1161. doi: 10.1038/ng1449. [DOI] [PubMed] [Google Scholar]
  • 2.Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194:23–28. doi: 10.1126/science.959840. This is a seminal paper proposing that solid tumours evolve clonally while accumulating mutations from one mitosis to the next via a process of selection of mutant subpopulations from a common progenitor cell. [DOI] [PubMed] [Google Scholar]
  • 3.Polyak K. Is breast tumor progression really linear? Clin. Cancer Res. 2008;14:339–341. doi: 10.1158/1078-0432.CCR-07-2188. [DOI] [PubMed] [Google Scholar]
  • 4.Naxerova K, Jain RK. Using tumour phylogenetics to identify the roots of metastasis in humans. Nat. Rev. Clin. Oncol. 2015;12:258–272. doi: 10.1038/nrclinonc.2014.238. [DOI] [PubMed] [Google Scholar]
  • 5.Foo J, Michor F. Evolution of acquired resistance to anti-cancer therapy. J. Theor. Biol. 2014;355:10–20. doi: 10.1016/j.jtbi.2014.02.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Enriquez-Naxas PM, et al. Exploiting evolutionary principles to prolong tumor control in preclinical models of breast cancer. Sci. Transl Med. 2016;8:327ra24. doi: 10.1126/scitranslmed.aad7842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Merlo LMF, Pepper JW, Ried BJ, Maley CC. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer. 2006;6:924–935. doi: 10.1038/nrc2013. [DOI] [PubMed] [Google Scholar]
  • 8.Burrell RA, Swanton C. Re-evaluating clonal dominance in cancer evolution. Trends Cancer. 2016;2:263–276. doi: 10.1016/j.trecan.2016.04.002. [DOI] [PubMed] [Google Scholar]
  • 9.Cross WC, Graham TA, Wright NA. New paradigms in clonal evolution: punctuated equilibrium in cancer. J. Pathol. 2016;240:126–136. doi: 10.1002/path.4757. [DOI] [PubMed] [Google Scholar]
  • 10.Podlaha O, Riester M, De S, Michor F. Evolution of the cancer genome. Trends Genet. 2012;28:155–163. doi: 10.1016/j.tig.2012.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ding L, Raphael BJ, Chen F, Wendl MC. Advances for studying clonal evolution in cancer. Cancer Lett. 2013;340:212–219. doi: 10.1016/j.canlet.2012.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Altrock PM, Liu LL, Michor F. The mathematics of cancer: integrating quantitative models. Nat. Rev. Cancer. 2015;15:730–745. doi: 10.1038/nrc4029. [DOI] [PubMed] [Google Scholar]
  • 13.Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst. Biol. 2015;64:e1–e25. doi: 10.1093/sysbio/syu081. This is an in-depth review of applications of mathematical models of evolution to many problems in cancer research, including examples of various techniques drawn from phylogenetics, population genetics, stochastic processes, and game theory and related areas. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481:306–313. doi: 10.1038/nature10762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Loeb LA. Mutator phenotype may be required for multistage carcinogenesis. Cancer Res. 1991;51:3075–3079. [PubMed] [Google Scholar]
  • 16.Greenblatt MS, Bennett WP, Hollstein M, Harris CC. Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis. Cancer Res. 1994;54:4855–4878. [PubMed] [Google Scholar]
  • 17.Cahill DP, Kinzler KW, Vogelstein B, Lengauer C. Genetic instability and darwinian selection in tumours. Trends Cell Biol. 1999;9:M57–M60. [PubMed] [Google Scholar]
  • 18.Harris R, Petersen-Mahrt S, Neuberger M. RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol. Cell. 2002;10:1247–1253. doi: 10.1016/s1097-2765(02)00742-6. [DOI] [PubMed] [Google Scholar]
  • 19.Campbell PJ, et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010;467:1109–1113. doi: 10.1038/nature09460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Heng HH, et al. Chromosome instability (CIN): what it is and why it is crucial to cancer evolution. Cancer Metastasis Rev. 2013;32:325–340. doi: 10.1007/s10555-013-9427-7. [DOI] [PubMed] [Google Scholar]
  • 21.de Bruin EC, et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science. 2014;346:251–256. doi: 10.1126/science.1253462. This is a particularly instructive study for critically evaluating the application of phylogenetics to bulk tumour samples, in part because it considers multiple phylogenetic methods and recognizes that some samples yield multiple optimal tree topologies. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gibson WJ, et al. The genomic landscape and evolution of endometrial carcinoma progression and abdominopelvic metastasis. Nat. Genet. 2016;48:848–855. doi: 10.1038/ng.3602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nik-Zainal S, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149:979–993. doi: 10.1016/j.cell.2012.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40. doi: 10.1016/j.cell.2010.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Alexandrov L, et al. Signatures of mutation processes in human cancers. Nature. 2013;500:415–421. doi: 10.1038/nature12477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Alexandrov LB, Stratton MR. Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr. Opin. Genet. Devel. 2014;24:52–60. doi: 10.1016/j.gde.2013.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hong M, et al. Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer. Nat. Commun. 2015;6:6605. doi: 10.1038/ncomms7605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.McGranahan N, et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl Med. 2015;7:283ra54. doi: 10.1126/scitranslmed.aaa1408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Murugaesu N, et al. Tracking the genomic evolution of esophageal adenocarcinoma through neoadjuvant chemotherapy. Cancer Discov. 2015;5:821–831. doi: 10.1158/2159-8290.CD-15-0412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kim YA, Madan S, Przytycka TM. WeSME: uncovering mutual exclusivity of cancer drivers and beyond. Bioinformatics. 2016 doi: 10.1093/bioinformatics/btw242. http://dx.doi.org/10.1093/bioinformatics/btw242. [DOI] [PMC free article] [PubMed]
  • 32.Allan JM, Travis LB. Mechanisms of therapy-related carcinogenesis. Nat. Rev. Cancer. 2005;5:943–955. doi: 10.1038/nrc1749. [DOI] [PubMed] [Google Scholar]
  • 33.Johnson BE, et al. Mutational analysis reveals the origin and therapy-driven evolution of recurrent glioma. Science. 2014;343:189–193. doi: 10.1126/science.1239947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang J, et al. Clonal evolution of glioblastoma under therapy. Nat. Genet. 2016;48:768–776. doi: 10.1038/ng.3590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kostadinov RL, et al. NSAIDs modulate clonal evolution in Barrett’s esophagus. PLoS Genet. 2013;9:e100353. doi: 10.1371/journal.pgen.1003553. This is an important investigation for demonstrating the ability of treatment to shape the pre-cancer evolutionary landscape. It provides evidence of a more than order-of-magnitude decrease in mutation rates for patients with Barrett oesophagus who took non-steroidal anti-inflammatory drugs (NSAIDs) versus those who did not. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–674. doi: 10.1016/j.cell.2011.02.013. [DOI] [PubMed] [Google Scholar]
  • 37.Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim. Biophys. Acta. 2010;1805:105–117. doi: 10.1016/j.bbcan.2009.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Park SY, Gönen M, Kim HJ, Michor F, Polyak K. Cellular and genetic diversity in the progression of in situ human breast carcinomas to an invasive phenotype. J. Clin. Invest. 2010;120:636–644. doi: 10.1172/JCI40724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Anderson K, et al. Genetic variegation of clonal architecture and propagating cells in leukaemia. Nature. 2011;469:356–361. doi: 10.1038/nature09650. [DOI] [PubMed] [Google Scholar]
  • 40.Gerlinger M, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 2012;366:883–892. doi: 10.1056/NEJMoa1113205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Heselmeyer-Haddad K, et al. Single-cell genetic analysis of ductal carcinoma in situ and invasive breast cancer reveals enormous tumor heterogeneity, yet conserved genomic imbalances and gain of MYC during progression. Am. J. Pathol. 2012;181:1807–1822. doi: 10.1016/j.ajpath.2012.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kim TM, et al. Subclonal genomic architectures of primary and metastatic colorectal cancer based on intratumoral genetic heterogeneity. Clin. Cancer Res. 2015;21:4461–4472. doi: 10.1158/1078-0432.CCR-14-2413. [DOI] [PubMed] [Google Scholar]
  • 43.Schwarz RF, et al. Spatial and temporal heterogeneity in high-grade serous ovarian cancer: a phylogenetic analysis. PLoS Med. 2015;12:e1001789. doi: 10.1371/journal.pmed.1001789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Turajlic S, McGranahan N, Swanton C. Inferring mutational timing and reconstructing tumour evolutionary histories. Biochim. Biophys. Acta. 2015;1855:264–275. doi: 10.1016/j.bbcan.2015.03.005. [DOI] [PubMed] [Google Scholar]
  • 45.Hong WS, Shpak M, Townsend JP. Inferring the origin of metastases from cancer phylogenies. Cancer Res. 2015;75:4021–4025. doi: 10.1158/0008-5472.CAN-15-1889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ling S, et al. Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution. Proc. Natl Acad. Sci. USA. 2015;112:E6496–E6505. doi: 10.1073/pnas.1519556112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sottoriva A, et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 2015;47:209–216. doi: 10.1038/ng.3214. This study on the evolution of colorectal cancer illustrates the importance of deep evolutionary theory in interpreting genomic data from tumours. It provides evidence that largely selectively neutral mutations can occur, in contrast to one of the two evolutionary principles of Nowell (reference 2) and others: that cancer evolves by a gradual series of genomic aberrations and that there is strong selection for those aberrations that are more favourable to tumour progression. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 2016;48:238–244. doi: 10.1038/ng.3489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Shi H, et al. Acquired resistance and clonal evolution in melanoma during BRAF inhibitor therapy. Cancer Discov. 2014;4:80–93. doi: 10.1158/2159-8290.CD-13-0642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Yates LR, et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 2015;21:751–759. doi: 10.1038/nm.3886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Andor N, Harness JV, Müller S, Mewes HW, Petritsch C. EXPANDS: expanding ploidy and allele frequency on nested subpopulations. Bioinformatics. 2014;30:50–60. doi: 10.1093/bioinformatics/btt622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Juric D, et al. Convergent loss of PTEN leads to clinical resistance to a PI(3)Kα inhibitor. Nature. 2015;518:240–244. doi: 10.1038/nature13948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Morrissy AS, et al. Divergent clonal selection dominates medulloblastoma at recurrence. Nature. 2016;529:351–357. doi: 10.1038/nature16478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ding L, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510. doi: 10.1038/nature10738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Fisher R, Pusztai L, Swanton C. Cancer heterogeneity: implications for targeted therapeutics. Br. J. Cancer. 2013;108:479–485. doi: 10.1038/bjc.2012.581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Cooke SL, et al. Intra-tumour genetic heterogeneity and poor chemoradiotherapy response in cervical cancer. Br. J. Cancer. 2011;104:361–368. doi: 10.1038/sj.bjc.6605971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Almendro V, et al. Inference of tumor evolution during chemotherapy by computational modeling and in situ analysis of genetic and phenotypic cellular diversity. Cell Rep. 2014;6:514–527. doi: 10.1016/j.celrep.2013.12.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Wangsa D, et al. Phylogenetic analysis of multiple FISH markers in oral tongue squamous cell carcinoma suggests that a diverse distribution of copy number changes is associated with poor prognosis. Int. J. Cancer. 2016;138:98–109. doi: 10.1002/ijc.29691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Sottoriva A, et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl Acad. Sci. USA. 2013;110:4009–4014. doi: 10.1073/pnas.1219747110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.McGranahan N, et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science. 2016;351:1463–1469. doi: 10.1126/science.aaf1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zhang J, et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science. 2014;346:256–259. doi: 10.1126/science.1256930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Comen E, Norton L, Massagué J. Clinical implications of cancer self-seeding. Nat. Rev. Clin. Oncol. 2011;8:369–377. doi: 10.1038/nrclinonc.2011.64. [DOI] [PubMed] [Google Scholar]
  • 63.Marusyk A, et al. Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity. Nature. 2014;514:54–58. doi: 10.1038/nature13556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Sanborn JZ, et al. Phylogenetic analyses of melanoma reveal complex patterns of metastatic dissemination. Proc. Natl Acad. Sci. USA. 2015;112:10995–11000. doi: 10.1073/pnas.1508074112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Tsao J, et al. Tracing cell fates in human colorectal tumors from somatic microsatellite mutations: evidence of adenomas with stem cell architecture. Am. J. Pathol. 1998;153:1189–1200. doi: 10.1016/S0002-9440(10)65663-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Desper R, et al. Inferring tree models of oncogenesis from comparative genomic hybridization data. J. Comput. Biol. 1999;6:37–51. doi: 10.1089/cmb.1999.6.37. This is the first report to suggest that there might be difficulties with modelling tumour progression as a tree construction problem in phylogenetics. [DOI] [PubMed] [Google Scholar]
  • 67.Papaemmanuil E, et al. Clinical and biological implications of driver mutations in myelodysplastic syndromes. Blood. 2013;122:3616–3627. doi: 10.1182/blood-2013-08-518886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Zhao Z, et al. Early and multiple origins of metastatic lineages within primary tumors. Proc. Natl Acad. Sci. USA. 2016;113:2140–2145. doi: 10.1073/pnas.1525677113. This study is an important advance over prior bulk sequencing studies for at least three reasons: it compares different phylogenetic methods and draws inferences only when the methods agree on the tree topology; it combines the SNVs in a manner that does not require the use of variant allele frequencies to infer subclones; and it provides clear evidence that some metastases branch early and in parallel, whereas others have a single late origin, reconciling contradictory conclusions reached by earlier studies. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Pennington G, Smith CA, Shackney S, Schwartz R. Reconstructing tumor phylogenies from heterogeneous single-cell data. J. Bioinform. Comput. Biol. 2007;5:407–427. doi: 10.1142/s021972000700259x. [DOI] [PubMed] [Google Scholar]
  • 70.Bashashati A, et al. Distinct evolutionary trajectories of primary high-grade serous ovarian cancers revealed through spatial mutational profiling. J. Pathol. 2013;231:21–34. doi: 10.1002/path.4230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Navin N, et al. Tumour evolution inferred by single-cell sequencing. Nature. 2011;472:90–94. doi: 10.1038/nature09807. This is the seminal paper in developing and demonstrating the biotechnology to perform scSeq in tumours and apply it to phylogenetic inferences of single tumours. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Klein CA. Selection and adaptation during metastatic cancer progression. Nature. 2013;501:365–372. doi: 10.1038/nature12628. [DOI] [PubMed] [Google Scholar]
  • 73.Brocks D, et al. Intratumor DNA methylation heterogeneity reflects clonal evolution in aggressive prostate cancer. Cell Rep. 2014;8:798–806. doi: 10.1016/j.celrep.2014.06.053. This bulk tumour phylogeny study is of interest because the authors combine CNV data and DNA methylation data, showing a high correlation of inferred inter-sample evolutionary distances between inferences derived from genetic and from epigenetic data. [DOI] [PubMed] [Google Scholar]
  • 74.Hastings P, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 2009;10:551–564. doi: 10.1038/nrg2593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Attolini CSO, Michor F. Evolutionary theory of cancer. Ann. NY Acad. Sci. 2009;1168:23–51. doi: 10.1111/j.1749-6632.2009.04880.x. [DOI] [PubMed] [Google Scholar]
  • 76.Frumkin D, et al. Cell lineage analysis of a mouse tumor. Cancer Res. 2008;68:5924–5931. doi: 10.1158/0008-5472.CAN-07-6216. [DOI] [PubMed] [Google Scholar]
  • 77.Salk JJ, Horwitz MS, Risques RA. Passenger mutations as a marker of clonal cell lineages in emerging neoplasia. Semin. Cancer Biol. 2010;20:294–303. doi: 10.1016/j.semcancer.2010.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Shlush LI, et al. Cell lineage analysis of acute leukemia relapse uncovers the role of replication-rate heterogeneity and microsatellite instability. Blood. 2012;120:603–612. doi: 10.1182/blood-2011-10-388629. [DOI] [PubMed] [Google Scholar]
  • 79.Sottoriva A, Spiteri I, Shibata D, Curtis C, Tavaré S. Single-molecule genomic data delineate patient-specific tumor profiles and cancer stem cell organization. Cancer Res. 2013;73:41–49. doi: 10.1158/0008-5472.CAN-12-2273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Xu X, et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell. 2012;148:886–895. doi: 10.1016/j.cell.2012.02.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP. Bayesian inference of phylogeny and its impact on evolutionary biology. Science. 2001;294:2310–2314. doi: 10.1126/science.1065889. [DOI] [PubMed] [Google Scholar]
  • 82.Felsenstein J. Inferring Phylogenies. Sinauer Associates, Inc.; 2004. [Google Scholar]
  • 83.Boutros PC, et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat. Genet. 2015;47:736–745. doi: 10.1038/ng.3315. [DOI] [PubMed] [Google Scholar]
  • 84.Chowdhury SA, et al. Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations. Bioinformatics. 2013;29:i189–i198. doi: 10.1093/bioinformatics/btt205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Schwarz RF, et al. Phylogenetic quantification of intra-tumour heterogeneity. PLoS Comput. Biol. 2014;10:e1003535. doi: 10.1371/journal.pcbi.1003535. This work provides an important example of a robust cross-platform computational tool for tumour-specific phylogenetic inference, MEDICC, which carries out phylogenetic analysis of multiple samples from a tumour by quantifying intra-tumour heterogeneity while taking into account dependencies between genomic changes. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Yuan K, Sakoparnig T, Markowetz F, Beerenwinkel N. BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol. 2015;16:36. doi: 10.1186/s13059-015-0592-6. This study represents an exciting advance in the development and implementation of tumour phylogeny methods for third-party use, developing a full Bayesian model that can be applied to both bulk sequencing data and single-cell data. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Jahn K, Kuipers J, Beerenwinkel N. Tree inference for single-cell data. Genome Biol. 2016;17:96. doi: 10.1186/s13059-016-0936-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Nicoloau M, Levine AJ, Carlsson G. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl Acad. Sci. USA. 2011;108:7265–7270. doi: 10.1073/pnas.1102826108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Kim KI, Simon R. Using single cell sequencing data to model the evolutionary history of a tumor. BMC Bioinformatics. 2014;15:27. doi: 10.1186/1471-2105-15-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Misra N, Szczurek E, Vingron M. Inferring the paths of somatic evolution in cancer. Bioinformatics. 2014;30:2456–2463. doi: 10.1093/bioinformatics/btu319. [DOI] [PubMed] [Google Scholar]
  • 91.Lecca P, Casiraghi N, Demichelis F. Defining order and timing of mutations during cancer progression: the TO-DAG probabilistic graphical model. Front. Genet. 2015;6:309. doi: 10.3389/fgene.2015.00309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Urbschat S, et al. Clonal cytogenetic progression within intratumorally heterogeneous meningiomas predicts tumor recurrence. Int. J. Oncol. 2011;39:1601–1608. doi: 10.3892/ijo.2011.1199. [DOI] [PubMed] [Google Scholar]
  • 93.Fearon E, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990;61:759–767. doi: 10.1016/0092-8674(90)90186-i. [DOI] [PubMed] [Google Scholar]
  • 94.Tomlinson I, Bodmer W. Selection, the mutation rate and cancer: ensuring that the tail does not wag the dog. Nat. Med. 1999;5:11–12. doi: 10.1038/4687. [DOI] [PubMed] [Google Scholar]
  • 95.Nowak MA, et al. The role of chromosomal instability in tumor initiation. Proc. Natl Acad. Sci. USA. 2002;99:16226–16231. doi: 10.1073/pnas.202617399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Khalique L, et al. The clonal evolution of metastases from primary serous epithelial ovarian cancers. Int. J. Cancer. 2009;124:1579–1586. doi: 10.1002/ijc.24148. [DOI] [PubMed] [Google Scholar]
  • 97.Hou Y, et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell. 2012;148:873–885. doi: 10.1016/j.cell.2012.02.028. [DOI] [PubMed] [Google Scholar]
  • 98.Brastianos PK, et al. Genomic characterization of brain metastases reveals branched evolution and potential therapeutic targets. Cancer Discov. 2015;5:1164–1177. doi: 10.1158/2159-8290.CD-15-0369. This is one of the most interesting and instructive bulk tumour phylogeny studies to date because the study design was to investigate the evolution of metastasis across many tumour types, and the sample size (86) is among the largest for the bulk tumour studies carried out thus far. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Joung JG, et al. Nonlinear tumor evolution from dysplastic nodules to hepatocellular carcinoma. Oncotarget. 2016 doi: 10.18632/oncotarget.10502. http://dx.doi.org/10.18632/oncotarget.10502. [DOI] [PMC free article] [PubMed]
  • 100.Paracchini L, et al. Regional and temporal heterogeneity of epithelial ovarian cancer tumor biopsies: implications for therapeutic strategies. Oncotarget. 2016 doi: 10.18632/oncotarget.10505. http://dx.doi.org/10.18632/oncotarget.10505. [DOI] [PMC free article] [PubMed]
  • 101.Bolli N, et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat. Commun. 2014;5:2997. doi: 10.1038/ncomms3997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Heselmeyer-Haddad K, et al. Single-cell genetic analysis reveals insights into clonal development of prostate cancers and indicates loss of PTEN as a marker of poor prognosis. Am. J. Pathol. 2014;184:2671–2686. doi: 10.1016/j.ajpath.2014.06.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Janocko LE, et al. Distinctive patterns of Her-2/neu, c-myc, and cyclin D1 gene amplification by fluorescence in situ hybridization in primary breast cancers. Cytometry. 2001;46:136–149. doi: 10.1002/cyto.1098. [DOI] [PubMed] [Google Scholar]
  • 104.Maley CC, et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nat. Genet. 2006;38:468–473. doi: 10.1038/ng1768. [DOI] [PubMed] [Google Scholar]
  • 105.Andor N, et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 2016;22:105–113. doi: 10.1038/nm.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Naxerova K, et al. Hypermutable DNA chronicles the evolution of human colon cancer. Proc. Natl Acad. Sci. USA. 2014;111:E1889–E1898. doi: 10.1073/pnas.1400179111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Tsao J, et al. Colorectal adenoma and cancer divergence: evidence of multilineage progression. Am. J. Pathol. 1999;154:815–1824. doi: 10.1016/S0002-9440(10)65437-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.von Heydebreck A, Gunawan B, Füzesi L. Maximum likelihood estimation of oncogenetic tree models. Biostatistics. 2004;5:545–556. doi: 10.1093/biostatistics/kxh007. [DOI] [PubMed] [Google Scholar]
  • 109.Bilke S, et al. Inferring a tumor progression model for neuroblastoma from genomic data. J. Clin. Oncol. 2005;23:7322–7331. doi: 10.1200/JCO.2005.03.2821. [DOI] [PubMed] [Google Scholar]
  • 110.Beerenwinkel N, et al. Learning multiple evolutionary pathways from cross-sectional data. J. Comput. Biol. 2005;12:584–598. doi: 10.1089/cmb.2005.12.584. [DOI] [PubMed] [Google Scholar]
  • 111.Beerenwinkel N, et al. Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics. 2005;21:2106–2107. doi: 10.1093/bioinformatics/bti274. [DOI] [PubMed] [Google Scholar]
  • 112.Hjelm M, Höglund M, Lagergren J. New probabilistic network models and algorithms for oncogenesis. J. Comput. Biol. 2006;13:853–865. doi: 10.1089/cmb.2006.13.853. [DOI] [PubMed] [Google Scholar]
  • 113.Gerstung M, Baudis M, Moch H, Beerenwinkel N. Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics. 2009;25:2809–2815. doi: 10.1093/bioinformatics/btp505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Rahnenführer J, et al. Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics. 2005;21:2438–2446. doi: 10.1093/bioinformatics/bti312. [DOI] [PubMed] [Google Scholar]
  • 115.Bogojeska J, Alexa A, Altmann A, Lengauer T, Rahnenführer J. Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores. Bioinformatics. 2008;24:2391–2392. doi: 10.1093/bioinformatics/btn410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Bogojeska J, Lengauer T, Rahnenführer J. Stability analysis of mixtures of mutagenetic trees. BMC Bioinformatics. 2008;9:165. doi: 10.1186/1471-2105-9-165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Attolini CS, et al. A mathematical framework to determine the temporal sequence of somatic genetic events in cancer. Proc. Natl Acad. Sci. USA. 2010;107:17604–17609. doi: 10.1073/pnas.1009117107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Cheng Y, et al. A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis. PLoS Comput. Biol. 2012;8:e1002337. doi: 10.1371/journal.pcbi.1002337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Jiao W, Vembu S, Deshwar AG, Stein L, Morris Q. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics. 2014;15:35. doi: 10.1186/1471-2105-15-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Shahrabi Farahani H, Lagergren J. Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS ONE. 2013;8:e65773. doi: 10.1371/journal.pone.0065773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Ramazzotti D, et al. CAPRI: efficient inference of cancer progression models from cross-sectional data. Bioinformatics. 2015;31:3016–3026. doi: 10.1093/bioinformatics/btv296. [DOI] [PubMed] [Google Scholar]
  • 122.Desper R, et al. Distance-based reconstruction of tree models for oncogenesis. J. Comput. Biol. 2000;7:789–803. doi: 10.1089/10665270050514936. [DOI] [PubMed] [Google Scholar]
  • 123.Desper R, Khan J, Schäffer AA. Tumor classification using phylogenetic methods on expression data. J. Theor. Biol. 2004;228:477–496. doi: 10.1016/j.jtbi.2004.02.021. [DOI] [PubMed] [Google Scholar]
  • 124.Riester M, Attolini C, Downey RJ, Singer S, Michor F. A differentiation-based phylogeny of cancer subtypes. PLoS Comput. Biol. 2010;6:e1000777. doi: 10.1371/journal.pcbi.1000777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Liu J, Bandyopadhyay N, Ranka S, Baudis M, Kahveci T. Inferring progression models for CGH data. Bioinformatics. 2009;25:2208–2215. doi: 10.1093/bioinformatics/btp365. [DOI] [PubMed] [Google Scholar]
  • 126.Hainke K, Rahnenführer J, Fried R. Cumulative disease progression models for cross-sectional data: a review and comparison. Biom. J. 2012;54:617–640. doi: 10.1002/bimj.201100186. [DOI] [PubMed] [Google Scholar]
  • 127.Diaz-Uriarte R. Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling. BMC Bioinformatics. 2015;16:41. doi: 10.1186/s12859-015-0466-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Sprouffske K, Pepper JW, Maley CC. Accurate reconstruction of the temporal order of mutations in neoplastic progression. Cancer Prev. Res. 2011;4:1135–1144. doi: 10.1158/1940-6207.CAPR-10-0374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Letouzé E, Allory Y, Bollet MA, Radvanyi F, Guyon F. Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis. Genome Biol. 2010;11:R76. doi: 10.1186/gb-2010-11-7-r76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Hajirasouliha I, Mahmoody A, Raphael BJ. A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics. 2014;30:i78–i86. doi: 10.1093/bioinformatics/btu284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ. Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics. 2015;31:i62–i70. doi: 10.1093/bioinformatics/btv261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Popic V, et al. Fast and scalable inference of multi-sample cancer lineages. Genome Biol. 2015;16:91. doi: 10.1186/s13059-015-0647-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Ha G, et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 2014;24:1881–1893. doi: 10.1101/gr.180281.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Greenman CD, et al. Estimation of rearrangement phylogeny for cancer genomes. Genome Res. 2012;22:346–361. doi: 10.1101/gr.118414.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Deshwar AG, et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 2015;16:35. doi: 10.1186/s13059-015-0602-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.El-Kebir M, Satas G, Oesper L, Raphael BJ. Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Syst. 2016;3:43–53. doi: 10.1016/j.cels.2016.07.004. [DOI] [PubMed] [Google Scholar]
  • 137.Jiang Y, Qiu Y, Minn AJ, Zhang NR. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl Acad. Sci. USA. 2016;113:E5528–E5537. doi: 10.1073/pnas.1522203113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Qiao Y, et al. SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol. 2014;15:443. doi: 10.1186/s13059-014-0443-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Schwartz R, Shackney SE. Applying unmixing to gene expression data for tumor phylogeny inference. BMC Bioinformatics. 2010;11:42. doi: 10.1186/1471-2105-11-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Miller CA, et al. SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 2014;10:e1003665. doi: 10.1371/journal.pcbi.1003665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Roth A, et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods. 2014;11:396–398. doi: 10.1038/nmeth.2883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Zare H, et al. Inferring clonal composition from multiple sections of a breast cancer. PLoS Comput. Biol. 2014;10:e1003703. doi: 10.1371/journal.pcbi.1003703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Niknafs N, Beleva-Guthrie V, Naiman DQ, Karchin R. Subclonal hierarchy inference from somatic mutations: automatic reconstruction of cancer evolutionary trees from multi-region next generation sequencing. PLoS Comput. Biol. 2015;11:e1004416. doi: 10.1371/journal.pcbi.1004416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Malikic S, McPherson AA, Donmez N, Sahinalp CS. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics. 2015;31:1349–1356. doi: 10.1093/bioinformatics/btv003. [DOI] [PubMed] [Google Scholar]
  • 145.Navin N, et al. Inferring tumor progression from genomic heterogeneity. Genome Res. 2010;20:68–80. doi: 10.1101/gr.099622.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Newburger DE, et al. Genome evolution during progression to breast cancer. Genome Res. 2013;23:1097–1106. doi: 10.1101/gr.151670.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Stachler MD, et al. Paired exome analysis of Barrett’s esophagus and adenocarcinoma. Nat. Genet. 2015;47:1047–1055. doi: 10.1038/ng.3343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Gundem G, et al. The evolutionary history of lethal metastatic prostate cancer. Nature. 2015;520:353–357. doi: 10.1038/nature14347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Gertz EM, et al. FISHtrees 3.0: tumor phylogenetics using a ploidy probe. PLoS ONE. 2016;11:e0158569. doi: 10.1371/journal.pone.0158569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Spiro A, Shapiro E. Accuracy of answers to cell lineage questions depends on single-cell genomics data quality and quantity. PLoS Comput. Biol. 2016;12:e1004963. doi: 10.1371/journal.pcbi.1004983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Chowdhury SA, et al. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLoS Comput. Biol. 2014;10:e1003740. doi: 10.1371/journal.pcbi.1003740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Chowdhury SA, et al. Inferring models of multiscale copy number evolution for single-tumor phylogenetics. Bioinformatics. 2015;31:i258–i267. doi: 10.1093/bioinformatics/btv233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Zhou J, Lin Y, Rajan V, Hoskins W, Tang J. In: Proc. 15th Int. Workshop on Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science. Pop M, Touzet H, editors. Vol. 9289. Springer; 2015. pp. 108–120. [Google Scholar]
  • 154.Ross EM, Markowetz F. OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol. 2016;17:69. doi: 10.1186/s13059-016-0929-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
  • 156.Wang Y, et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature. 2014;512:155–160. doi: 10.1038/nature13600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Gao R, et al. Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat. Genet. 2016;48:1119–1130. doi: 10.1038/ng.3641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Tao Y, et al. Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data. Proc. Natl Acad. Sci. USA. 2011;108:12042–12047. doi: 10.1073/pnas.1108715108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human cancers. Nature. 1998;396:643–649. doi: 10.1038/25292. [DOI] [PubMed] [Google Scholar]
  • 160.Dewhurst SM, et al. Tolerance of whole-genome doubling propagates chromosomal instability and accelerates cancer genome evolution. Cancer Discov. 2014;4:175–185. doi: 10.1158/2159-8290.CD-13-0285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161.Aguilera A, Gómez-González B. Genome instability: a mechanistic view of its causes and consequences. Nat. Rev. Genet. 2008;9:204–217. doi: 10.1038/nrg2268. [DOI] [PubMed] [Google Scholar]
  • 162.Youn A, Simon R. Estimating the order of mutations during tumorigenesis from tumor genome sequencing data. Bioinformatics. 2012;28:1555–1561. doi: 10.1093/bioinformatics/bts168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Purdom E, et al. Methods and challenges in timing chromosomal abnormalities within cancer samples. Bioinformatics. 2013;29:3113–3120. doi: 10.1093/bioinformatics/btt546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 2010;41:379–406. [Google Scholar]
  • 165.Tsao J, et al. Genetic reconstruction of individual colorectal tumor histories. Proc. Natl, Acad. Sci. USA. 2000;97:1236–1241. doi: 10.1073/pnas.97.3.1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Ding L, et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. doi: 10.1038/nature08989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167.Gerlinger M, et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 2014;46:225–233. doi: 10.1038/ng.2891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Eyles J, et al. Tumor cells disseminate early, but immunosurveillance limits metastatic outgrowth, in a mouse model of melanoma. J. Clin. Invest. 2010;120:2030–2039. doi: 10.1172/JCI42002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Catanzaro D, Shackney S, Schäffer AA, Schwartz R. Classifying the progression of ductal carcinoma from single-cell sampled data via integer linear programming: a case study. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016;13:643–655. doi: 10.1109/TCBB.2015.2476808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170.Garey MR, Johnson DS. Computers and Intractability. WH Freeman; New York: 2002. [Google Scholar]
  • 171.Salk JJ, et al. Clonal expansions in ulcerative colitis identify patients with neoplasia. Proc. Natl Acad. Sci. USA. 2009;106:20871–20876. doi: 10.1073/pnas.0909428106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.McGlynn KA, et al. A phylogenetic analysis identifies heterogeneity among hepatocellular carcinomas. Hepatology. 2002;36:1341–1348. doi: 10.1053/jhep.2002.37261. [DOI] [PubMed] [Google Scholar]
  • 173.Oesper L, Mahmoody A, Raphael BJ. THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 2013;14:R80. doi: 10.1186/gb-2013-14-7-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.Greenman C, et al. Patterns of somatic mutations in human cancer genomes. Nature. 2007;446:153–158. doi: 10.1038/nature05610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175.Turajlic S, Swanton C. Metastasis as an evolutionary process. Science. 2016;352:169–175. doi: 10.1126/science.aaf2784. [DOI] [PubMed] [Google Scholar]
  • 176.Martincorea I, et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science. 2015;348:880–886. doi: 10.1126/science.aaa6806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 177.Subramanian A, Shackney S, Schwartz R. Inference of tumor phylogenies from genomic assays on heterogeneous samples. J. Biomed. Biotechnol. 2012;2012:797812. doi: 10.1155/2012/797812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 178.Szabo A, Boucher K. Estimating an oncogenetic tree when false negatives and positives are present. Math. Biosci. 2002;176:219–236. doi: 10.1016/s0025-5564(02)00086-x. [DOI] [PubMed] [Google Scholar]
  • 179.De Sano L, et al. TRONCO: an R package for the inference of cancer progression models from heterogeneous genomic data. Bioinformatics. 2016;32:1911–1913. doi: 10.1093/bioinformatics/btw035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 180.Caravagna G, et al. Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc. Natl Acad. Sci. USA. 2016;113:E4025–E4034. doi: 10.1073/pnas.1520213113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181.Pennington G, Smith CA, Shackney S, Schwartz R. Expectation-maximization method for reconstructing tumor phylogenies from single-cell data. Comput. Syst. Bioinformatics Conf. 2006;2006:371–380. [PubMed] [Google Scholar]
  • 182.Potter NE, et al. Single cell mutational profiling and clonal phylogeny in cancer. Genome Res. 2013;23:2115–2125. doi: 10.1101/gr.159913.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.Cooper CS, et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat. Genet. 2015;47:367–372. doi: 10.1038/ng.3221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184.Cresswell GD, et al. Intra-tumor genetic heterogeneity in Wilms tumor: clonal evolution and clinical implications. EBioMedicine. 2016;9:120–129. doi: 10.1016/j.ebiom.2016.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185.Yang Z, et al. Single-cell sequencing reveals variants in ARID1A, GPRC5A and MLL2 driving self-renewal of human bladder cancer stem cells. Eur. Oncol. 2017;71:8–12. doi: 10.1016/j.eururo.2016.06.025. [DOI] [PubMed] [Google Scholar]
  • 186.Ozawa T, et al. Most human non-GCIMP glioblastoma subtypes evolve from a common proneural-like recursor glioma. Cancer Cell. 2014;26:288–300. doi: 10.1016/j.ccr.2014.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 187.Eskilsson E, et al. EGFRvIII mutations can emerge as late and heterogenous events in glioblastoma development and promote angiogenesis through Src activation. Neuro Oncol. 2016;18:1644–1655. doi: 10.1093/neuonc/now113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 188.Notta F, et al. Evolution of human BCR-ABL1 lymphoblastic leukaemia-initiating cells. Nature. 2011;469:362–367. doi: 10.1038/nature09733. [DOI] [PubMed] [Google Scholar]
  • 189.Campbell PJ, et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl Acad. Sci. USA. 2008;105:13081–13086. doi: 10.1073/pnas.0801523105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 190.Lamy P, et al. Paired exome analysis reveals clonal evolution and potential therapeutic targets in urothelial carcinoma. Cancer Res. 2016;76:5894–5906. doi: 10.1158/0008-5472.CAN-16-0436. [DOI] [PubMed] [Google Scholar]
  • 191.Eirew P, et al. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature. 2015;518:422–426. doi: 10.1038/nature13952. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary table 1
supplementary table 2

RESOURCES