Abstract
The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumour's subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential.
The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks.
In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.
Keywords: Single-cell sequencing, Cancer evolution, Tumour heterogeneity, Phylogenetics
1. Tumour evolution and heterogeneity
Cancerous cells experience complex and diverse genomic aberrations which may induce characteristic hallmarks [1], [2] and allow tumour progression. The view of a sequence of genetic changes providing a fitness advantage and leading to a clonal expansion of cells inheriting those characteristics was crystallised by Nowell [3], and exemplified for colon cancer [4]. The consequences of an evolutionary model of competing clones in a Darwinian framework are complex and heterogeneous tumours, as were also initially observed [5] and seen as a founder of metastases [6]. Tumour heterogeneity was quickly established and examined (as reviewed in [7]) but the evolutionary view of competing populations of tumour cells came back into focus with the turn of the millennium [8], [9], [10] with the arrival of genome sequencing.
The collection of large amounts of genetic data with next generation sequencing (NGS), spearheaded by the compilation of large public databases by consortia like The Cancer Genome Atlas (TCGA) [11] or the International Cancer Genome Consortium (ICGC) [12], cemented the view of cancer as an dynamic evolutionary process with clones arising, expanding and descendent cells differentiating into further competing subclones [13], [14], [15]. Detailed genomic data have also uncovered the clonal complexity and heterogeneity across many cancer types as recently reviewed [16].
The negative effects of clonal diversity on tumour progression were observed clinically for esophageal adenocarcinoma [17], allowing the use of diversity as a biomarker [18]. This example spurred the examination of the clinical implications of the genetic diversity resulting from tumour heterogeneity [19]. Heterogeneity or diversity is also a cause of drug resistance or relapse [15], [20], [21], [22]. The treatment may target the most common clone, which upon its remission, and the new selective pressures of treatment, may allow smaller subclones to emerge, develop resistance and to progress [23], [24], [25]. Subclones may also cooperate [26], which connects back to the ideas of Heppner [7] which emphasised that subclones belong to a complex tumour ecosystem. The order of mutations can also affect disease progression and response to treatment [27]. The large amounts of genomic data have therefore not only shone light on the complex makeup of tumours, but now highlight how a deeper understanding of their diversity and evolutionary history are needed for more effective and precise cancer therapies [15], [16], [25], [28], [29], [30].
1.1. Decoding heterogeneity and evolutionary histories
Typically, approaches to study heterogeneity and clonal evolution have looked at bulk samples which mix the DNA of thousands or millions of cells before sequencing. The resulting output is an estimate of the frequencies of various variants in each sample. To understand the diversity and subclone structure, one needs to be able to decode the evolutionary history from such bulk data. The problem of moving from variant frequencies to evolutionary histories reduces to one of deconvolving the mutations in the mixture into clones and their phylogenetic relationship. We review methods developed for resolving this problem in Section 2.
As depicted in Fig. 1 there are situations where the frequencies alone cannot distinguish between different histories. This can be improved by taking multiple samples [31], [32] or at different times [33]. The results from bulk data however tend to provide rather low-resolution indications of the evolutionary history and heterogeneity [34], [35] because low-frequency mutations cannot be reliably separated into new clones and tend to be placed together or in existing clones. Again multiple samples can help in improving the resolution.
To arrive at the highest possible resolution of a tumour's history, the sequencing of individual cells has been advocated [35]. All cells in the body and in tumours descend a binary genealogical tree of which the cells themselves are the taxa, as depicted in Fig. 2. Reconstructing the tree then requires no deconvolution. It does though require that mutations, once they arise are preserved from generation to generation and that they may only occur once in the evolutionary tree, also known as the infinite sites assumption. With this assumption and perfect calling of the mutations in each cell, the phylogeny can be reconstructed very efficiently [36]. The challenge with single-cell data though is that the errors in mutation calling can be very large, and unbalanced. In particular when the single copy of a cell's DNA is amplified to allow it to be sequenced, the coverage may be rather uneven so that some genome positions cannot be called and are effectively missing. Due to feedback in the amplification, one allele may happen to predominate at certain genomic positions so that mutations on the other allele do not appear in the sequencing data. Algorithms have therefore been developed to specifically deal with single-cell data which we review in Section 4 after discussing the advances in single-cell sequencing in Section 3. An overview of the sequencing and phylogentic reconstruction processes for both bulk and single-cell samples is presented in Fig. 3.
2. Bulk sequencing phylogeny approaches
Due to the higher prevalence of bulk-sequencing data, most approaches to reconstruct evolutionary histories of individual tumours are based on this data type. Sequencing the admixed cell populations of hundreds of thousands or even millions of cells that compose a bulk sample only reveals the allele frequencies of the individual mutations in the mixture leaving the number of present subclones, their prevalences, their individual mutation profiles and their genealogy undetermined [35]. Phrased in terms of classic phylogeny reconstruction, this is a situation where the number of taxa, their relative population sizes, their individual character states, as well as their phylogenetic relationships needs to be established, while the only information available is the set of characters and an estimate of their relative frequencies across the admixed populations. This constitutes a highly underdetermined problem for which classic approaches to phylogeny reconstruction are not suited. Hence many tools customised to this problem have been developed in the past years.
2.1. Phylogeny reconstruction from SNV data
An overview of software tools for reconstructing tumour evolution based on single-nucleotide variant (SNV) data is given in Table 1. We discuss in the following the shared and distinctive features of the underlying methods.
Table 1.
Software | Year | Reference | Phylogeny | Multiple samples | Inference |
---|---|---|---|---|---|
TrAp | 2013 | [37] | Y | N | Exhaustive search |
Clomial | 2014 | [31] | N | Y | Binomial/EM |
PhyloSub | 2014 | [32] | Y | Y | Tree-structured stick-breaking/MCMC |
PyClone | 2014 | [38] | N | Y | Dirichlet process, beta-binomial/MCMC |
RecBTP | 2014 | [39] | Y | N | Approximation algorithm |
SciClone | 2014 | [40] | N | N | Beta mixture model |
AncesTree | 2015 | [41] | Y | Y | Optimisation/MILP |
CITUP | 2015 | [42] | Y | Y | Optimisation/QIP |
LICHeE | 2015 | [43] | Y | Y | Heuristic |
BayClone | 2015 | [44] | N | Y | Gibbs sampling/Metropolis-Hastings |
CTPsingle | 2016 | [45] | Y | N | Dirichlet process, beta-binomial/MCMC |
Cloe | 2016 | [46] | Y | Y | Metropolis-coupled MCMC |
An important preprocessing step for reconstructing tumour phylogenies from SNV data, is the correction of allele frequencies for ploidy aberrations - due to copy number alterations (CNAs) or loss of heterozygosity (LOH) - to estimate the cellular prevalences of the mutations [38], [47]. In practice many SNV based approaches focus on mutations at copy number neutral sites [39], [40], [41], [42], [45], in which case the cellular prevalence of heterozygous mutations is just two times their relative allele frequency.
A key assumption shared by nearly all approaches focusing on phylogeny reconstruction from SNV data is that of infinite sites which restricts the space of possible mutation histories in two ways: First, no genomic site is hit by more than one mutation throughout the entire evolutionary history of a tumour, and second, once present, a mutation persists in the whole lineage founded by the cell where it initially occurred. The motivation for this assumption is mainly its plausibility given the size of the genome and the relatively low number of mutations observed in tumour samples. However it also has the welcome side-effect of reducing the underdetermination of the deconvolution problem and the tree search space.
The next step common to most SNV based approaches is a clustering of mutations with approximate allele frequencies. Some approaches use Bayesian mixture models for this step [47], [48]. The assumption behind the clustering is that variants with identical frequency are either both present or both absent in every subpopulation. A scenario for such a connection to arise could be a driver mutation occurring in a cell with a pre-existing passenger mutation. Then the increased fitness of the cell with the driver and its descendants may have led to the extinction of all cells carrying only the passenger mutation. For mutation sets with a shared cell prevalence >50% such a connection is the only way they can fit on a single tree. This follows from the infinite sites assumption, which prevents mutations from being split onto separate tree parts, and the pigeon hole principle by which some cell population of the tumour has to have both mutations as the sum of cell prevalences can not exceed 100%. For smaller cell prevalences - especially for low-frequency mutations - it is less obvious why the assumption should be generally true. Two low frequency mutations could have the same approximate cell prevalence by chance without the driver/passenger link described above and could still be erroneously clustered together. It has been shown that the deconvolution problem can be solved without grouping mutations by cellular prevalence [37]. However the complexity of the problem increases significantly with increasing numbers of subclones, and indeed Strino et al. could only solve instances of up to 25 aberrations [37], such that tree inference would in most cases be restricted to a selection of mutations.
Once the clustering is fixed, the remaining task is to arrange the mutations in a tree consistent with the cell prevalences of the mutations. The mutation states of the subclones and their relative frequencies in the sample follow immediately from the consistent tree. Consistency here means that the cellular prevalence of each node is at least as large as the sum of the prevalences of its child nodes. This is necessary as the nodes are then interpreted as subclones that contain all the mutations along the path from the root to this node, such that the prevalence of a mutation at a node has to be shared with the whole subtree below the node. This constraint is also referred to as the ‘sum rule’ [32]. While it substantially restricts the solution space, it is typically not enough to find a unique solution. For example, a linear chain of mutations sorted by decreasing prevalence is always consistent with a single sample. Biologically motivated constraints, such as minimising the number of populated subclones or the tree depth can be used to pick plausible topologies [37], [39].
Here it is also advantageous that studies increasingly analyse multiple samples per patient. These could either be from spatially distinct tumour parts [49], tumour metastasis pairs, or longitudinal studies such as tumour/relapse pairs [20], or xenograft models [50]. When multiple samples of the same tumour are available, there is a second constraint, the ‘fork rule’, which states that if among two mutations, the first is more prevalent in one sample and the second in another sample, they need to be placed in separate branches [32]. In general, the more samples available the more topologies can be excluded, as long as the their subclone composition differs sufficiently. However, in practice this process is complicated by inaccuracies in the estimated cell prevalences and possible errors in the clustering due to which no tree may be consistent with all data. One solution here is to find a tree that minimises the errors in the estimated cell prevalences to fit them to a tree [32], [42], or to exclude some mutations from the tree [41].
While all SNV based reconstruction approaches make use of the combinatoric constraints, they employ vastly different methodologies. Three major lines can be identified: Some perform an exhaustive search enumerating all trees that fulfil the combinatoric constraints plus additional biological restrictions [37] or an approximation thereof [39]. Others represent the constraints via a directed ancestry graph, which contains the optimal solutions in the form of spanning trees [41], [43], and finally there is a group of Bayesian approaches that give a posterior distribution over the tree space, thereby quantifying uncertainty in the inference [32], [45]. Recently another Bayesian approach for tree inference has been proposed that merely penalises trees for violations of the infinite sites assumptions instead of generally excluding them [46].
For high-frequency subclones, tree reconstruction from SNV bulk data has sufficient discriminative power to reveal their evolutionary relationships. However for low-frequency populations, the signal in the admixed variant allele frequencies seems to be too weak for a reliable reconstruction [35]. Also the clustering by allele frequency is less convincing for low-frequency mutations leaving their correct placement in the tree a largely unsolved problem. Advances in the sequencing technology towards longer reads may provide further constraints in the future, as mutations located on a single read can not be placed in different tree branches.
2.2. Phylogeny reconstruction from SNV and CNA data
There exist a few approaches such as THetA [51], THetA2 [52] and TITAN [53] that use CNA data alone to infer subclones, but none of them reconstructs tumour phylogenies. More recently CNA and SNV data have been combined to increase the discriminative power in the reconstruction process. A summary of methods following this strategy and their key features are given in Table 2.
Table 2.
Software | Year | Reference | Phylogeny | Multiple samples | Inference |
---|---|---|---|---|---|
CHAT | 2014 | [54] | N | N | Dirichlet process Gaussian mixture model/MCMC |
CloneHD | 2014 | [55] | N | Y | HMM/local optimisation |
SubcloneSeeker | 2014 | [56] | Y | Y | Exhaustive enumeration |
PhyloWGS | 2015 | [58] | Y | Y | Tree-structured stick-breaking/MCMC |
SCHISM | 2015 | [57] | Y | Y | Likelihood ratio tests/genetic algorithm |
SPRUCE | 2016 | [59] | Y | Y | Exhaustive enumeration |
CANOPY | 2016 | [60] | Y | Y | MCMC |
The methods CHAT [54] and CloneHD [55] estimate cellular prevalences of both SNVs and CNAs but do not set them into a phylogenetic context. SubcloneSeeker infers trees based on cellular prevalences of both SNV and CNA data [56]. However it relies on other tools to accurately estimate these prevalences in a preprocessing step and is restricted to two samples such as tumour/relapse pairs. SCHISM [57] also relies on pre-established cellular prevalences. The inference is then a two-step process: It first uses a hypothesis testing framework to establish subclones and their pairwise relationships and then applies a genetic algorithm to find a matching phylogeny.
PhyloWGS [58] extends the probabilistic framework of PhyloSub [32] to integrate copy number information. It is also the first approach to model overlaps between CNA and SNV data. Estimates of CNA copy number status and population frequencies are required as input which are then used to transform sites affected by a CNA, or by a CNA and SNV, into pseudo-SNV sites to apply the SNV based probabilistic tree inference method of PhyloSub.
All of the tree inference approaches discussed so far make the infinite sites assumption which should be revisited in context of copy number changes. Since these events typically affect larger segments, the likelihood of two of them overlapping is not negligible. Likewise the chance of a mutated allele being lost by a segmental loss is much higher than that of a point mutation reverting it back to its original state. Neither scenario is compatible with the infinite sites model such that it is debatable whether the assumption is still safe to make.
SPRUCE [59] relaxes the assumption to a model where a mutation can change its state multiple times but can not twice attain the same state independently in the tree. This restriction is known as infinite alleles assumption or multi-state perfect phylogeny. While this is a step in the right direction, it still overlooks many plausible scenarios, such as a site undergoing a copy number change that is later reverted.
CANOPY [60] solves the issue of recurrent mutation states in a different way: While it nominally keeps the infinite sites assumption, it restricts the scenarios in which it could be violated to such a small number that the assumption becomes reasonable again. For example a mutation event would only be considered as recurrent when it sets the exact same genomic segment to the exact same copy number state in different parts of the phylogeny. As the endpoints of the segments are defined at the resolution of nucleotide positions, such a recurrence is unlikely to be observed.
In contrast to the other methods discussed so far, CANOPY is also the only one to recognise that copy number alterations are interdependent and should be rather modelled as sequences of events than as independent changes of chromosome segments. This view on genome evolution will become even more useful once tree inference models start to consider structural rearrangements and their potential in confounding read-depth data. Pioneering work in this direction was performed by Greenman et al. [61] and Purdom et al. [62] Neither of these two studies focuses on tree construction, but they estimate the order of genomic rearrangement events. Many of the concepts introduced in these works such as the use of external linkage information, e.g. HapMap data, for phasing, the assignment of copy numbers to one of the physical alleles [61], may be worthwhile to integrate in future approaches to reconstruct mutation histories of tumours from bulk sequencing data. An approach for phasing using only major and minor allele copy number profiles was recently suggested by Schwarz et al. [63]. Besides the phasing, it computes the tree topology and assigns genomes to ancestral states based on the minimum evolution criterion.
3. Single-cell advances
After the arrival of NGS and the accompanying drop in price of obtaining genomic information, efforts to understand tumour diversity were epitomised by the collection and archiving of thousands of tumour samples by TCGA [11] and the ICGC [12]. Efforts were later also underway to understand intra-tumour diversity at full resolution by sequencing individual tumour cells. The technical advances are reviewed for example in [64], [65] and expounded in [66], and here we focus on their use to uncover tumour heterogeneity from a modelling perspective.
3.1. Single-cell sequencing
The first results for single-cell genomics were for mRNA sequencing of a mouse blastomere [67] where the major challenge was to have sensitive enough sequencing for the small amount of primary material. For DNA this involves amplifying the initial single copy enough to be passed on to sequencers. The first successful results [68] used a modified version of PCR for the initial amplification, before further PCR amplification and sequencing. The low resulting coverage (≈ 10%) allowed for the identification of copy number variations, but not high confidence mutation calling. Higher coverage was then quickly achieved through the use of multiple-displacement amplification (MDA) [69], [70], [71], [72] allowing the identification of SNVs.
The MDA process involves the attachment of randomly primed Φ29 enzymes which synthesise DNA to create additional and displaced strands, which may then themselves be further amplified. From a modelling perspective the amplification of the two original alleles is more akin to a Pólya urn model: starting with two balls representing the genomic base on each allele, repeatedly one ball is selected at random, duplicated and returned with the duplicate to the urn. This feedback in the MDA process can also lead to rather non-uniform coverage. Sites with low coverage cannot be reliably used for SNV calling, leading to high levels of missing data in early experiments (≈ 60% in [69]).
To obtain higher uniformity, although at the cost of higher error rates, hybrid amplification methods have also been developed and utilised [73], [74], [75], [76], [77]. Using cells where the DNA had just duplicated [78] reduced the amount of early amplification needed leading to lower error and missing data rates and can be part of the single nucleus exome sequencing (SNES) protocol of [79].
With current techniques, single-cell sequencing (SCS) provides high coverage and low false positive rates, but the largest source of uncertainty comes from allelic dropout (AD) where one strand (or part of it) does not get amplified (or not sufficiently) in the early stages and is not detectable in the final sequencing. Although AD, which leads to false negatives, has fallen from highs of 40% or more [69], currently they are in the range of 10–20%. False negatives therefore remain a very important component for any modelling of SCS data.
Although the false positive error rates are low (≲ 10 −5) many base positions can be tested across the whole exome or genome so that the total number of falsely detected SNVs may still be in the hundreds or thousands per cell. For cells from the same tumour sample, a simple consensus of SNVs across two or more cells reduces the error rates back to low values, which is fortuitous from a modelling perspective because mutations observed in only one cell are also uninformative for reconstructing the evolutionary history of the tumour. Since SNVs are selected for analysis when they are detected, the false positive rate among them may be enriched compared to the per base pair error rate of the SCS technique.
An exciting alternative to whole exome sequencing (WES), or whole genome sequencing, of each single cell to reduce the cost while offering low error rates was to first perform deep bulk sequencing and to liberally select sites which may possess a mutation. A personalised panel was then developed for 6 leukaemia patients to use for the final sequencing and mutation calling [80]. The preselection of sites to test reduces the enrichment of false positives, but AD and other false negatives still occur during the amplification. A further alternative to amplifying the DNA of single cells is to culture individual cells (as done for organoids [81], [82]) before harvesting a large number and performing standard bulk sequencing with the downside that culturing will bias the sample by selecting for viable cells, and may introduce new mutations.
Before individual cells can have their DNA amplified and sequenced, the cells themselves need to be isolated first. One approach has been to collect circulating tumour cells (CTCs) from blood samples which for DNA experiments first had low coverage for CNA calling [83], [84], [85] and later with WES [86]. For primary tumour cells, early experiments focussed on micropipetting [69], [70], [73], [74], [87] or nuclei sorting [68], [78], [88]. Higher throughput experiments, combined with panel sequencing, have turned to microfluidics [80] or FACS [89], [90]. Barcoding methods [91] are also promising to increase the scope of SCS at lower costs. Microwells or drops combined with barcoded beads [92], [93] now allow the parallel RNA sequencing of thousands of cells. A more recent version of barcoding for DNA sequencing [94] offers the possibility to sequence 48–96 cells simultaneously broadening the scope of single cell sequencing experiments. High-throughput protocols also offer the joint RNA and DNA sequencing of single cells [95].
However the individual cells are isolated, a key point in SCS experiments is to verify that the cells are indeed unique. Any doublet samples obviously break the single cell assumption at the heart of methods designed specifically to analyse single-cell data. Some cell isolating techniques may have high rates of doublet sampling in the range of 10–40% [96] which are important to control experimentally and to bear in mind when modelling.
3.2. Single-cell histories
Once the single cells have been sequenced, and the mutations or copy number events uncovered with standard bioinformatics pipelines, one focus is on understanding the evolutionary history of tumours and their diversity. We highlight some of the key datasets, with their characteristics summarised in Table 3, and how the single-cell phylogenetic history informed their analysis.
Table 3.
Cancer type | Year and reference | Number of patients | Number of samples | Number of mutations | Number of cells | False positive rate | Allelic drop out rate | Missing data |
---|---|---|---|---|---|---|---|---|
Myeloproliferative neoplasm | (2012) [69] | 1 | 1 | 712 | 58 | 6.04 ×10 −5 | 0.4309 | 58% |
Kidney | (2012) [70] | 1 | 1 | 35 | 17 | 2.67 ×10 −5 | 0.1643 | 22% |
Bladder | (2012) [71] | 1 | 1 | 443 | 44 | 6.7 × 10 −5 | 0.4 | 55% |
Colon | (2014) [87] | 1 | 1 | 176 | 63 | <1 ×10 −4 | >0.5 | – |
Breast | (2014) [78] | 2 | 1 | 40/519 | 47/16 | 1.24 ×10 −6 | 0.0973 | 1% |
Leukemia | (2014) [77] | 3 | 1 | ≤ 1953a | 11–12 | – | 0.12 | 28% |
Leukemia | (2014) [80] | 6 | 1 | 10–105 | 96–150 | – | ≤ 0.3 | – |
Breast (and xenografts) | (2015) [50] | 2 | 2/3 | 37/45b | 120/90 | – | ≈ 0.2 | 7–12% |
Ovarian (intraperitoneal) | (2016) [97] | 3 | 4–5 | 23–33b | 420–672 | – | – | – |
One of the first single-cell datasets comes from a JAK2-negative myeloproliferative neoplasm [69], PCA was employed to uncover a likely monoclonal origin of the tumour. Also they found that the patient specific mutations did not coincide with the commonly implicated genes for that tumour type.
Back-to-back a kidney cancer sample [70] was published and no real evidence of clonal subpopulations was uncovered using neighbour-joining (NJ) [98]. However there was large diversity in mutations suggesting an accumulation of passenger mutations. The cancer cells were also close to the non-tumour controls indicating a short time frame for the cancer's progression.
The first evidence for a branching mutation history in single-cell data was discovered in a bladder cancer [71] using hierarchical clustering. This revealed two main subclones which seemed to be outgrowing the ancestral clone since they appeared late in the tumour development but still made up sizeable proportions of the tumour itself.
Hierarchical clustering was also employed on a colon cancer sample [87] which uncovered a minor clone alongside a much larger main clone. The main clone possessed early mutations in TP53 and APC, which are highly prevalent in colon cancer, but they were missing in the minor clone pointing to it having a distinct origin and separate development.
Advances in SCS technology led to better coverage and lower error rates for two breast cancer samples [78]. Phylogenetic histories were reconstructed with NJ. Since copy number analysis was also performed on the same single cells, they could uncover an early phase of aneuploid rearrangements followed by clonal expansion dominated by point mutations. For one sample they saw a linear progression of clonal expansions, while for the second sample the clones separated into subclones, with one subclone founded by another aneuploidy event. This combination of copy number and SNV calling on the same individual cells highlighted how both sets of information can be combined to improve the understanding of the phylogenetic history.
Single cells were analysed from three leukaemia patients [77]. In particular they compared different SNV callers, opting for joint calling across samples, and specifically sequenced doublets samples to test for their contamination in the single-cell data. To infer the phylogenetic history, they learnt a maximum likelihood tree from the genetic distances between each pair of single cells. The evolution was mostly linear (with major subclones for one patient sample) but also exhibited low frequency heterogeneity and branching.
Since SNV callers (like [99], [100], [101], [102], [103], [104], [105]) are aimed at uncovering variants of different frequencies from bulk sequencing data, they are less applicable to single-cell data where the underlying number of copies of any variant is a (low) integer but the amplification and sequencing is much more noisy. To account particularly for the non-uniform coverage of SCS [106], clustered the reads to correct for errors. More recently a mutation caller designed for single-cell data has been developed [107] which treats the underlying mutation states in a single cell allowing it to outperform bulk SNV callers.
For single cell samples from 6 leukaemia patients (from targeted panel sequencing), [80] looked in the other direction of modifying the phylogenetic reconstruction to account for the particularities of single-cell data. With high dropouts from the MDA step before sequencing the error rates in single-cell data are highly unbalanced. The distance based approaches employed before (whether in constructing a tree, in hierarchical clustering or NJ) implicitly weigh both kinds of errors equally, which can adversely affect the reconstruction. Instead [80] introduced a binomial mixture model to cluster the single-cell genotypes, where the probability of a mutation or its absence varies for each cluster according to the data. Once clustered, the phylogeny can be found as the minimum spanning tree, which for five of the six patient samples featured coexisting high-frequency clones. Often the ancestral clones were also still present in the population. Along with the phylogenies, the clustering highlighted cells sharing mutations from different lineages indicating that they were the result of doublet sampling.
More recently, the clustering in [80] was refined to a variational Bayes approach [108] which could also explicitly model the presence of doublet samples. The clustering however, like in [80], was performed without enforcing a phylogeny.
After performing deep bulk sequencing on primary tumours and derived xenograft lines from 15 patients, and studying their clonal composition and dynamics with PyClone [38], two examples were selected in [50] for high resolution follow up with SCS: one with strong initial selection upon transplantation, and one with complex clonal evolution through the xenograft generations. For the SCS a targeted panel was designed for each example based on mutations detected with the bulk sequencing. For inferring the tree structure of the single cells, the Bayesian phylogenetic approach of [109] was employed. The resulting single-cell phylogenies were mainly used to corroborate the genotype clusters found by PyClone from the bulk sequencing, but with the advantage of also providing the ancestral histories of the clones. For the example with strong initial selection, the single cell data indicated complete separation between the primary tumour and a late xenograft sample and that the xenograft clone was founded by a very minor clone of the original tumour. The other example showed complex clonal evolution with two main lineages. The second lineage expanded heavily during the second xenograft generation to then vanish compared to further generations of the first lineage.
Likewise utilising SCS to enrich bulk sequencing data, the intraperitoneal spread of high-grade ovarian cancer was examined over 68 samples from 7 patients in [97]. For three patients, each with 4 or 5 spatially distinct samples, a total of 1680 single cells were isolated and subjected to targeted sequencing of a small number of genomic sites. The clonal composition of those tumours was inferred from the single cells using the clustering method of [108]. This augmented the bulk clustering analysis by providing higher quality genotypes. From the phylogenetic analysis of the multiple spatial samples for each of the 7 patients, the nature of the clonal spread from the ovaries to the intraperitoneal sites could be uncovered [97]. Particularly striking was that along with the five patients exhibiting monoclonal seeding, two patients exhibited reseeding and polyclonal spread. As well as indicating different possible modes of peritoneal spread, this could also suggest that the different microenvironment of the peritoneal cavity leads to novel selective pressures on heterogeneous tumours.
4. Single-cell phylogenetic reconstruction
Along with approaches to call mutations in single cells [107] and cluster them [80], [108], a different direction has been to modify the phylogenetic inference to account for the specifics of single-cell data.
All cells in a tumour live on a genealogical tree, Fig. 2 (c), where they connect with each other at their common ancestors. If we take the infinite sites assumption that the genome is essentially so long that there is no chance that the same position may mutate more than once in the entire tumour's history (which also means that no mutations are lost once they arise), then the mutations in the cells form a perfect phylogeny [36]. However, fast and straightforward phylogenetic algorithms, like hierarchical clustering, NJ, perfect phylogeny or distance based tree constructions like a minimum spanning tree can struggle or fail completely when presented with noisy data. Extensions of the perfect phylogeny problem exist to handle imperfect data, but typically aim to remove data to remove any inconsistencies. For example they may find the minimum number of mutations to remove [110], [111] or the minimum number of sampled cells [112]. A further difficulty with single-cell data, and where these approaches still struggle, is that the errors are very unbalanced. In single-cell data AD or false negative rates are generally over 10% while false positives are of the order of 10 −5 or less.
To account for this fully, probabilistic approaches have been introduced which select possible phylogenetic trees by how well they explain the single-cell data and which consider the full dataset with all of its inconsistencies and the errors due to the technical challenges of sequencing single cells. In particular the methods start with a given tree which allows one to check which cells should exhibit which mutations. If a cell is supposed to possess a mutation under the tree model, but it is absent in the observed data this would be considered a false negative, with a probability of occurrence given by the false negative rate. Conversely if the tree model predicts no mutations, but one is observed, the model would indicate a false positive. Repeating this for all cells provides the joint probability of observing the data for that particular tree and error rates. This is the likelihood of obtaining the observed data under the tree model and naturally accounts for differences in the error rates. A common approach is to find the tree which maximises the likelihood and fits the data most closely. Alternatively, Bayes theorem may be employed to find the probability of the tree from the data as a measure of fit of the tree to the data. These underlying ideas link the methods developed for single-cell phylogenetic inference [113], [114], [115], [116] although the exact details of the models and their inference vary, as we summarise in Table 4 and now explore in some detail.
Table 4.
Method | Phylogenetic representation | Inference |
---|---|---|
Kim and Simon [113] | Mutation tree | Pairwise ordering and maximum spanning tree |
BitPhylogeny [114] | Clonal tree | Tree-structure stick-breaking MCMC |
OncoNEM [115] | Sample/clonal tree | Greedy structure search |
SCITE [116] | Mutation treea | MCMC |
SCITE [116] provides the option of using the sample tree representation.
Despite the elevated error rates, an advantage of single-cell data is that, assuming diploid cells and the infinite sites assumption, mutations should be present in either none or one or the alleles, rather than at arbitrary frequencies, and these are the only two cases that need to be tested. Of course the presence of mutations across single-cell samples are not independent, but related by the phylogenetic history and in general the challenge is dealing with the vast number of trees that exist and in finding optimal trees, or a good set of them.
The first probabilistic single-cell approach of [113] considered three mutation states for the data of [69]: wildtype, and heterozygous and homozygous variants. Homozygous variants are presumed to be the result of an allelic dropout of the normal allele so that only the alternative is amplified. The likelihood of [113] consisted of the probability of the three observable states given either of the two underlying states and the allelic dropout and false positive rates. For the trees themselves, the representation in terms of mutation trees, Fig. 2 (d), was employed with the aim of uncovering the mutation ordering and evolutionary history. Rather than examining the tree as a whole, first the pairwise ordering of each pair of mutations was considered [113]. In particular the likelihood of the data when the pair of mutations are in the same or different lineages was computed. By simulating genealogical trees [Fig. 2 (c)], Monte Carlo estimates of the prior probability of mutations sharing a lineage were obtained resulting in a posterior estimate of the probability of different relationships between each pair of mutations. In simulating genealogical trees, a parameter was introduced to model the relative time of the first branching event. This parameter, which influences the prior distribution, was inferred from the data (an approach known as empirical Bayes).
In order to build the mutation tree, first estimates for the pairwise ancestral relationships of all mutations were obtained. The maximal posterior ordering between each pair was encoded as an edge in a directed graph, weighted by the posterior probability. The mutation tree is then defined as the maximum spanning tree. Specifically, edges were removed to achieve a tree which maximised the remaining weights. Although this procedure returns a tree, it is not necessarily the tree with the highest likelihood as a whole model since the ancestral relations inferred earlier behave more like parent-child relationships when embedded in the directed graph. For the 18 cancer related mutations in the 58 single cells of [69], for example, the empirical Bayes estimate of the prior tree structure is highly linear while the resulting minimum spanning tree is rather branched.
BitPhylogeny [114] works on the sample tree representation, but rather than using the single cells as leaves they are clustered together into clones. Since the number of clones and their composition is unknown, the number of nodes and branches in the cluster tree is also unknown. BitPhylogeny therefore considers in its search space all trees with an arbitrary number of clones. A prior for the trees is derived from a nested stick-breaking process following [117]. A stick, or unit interval, is chopped into many parts. Each part is then further divided with the same process, and this is repeated at all scales. At each stage the first part denotes a clone which is a child of the clone at the previous stage, providing the tree structure. The stick-breaking process involves parameters which influence the shape and number of clones in the prior distribution. The process has also been applied to bulk data [32], [58] and BitPhylogeny also includes a model for methylation data [114].
Returning to the single-cell treatment, BitPhylogeny employs the Markov chain Monte Carlo (MCMC) inference scheme of [117]. Essentially one component, like the composition of the clones or the division of the stick at a particular stage, is updated while keeping the rest fixed. In the phylogenetic model of [114] the mutations occur along the edges of the clonal tree with same rate. This leads to a transition probability of mutations accumulating across the phylogeny so that the appearance of mutations in descendant clones is treated probabilistically. For the inference of the tree itself these probabilistic appearances are averaged over so that the mutations become marginalised out.
The combining of cells into clones can be seen as a way of correcting for the high error rates of SCS (like [80]) while respecting a phylogeny enforced by the tree framework. The MCMC sampling also provides a posterior distribution of trees and parameters, better representing the uncertainty in the phylogeny than a single maximum likelihood estimate. However the inference scheme is relatively computationally costly which might cause convergence issues for more intricate or larger clonal trees. For the example of the full 712 mutations uncovered in the data of [69], BitPhylogeny [114] finds one large clone consisting of 70% of the cells and some smaller clones that branch off near the root of the tree.
The more recent approaches [115], [116] returned to the full tree model with likelihoods given by the false positives and negatives. From there they take complementary paths: OncoNEM [115] focuses on the sample tree, Fig. 2 (c), by marginalising or averaging over the placement of mutations along the edges; SCITE [116] focuses on the mutation tree, Fig. 2 (d), by averaging over the attachment of sampled cells. The averaging serves to vastly simplify and speed up the tree inference but a complete tree can be obtained from both approaches.
For the phylogenetic inference, both methods utilise a search-and-score framework: OncoNEM with a greedy search and SCITE with a stochastic MCMC scheme. The latter can either provide a single maximum likelihood estimate or a full posterior sample accounting for uncertainty in the inferred trees. After the greedy search in the sample tree space, OncoNEM [115] then attempts to cluster similar cells together into clone in a second step to provide a clone tree like BitPhylogeny [114]. Both of the more recent methods [115], [116] allow error rates to be learnt from the data and significantly outperform previous single-cell approaches and bulk data methods applied to single-cell data.
The different choice of representation between sample and mutation trees as in Fig. 2 (d) is mainly one of interest: if the key question concerns the clonal composition of the tumour then a sample tree is more appropriate, while questions concerning the order and evolutionary history of the mutations are better answered with the mutation trees. The choice is also partly dictated by the nature of the single-cell data. Mutations which occur in only one cell, or in all of them, are not informative for the tree reconstruction (although they may still inform the inferred error rates). If the number of remaining mutations is much larger than the number of cells, then the sample tree representation can be much more computationally efficient. When the number of sampled cells dominates then mutation tree inference is much faster. This occurs for example with the leukaemia datasets of [80] and especially when a targeted panel is utilised as in [50], [97]. SCITE [116] offers the option to change the representation depending on the data.
In reanalysing previous data, both OncoNEM and SCITE were applied to the 58 sequenced cells of [69] with OncoNEM considering the full set of 712 SNVs and SCITE looking at the 18 cancer-related mutations or the set of 78 non-synonymous ones due to the different representations. Both found highly linear or sequential trees suggesting monoclonal evolution and trees with much higher likelihoods than those found previously in [113], [114] with the same data. OncoNEM [115] additionally considered the bladder cancer data set of [71] finding very similar results to the original paper, but refining the clonal composition. SCITE [116] found another highly linear tree for the kidney cancer data of [70], again suggesting monoclonal expansion, but a tree with a long trunk region followed by complex branching lower down for the higher quality ER + breast tumour sample of [78]. This would be consistent with an early build up of mutations which fixate in the tumour before a more recent division into competing subclones.
5. Discussion
Studying the evolutionary history of tumours and their heterogeneity covers computational aspects from processing raw sequencing data to resolving the phylogeny. For bulk data, the discovery of the prevalence of mutations in the sample is reasonably accurate, apart from for low-frequency events. However low-frequency mutations are common and could account for much of a tumour's diversity and be relevant for treatment. Deeper sequencing can help give better accuracy on distinguishing their prevalence and so in resolving their evolutionary history [118]. Apart from the difficulties in resolving low-frequency mutations, the main issue is with untangling the clonal structure from the mixture of DNA from a large number of cells. Computational approaches started focusing on the clustering [31], [38], [47] or the phylogenetic [37], [39], [41] aspects before considering their inference jointly [32], [42], [58], [60].
For single-cell data, the deconvolution is no longer needed, but the need for extensive amplification of the initial DNA material, and feedback within the amplification process introduces more noise in the sequencing data and makes uncovering mutations harder. Computational approaches have each so far focused separately on one facet of single-cell data: mutation calling designed for the specifics of SCS [107], clustering to correct for errors in the calling [80], [108], or probabilistic phylogenetic methods tailored for those high (and unbalanced) errors [113], [114], [115], [116]. Mirroring the advances for bulk data, we can expect the next advances for single-cell based approaches to offer a holistic treatment for the process from sequencing to phylogeny, while also considering a larger range of mutation types.
A first step would be to account for the uncertainty in the mutation calling (as performed by [111] for bulk data and as can be extracted from [107] for single cells) in the input for the phylogenetic inference [115], [116], but overall the aim would be joint inference of the mutations and their phylogenetic structure. Along with combining the raw sequencing data with the tree reconstruction, models will also need to account for further technical errors in single-cell data, like the inadvertent sampling of doublets (as was recently considered in the clustering approach of [108]).
Another aspect concerns copy number and aneuploidy changes, which often occur in cancer evolution and can inform the tumour phylogeny. These raise a number of interesting challenges for single-cell data, both for the mutation calling where the underlying frequencies can differ from and for the tree reconstruction where such events can impact several mutations at once. For copy number variations in single cells this problem also arises for copy number changes at the different scales of the gene and chromosome level. Algorithms have been developed to find the most parsimonious set of aberration events consistent with the data [119]. The data concerned were obtained using fluorescent imaging rather than sequencing but sequencing data will only add higher resolution of small scale events down to SNVs. Since it has already been shown that CNAs and SNVs can be discovered from the same SCS data [78], we expect further and corresponding modelling frameworks to arise to deal with such data.
A further aspect that CNAs thrust into the spotlight is the infinite sites assumption, that mutations or aberrations only occur once in the evolutionary history and persist afterwards. Although a priori reasonable for sparse point mutations, this is not compatible with back mutations due to a LOH. Indeed, developing and employing a probabilistic model allowing for deletions and loss of mutations, bulk sequencing of ovarian cancer uncovered different CNAs affecting the same genomic regions providing routes to convergent evolution [97]. The copy number changes were still assumed to only occur once, a generalisation of the infinite sites assumption to infinite alleles [59]. Convergent evolution has also been observed at a gene level, with the same driver gene affected in different evolutionary lineages and spatial areas of tumours [120], [121], albeit with mutations at distinct genomic sites consistent with the infinite sites assumption. At the level of point mutations, the resolution of SCS actually allows one to test the persistence of mutations and for convergent recurrence [122]. Results from SCS datasets strongly indicate that the infinite sites assumption is frequently violated [122]. Although employed in the current single-cell phylogenetic methods [113], [114], [115], [116], and bulk methods, as it greatly simplifies the inference, this will need to be relaxed for more general models which capture the full complexity of tumour evolution. These can build on models allowing (and penalising) a single recurrence [122], allowing the loss of mutations [97], or with substitution models allowing arbitrary recurrence and loss as in [123] and the methylation model of BitPhylogeny [114]. Alternatively phylogenetic clustering approaches which do not need to enforce the infinite sites, like [46], can be further explored. Important when relaxing the infinite sites assumption will be to account for and appropriately penalise the increase in complexity of more general models.
One general limitation of SCS is that from a relatively small sample of cells it is difficult to obtain an accurate picture of the prevalence of clones and their mutations, especially for highly heterogeneous tumours. Low frequency clones are unlikely to be sampled, and those which happen to be sampled would appear more frequent than they really are. Sequencing more cells obviously gives a clearer picture, but at a higher cost and likely to recapitulate high frequency clones while providing little extra information about the low frequency ones. Deep sequencing of bulk samples, however, can give complementary information on these frequencies, which could also inform the phylogenetic reconstruction. This is highlighted by [50], [97] where selected and targeted SCS was employed to enrich bulk analyses. The challenge would be to combine both single-cell and bulk data, with their individual characteristics, into a coherent modelling framework. Several bulk samples may help in particular (as for the bulk phylogeny problem [32], [41], [42], [43], [56], [57], [58], [59], [60]) and importantly this sort of framework could inform experiments on which combinations of bulk and single-cell data would offer the most detailed picture of the tumour's history and heterogeneity.
For single-cell data with high coverage and current error rates [78] we can expect a good reconstruction of the mutation order and history with a couple of cells sampled per relevant mutation [116]. For the 40 mutations uncovered in an ER + breast tumour, even the 47 cells sequenced by [78] offer a detailed picture of the clonal expansion and subsequent separation into subclones [116] since probabilistic phylogenetic models account for the uncertainties in the mutations observed or missed in each cell and combine this information when inferring the tree structure. By considering current single-cell datasets, it would seem that sequencing 50–100 single cells should give a high resolution picture of the tumour. Sequencing more cells obviously improves the resolution, but at a higher cost and may be of less marginal value than several very deep bulk sequences. Better estimates will however arise once methods arrive to combine single-cell and bulk data. Experimentally it is also worthwhile verifying that samples are indeed single cells before sequencing to avoid contamination from doublets.
A related aspect is to consider the spatial resolution and heterogeneity of tumours, as recently performed by [124], and the temporal evolution for example by following tumour progression through xenograft generations [50]. Spatiotemporal dynamics also play a key role for the spread of tumours [97] and the link between the primary tumour and metastases [111]. Here a key question, and one with great treatment relevance, is whether the metastases were seeded early in the tumour's development or are derived from later cells. Again we can consider which sorts and combinations of data would best help to answer such questions. To understand where metastases fit in the evolutionary history of the primary tumour and their origin, ideally we would posses a high resolution understanding of the primary tumour with single-cell and deep bulk data. Assuming a single seeding event of each metastasis suggests that their bulk sequencing would suffice (as in the data of [111]), but to test this assumption would also require high resolution of the heterogeneity within the metastases themselves. As well as answering the question of the origin of metastases, SCS and its ability to provide clear understanding of a tumour's evolutionary history offers great potential for examining tumour development under the action of clinical therapies through serial biopsies or even time course collection of CTCs.
Looking to a future where high quality single-cell (and bulk) data is available across many patient samples, as is currently the case for the TCGA and ICGC databases for bulk samples, such data and its analysis will not only help in the identification of further driver mutations but will also allow the identification of recurring mutational patterns. These may be informative for cancer treatment and in predicting cancer progression. Furthermore, combining evolutionary histories from real patient data with evolutionary models (like [125], [126], [127]) offers the possibility to infer the fitness landscape of the tumour's aberrations. Different evolutionary models result in different phylogenetic patterns so that single-cell analysis could further help to distinguish between different models of tumour evolution like clonal expansion [15], neutral evolution [124], [128], ‘Big Bang’ models [129] of a sudden selective change followed by mostly neutral evolution, and punctuated evolution [130] of flurries of aberrations followed by clonal expansion.
List of abbreviations
- AD
Allelic dropout
- CNA
Copy number alteration
- CTC
Circulating tumour cell
- EM
Expectation maximisation
- FACS
Fluorescence-activated cell sorting
- ICGC
International Cancer Genome Consortium
- LOH
Loss of heterozygosity
- MCMC
Markov chain Monte Carlo
- MDA
Multiple-displacement amplification
- MILP
Mixed integer linear programming
- NGS
Next generation sequencing
- QIP
Quadratic integer programming
- PCA
Principal component analysis
- PCR
Polymerase chain reaction
- SCS
Single cell sequencing
- SNES
Single nucleus exome sequencing
- SNV
Single nucleotide variant
- TCGA
The Cancer Genome Atlas
- WES
Whole exome sequencing
Author contribution
JK, KJ and NB wrote the manuscript.
Funding
JK was supported by ERC Synergy Grant 609883 (http://erc.europa.eu/). KJ was supported by SystemsX.ch RTD Grant 2013/150 (http://www.systemsx.ch/).
Transparency document
Transparancy document
The Transparency Document associated with this article can be found, in online version.
Footnotes
This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.
References
- 1.Hanahan D., Weinberg R.A. The hallmarks of cancer. Cell. 2000;100:57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
- 2.Hanahan D., Weinberg R.A. Hallmarks of cancer: the next generation. Cell. 2011;144:646–674. doi: 10.1016/j.cell.2011.02.013. [DOI] [PubMed] [Google Scholar]
- 3.Nowell P.C. The clonal evolution of tumor cell populations. Science. 1976;194:23–28. doi: 10.1126/science.959840. [DOI] [PubMed] [Google Scholar]
- 4.Vogelstein B., Fearon E.R., Hamilton S.R., Kern S.E., Preisinger A.C., Leppert M., Nakamura Y., White R., Smits A.M., Bos J.L. Genetic alterations during colorectal tumor development. N. Engl. J. Med. 1988;319:525–532. doi: 10.1056/NEJM198809013190901. [DOI] [PubMed] [Google Scholar]
- 5.Dexter D.L., Kowalski H.M., Blazar B.A., Fligiel Z., Vogel R., Heppner G.H. Heterogeneity of tumor cells from a single mouse mammary tumor. Cancer Res. 1978;38:3174–3181. [PubMed] [Google Scholar]
- 6.Fidler I.J. Tumor heterogeneity and the biology of cancer invasion and metastasis. Cancer Res. 1978;38:2651–2660. [PubMed] [Google Scholar]
- 7.Heppner G.H. Tumor heterogeneity. Cancer Res. 1984;44:2259–2265. [PubMed] [Google Scholar]
- 8.Michor F., Iwasa Y., Nowak M.A. Dynamics of cancer progression. Nat. Rev. Cancer. 2004;4:197–205. doi: 10.1038/nrc1295. [DOI] [PubMed] [Google Scholar]
- 9.Merlo L.M., Pepper J.W., Reid B.J., Maley C.C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer. 2006;6:924–935. doi: 10.1038/nrc2013. [DOI] [PubMed] [Google Scholar]
- 10.Pepper J.W., Scott Findlay C., Kassen R., Spencer S.L., Maley C.C. SYNTHESIS: cancer research meets evolutionary biology. Evol. Appl. 2009;2:62–70. doi: 10.1111/j.1752-4571.2008.00063.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McLendon R. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hudson T.J. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yates L.R., Campbell P.J. Evolution of the cancer genome. Nat. Rev. Genet. 2012;13:795–806. doi: 10.1038/nrg3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nik-Zainal S., Van Loo P., Wedge D.C., Alexandrov L.B., Greenman C.D., Lau K.W., Raine K., Jones D., Marshall J., Ramakrishna M. The life history of 21 breast cancers. Cell. 2012;149:994–1007. doi: 10.1016/j.cell.2012.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Greaves M., Maley C.C. Clonal evolution in cancer. Nature. 2012;481:306–313. doi: 10.1038/nature10762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Burrell R.A., Swanton C. Re-evaluating clonal dominance in cancer evolution. Trends in Cancer. 2016;2:263–276. doi: 10.1016/j.trecan.2016.04.002. [DOI] [PubMed] [Google Scholar]
- 17.Maley C.C., Galipeau P.C., Finley J.C., Wongsurawat V.J., Li X., Sanchez C.A., Paulson T.G., Blount P.L., Risques R.A., Rabinovitch P.S., Reid B.J. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nat. Genet. 2006;38:468–473. doi: 10.1038/ng1768. [DOI] [PubMed] [Google Scholar]
- 18.Merlo L.M., Shah N.A., Li X., Blount P.L., Vaughan T.L., Reid B.J., Maley C.C. A comprehensive survey of clonal diversity measures in Barrett's esophagus as biomarkers of progression to esophageal adenocarcinoma. Cancer Prev. Res. 2010;3:1388–1397. doi: 10.1158/1940-6207.CAPR-10-0108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Marusyk A., Polyak K. Tumor heterogeneity: causes and consequences. BBA Rev. Cancer. 2010;1805:105–117. doi: 10.1016/j.bbcan.2009.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ding L., Ley T.J., Larson D.E., Miller C.A., Koboldt D.C., Welch J.S., Ritchey J.K., Young M.A., Lamprecht T., McLellan M.D. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510. doi: 10.1038/nature10738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Marusyk A., Almendro V., Polyak K. Intra-tumour heterogeneity: a looking glass for cancer? Nat. Rev. Cancer. 2012;12:323–334. doi: 10.1038/nrc3261. [DOI] [PubMed] [Google Scholar]
- 22.Burrell R.A., McGranahan N., Bartek J., Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013;501:338–345. doi: 10.1038/nature12625. [DOI] [PubMed] [Google Scholar]
- 23.Gillies R.J., Verduzco D., Gatenby R.A. Evolutionary dynamics of carcinogenesis and why targeted therapy does not work. Nat. Rev. Cancer. 2012;12:487–493. doi: 10.1038/nrc3298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Burrell R.A., Swanton C. Tumour heterogeneity and the evolution of polyclonal drug resistance. Mol. Oncol. 2014;8:1095–1111. doi: 10.1016/j.molonc.2014.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.McGranahan N., Swanton C. Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. Cancer Cell. 2015;27:15–26. doi: 10.1016/j.ccell.2014.12.001. [DOI] [PubMed] [Google Scholar]
- 26.Bonavia R., Inda M.M., Cavenee W.K., Furnari F.B. Heterogeneity maintenance in glioblastoma: a social network. Cancer Res. 2011;71:4055–4060. doi: 10.1158/0008-5472.CAN-11-0153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ortmann C.A., Kent D.G., Nangalia J., Silber Y., Wedge D.C., Grinfeld J., Baxter E.J., Massie C.E., Papaemmanuil E., Menon S. Effect of mutation order on myeloproliferative neoplasms. N. Engl. J. Med. 2015;372:601–612. doi: 10.1056/NEJMoa1412098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Stratton M.R., Campbell P.J., Futreal P.A. The cancer genome. Nature. 2009;458:719–724. doi: 10.1038/nature07943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Swanton C. Intratumor heterogeneity: evolution through space and time. Cancer Res. 2012;72:4875–4882. doi: 10.1158/0008-5472.CAN-12-2217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Allison K.H., Sledge G.W. Heterogeneity and cancer. Oncol. 2014;28:772–778. [PubMed] [Google Scholar]
- 31.Zare H., Wang J., Hu A., Weber K., Smith J., Nickerson D., Song C., Witten D., Blau C.A., Noble W.S. Inferring clonal composition from multiple sections of a breast cancer. PLoS Comput. Biol. 2014;10:e003703. doi: 10.1371/journal.pcbi.1003703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jiao W., Vembu S., Deshwar A.G., Stein L., Morris Q. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinf. 2014;15:35. doi: 10.1186/1471-2105-15-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schuh A., Becq J., Humphray S., Alexa A., Burns A., Clifford R., Feller S.M., Grocock R., Henderson S., Khrebtukova I. Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood. 2012;120:4191–4196. doi: 10.1182/blood-2012-05-433540. [DOI] [PubMed] [Google Scholar]
- 34.Van Loo P., Voet T. Single cell analysis of cancer genomes. Curr. Opin. Genet. Dev. 2014;24:82–91. doi: 10.1016/j.gde.2013.12.004. [DOI] [PubMed] [Google Scholar]
- 35.Navin N.E. Cancer genomics: one cell at a time. Genome Biol. 2014;15 doi: 10.1186/s13059-014-0452-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gusfield D. Cambridge University Press; Cambridge: 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. [Google Scholar]
- 37.Strino F., Parisi F., Micsinai M., Kluger Y. TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res. 2013;41:e165. doi: 10.1093/nar/gkt641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Roth A., Khattra J., Yap D., Wan A., Laks E., Biele J., Ha G., Aparicio S., Bouchard-Côté A., Shah S.P. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods. 2014;11:396–398. doi: 10.1038/nmeth.2883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hajirasouliha I., Mahmoody A., Raphael B.J. A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics. 2014;30:i78–i86. doi: 10.1093/bioinformatics/btu284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Miller C.A., White B.S., Dees N.D., Griffith M., Welch J.S., Griffith O.L., Vij R., Tomasson M.H., Graubert T.A., Walter M.J. SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 2014;10:e1003665. doi: 10.1371/journal.pcbi.1003665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.El-Kebir M., Oesper L., Acheson-Field H., Raphael B.J. Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics. 2015;31:i62–i70. doi: 10.1093/bioinformatics/btv261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Malikic S., McPherson A.W., Donmez N., Sahinalp C.S. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics. 2015;31:1349–1356. doi: 10.1093/bioinformatics/btv003. [DOI] [PubMed] [Google Scholar]
- 43.Popic V., Salari R., Hajirasouliha I., Kashef-Haghighi D., West R.B., Batzoglou S. Fast and scalable inference of multi-sample cancer lineages. CoRR. 2014;abs/1412.8574 doi: 10.1186/s13059-015-0647-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Sengupta S., Wang J., Lee J., Müller P., Gulukota K., Banerjee A., Ji Y. Pacific Symposium on Biocomputing. Vol. 20. 2015. Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data. pp. 467–478. [PubMed] [Google Scholar]
- 45.Donmez N., Malikic S., Wyatt A.W., Gleave M.E., Collins C.C., Sahinalp S.C. International Conference on Research in Computational Molecular Biology. Springer; 2016. Clonality inference from single tumor samples using low coverage sequence data; pp. 83–94. [DOI] [PubMed] [Google Scholar]
- 46.Marass F., Mouliere F., Yuan K., Rosenfeld N., Markowetz F. A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat. 2016;10:2377–2404. [Google Scholar]
- 47.Shah S.P., Roth A., Goya R., Oloumi A., Ha G., Zhao Y., Turashvili G., Ding J., Tse K., Haffari G. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012;486:395–399. doi: 10.1038/nature10933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Larson N.B., Fridley B.L. PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics. 2013;29:1888–1889. doi: 10.1093/bioinformatics/btt293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gerlinger M., Horswell S., Larkin J., Rowan A.J., Salm M.P., Varela I., Fisher R., McGranahan N., Matthews N., Santos C.R. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 2014;46:225–233. doi: 10.1038/ng.2891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Eirew P., Steif A., Khattra J., Ha G., Yap D., Farahani H., Gelmon K., Chia S., Mar C., Wan A. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature. 2015;518:422–426. doi: 10.1038/nature13952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Oesper L., Mahmoody A., Raphael B.J. THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 2013;14:R80. doi: 10.1186/gb-2013-14-7-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Oesper L., Satas G., Raphael B.J. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics. 2014;30:3532–3540. doi: 10.1093/bioinformatics/btu651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Ha G., Roth A., Khattra J., Ho J., Yap D., Prentice L.M., Melnyk N., McPherson A., Bashashati A., Laks E. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genet. Res. 2014;24:1881–1893. doi: 10.1101/gr.180281.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li B., Li J.Z. A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data. Genome Biol. 2014;15:473. doi: 10.1186/s13059-014-0473-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fischer A., Vázquez-García I., Illingworth C.J., Mustonen V. High-definition reconstruction of clonal composition in cancer. Cell Rep. 2014;7:1740–1752. doi: 10.1016/j.celrep.2014.04.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Qiao Y., Quinlan A.R., Jazaeri A.A., Verhaak R.G., Wheeler D.A., Marth G.T. SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol. 2014;15:443. doi: 10.1186/s13059-014-0443-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Niknafs N., Beleva-Guthrie V., Naiman D.Q., Karchin R. SubClonal hierarchy inference from somatic mutations: automatic reconstruction of cancer evolutionary trees from multi-region next generation sequencing. PLoS Comput. Biol. 2015;11:e1004416. doi: 10.1371/journal.pcbi.1004416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Deshwar A.G., Vembu S., Yung C.K., Jang G.H., Stein L., Morris Q. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 2015;16:35. doi: 10.1186/s13059-015-0602-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.El-Kebir M., Satas G., Oesper L., Raphael B.J. Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Syst. 2016;3:43–53. doi: 10.1016/j.cels.2016.07.004. [DOI] [PubMed] [Google Scholar]
- 60.Jiang Y., Qiu Y., Minn A.J., Zhang N.R. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl. Acad. Sci. 2016;113:E5528–E5537. doi: 10.1073/pnas.1522203113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Greenman C.D., Pleasance E.D., Newman S., Yang F., Fu B., Nik-Zainal S., Jones D., Lau K.W., Carter N., Edwards P.A. Estimation of rearrangement phylogeny for cancer genomes. Genome Res. 2012;22:346–361. doi: 10.1101/gr.118414.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Purdom E., Ho C., Grasso C.S., Quist M.J., Cho R.J., Spellman P. Methods and challenges in timing chromosomal abnormalities within cancer samples. Bioinformatics. 2013;29:3113–3120. doi: 10.1093/bioinformatics/btt546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Schwarz R.F., Trinh A., Sipos B., Brenton J.D., Goldman N., Markowetz F. Phylogenetic quantification of intra-tumour heterogeneity. PLoS Comput. Biol. 2014;10:e1003535. doi: 10.1371/journal.pcbi.1003535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wang Y., Navin N.E. Advances and applications of single-cell sequencing technologies. Mol. Cell. 2015;58:598–609. doi: 10.1016/j.molcel.2015.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gawad C., Koh W., Quake S.R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 2016;17:175–188. doi: 10.1038/nrg.2015.16. [DOI] [PubMed] [Google Scholar]
- 66.Navin N.E. The first five years of single-cell cancer genomics and beyond. Genome Res. 2015;25:1499–1507. doi: 10.1101/gr.191098.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Tang F., Barbacioru C., Wang Y., Nordman E., Lee C., Xu N., Wang X., Bodeau J., Tuch B.B., Siddiqui A. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 2009;6:377–382. doi: 10.1038/nmeth.1315. [DOI] [PubMed] [Google Scholar]
- 68.Navin N., Kendall J., Troge J., Andrews P., Rodgers L., McIndoo J., Cook K., Stepansky A., Levy D., Esposito D. Tumour evolution inferred by single-cell sequencing. Nature. 2011;472:90–94. doi: 10.1038/nature09807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Hou Y., Song L., Zhu P., Zhang B., Tao Y., Xu X., Li F., Wu K., Liang J., Shao D. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell. 2012;148:873–885. doi: 10.1016/j.cell.2012.02.028. [DOI] [PubMed] [Google Scholar]
- 70.Xu X., Hou Y., Yin X., Bao L., Tang A., Song L., Li F., Tsang S., Wu K., Wu H. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell. 2012;148:886–895. doi: 10.1016/j.cell.2012.02.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Li Y., Xu X., Song L., Hou Y., Li Z., Tsang S., Li F., Im K.M., Wu K., Wu H. Single-cell sequencing analysis characterizes common and cell-lineage-specific mutations in a muscle-invasive bladder cancer. GigaScience. 2012;1:1–14. doi: 10.1186/2047-217X-1-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Wang J., Fan H.C., Behr B., Quake S.R. Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell. 2012;150:402–412. doi: 10.1016/j.cell.2012.06.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zong C., Lu S., Chapman A.R., Xie X.S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science. 2012;338:1622–1626. doi: 10.1126/science.1229164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Lu S., Zong C., Fan W., Yang M., Li J., Chapman A.R., Zhu P., Hu X., Xu L., Yan L. Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing. Science. 2012;338:1627–1630. doi: 10.1126/science.1229112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Hou Y., Fan W., Yan L., Li R., Lian Y., Huang J., Li J., Xu L., Tang F., Xie X.S., Qiao J. Genome analyses of single human oocytes. Cell. 2013;155:1492–1506. doi: 10.1016/j.cell.2013.11.040. [DOI] [PubMed] [Google Scholar]
- 76.Voet T., Kumar P., Van Loo P., Cooke S.L., Marshall J., Lin M.-L., Zamani Esteki M., Van der Aa N., Mateiu L., McBride D.J. Single-cell paired-end genome sequencing reveals structural variation per cell cycle. Nucleic Acids Res. 2013;41:6119–6138. doi: 10.1093/nar/gkt345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Hughes A.E., Magrini V., Demeter R., Miller C.A., Fulton R., Fulton L.L., Eades W.C., Elliott K., Heath S., Westervelt P. Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing. PLoS Genet. 2014;10:e1004462. doi: 10.1371/journal.pgen.1004462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Wang Y., Waters J., Leung M.L., Unruh A., Roh W., Shi X., Chen K., Scheet P., Vattathil S., Liang H. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature. 2014;512:155–160. doi: 10.1038/nature13600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Leung M.L., Wang Y., Waters J., Navin N.E. SNES: single nucleus exome sequencing. Genome Biol. 2015;16:1–10. doi: 10.1186/s13059-015-0616-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Gawad C., Koh W., Quake S.R. Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics. Proc. Natl. Acad. Sci. 2014;111:17947–17952. doi: 10.1073/pnas.1420822111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sachs N., Clevers H. Organoid cultures for the analysis of cancer phenotypes. Curr. Opin. Genet. Dev. 2014;24:68–73. doi: 10.1016/j.gde.2013.11.012. [DOI] [PubMed] [Google Scholar]
- 82.Boj S.F., Hwang C.-I., Baker L.A., Chio I.I.C., Engle D.D., Corbo V., Jager M., Ponz-Sarvise M., Tiriac H., Spector M.S. Organoid models of human and mouse ductal pancreatic cancer. Cell. 2015;160:324–338. doi: 10.1016/j.cell.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Heitzer E., Auer M., Gasch C., Pichler M., Ulz P., Hoffmann E.M., Lax S., Waldispuehl-Geigl J., Mauermann O., Lackner C. Complex tumor genomes inferred from single circulating tumor cells by array-CGH and next-generation sequencing. Cancer Res. 2013;73:2965–2975. doi: 10.1158/0008-5472.CAN-12-4140. [DOI] [PubMed] [Google Scholar]
- 84.Ni X., Zhuo M., Su Z., Duan J., Gao Y., Wang Z., Zong C., Bai H., Chapman A.R., Zhao J. Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients. Proc. Natl. Acad. Sci. 2013;110:21083–21088. doi: 10.1073/pnas.1320659110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Dago A.E., Stepansky A., Carlsson A., Luttgen M., Kendall J., Baslan T., Kolatkar A., Wigler M., Bethel K., Gross M.E. Rapid phenotypic and genomic change in response to therapeutic pressure in prostate cancer inferred by high content analysis of single circulating tumor cells. PloS One. 2014;9:e101777. doi: 10.1371/journal.pone.0101777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Lohr J.G., Adalsteinsson V.A., Cibulskis K., Choudhury A.D., Rosenberg M., Cruz-Gordillo P., Francis J., Zhang C.-Z., Shalek A.K., Satija R. Whole exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Nat. Biotechnol. 2014;32:479. doi: 10.1038/nbt.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Yu C., Yu J., Yao X., Wu W.K., Lu Y., Tang S., Li X., Bao L., Li X., Hou Y. Discovery of biclonal origin and a novel oncogene SLC12A5 in colon cancer by single-cell sequencing. Cell Res. 2014;24:701–712. doi: 10.1038/cr.2014.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.McConnell M.J., Lindberg M.R., Brennand K.J., Piper J.C., Voet T., Cowing-Zitron C., Shumilina S., Lasken R.S., Vermeesch J.R., Hall I.M., Gage F.H. Mosaic copy number variation in human neurons. Science. 2013;342:632–637. doi: 10.1126/science.1243472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Potter N.E., Ermini L., Papaemmanuil E., Cazzaniga G., Vijayaraghavan G., Titley I., Ford A., Campbell P., Kearney L., Greaves M. Single-cell mutational profiling and clonal phylogeny in cancer. Genome Res. 2013;23:2115–2125. doi: 10.1101/gr.159913.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Papaemmanuil E., Rapado I., Li Y., Potter N.E., Wedge D.C., Tubio J., Alexandrov L.B., Van Loo P., Cooke S.L., Marshall J. RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia. Nat. Genet. 2014;46:116–125. doi: 10.1038/ng.2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Baslan T., Kendall J., Ward B., Cox H., Leotta A., Rodgers L., Riggs M., D’Italia S., Sun G., Yong M. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling. Genome Res. 2015;25:714–724. doi: 10.1101/gr.188060.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Fan H.C., Fu G.K., Fodor S.P.A. Combinatorial labeling of single cells for gene expression cytometry. Science. 2015;347 doi: 10.1126/science.1258367. [DOI] [PubMed] [Google Scholar]
- 93.Macosko E.Z., Basu A., Satija R., Nemesh J., Shekhar K., Goldman M., Tirosh I., Bialas A.R., Kamitaki N., Martersteck E.M. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–1214. doi: 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Leung M.L., Wang Y., Kim C., Gao R., Jiang J., Sei E., Navin N.E. Highly multiplexed targeted DNA sequencing from single nuclei. Nat. Protoc. 2016;11:214–235. doi: 10.1038/nprot.2016.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Macaulay I.C., Teng M.J., Haerty W., Kumar P., Ponting C.P., Voet T. Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq. Nat. Protoc. 2016;11:2081–2103. doi: 10.1038/nprot.2016.138. [DOI] [PubMed] [Google Scholar]
- 96.Fluidigm . White Paper PN 101-2711 A1. 2016. Doublet Rate and Detection on the C1 IFCs. [Google Scholar]
- 97.McPherson A., Roth A., Laks E., Masud T., Bashashati A., Zhang A.W., Ha G., Biele J., Yap D., Wan A. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 2016;48:758–767. doi: 10.1038/ng.3573. [DOI] [PubMed] [Google Scholar]
- 98.Saitou N., Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
- 99.Gerstung M., Beisel C., Rechsteiner M., Wild P., Schraml P., Moch H., Beerenwinkel N. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 2012;3:811. doi: 10.1038/ncomms1814. [DOI] [PubMed] [Google Scholar]
- 100.DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A., Del Angel G., Rivas M.A., Hanna M. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Roth A., Ding J., Morin R., Crisan A., Ha G., Giuliany R., Bashashati A., Hirst M., Turashvili G., Oloumi A. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics. 2012;28:907–913. doi: 10.1093/bioinformatics/bts053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Cibulskis K., Lawrence M.S., Carter S.L., Sivachenko A., Jaffe D., Sougnez C., Gabriel S., Meyerson M., Lander E.S., Getz G. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Larson D.E., Harris C.C., Chen K., Koboldt D.C., Abbott T.E., Dooling D.J., Ley T.J., Mardis E.R., Wilson R.K., Ding L. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317. doi: 10.1093/bioinformatics/btr665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Koboldt D.C., Zhang Q., Larson D.E., Shen D., McLellan M.D., Lin L., Miller C.A., Mardis E.R., Ding L., Wilson R.K. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Nikolenko S.I., Korobeynikov A.I., Alekseyev M.A. BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics. 2013;14:S7. doi: 10.1186/1471-2164-14-S1-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Zafar H., Wang Y., Nakhleh L., Navin N., Chen K. Monovar: single-nucleotide variant detection in single cells. Nat. Methods. 2016;13:505–507. doi: 10.1038/nmeth.3835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Roth A., McPherson A., Laks E., Biele J., Yap D., Wan A., Smith M.A., Nielsen C.B., McAlpine J.N., Aparicio S., Bouchard-Cote A., Shah S.P. Clonal genotype and population structure inference from single-cell tumor sequencing. Nat. Methods. 2016;13:573–576. doi: 10.1038/nmeth.3867. [DOI] [PubMed] [Google Scholar]
- 109.Ronquist F., Teslenko M., van der Mark P., Ayres D.L., Darling A., Höhna S., Larget B., Liu L., Suchard M.A., Huelsenbeck J.P. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 2012;61:539–542. doi: 10.1093/sysbio/sys029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Chen D., Eulenstein O., Fernandez-Baca D., Sanderson M. Minimum-flip supertrees: complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 2006;3:165–173. doi: 10.1109/TCBB.2006.26. [DOI] [PubMed] [Google Scholar]
- 111.Reiter J.G., Makohon-Moore A.P., Gerold J.M., Bozic I., Chatterjee K., Iacobuzio-Donahue C.A., Vogelstein B., Nowak M.A. Reconstructing phylogenies of metastatic cancers. Nat. Commun. 2017;8:14114. doi: 10.1038/ncomms14114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Gusfield D., Frid Y., Brown D. Springer; Berlin: 2007. Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data; pp. 51–64. (Computing and Combinatorics). [Google Scholar]
- 113.Kim K.I., Simon R. Using single cell sequencing data to model the evolutionary history of a tumor. BMC Bioinf. 2014;15:27. doi: 10.1186/1471-2105-15-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Yuan K., Sakoparnig T., Markowetz F., Beerenwinkel N. BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol. 2015;16:36. doi: 10.1186/s13059-015-0592-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Ross E., Markowetz F. OncoNEM: inferring tumour evolution from single-cell sequencing data. Genome Biol. 2016;17:69. doi: 10.1186/s13059-016-0929-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Jahn K., Kuipers J., Beerenwinkel N. Tree inference for single-cell data. Genome Biol. 2016;17:86. doi: 10.1186/s13059-016-0936-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Adams R.P., Ghahramani Z., Jordan M.I. Advances in Neural Information Processing Systems. Vol. 23. 2010. Tree-structured stick breaking for hierarchical data; pp. 19–27. [Google Scholar]
- 118.Griffith M., Miller C.A., Griffith O.L., Krysiak K., Skidmore Z.L., Ramu A., Walker J.R., Dang H.X., Trani L., Larson D.E. Optimizing cancer genome sequencing and analysis. Cell Syst. 2015;1:210–223. doi: 10.1016/j.cels.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Chowdhury S.A., Gertz E.M., Wangsa D., Heselmeyer-Haddad K., Ried T., Schffer A.A., Schwartz R. Inferring models of multiscale copy number evolution for single-tumor phylogenetics. Bioinformatics. 2015;31:i258–i267. doi: 10.1093/bioinformatics/btv233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Gerlinger M., Rowan A.J., Horswell S., Larkin J., Endesfelder D., Gronroos E., Martinez P., Matthews N., Stewart A., Tarpey P. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 2012;366(10):883–892. doi: 10.1056/NEJMoa1113205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Kovac M., Navas C., Horswell S., Salm M., Bardella C., Rowan A., Stares M., Castro-Giner F., Fisher R., De Bruin E.C. Recurrent chromosomal gains and heterogeneous driver mutations characterise papillary renal cancer evolution. Nat. Commun. 2015;6:6336. doi: 10.1038/ncomms7336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Kuipers J., Jahn K., Beerenwinkel N. A statistical test on single-cell data reveals widespread recurrent mutations in tumor evolution. bioRxiv. 2016:094722. [Google Scholar]
- 123.Zafar H., Tzen A., Navin N., Chen K., Nakhleh L. SiFit: a method for inferring tumor trees from single-cell sequencing data under finite-site models. bioRxiv. 2016:091595. doi: 10.1186/s13059-017-1311-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Ling S., Hu Z., Yang Z., Yang F., Li Y., Lin P., Chen K., Dong L., Cao L., Tao Y. Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution. Proc. Natl. Acad. Sci. 2015;112:E6496–E6505. doi: 10.1073/pnas.1519556112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Beerenwinkel N., Antal T., Dingli D., Traulsen A., Kinzler K.W., Velculescu V.E., Vogelstein B., Nowak M.A. Genetic progression and the waiting time to cancer. PLoS Comput. Biol. 2007;3:e225. doi: 10.1371/journal.pcbi.0030225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Bozic I., Antal T., Ohtsuki H., Carter H., Kim D., Chen S., Karchin R., Kinzler K.W., Vogelstein B., Nowak M.A. Accumulation of driver and passenger mutations during tumor progression. Proc. Natl. Acad. Sci. 2010;107:18545–18550. doi: 10.1073/pnas.1010978107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Waclaw B., Bozic I., Pittman M.E., Hruban R.H., Vogelstein B., Nowak M.A. A spatial model predicts that dispersal and cell turnover limit intratumour heterogeneity. Nature. 2015;525:261–264. doi: 10.1038/nature14971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Williams M.J., Werner B., Barnes C.P., Graham T.A., Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 2016;48:238–244. doi: 10.1038/ng.3489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Sottoriva A., Kang H., Ma Z., Graham T.A., Salomon M.P., Zhao J., Marjoram P., Siegmund K., Press M.F., Shibata D. A Big Bang model of human colorectal tumor growth. Nat. Genet. 2015;47:209–216. doi: 10.1038/ng.3214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Gao R., Davis A., McDonald T.O., Sei E., Shi X., Wang Y., Tsai P.-C., Casasent A., Waters J., Zhang H. Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat. Genet. 2016;48:1119–1130. doi: 10.1038/ng.3641. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.