Abstract
Purpose
Recent studies have highlighted the existence of sub-clones in tumors. Lymph nodes are generally the first location of metastasis for most solid epithelial tumors including colorectal cancer (CRC). We sought to understand the genetic origin of lymph node metastasis in CRC by evaluating the relationship between CRC tumor sub-clones present in primary tumors and lymph nodes.
Experimental Design
A total of 33 samples from seven CRC’s, including two or three spatially disparate regions from each primary tumor and one to four matched lymph nodes for each tumor, underwent next-generation whole-exome DNA sequencing, Affymetrix OncoScan SNP arrays, and targeted deep confirmatory sequencing. We performed mapping between SNPs and copy number events from the primary tumor and matched lymph node samples, allowing us to profile heterogeneity and the mutational origin of LN metastases. The computational method PyClone was used to define sub-clones within each tumor. The method Citup was subsequently used to infer phylogenetic relationships among sub-clones.
Results
We found there was substantial heterogeneity in mutations and copy number changes among all samples from any given patient. For each patient, the primary tumor regions and matched lymph node metastases were each polyclonal and the clonal populations differed from one lymph node to another. In some patients, the cancer cell populations in a given lymph node originated from multiple distinct regions of a tumor.
Conclusions
Our data support a model of lymph node metastatic spread in colorectal cancer whereby metastases originate from multiple waves of seeding from the primary tumor over time.
Introduction
Death from colorectal cancer (CRC) typically occurs due to distant metastasis (1). Lymph node metastases are thought to occur before distant metastasis. We and others have previously shown that CRC primary tumors are composed of multiple distinct genetic sub-clones (2, 3). Prior models have proposed that individual cells or groups of cells escape from the primary tumor and enter into the lymphatics or blood vessels to seed local lymph nodes and potentially remote sites (4). Rare, individual metastatic cells escaping from the tumor would produce clonal metastatic lesions. In contrast, polyclonal metastatic lesions could arise from multiple different sub-clones from a primary cancer simultaneously seeding a metastatic site or by recurrent waves of seeding by different individual cancer cells and/or by polyclonal metastatic clusters of cells.
Prior limited studies of CRC lymph node metastasis have shown discordance between KRAS mutation status in primary tumors and associated lymph node metastasis in about thirty percent of patients, suggesting heterogeneity among metastatic CRC cells in individual patients (5). However comparisons of mutational status between primary CRC and distant metastases, have been mixed with some showing concordance (6, 7) and others discordant mutation status (8). Recent studies of breast cancer lung metastasis in mice have shown that the lung metastases were polyclonal (9) but in human high-grade serous ovarian cancer, omental metastases are predominantly made of individual clones (10). Thus, the clonality of metastasis may be tumor type or site specific or neither, depending when in time the tumor and the metastases are sampled.
In the work presented here, we studied multiple primary tumor regions and cancer-containing lymph nodes from seven CRC patients with the goal of understanding whether a given lymph node metastasis was mono- or polyclonal in its composition. In addition, we sought to determine whether a given lymph node metastasis harbors cells from a single area of the cancer or whether the lymph node harbors cells from multiple spatially discrete areas of the primary tumor. Moreover, we wanted to assess the relationships among the cancer cell populations present in different metastatic lymph nodes from individual patients and determine whether the metastatic sub-clones present in a lymph node originated early or late in tumor evolution. Our overall strategy for addressing these inter-related questions was to start by identifying individual somatic single nucleotide changes that could be detected by sequencing, or copy number events that could be measured using SNP arrays, in specific regions of patient’s primary CRC specimen. Then, the somatic mutations and copy number changes found in the primary tumor regions were sought in the metastatic lymph node lesions of a given patient. We then analyzed the primary tumor and lymph node data in depth to infer sub-clonal variant populations and their phylogenetic ordering.
We found that LN metastases are polyclonal, differ from one LN to another, and can originate from multiple geographic regions of the primary tumor. Additionally, we found that single LN’s can harbor sub-clones from different geographic regions in the primary tumor. Our findings are consistent with a model of metastasis where multiple waves of metastatic cells escape the primary tumor over time and seed a given lymph node during tumor progression.
Methods
Sample collection and DNA extraction
For seven colon cancers, DNA was isolated from banked surgical resection materials present in the Michigan Medicine Tissue Procurement Core. A qualified gastrointestinal pathologist (JG) chose two or three spatially distinct areas of adenocarcinoma in each primary tumor specimen via inspection of hematoxylin- and eosin- (H&E)-stained glass slides. In addition, multiple matched formalin fixed paraffin embedded (FFPE) LN’s containing tumor metastasis were identified. Each tumor region intended for analysis was at least one centimeter in distance from the other area(s). Normal colon tissue was also collected from the surgical specimens. The relevant areas of each frozen or FFPE tumor block were identified and manually dissected from the blocks. Clinical data on each patient was abstracted from the medical records. The Michigan Medicine Institutional Review Board approved this study. DNA was extracted, using the Allprep mini kit (Qiagen, Valencia, California) according to the manufacturer’s instructions.
Whole-exome sequencing
An overview of the main workflow for this experiment is provided in Supplemental Figure 1.
Genomic DNA samples were fragmented to a target size of 300bp using a Covaris S2 fragmentation system. The samples were end-repaired, a-tailed, and custom adapters were ligated using the NEBNext® DNA Library Prep kit according to the manufacturers recommended protocols. The custom adapters included 6bp barcodes designed using BARCRAWL software (11) and synthesized by Integrated DNA Technologies. After ligation, the samples were size selected to 300bp on a 2 percent agarose gel and 1mm gel slices were retained. Samples were isolated from the gel using the Qiagen QIAquick gel extraction system. Either 10 μl or 15 μl of each ligation product was enriched using the Phusion master mix kit and custom polymerase chain reaction (PCR) primers for a total of 14 cycles of PCR amplification. The PCR products were purified using AmpureXP® beads.
Library QC was performed using the Agilent Bioanalyzer and qPCR. Each pool was captured using the Nimblegen SeqCap EZ V3 Exome Enrichment Kit. The capture pools were combined and sequenced on the lllumina HiSeq 2500 platform with paired-end 100bp reads using v3 reagents. Paired reads were adapter-trimmed and mapped to the hg19 reference genome using the Burrows-Wheeler Aligner (BWA mem v0.7.15) (12). Duplicates were removed using Picard v1.140 and local realignment and base quality recalibration performed using GATK v3.2-2. Tumor/normal pairs of the analysis-ready alignment files were analyzed using three somatic variant callers: MuTect v1.1.4, VarScan somatic v2.3.7, and Strelka v1.0.14 using default parameters, with the exception of VarScan for which the minimum variant allele frequency was reduced to 0.05. VarScan variants were filtered using the fpfilter tool. The resulting variant calls were compiled using a custom tool called Jacquard (GitHub repository: https://github.com/umich-brcf-bioinf/Jacquard). High-confidence somatic variants identified by any one of the three callers were retained as targets for deep-sequencing and validation. Variants were annotated using VarSeq (Golden Helix, v.1.4.1), classifying them by region (e.g. exon, intron, 5′-UTR) and amino acid impact (e.g. synonymous, non-synonymous).
Deep targeted sequencing
To validate mutations identified from exome sequencing and to facilitate sub-clone profiling, we designed custom capture panels targeting somatic loci detected in the primary tumors using the Agilent SureSelect XT platform (13). The total target size across all capture panels was 1.772Mb. Samples were sequenced on two lanes of an Illumina HiSeq 4000 using paired-end 150bp reads.
Deep sequencing data were processed via a workflow identical to exome sequencing described above except that VarScan alternate allele frequency was permitted down to 0.01. Variants were retained from the deep sequencing data if at least two callers reported the mutation as somatic and if the variant was located in an exonic or UTR region of a RefSeq v105 gene. In addition to variant validation, these data were used as input into the PyClone and Citup tools for assessing subclonality and phylogeny. Driver genes were annotated according to the recent publication of Giannakis and colleagues assessing recurrent mutations in 619 colorectal cancers (14).
Oncoscan SNP arrays and Copy Number Variation analysis
ThermoFischer (Affymetrix) OncoScan v3 arrays were run on all samples. The assay detects copy number change by generating data at 50–100kb resolution across a set of 891 cancer genes and 300–400kb across the rest of the genome. Raw array florescence intensity data generated on the Affymetrix scanners in the form of CEL files were loaded into the OncoScan Console software v.1.1.0 (ThermoFisher, Waltham, Massachusetts) and processed using the standard Affymetrix reference control files, based on sample type (normal or FFPE).
The copy number data were processed using the Nexus Copy Number software v7.5 (BioDiscovery, El Segundo, California) using their SNP-FASST2 algorithm for analysis and segmentation, generating a median Log2 Ratio and a median B-allele frequency (BAF) for each genomic segment. The significance threshold for segmentation was set at 1.0E-5 also requiring a minimum of three probes per segment and a maximum probe spacing of 1000kbp between adjacent probes before breaking a segment. Segments were classified as having gains or losses if the Log2Ratio (L2R) exceeded or fell lower than thresholds manually set for each tumor depending on its inferred purity and the quality of the SNP array (Supplementary Table 1, and section “Purity Estimation” below). The reported median BAF is the median BAF of the markers identified as heterozygous; if the number of heterozygous markers in the segment was below 10 or the percent of homozygous markers was above 85% no value was reported. The BAF values are used to determine whether a segment is in a loss of heterozygosity (LOH) or an allelic imbalance state.
By default, probe sets were automatically centered to the median for all samples by the Nexus software. For individual samples where the median probe set value was not diploid, specified regions of balanced heterozygosity were manually identified by visual inspection of L2R and BAF plots and defined as diploid regions, permitting the Nexus software to re-center the entire probe set.
Purity estimation
The Affymetrix Oncoscan Console software produces an estimate of tumor purity and ploidy using a custom version of the ASCAT algorithm (14) called TuScan. We used TuScan results as well as the ASCAT algorithm directly to infer purity and ploidy of the tumors and utilized the sunrise plots that are generated by the algorithm to assist in inferring or confirming the purity estimate for the tumor samples. In the case of multiple conflicting solutions, we chose solutions that either most closely corresponded with other samples from the same tumor or that favored lower overall ploidy (e.g. diploid rather than tetraploid). Purity estimates for each tumor are utilized as input into the PyClone algorithm.
PyClone analysis
PyClone datasets were assembled for each patient including variant and reference read depth, an alternate read depth, as well as major and minor copy number values for the region in which the variant resides. The reference and alternate read depths were extracted from the alignment (bam) files for each deep sequencing sample using a custom software tool called Zither (https://github.com/umich-brcf-bioinf/Zither). Copy number data processed via Nexus as described above were compiled for each variant in a semi-manual manner using the TAPS tool in the Patchwork software library (15). Copy number analysis results were exported from the Nexus software as text and Imported into TAPS, and chromosomal plots were generated representing the log-ratio vs allelic imbalance of every copy number segment (Supplemental Figure 2). Chromosomal plots were manually inspected to assign copy number states to clustered regions on the plots. TAPS then uses this information to assign a copy number state and allelic ratio to each copy number segment. The segmental results from TAPS were mapped to detected somatic variants based on each variant’s genomic coordinate using a custom Python script.
PyClone is a hierarchical Bayesian model that infers the cellular prevalence of each variant (the proportion of tumor cells in a sample that contain the variant), clustering variants based on co-variance of those prevalence estimates across multiple samples of the same patient (10, 16–18) (Supplemental File 3). The PyClone v0.13 Markov Chain Monte Carlo model was run for 10k iterations for each patient, discarding the first 1000 as burn-in. Prior to downstream analysis, variants for which PyClone produced overly broad posterior cellular prevalence distributions were removed, defined specifically as variants in a patient data set with cellular prevalence standard deviation estimates greater than 0.2 in at least 25% of the samples. PyClone clusters were retained for subsequent analysis and reporting if they contained four or more variants, or fewer variants if the cluster contained the variant of a driver gene or if the cluster’s cellular prevalence measurement was the highest of all clusters for more than one sample.
Phylogeny analysis with Citup
The Clonality Inference in Tumors Using Phylogeny (CITUP) tool (19) (v0.1.0 of the Bitbucket version, https://bitbucket.org/dranew/citup/) was run for the assembled dataset of filtered cellular prevalence estimates for each variant in each patient generated by PyClone. CITUP enumerates all possible phylogenetic trees up to a given number of nodes, assigning variants to nodes in the tree and solving a quadratic inference problem that minimizes error in the assignment of variants to nodes in the tree. The QIP-based method of the tool was used and PyClone cluster assignments provided for each variant, using 1000 restarts and selecting the tree solution with the minimum Bayesian information criterion (BIC) score. The max number of nodes was set to eight. Higher max nodes counts were attempted for tumors for which PyClone predicted more sub-clones than eight, but were computationally prohibitive.
Data sharing
The called mutation data can be found at the European Variation Archive website: http://www.ebi.ac.uk/eva/?Home accession number: ***. The OncoScan array data are available from the NCBI’s Gene Expression Omnibus (20) at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=*** accession number ***.
Results
Tumor characteristics
All of the tumors analyzed were stage III colon cancer. In total, 43 samples were collected from the seven colon cancers, five to eight samples per patient. Of these, seven normal, 15 primary tumor, and 18 lymph samples yielded sufficient amounts of DNA for complete analysis. Table 1 shows tumor and patient characteristics including patient age, tumor location, size, and number of lymph nodes containing tumor.
Table 1. Tumor sample characteristics.
Seven stage III colon cancer samples were analyzed, including a matching normal and 2–3 primary tumor samples, and 2–9 lymph node samples per patient. Of these, fifteen primary tumor and eighteen lymph node samples yielded sufficient material for analysis.
| ID# | Age | Sex | Stage | Tumor location | # tumor samples | # LN positive | # LN studied |
|---|---|---|---|---|---|---|---|
| CP08 | 62 | Male | T3N1M0 | Transverse colon | 2 | 2 | 1 |
| CP11 | 38 | Male | T3N2M0 | Cecum | 2* | 8 | 3 |
| CP14 | 66 | Male | T2N2M0 | Cecum | 3 | 9 | 3 |
| CP15 | 74 | Male | T3N1M0 | Cecum | 2 | 3 | 2 |
| CP17 | 72 | Male | T3N2M0 | Sigmoid | 2 | 9 | 2 |
| CP18 | 56 | Female | T4N2M1 | Sigmoid | 2 | 8 | 4 |
| CP19 | 56 | Male | T3N1M0 | Sigmoid | 2 | 4 | 3 |
SNP Array analysis failed for one CP11 primary tumor sample.
Sequencing
Whole-exome sequencing was performed on tumor tissue samples and matched adjacent normal tissue samples, yielding an average read depth of 48x. Somatic variants were detected using three variant callers, selecting all variants identified by at least one caller. The number of somatic variants initially identified per patient across all tumor subsections was between 171 and 591 in six patients, with one hyper-mutated patient (CP11) yielding 4718 somatic variants.
Custom targeted sequencing panels were designed around all somatic mutations in each patient to generate more accurate alternate allele measurements and to validate identified mutations. An exception was CP11 for which only LOF, missense, synonymous, and splice-region loci were included (1639 variants). Lymph node metastatic samples were also assessed via these deep sequencing panels. An overall average read depth of 514× was achieved across target regions in all deep sequenced samples, with an average depth of 736× across detected variant locations. The overall result is a corpus of variants detected by sequencing, organized by patient, containing 2,530 somatic variants: 196 Loss Of Function (LOF) mutations, 1,454 missense alterations, and 880 silent mutations. Counts of variants by patient are shown in Supplemental Table 2.
Expected LOF APC mutations were detected in tumors CP14, CP15, CP17, CP18, and CP19 (Supplemental Table 2). CP11 is the only tumor of the seven examples without a detected APC variation, although it is a hypermutated sample with damaging mutations detected in CTNNB1, KRAS, and as well as several other driver genes (Supplemental Table 2). Predicted damaging KRAS mismatch mutations were also detected in CP14, CP15, and CP17, and LOF TP53 mutations were present in CP08, CP14, and CP19. CP17 also had a predicted damaging FBXW7 mutation; CP15 had a SMAD4 mutation; and CP18 had a PIK3CA mutation. Mutation effect predictions were obtained via dbNSFP annotation available via our annotation software (VarSeq, Golden Helix, Bozeman, MT, USA) (21), which aggregates results of six effect prediction tools: SIFT, Polyphen2 HumVar, MutationTaster, Mutation Assessor, FATHMM, FATHMM MKL Coding (18).
An annotated listing of the variants detected in this experiment is provided in Supplemental File 2.
Copy Number Alterations
CNV data were analyzed using TuScan/ASCAT to determine purity and ploidy for each specimen (Supplemental Table 1). Tumor purity was quite diverse in this cohort, ranging from 20% to 95%. This diversity appears to be a general characteristic of colon cancer samples in our experience and complicates analysis. Purity was higher in the tumors than in the lymph nodes (average purity of 0.65 vs 0.50, respectively, two-sided t-test 0.0532), and there was a higher diversity in purities of the lymph node samples vs the tumors in each patient: standard deviation between the tumor samples averaged 0.06 whereas the lymph nodes averaged 0.24, t-test 0.0063).
The tumors were largely diploid, with the exception of CP18 and CP15LN3 which have an overall triploid genome. These tumors showed substantial overall copy number changes across the genome, with an average of 36% of the genome modified by a copy number event (Supplemental Table 1, Supplemental File 1). The exception is CP11, which is a hypermutated tumor with characteristically little copy number change. A significant portion of the genome of these tumors was in allelic imbalance, with an average of 23.8% of the genomes designated as Loss Of Heterozygosity (LOH) regions. Over 50% of the samples exhibited copy number gains of Chr7, the q-arm of Chr8, and the q-arm of Chr20, and over 33% of the samples showed deletion events in the q-arm of Chr18. APC is in a loss of heterozygosity (LOH) state in all CP08, CP15, and PC17 samples as well (Supplemental File 1). Primary tumor samples from CP19 show APC in in LOH as well as LN1 and LN2 but not LN4. In addition, APC is in an amplified region in all CP18 samples. All CP19 samples contain a significant focal amplification encapsulating the EGFR gene (Supplemental File 1, Chr7: 55,086,725 – 55,238,738, p-arm near centromere). Supplemental File 1 shows log R and BAF profiles for all samples.
Primary tumors and lymph node metastases are genetically heterogeneous
All tumors had multiple examples of copy number heterogeneity between samples. Intra-tumor heterogeneity was observable between the primary tumor samples as well as between the primary samples and the metastases, and even between the metastases themselves. Copy number events in the primary tumors were often found in one or more of the LNs, however a number of CNVs in the LNs were not found in the primary tumors and either arose after metastasis occurred or originated from an un-sampled portion of the tumor.
Figure 1 shows an example of heterogeneity of CNV within a primary tumor and its matched LNs. In this example, LN2 and LN7 more closely resemble T1, and differences in allelic balance can be found amongst these two lymph samples themselves. This type of assessment was carried out for all regions of all tumors to assess the geographic origin of metastases and is discussed below.
Figure 1. Colorectal primary tumors and lymph node metastasis are genetically heterogeneous.
A: SNP Array profile of patient CP14 Chromosome 2. Top panels are log2 ratio (L2R) and the bottom panels B-allele Frequency (BAF) plots. Probes falling in copy gain and copy loss segments are colored blue and red, respectively; probes corresponding to allelic imbalance and LOH segments are colored purple and gold. The lymph node profiles LN2, LN7 and LN8 are of higher quality (higher signal-to-noise) than the primary tumor samples T1–T3. Overall, the CNV and allelic imbalance patterns of LN2 and LN7 match those of T1, while LN9 matches T2 and T3. Regions A, D, and J indicate a deletion event resulting in loss of heterozygosity in T1, LN2, and LN7. Regions B and I are copy neutral with no allelic imbalance. Regions C and E contain copy gain events in LN2 and LN7 which are possibly present in T1 but uncalled by the segmentation algorithm. The copy gain segments in region C show corresponding allelic imbalance in T1, LN2, LN7, and possibly LN9 as well; differences in the BAF plots between LN2 and LN3 in this region (purple vs gold) may be due to thresholding differences and may not reflect biological difference. Regions E and G contain copy gain and loss events in LN2 and LN7 with corresponding allelic imbalance in the BAF plots that are unique events in the lymphs; with the exception of several small sub-regions, these events are not present in T1 or the other tumor samples. Region F indicates a region of allelic imbalance in LN7 but not LN2 with no corresponding change in the primary tumors. Arrows at the top of the figure indicate regions used in the main copy number analysis (Table 3).
Assessment of the deep targeted sequencing across samples also revealed heterogeneity across tumor samples and matched LN’s in each patient. Using the presence or absence of variants between samples as a rough measure of heterogeneity indicated that between 1.2% and 89.1% of mutations detected in a tumor are shared across all samples in individual patients (Supplemental Table 2). Tumor CP14 appeared to have the highest degree of heterogeneity in detected variants between all samples, with only four mutations shared across all seven samples. Many of the variants in CP14 (160) were unique to individual samples. In tumor CP18, 66 out of 181 total variants were unique to a single sample, all in primary tumor samples, and 83 were shared by all. CP15 had 42 out of the total 297 private to individual samples, all but one in the primary tumor samples, and 231 shared by all. All of CP08’s unique variants were in T1. Tumors CP08 and CP11 showed a relatively low amount of variant heterogeneity between samples, with 89% and 78% of the variants shared by all, respectively. Tumor CP19 had only 37 of 95 variants shared amongst all samples, but only 7 variants private to individual samples; 27 variants were shared between T1, T2, LN1 and LN2. LN4 of CP19 (LN_C in Supplemental Table 3) was a low-purity sample with relatively fewer variants detected, as was LN9 of CP14 (LN_D); the private versus shared variant patterns in these tumors may be skewed as a result. As seen in Table 2 and described below, unique variants detected in the LN metastases can be mapped uniquely to specific regions of the primary tumor.
Table 2. Mapping of private sequencing variants.
Presence of variants that are unique to a single region in a primary tumor was assessed in each LN specimen from that tumor. In all tumors, individual LN’s contained variants that were unique to single tumor regions. In tumors 14, 15, and 17, LN’s contained unique mutations from multiple different regions, whereas the metastatic variants detected in the other four tumors are mapped to a single primary tumor region. Note, tumor CP11 is not listed because only a single primary tumor sample was available.
| tumor | primary | lymph | count |
|---|---|---|---|
| CP08 | T1 | LN1 | 0 |
| LN2 | 0 | ||
| LN3 | 0 | ||
| T2 | LN1 | 2 | |
| LN2 | 7 | ||
| LN3 | 7 | ||
| CP14 | T1 | LN2 | 21 |
| LN6 | 17 | ||
| LN7 | 19 | ||
| LN9 | 4 | ||
| T2 | LN2 | 6 | |
| LN6 | 18 | ||
| LN7 | 5 | ||
| LN9 | 8 | ||
| T3 | LN2 | 8 | |
| LN6 | 8 | ||
| LN7 | 10 | ||
| LN9 | 12 | ||
| CP15 | T1 | LN2 | 2 |
| LN3 | 1 | ||
| T2 | LN2 | 11 | |
| LN3 | 2 | ||
| CP17 | T1 | LN1 | 1 |
| LN5 | 1 | ||
| LN7 | 2 | ||
| T2 | LN1 | 2 | |
| LN5 | 2 | ||
| LN7 | 3 | ||
| CP18 | T1 | LN5 | 0 |
| LN6 | 5 | ||
| LN7 | 4 | ||
| LN8 | 0 | ||
| T2 | LN5 | 0 | |
| LN6 | 0 | ||
| LN7 | 0 | ||
| LN8 | 0 | ||
| CP19 | T1 | LN1 | 1 |
| LN2 | 6 | ||
| LN4 | 3 | ||
| T2 | LN1 | 0 | |
| LN2 | 0 | ||
| LN4 | 0 |
Primary tumors and lymph node metastases are polyclonal
Deep sequencing data and copy number data from SNP arrays were used to profile the sub-clonality of each tumor using PyClone. As discussed above, PyClone uses copy number data and tumor purity to estimate the cellular prevalence of each variant based on its alternate allele frequency, grouping variants into clusters based on their cellular prevalence profiles across samples (Supplementary File 3). Cellular prevalence plots along with inferred phylogenies for all tumors are shown in Figure 2. The cellular prevalence plots provide a profile of the sub-clones inferred in each sample, and indicate the changing sub-clonal composition across samples. The ordering of the samples in the cellular prevalence plots is not temporal: each profile is an assessment of the composition of a sample at a single time point.
Figure 2. Sub-clonal composition and phylogenies of tumors with LN metastases originating from multiple geographic regions of the primary tumor.
Composition profiles generated by PyClone (left) plot the mean cellular prevalence of the variants in each cluster in each sample. The order of the samples along the x-axis is not temporal. The size of the marker for each mean cellular prevalence measurement is proportional to the number of variants in the cluster. Vertical lines at each point represent one standard deviation. The best scoring phylogenetic tree inferred by Citup is shown for each tumor (right). Each tree node is labeled with the PyClone cluster ID(s) to which it corresponds. Vertical line lengths in the phylogenies correspond to the number of mutations occurring in the node below. Mutations occurring in known driver genes are indicated (17). Tumors for which metastases originate from multiple geographic regions of the primary tumor are shown in a), and those for which metastases originate from a single geographic region of the primary tumor are shown in b). Tumor CP08 has only one lymph sample, whereas CP11 only has one primary tumor sample.
All tumor samples showed evidence of sub-clonal structure, with the number of mutation clusters detected by PyClone varying between six and fifteen. However, inspection of the similarity matrices and prevalence plots indicates that clusters with very similar profiles may possibly be merged and considered a single sub-clone: e.g. clusters 6 and 10 of CP08, clusters 14 and 15 of CP11, and clusters 8 and 9 of CP15 (Figure 2, Supplemental File 3). The number of distinct clusters is not of primary importance for this analysis: of greater significance is the presence of distinctly definable sub-clones and their mapping between lymph and primary tumors.
Heterogeneity between primary tumor samples is evident, not only in differing prevalence measurements of clusters across samples, but also evidenced by of mutations present in one but not other primary tumor samples, as discussed next. Lymph nodes also often have substantially different clonal profiles from each other, and between themselves and the primary tumor samples (Figure 2). These sub-clonal profiles clearly indicate that lymph node metastases are not monoclonal.
Lymph node metastases originate from single or multiple geographic regions of the primary tumor
We examined variant data to identify mutations uniquely present in individual primary tumor samples with the goal of using these variants as markers of the region of origin of lymph node metastasis in the matched primary tumor. For each tumor, variants that are private to only one of the primary tumor samples were mapped to the lymph nodes in which they are detected. Counts of these private variant ‘marker’ mappings are shown for each of the lymph node samples in Table 1. Multiple marker variants were manually validated by visualizing the raw pileup data for the region of the variant in a genome browser. The variant data indicate that for tumors CP08, CP18, and CP19, all private marker variants unique to a single area of the primary tumor found in the LN metastases originate from a single geographic region of the primary tumor (Table 1). This is consistent with a group of cells carrying these unique mutations moving from a single region of the tumor to the LN’s, or single cells from that same region moving over a series of events. However, in tumors CP14, CP15 and CP17, each lymph node harbors marker mutations originating from multiple regions of the primary tumor, indicating that multiple tumor regions contributed to each LN metastasis (Table 1). This is consistent with seeding of the lymph nodes with the unique mutations from a single region of the primary tumor at one time and from a different region of the primary tumor during another event.
Next we performed the same lymph node mapping using copy number variants. Figure 1, introduced earlier, is an example of the method that was applied across the genome of each sample. The goal of the profiling was to identify copy number events private to individual regions of the primary tumors that could be used to indicate a possible clonal origin for each metastasis. The overall profile of lymph nodes 2 and 7 (LN2 and LN7) in Figure 1 appears to clearly match primary tumor sample 1 (T1) and not the other samples from the primary tumor. However, there are unique regions of copy number and zygosity change present in these lymph node samples that are not reflected in any of the sampled regions of the primary tumor, as well as a small region of heterogeneity between the lymph LN2 and LN7 samples themselves (region F). LN9 exhibits similarity to T2 and T3 in this chromosome in that there are no gross CNV events displayed in these samples. Care must be taken in these analyses to distinguish between simple purity differences and actual copy number changes. LN9 has a lower fraction of aberrant cells, however CNV events would easily be detectable if they were present in this region as evidenced by a number of other CNV events clearly detected in this sample in other chromosomes (Supplemental File 1).
As with the sequencing variants, copy number events detected in the lymph node metastases which could be traced to a single primary tumor region were identified. A summary of these distinguishing events for each tumor is provided in Table 3, which also lists events that are unique to the lymph nodes and not present in the primary tumor samples. The table summarizes distinguishing events at the chromosomal level. As can be seen in Table 3 and in Supplemental File 1, the copy number data for CP08 suggest a strong correlation between the lymph node and the T2 primary sample. All distinguishing copy number events appear to originate in T2 (Table 2), in agreement with the variant profile from Table 1. Tumor CP11 is a hypermutated sample with very little copy number change. The copy number data for CP14 suggests a specific match to T1 in many regions, in apparent contrast to the variant profile which revealed distinguishing variants originating from all three tumor samples, which may reflect limitations of the data. The lymph node samples of CP15 show a similar pattern matching both T1 and T2 across the genome, but there are several specific regions in which the lymph node samples match T2 only, and a region in LN2 (q-arm of Chr7) that maps uniquely to T1. A significant fraction of the CP15_LN2 genome has a unique copy number profile as well, and chr14 signals an event present in T1 but not T2 which is missing in both lymph node samples. The copy number data for CP17 do not suggest a specific primary tumor sample of origin for the lymph metastases. There are a couple regions in LN1 that are unique and not present in the primary tumor samples however, and there are several regions in which an event is evident in T1 that are not reflected in the lymph nodes as well as regions of heterogeneity between the lymph nodes. In CP18, there are entire chromosome regions with copy number states that match T2 but not T1, in contrast to the variant data. However, there are also smaller regions of lymph nodes LN7 and LN8 which indicate a T1 origin. The bulk of the genome of this tumor exhibits matching copy number events for both primary tumor samples, and there are significant regions of the lymph node genomes that contain unique events not seen in the primary tumor samples. CP19 shows copy number similarity to both T1 and T2 across much of the genome in LN1 and LN2; LN4 is very impure and difficult to assess. All three lymph nodes display a 150kb deletion event in chr20 that is present in T2 but not T1, and there is a region of amplification in chr5 visible in LN1 and LN2 which matches T1 only. There are regions of unique change visible in these lymph nodes as well.
Table 3. Copy number events in lymph nodes mapping to unique primary tumors.
For each chromosome, the table lists whether there exists a copy number event detected in a lymph sample that is private to an individual primary tumor location. If so, the primary tumor ID is indicated (T1 or T2 for each patient; there were no unique distinguishing CNVs detectable in CP14 T3). If the lymph sample contains a unique copy number event not present in any primary tumor, a “U” is indicated. See Supplemental File 1 for full copy number profiles for each tumor.
| CP08 | CP11 | CP14 | CP15 | CP17 | CP18 | CP19 | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LN3 | LN1 | LN6 | LN7 | LN2 | LN7 | LN9 | LN2 | LN3 | LN1 | LN7 | LN5 | LN6 | LN7 | LN8 | LN1 | LN2 | LN4 | |||||||||||||||||||
| LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | LogR | BAF | |
| chr1 | T2 | U | U | U | U | T1 | U | U | ||||||||||||||||||||||||||||
| chr2 | T2 | T2 | U | T1 | T1 | T1 | T1 | U | ||||||||||||||||||||||||||||
| chr3 | T2 | T2 | T1 | U | U | U | ||||||||||||||||||||||||||||||
| chr4 | T2 | T2 | T2 | T2 | T2 | T2 | T1 | |||||||||||||||||||||||||||||
| chr5 | T2 | T2 | T1 | T1 | T1 | T1 | U | U | U | |||||||||||||||||||||||||||
| chr6 | T2 | U | U | U | U | U | ||||||||||||||||||||||||||||||
| chr7 | T1 | T1 | T1 | T1 | T1 | U | ||||||||||||||||||||||||||||||
| chr8 | T1 | T1 | T1 | T1 | T2 | T2 | U | U | U | |||||||||||||||||||||||||||
| chr9 | T2 | U | U | U | U | |||||||||||||||||||||||||||||||
| chr10 | T2 | T2 | U | U | U | |||||||||||||||||||||||||||||||
| chr11 | T1 | T1 | U | U | T2 | T2 | U | |||||||||||||||||||||||||||||
| chr12 | T2 | T2 | T1 | T1 | T1 | T1 | U | U | U | U | T1 | T1 | ||||||||||||||||||||||||
| chr13 | T2 | T2 | U | U | ||||||||||||||||||||||||||||||||
| chr14 | U | U | U | U | ||||||||||||||||||||||||||||||||
| chr15 | T2 | U | U | U | U | U | U | |||||||||||||||||||||||||||||
| chr16 | T2 | U | U | U | U | |||||||||||||||||||||||||||||||
| chr17 | T2 | T1 | T1 | T1 | T1 | U | U | U | U | U | U | |||||||||||||||||||||||||
| chr18 | T2 | T2 | T1 | T1 | T1 | T1 | T1 | T1 | ||||||||||||||||||||||||||||
| chr19 | T2 | T1 | T1 | T1 | T1 | T1 | U | U | U | U | ||||||||||||||||||||||||||
| chr20 | T1 | T1 | T1 | T1 | T2 | T2 | ||||||||||||||||||||||||||||||
| chr21 | U | U | U | |||||||||||||||||||||||||||||||||
| chr22 | T1 | T1 | T1 | T1 | ||||||||||||||||||||||||||||||||
Lastly, the unique sub-clones identified by PyClone can be utilized within the cellular prevalence plots (Figure 2) to assess origin. For example, in CP8, subclone 2 (orange) is unique to T2 and is found in LN3 indicating the region of origin for the metastasis is T2. Another example is CP15, where subclone 19 (light green) found only in T2 is in LN2 but not LN3 indicating the origin for LN2 is T2.
These results suggest that LN metastasis can originate from one or more primary tumor regions. With some exceptions noted above, the origins of LN metastases are in general agreement across all methods tested including variants, CNV, and PyClone sub-clonality profiling. The results support a model of polyclonal metastasis (Figure 3) the provenance of which is diverse, spanning one or more distinct tumor regions.
Figure 3. Model of lymph node metastases with multiple waves of seeding of the LN’s by sub-clones from the primary tumor over time.

In panels A–D, subclones A–D move from the primary tumor to one or more lymph nodes. Panel A shows a single clone populating multiple nodes. Panel B–D indicate subsequent subclones each successively populating different subsets of lymph nodes.
LN metastasis contain early and late sub-clones
Phylogenies were derived using Citup and illustrate the sequence of the origins of each sub-clone in each patient (Figure 2, right). Phylogenies reveal examples of LN metastasis that are early events such as sub-clone 13 in CP14 which is found in all samples as well as late events such as sub-clone 4 in CP14 which is a late sub-clone and is found in only one tumor sample (Figure 2a). Sub-clones 7, 8, and 12 of CP17, as another example, are present in all four samples, whereas sub-clones 1 and 3 are present in primary samples T1 and T2, respectively. Sub-clones that occur late in the phylogenies are generally found in only a single tumor sample and at most a single LN; examples include sub-clone 15 in CP15 which is found only in primary tumor sample T1, and sub-clone 13 in CP19 which is found in primary tumor sample T1 at a low cellular prevalence and then only found in LN2. The most common driver genes are generally found in sub-clones in all samples from a tumor and are early in the phylogeny. For example sub-clone 12, 21 (brown) in CP08 contains a TP53 mutation and is in all samples and the green sub-clone in CP15 which contains APC, KRAS and TP53 mutations and is in all samples (Figure 2). Less common drivers are more often found in sub-clones found only in a single region of the primary tumor, for example, the sub-clone containing the SMAD4 mutation in CP15 is only in T2. The drivers listed in Figure 2 are from the study by Giannakis and colleagues which identified multiple recurrent drivers across 161 tumors. We find some examples of apparent sub-clones that do not contain these known drivers and yet are metastatic. When the composition of these sub-clones is assessed, putative drivers are identified. For example, in the pink cluster (cluster 10) in CP15 there are 2 loss of function mutations and 56 missense mutations, several of which could be candidate drivers. Another example of a tumor with sub-clones that metastasize but do not contain a known driver is sub-clone 6 (red) in CP8. This sub-clone does contain a loss of function mutation (SLC46A3) and 57 missense mutations. In CP11, there is also a sub-clone (14, purple), which is metastatic but does not have a labeled driver in Figure 2, however, it does have 16 loss of function mutations and many missense mutations which could be drivers.
Multiple metastatic events seed lymph nodes over time
We present a model of LN metastasis explaining the relationship between tumor sub-clones within the primary tumor and sub-clones found in the LN’s (Figure 3). As can be seen from the data above, each site in the primary tumor is polyclonal and each LN is polyclonal. We have shown evidence supporting polyclonal LN’s by assessing unique mutations in the primary tumor and noting that in 3 tumors, we find mutations unique to more than one geographic region of the tumor in the individual LN’s. This, in addition to the PyClone data defining multiple tumor sub-clones in each tumor and LN supports a model of polyclonal LN metastasis arising from multiple individual metastatic events from the primary tumor over time. Additional support for this model is found in the phylogenies showing that sub-clones that were early, mid, and late in tumor evolution are all found in the LN’s, again, suggesting multiple clones leaving the primary tumor at different times.
Discussion
We have explored the genetics of primary colorectal cancer and matched lymph node metastases of seven tumors to uncover temporal and spatial patterns of metastatic spread. Using current informatics tools for inferring sub-clonality and phylogeny from bulk tumor samples, as well as comparative studies of mutations and CNV in multiple regions of primary tumors and their matched metastasis, we have found that each lymph node contains multiple sub-clones derived from the primary tumor. Additionally, we showed that there is substantial heterogeneity of clones present in different lymph nodes from any individual patient and within a single primary tumor. Together, these results support a model of metastasis with shedding of tumor, either as single cells or groups of cells, over time as shown in Figure 3. This model is similar to the one by Cheung and colleagues regarding seeding of distant metastasis which proposes that polyclonal metastases are generated through polyclonal clusters (4). Our model differs in that it describes LN metastases and we hypothesize that seeding occurs in multiple waves over time because we find multiple tumors with evidence of seeding from multiple different geographic locations within the primary tumor. It is possible that these groups of seeding cells could be composed of single or multiple sub-clones. We do not find evidence of a superior “metastastic” sub-clone that is alone responsible for metastasis in any lymph node tested.
This model supported by our data may bear profound clinical consequences. Although lymph node metastases remain a strong prognostic factors in colorectal cancer, it is clear from our work that a particular positive node may be quite different in its composition from another positive lymph node from a given patient. Thus, this work suggests at least two important future lines of investigation: first to couple clonal heterogeneity to the ability to understand the phenotypic consequences of individual clones and second to repeat this analyses on larger cohorts with known survival outcomes to better understand how tumor and lymph node heterogeneity may be employed to prioritize actionable clones.
Very few studies have been performed assessing molecular genetics and mutational heterogeneity in LN metastases and none have assessed regional origin of metastasis. McPherson and colleagues studied clonal relationships in multiple samples from seven metastatic high-grade serous ovarian cancer and found that the majority of metastatic sites in each patient were clonally pure and polyclonal metastases were found in only two patients (10). Comparison between our study and the McPherson study raises the possibility that the mechanisms of metastases may be tumor (CRC vs ovarian cancer) or site (LN vs omentum) specific.
The data presented here offer support for polyclonal LN metastasis due to multiple waves of seeding of the LN’s from the primary tumor over time. It is important to understand the relationship between tumor sub-clones and thus intra-tumor heterogeneity and metastasis in order to understand response to therapy. None of the patients whose tumors were studied here had previously received chemotherapy revealing that colorectal cancer metastases are highly complex even prior to treatment. How LN metastases relate to distant metastases is yet to be determined.
Multiple recent studies, including this one, have called into question the details of the adenoma to carcinoma pathway: namely clonal sweeps after APC, KRAS, and then TP53 mutations as proposed in 1990 (22). For example, CP14 does not follow any simple model of clonal evolution (figure 2A). In a recent study by Thirlwell and colleagues, they looked at multiple adenomas and sporadic cancers and assessed the mutation status of certain drivers in individual crypts in the neoplasms and found that the lesions were polyclonal and that in constructed phylogenies, there was not a clear obligate order of genetic events (23). Additionally, the recent publication by Gausachs and colleagues performed deep nanofluidic PRC in multiple tumors and found that different crypts from the same tumor contained different APC and KRAS mutations and even found crypts that did not contain APC mutations calling into question a clonal origin of colorectal cancer (24). Query in cBioPortal of the 212 TCGA tumors with sequencing and CNV data for mutations and copy number alterations in APC, KRAS, and TP53 reveals that although 91% of colorectal cancers have an alteration in at least one of these genes, only 16.5% harbor alterations in all 3 (25). We see sub-clones that do not include previously validated drivers, perhaps providing evidence for previously un-recognized drivers, or evidence for other driver events (e.g. copy number, epigenetic modification, post-translational protein modification) that are not accounted for in a more direct manner in bulk sequencing or SNP array profiling. Explanations for this also include inaccuracies in the algorithm. As noted in the results, we do identify potential drivers in these sub-clones. One drawback of PyClone is that the algorithm does not include CNV events in defining sub-clones which could be driver events.
Supplementary Material
Statement of Translational Relevance.
The likelihood of cure is substantially decreased in colorectal cancer patients with metastatic lesions. Regional lymph node metastasis is seen in about 35% of patients with colorectal cancer, but genetic factors leading to colorectal metastasis to lymph nodes are not well understood. Intra-tumor genetic heterogeneity and sub-clonal variants are present in primary cancers but their relationship to metastases is largely uncertain. We utilized multiple in-depth studies in seven primary colorectal cancers and their matched lymph node metastases to define the relationships among primary cancers and metastases. We found that the relationships between individual primary tumors and associated metastatic lesions are complex; the metastases are polyclonal and can originate from more than one region of the primary tumor. The data support a model of lymph node metastases that likely reflects multiple waves of tumor seeding of a given lymph node over time. Our findings on the molecular evolution of metastasis in colorectal cancer patients have implications for the origins and behavior of metastatic cancer cell populations and treatment strategies.
Acknowledgments
Grant Support: KMH: 5P50CA130810, American Surgical Association Foundation Fellowship; 5P30A046592; John S. and Suzanne C. Munn Cancer Research Fund, K08CA190645
ERF: R01CA082223 and P30CA046592
Dr Hardiman is funded by 5P50CA130810, American Surgical Association Foundation Fellowship; 5P30A046592; John S. and Suzanne C. Munn Cancer Research Fund, and K08CA190645 from the National Cancer Institute. Dr Fearon is funded by R01CA082223 and P30CA046592. The authors would like to acknowledge Jun Li and Sofia Merajver for thoughtful criticism as well as Chris Gates, Bob Lyons, Jeanne Geskes, Melissa Coon, Chris Krebs, Peter Graf, Sam Dougaparsad, Patricial Beals, and Zhiwei Che. We’d also like to thank the authors of the informatics tools used in this study—PyClone, Citup, and TAPS--for their work.
Abbreviations
- AF
allele frequencies
- CRC
Colorectal cancer
- DP
read depth
- LN
lymph node
- TCGA
The Cancer Genome Atlas
Footnotes
Conflict of Interest: The authors declare no potential conflicts of interest
References
- 1.Edwards BK, Noone AM, Mariotto AB, Simard EP, Boscoe FP, Henley SJ, et al. Annual Report to the Nation on the status of cancer, 1975–2010, featuring prevalence of comorbidity and impact on survival among persons with lung, colorectal, breast, or prostate cancer. Cancer. 2014;120(9):1290–314. doi: 10.1002/cncr.28509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hardiman KM, Ulintz PJ, Kuick RD, Hovelson DH, Gates CM, Bhasi A, et al. Intra-tumor genetic heterogeneity in rectal cancer. Lab Invest. 2016;96(1):4–15. doi: 10.1038/labinvest.2015.131. Epub 2015/11/17. labinvest2015131 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Sottoriva A, Kang H, Ma Z, Graham TA, Salomon MP, Zhao J, et al. A Big Bang model of human colorectal tumor growth. Nat Genet. 2015;47(3):209–16. doi: 10.1038/ng.3214ng.3214. Epub 2015/02/11. [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cheung KJ, Ewald AJ. A collective route to metastasis: Seeding by tumor cell clusters. Science. 2016;352(6282):167–9. doi: 10.1126/science.aaf6546352/6282/167. Epub 2016/04/29. [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Baldus SE, Schaefer KL, Engers R, Hartleb D, Stoecklein NH, Gabbert HE. Prevalence and heterogeneity of KRAS, BRAF, and PIK3CA mutations in primary colorectal adenocarcinomas and their corresponding metastases. Clin Cancer Res. 2010;16(3):790–9. doi: 10.1158/1078-0432.CCR-09-24461078-0432.CCR-09-2446. Epub 2010/01/28. [pii] [DOI] [PubMed] [Google Scholar]
- 6.Brannon AR, Vakiani E, Sylvester BE, Scott SN, McDermott G, Shah RH, et al. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions. Genome Biol. 2014;15(8):454. doi: 10.1186/s13059-014-0454-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kim TM, Jung SH, An CH, Lee SH, Baek IP, Kim MS, et al. Subclonal Genomic Architectures of Primary and Metastatic Colorectal Cancer Based on Intratumoral Genetic Heterogeneity. Clin Cancer Res. 2015;21(19):4461–72. doi: 10.1158/1078-0432.CCR-14-2413. [DOI] [PubMed] [Google Scholar]
- 8.Sebagh M, Allard MA, Bosselut N, Dao M, Vibert E, Lewin M, et al. Evidence of intermetastatic heterogeneity for pathological response and genetic mutations within colorectal liver metastases following preoperative chemotherapy. Oncotarget. 2016;7(16):21591–600. doi: 10.18632/oncotarget.7809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cheung KJ, Padmanaban V, Silvestri V, Schipper K, Cohen JD, Fairchild AN, et al. Polyclonal breast cancer metastases arise from collective dissemination of keratin 14-expressing tumor cell clusters. Proc Natl Acad Sci U S A. 2016;113(7):E854–63. doi: 10.1073/pnas.1508541113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.McPherson A, Roth A, Laks E, Masud T, Bashashati A, Zhang AW, et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat Genet. 2016;48(7):758–67. doi: 10.1038/ng.3573ng.3573. Epub 2016/05/18. [pii] [DOI] [PubMed] [Google Scholar]
- 11.Frank DN. BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing. BMC Bioinformatics. 2009;10:362. doi: 10.1186/1471-2105-10-362. Epub 2009/10/31. 1471-2105-10-362 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. doi: 10.1093/bioinformatics/btp324. Epub 2009/05/20. btp324 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hagemann IS, Devarakonda S, Lockwood CM, Spencer DH, Guebert K, Bredemeyer AJ, et al. Clinical next-generation sequencing in patients with non-small cell lung cancer. Cancer. 2015;121(4):631–9. doi: 10.1002/cncr.29089. Epub 2014/10/28. [DOI] [PubMed] [Google Scholar]
- 14.Giannakis M, Mu XJ, Shukla SA, Qian ZR, Cohen O, Nishihara R, et al. Genomic Correlates of Immune-Cell Infiltrates in Colorectal Carcinoma. Cell Rep. 2016;17(4):1206. doi: 10.1016/j.celrep.2016.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mayrhofer M, DiLorenzo S, Isaksson A. Patchwork: allele-specific copy number analysis of whole-genome sequenced tumor tissue. Genome Biol. 2013;14(3):R24. doi: 10.1186/gb-2013-14-3-r24gb-2013-14-3-r24. Epub 2013/03/28. [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Roth A, Khattra J, Yap D, Wan A, Laks E, Biele J, et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods. 2014;11(4):396–8. doi: 10.1038/nmeth.2883nmeth.2883. Epub 2014/03/19. [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lamy P, Nordentoft I, Birkenkamp-Demtroder K, Thomsen MB, Villesen P, Vang S, et al. Paired Exome Analysis Reveals Clonal Evolution and Potential Therapeutic Targets in Urothelial Carcinoma. Cancer Res. 2016;76(19):5894–906. doi: 10.1158/0008-5472.CAN-16-0436. [DOI] [PubMed] [Google Scholar]
- 18.Findlay JM, Castro-Giner F, Makino S, Rayner E, Kartsonaki C, Cross W, et al. Differential clonal evolution in oesophageal cancers in response to neo-adjuvant chemotherapy. Nat Commun. 2016;7:11111. doi: 10.1038/ncomms11111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Malikic S, McPherson AW, Donmez N, Sahinalp CS. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics. 2015;31(9):1349–56. doi: 10.1093/bioinformatics/btv003btv003. Epub 2015/01/09. [DOI] [PubMed] [Google Scholar]
- 20.Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10. doi: 10.1093/nar/30.1.207. Epub 2001/12/26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu X, Wu C, Li C, Boerwinkle E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum Mutat. 2016;37(3):235–41. doi: 10.1002/humu.22932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990;61(5):759–67. doi: 10.1016/0092-8674(90)90186-i. Epub 1990/06/010092-8674(90)90186-I[pii] [DOI] [PubMed] [Google Scholar]
- 23.Thirlwell C, Will OC, Domingo E, Graham TA, McDonald SA, Oukrif D, et al. Clonality assessment and clonal ordering of individual neoplastic crypts shows polyclonality of colorectal adenomas. Gastroenterology. 2010;138(4):1441–54. 54e1–7. doi: 10.1053/j.gastro.2010.01.033. [DOI] [PubMed] [Google Scholar]
- 24.Gausachs M, Borras E, Chang K, Gonzalez S, Azuara D, Delgado Amador A, et al. Mutational Heterogeneity in APC and KRAS Arises at the Crypt Level and Leads to Polyclonality in Early Colorectal Tumorigenesis. Clin Cancer Res. 2017;23(19):5936–47. doi: 10.1158/1078-0432.CCR-17-0821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.“http://www.cbioportal.org/index.do?session_id=5a01d0ae498e5df2e297d7ad&show_samples=false&” Nov 7, 2017.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



