Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2020 Feb 15;146(9):2488–2497. doi: 10.1002/ijc.32910

Quantification of multicellular colonization in tumor metastasis using exome‐sequencing data

Jo Nishino 1, Shuichi Watanabe 2, Fuyuki Miya 1,3, Takashi Kamatani 1,4, Toshitaka Sugawara 2, Keith A Boroevich 3, Tatsuhiko Tsunoda 1,3,4,5,
PMCID: PMC7079087  PMID: 32020592

Abstract

Metastasis is a major cause of cancer‐related mortality, and it is essential to understand how metastasis occurs in order to overcome it. One relevant question is the origin of a metastatic tumor cell population. Although the hypothesis of a single‐cell origin for metastasis from a primary tumor has long been prevalent, several recent studies using mouse models have supported a multicellular origin of metastasis. Human bulk whole‐exome sequencing (WES) studies also have demonstrated a multiple “clonal” origin of metastasis, with different mutational compositions. Specifically, there has not yet been strong research to determine how many founder cells colonize a metastatic tumor. To address this question, under the metastatic model of “single bottleneck followed by rapid growth,” we developed a method to quantify the “founder cell population size” in a metastasis using paired WES data from primary and metachronous metastatic tumors. Simulation studies demonstrated the proposed method gives unbiased results with sufficient accuracy in the range of realistic settings. Applying the proposed method to real WES data from four colorectal cancer patients, all samples supported a multicellular origin of metastasis and the founder size was quantified, ranging from 3 to 17 cells. Such a wide‐range of founder sizes estimated by the proposed method suggests that there are large variations in genetic similarity between primary and metastatic tumors in the same subjects, which may explain the observed (dis)similarity of drug responses between tumors.

Keywords: metastasis, multicellular colonization, founder population size, exome sequencing

Short abstract

What's new?

The founder cell population size of a metastatic tumor is one of the most important parameters for metastasis dynamics. However, multicellular colonization has not yet been quantified in human metastatic tumors. Using the ‘single bottleneck followed by rapid growth’ metastatic model and whole‐exome sequencing data from primary and metastatic tumors in colorectal cancer patients, this quantification method supports the multi‐cellular origin of metastasis, with founder population sizes ranging from 3 to 17 cells. The wide‐ranging population sizes suggest large variations in genetic similarity between primary and metastatic tumors within individual patients, possibly explaining variations in drug responses between the tumors.

Introduction

Metastasis is the main cause of cancer‐related death. Although it is essential to understand its mechanisms and the dynamics of distant site colonization in order to properly treat it, until recently little has been known. The founder cell population size of a metastatic tumor is one of the most important parameters for metastasis dynamics, which involves the change of mutational compositions from the primary to metastatic tumors (Fig. 1). The drastic genetic changes in the metastatic tumor from the primary one, brought by the limited cell migration, that is, “bottleneck effect,” might result in a difference in drug response between both tumors in the same patient.

Figure 1.

Figure 1

A schematic view of the proposed methodology. (a) Exome data from paired primary and metastatic tumors, and normal tissue. (b) Input of the method. (c) Illustration of basic premise for the estimation of founder sizes by computer simulations. Low correlation of observed VAFs in exome between the primary and the metastatic tumors in the small founder size, Nb = 2 (left). High correlation of observed VAFs between the primary and metastatic tumors in the large founder size, Nb = 50 (right).

Although the hypothesis that a metastatic tumor originates from a single tumor cell has been long prevalent,1, 2, 3 several recent studies using mouse models of cancer have demonstrated multicellular seeding.4, 5, 6 In humans, bulk whole‐exome sequencing (WES) studies of metastatic tumors, often including primary tumors from the same individuals, demonstrated metastases to have originated from multiple clones, where a “clone” was a cluster of tumor cells belonging to the same phylogenetic clade estimated by the variant allele frequency information.7, 8 While founder “cells,” but not “clones,” in the metastatic tumor have another clear meaning in understanding metastatic dynamics, the quantification of multicellular colonization has not been attempted so far in human metastatic tumors.

Here, we model metastatic colonization as “single bottleneck followed by rapid growth” for tumor cell populations and propose a method to quantify the founder cell population size of a metastatic tumor using a paired WES data from the primary and metachronous metastatic tumors. This method uses the outputs from commonly used mutation callers, that is, variant allele frequencies (mutant allele counts and sequence depths), and quickly estimates the founder size unbiasedly in a realistic range. We applied our proposed method to the high‐depth WES data from a study of four colorectal cancer (CRC) patients.

Methods

Overview for quantifying founder cell population size in metastasis

We use paired WES data of a primary and metachronous metastatic tumors together with the data from normal tissue (Fig. 1 a). The input file is composed of sequence depths, D1 and D2, and the mutation read counts, m1 and m2 for each called mutation in the primary and metastatic tumors, respectively (Table 1 and Fig. 1 b; See Supporting Information Appendix and Supporting Information Fig. S1 for more details of the input file). When the founder population size is large, the variant allele frequencies (VAFs) for called mutations in the metastatic tumor show high similarity to those in the primary tumor (Fig. 1 c, right). Conversely, when the number of founder cells is small, the VAFs in the primary and metastatic tumors are not so correlated (Fig. 1 c, left). In this case, due to the severe “bottleneck effect,” many variants can become extinct or have significantly higher VAFs in the metastatic tumor.

Table 1.

Notations in the model and the simulation study

Notation Description
Nb Founder cell population size, to be estimated.
R Number of mutations used for estimation of Nb.
m1, m2 Mutation read counts for the primary (m1) and metastatic tumors (m2) at a site.
m1(min) Minimum mutation read count in WES data from the primary tumor. For estimating Nb, we use only the sites with m1(min) or more mutant reads.
D1, D2 Sequence depths for the primary (D1) and metastatic tumors (D2) at a site.
p1, p2 Population VAFs in the primary (p1) and metastatic tumor (p2) at a site.
f(p1) Probability distribution of p1.
Mb Number of mutant cells among Nb founders.
γ1, γ2 Tumor purity in the WES samples from the primary (γ1) and metastatic tumors (γ2).
Additional notations in the simulation study
K Number of clonal mutations inherited from the initial primary tumor.
μ Mutation rate per tumor‐cell division in the primary tumor.
N1 Cell population size in the final primary tumor.
D¯
Mean sequence depth in the primary and metastatic tumor.
γ Tumor purity in the WES samples from the primary and metastatic tumors (γ1 = γ2).
Simulation for selection in the primary tumor
b Birth rate of cells in the primary tumor.
d Death rate of cells in the primary tumor.
Nsub. occ. Primary tumor size at which one advantageous mutation occurs.
a Coefficient for birth rate. Birth rate of a cell with k non‐neutral mutations is (1 + a)k.
Simulation for selective colonization
psmet
Proportion of mutations with advantage in metastatic colonization
smet Coefficient for ability of metastatic colonization (smet > 0). Ability of metastatic colonization of a cell with l advantageous mutations is (1 + smet)l.
Simulation for stochastic evolution of metastatic tumor
bmet Birth rate of cells in the metastatic tumor.
dmet Death rate of cells in the metastatic tumor.
Nmet Cell population size in the final metastatic tumor.

Model and estimation methods

Consider a diploid tumor cell population in a primary tumor. One somatic variant in the population has the VAF, p1, or the cancer cell fraction (CCF), 2p1 (see Table 1 for notations). The models assume no recurrence mutation at the same sites and therefore the VAF is at most 0.5, p1 ≤ 0.5. The VAF follows some distribution, p1~f(p1), as is properly assumed in the present implementation assuming a ‘neutral' evolution with a high cell birth rate for tumor population9, 10 (see Implementation section in Supporting Information Appendix; and see Results section for the robustness of the assumptions). In the bulk‐WES of the primary tumor, the sampled mutation read count, m1, at the variant site with sequence depth, D1, follows a binomial distribution with parameters, D1 and p1,

m1Binm1D1p1.

Metastatic colonization is modeled as follows. A single bottleneck occurs during colonization and is followed by rapid growth, so that the VAF in the full‐blown metastatic tumor is the same as that in the metastatic founder. We perform WES on samples from the full‐blown metastatic tumor. Then, in the WES of the metastatic tumor, the sampled mutation read count, m2, at the variant site with sequence depth, D2, is generated by a composite process of metastatic colonization and exome sequencing as follows:

m2Mb=0NbBinMbNb2p1Bin(m2D2p2),

where the Nb, Mb and p2 are the number of founder cells (founder population size) in metastatic colonization, the number of mutant cells in the Nb founder cells, and the VAF in the metastatic tumor, respectively. In the above distribution for m2, the Nb founder cells are assumed to be randomly selected from the primary tumor and colonize a metastatic site. Thus, the Mb mutant cells in the metastatic site follows a binomial distribution with parameters Nb and 2p1 (mutant cell fraction), where p2 is given by p2=Mb2Nb. In the bulk‐WES of the metastatic tumor, the sampled mutation read count, m2, follows a binomial distribution with parameter D2 and p2.

Taken together, the probability of observing m1 and m2 mutations in the primary and metastatic exome with depths D1 and D2, respectively, is given by

p1=01fp1Binm1D1p1Mb=0NbBinMbNb2p1Bin(m2D2Mb2Nb)dp1.

For quality control, we use only the sites with m1(min) (>0) or more mutant reads in the primary tumor. Note that, in the metastatic tumor, all mutations called in the primary tumor are tracked in order to use greater information on VAF change from the primary to the metastatic tumor. Finally, the probability of observing m1(≥m1(min)) and m2(≥0) mutation reads in the primary and metastatic tumors, respectively, is expressed as

p1=01fp1Binm1D1p1Mb=0NbBinMbNb2p1Bin(m2D2Mb2Nb)dp1m1=m1minD1ip1=01fp1Binm1D1p1dp1,

where m1 is possible read counts in the primary tumor. Explicitly, let p1i, D1i, m1i, D2i and m2i denote p1, D1, m1, D2 and m2 for the specific ith variant site, respectively. Assuming independencies among all R variants, each with m1i(≥m1(min)) mutation reads in the primary tumor, the likelihood of the founder size, Nb, is given by

LikelihoodNb=iRp1i=01fp1iBinm1iD1i,p1iMb=0NbBinMbNb,2p1iBinm2iD2i,Mb2Nbdp1im1i=m1minD1ip1i=01fp1iBinm1iD1i,p1idp1i. (1)

By maximizing the likelihood (1), we obtain the maximum likelihood estimate (MLE) of Nb (for implementation details, see Supporting Information Appendix). In reality, the independence assumption among variants does not hold since the unit of the tumor evolution is the cell, and mutations in the same cell evolve and colonize a metastatic site together. The effect of the independence assumption on the estimation of Nb is investigated below using simulations.

The tumor purities, the fraction of cancer cells, in the primary (γ1) and metastatic tumor tissue samples (γ2), are incorporated into the model simply by replacing p1i in the term Bin(m1i| D1i, p1i) and Bin(m1i| D1i, p1i) with γ1p1i, and Mb2Nb in the term Binm2iD2i,Mb2Nb with γ2Mb2Nb, respectively.

Data availability

The data that support the findings of our study were derived from Supporting Information Tables S5S12 in the reference 8. The modified data will be made available upon reasonable request. The code to estimate the founder size used for the study is available from the github repository: https://github.com/jonishino/MetaCellNum.

Results

Validation of our proposed method: pure birth tumor evolution model

We assessed our proposed method using simulated data, generated by a “pure birth model” for tumor evolution (see Methods and Supporting Information Appendix for details; see also Table 1 for notations). Briefly, a single tumor cell with K mutations generates two daughter cells, each with average μ new mutations, and cell divisions repeat until the population has grown to the final primary tumor size, N1. Nb cells are randomly sampled from the N1 cells to make up a metastatic tumor. Note that the above K mutations result in clonal mutations in the primary and metastatic tumors. Exome samples in the primary and metastatic tumor have mean depth D¯ and purity γ. Our proposed method was applied to sites with m1(min) mutant reads in the primary tumor. We ran 100 simulations for each parameter set. Mouse models have suggested that metastasis occurs via colonization of one circulating tumor cell (CTC) cluster rather than serial arrivals of CTC clusters (or single CTCs) and that the most CTC clusters contain between 2 and 20 tumor cells, with median of 6.6 We mainly focused on this range of founder sizes in the simulations.

In Figures 2 a2 d, all simulations were performed under the conditions of N1 = 100, 000, μ = 2.5. First, the effect of varying mean depth, D¯, on the estimation of Nb, was investigated under K = 50, γ = 1 and m1(min) = 2 (Fig. 2 a). The number of variants generated in the exome samples in the simulations were realistic, ranging from 1 to around 500 (Supporting Information Fig. S2B). In cases of Nb = 2, 5, 10 and 20, when D¯50, the medians of estimates were very close to the true values, that is, the estimator is median‐unbiased, and the estimation accuracy is good. For example, when the depth was 50, the medians of the estimates (and interquartile ranges; IQRs) were 5.0 (4.0, 6.0), 10.0 (8.0, 13.0) and 20.0 (15.0, 28.25) for the true Nb = 5, 10 and 20, respectively. The unbiasedness with D¯50 held for larger Nb (for Nb = 1–100, see Supporting Information Fig. S2 a). The estimation accuracy increased as sequence depth increased. Even when the depth was D¯=30, the precision and accuracy were acceptable, and the medians of estimates (IQRs) were 5.0 (4.0, 6.0), 12.0 (8.0, 18.25) and 21.0 (14.0, 35.0) for the true Nb = 5, 10 and 20, respectively. Under D¯=30, and particularly for larger Nb≥30, Nb was biasedly estimated and a reliable estimation was difficult to obtain (for Nb = 1–100, see Supporting Information Fig. S2 a). Note that, for all depth settings, the relative estimation errors were better for smaller Nb, as you can see from the smaller log‐scaled boxplots of the estimated Nb in Figure 2 a (see also Supporting Information Fig. S2 a).

Figure 2.

Figure 2

Valid quantification of founder size, Nb, confirmed by simulations. All simulations used “pure birth model” with the primary tumor population size, N1 = 100,000, and mutation rate per cell division per exome, μ = 2.5. For each parameter set, number of simulations is 100. The lower and upper hinges correspond to the first and third quartiles. Boxplots show medians, 25th and 75th percentiles (hinges). The upper/lower whiskers extend to the largest/smallest value at most 1.5 times of IQR from the upper/lower hinges. (a) Varying mean sequencing depth, D¯ for K = 50, γ = 1 and m1(min) = 2. (b) Varying tumor purity, γ, for K = 50, D¯ = 100 and m1(min) = 2. (c) Varying number of clonal mutations, K, for D¯ = 100, γ = 1 and m1(min) = 2. (d) Varying minimum number of mutation reads, m1(min), for K = 50, D¯ = 100 and γ = 1. (Variants with m1(min) or more mutation reads were used.)

Next, the effects of the tumor purity, γ, on the estimation were investigated under K = 50, D¯ = 100 and m1(min) = 2 (Fig. 2 b). When γ≥50%, the estimation was median‐unbiased and the accuracy was acceptable. For example, when γ = 50%, the medians of the estimates (IQRs) were 5.0 (4.0, 6.0), 10.0 (9.0, 13.0) and 22.0 (17.0, 33.5) for the true Nb = 5, 10 and 20, respectively. In conjunction with the result of Figure 2 a, defining the “effective sequence depth” as the depth multiplied by tumor purity, the proposed method gave unbiased results with acceptable accuracy when the effective sequence depth was 50. In the case of less purity, and large founder size, for example, γ ≤ 40% and Nb≥30, a reliable estimation was difficult obtain (for Nb = 1–100, see Supporting Information Fig. S3).

In the algorithm for Nb estimation, the proportion of clonal mutations in the primary tumors is fixed at 10% (Implementation section in Supporting Information Appendix). Practically, however, clonal mutations vary among tumors. Thus, the impact of the number of clonal mutations was investigated under D¯=100, γ = 1 and m1(min) = 2 (Fig. 2 c). The number of clonal mutations in the population, K, had no effect on both the unbiasedness and the accuracy of estimation of Nb. The same is true for larger Nb with various number of variants in WES samples (for Nb = 1–100, see Supporting Information Fig. S4).

For the input of the proposed method, we use variants with m1(min) or more mutation reads in the primary tumor. Then, the effects of various values of m1(min) on the estimation of Nb were investigated under K = 50, D¯ = 100 and γ = 1 (Fig. 2 d). The estimation results for up to Nb = 100 were also assessed (Supporting Information Fig. S5). In the case of m1(min) = 5, the estimation accuracy was worse than those for m1(min) < 5. The decreased accuracy was not due to lower numbers of variants used for input (for the case of larger number of variants, see Supporting Information Fig. S6 replacing μ = 2.5 with μ = 12.5). For the case of including singletons in the input (m1(min) = 1), a small upward bias can occur (for more clear bias in the large Nb, see Supporting Information Figure S5). Thus, we recommend the criteria of “at least 2 or 3 mutation read counts,” m1(min) = 2 or 3, for the input of the proposed method.

The simulations above were performed mainly under the conditions of the primary tumor size, N1 = 100, 000 and mutation rate, μ = 2.5. When values of N1 ranging from 1,000 to 300,000 were used under D¯ = 100, μ = 2.5, K = 50, γ = 1, and m1(min) = 2, the behavior of estimates were generally the same as that under N1 = 100, 000 (Supporting Information Fig. S7). When values of μ ranging from 0.5 to 10 were used under D¯ = 100, N1 = 100, 000 , K = 50, γ = 1 and m1(min) = 2, the behavior of estimates were generally the same as that under μ = 2.5 although the estimation accuracy was a little lower as the mutation rate is small (Supporting Information Fig. S8).

Robustness for cell death and selection in the primary tumor evolution

So far, in the development of primary tumor, it was assumed there was no cell death and no difference in cell division rates. The violation of the assumptions might make estimation of Nb difficult, since VAF distribution, f(p1), can potentially deviate from the postulated distribution under “neutral” evolution with high cell birth rate of tumor population. Here, we investigated the consequences of this violation, keeping D¯=100 and the all other settings as in Figures 2 a, that is, μ = 2.5, K = 50, γ = 1 and m1(min) = 2, N1 = 100, 000. We ran 100 simulations for each parameter set.

First, to investigate the effect of “cell death,” a death rate, d and a birth rate, b, per unit time were introduced. Limiting the case to d < b, which means growth of the tumor population, various values, d = 0.01, 0.1, 0.2, 0.5, 0.7, 0.9 and 0.99 against unit birth rate, b = 1, were assumed (the ratio of d to b define the evolutional system). High death rates, that is, d > 0.7, might be more realistic.11, 12 Exceptionally, N1 = 10,000 was used for d = 0.99 due to computer capacity limitations. For all death rates, the estimator for the founder size, Nb, is median‐unbiased and the estimation accuracy is sufficient, as with the case of no death (d = 0; Supporting Information Fig. S9 a). This is due to the fact that VAF distribution for d ≠ 0 does not vary greatly from that of the no death case (Supporting Information Figs. S9 bS9 d).

Second, we considered the case that one positively selective subclone in the primary tumor appears in the WES samples.13 One starting primary tumor cell with b = 1, d = 0.1 are assumed to evolve and at the time when the population reaches a particular population size, Nsub. occ., one selectively advantageous mutation occurs. The subclone with the advantageous mutation will have a larger birth rate, b = 2, 5 or 10. The values of Nsub. occ. are determined so that corresponding frequencies of the selective mutation are low (~2%), middle (~16%) and high (~30%) at the WES sampling point. Although distributions of VAF were shifted to the frequency of the selective mutation (Supporting Information Figs. S10S12 b,S12 c, S12 d), the estimator of the founder size, Nb, is median‐unbiased and the estimation accuracy is sufficient, as in the case of no selective subclone (Supporting Information Figs. S10S12 a).

Finally, we considered the case that many mutations with small effects are accumulated in the developmental process of the primary tumor. Neutral mutations and non‐neutral mutations occur with the probabilities of 0.3 and 0.7, respectively, which mimics synonymous and nonsynonymous mutation rates in exon regions.

The birth rate of a cell with k non‐neutral mutations is given by (1 + a)k, where a is a coefficient for birth rate and set as a = ± 0.01,  ± 0.05,  ± 0.1,  ± 0.15 and ± 0.2. A positive and negative value of a denotes advantageous and deleterious mutations, respectively. The death rate is always set to be one‐tenth of population mean of birth rates. Advantageous mutations, particularly when a≥0.1, shift VAF distribution toward intermediate frequency (Supporting Information Figs. S13 bS13 k). For deleterious or advantageous mutations with a ≤ 0.1, the estimator for the founder size, Nb, is median‐unbiased and the estimation accuracy is sufficient, as in the case of no selection, a = 0 (Supporting Information Fig. S13 a). When strong selection is observed (a≥0.15), the estimator is biased upwards and the accuracy is low. However, it is unrealistic that 70% of all mutations would have effects as strong as a≥0.15.

Robustness for selective colonization in the metastasis tumor

The proposed method assumes that the founders of metastatic tumor are randomly sampled from the primary tumor population during the process of metastatic colonization. Practically, however, some mutations might be preferentially selected. Here, we investigated the consequences of selective colonization (Supporting Information Fig. S14). The proportion psmet of all mutations in the primary tumor populations are advantageous for metastatic colonization and increase metastatic ability of cells multiplicatively by (1 + smet) (see Supporting Information Appendix for more details). The other parameters were set as in Figure 2 a, that is, μ = 2.5, K = 50, γ = 1 and m1(min) = 2, D¯=100, N1 = 100, 000. We ran 100 simulations for each parameter set.

When psmet=0.1%, which corresponds to around 530 selective mutations in the primary tumor population, the estimator for Nb is almost median‐unbiased and the estimation accuracy is sufficient as is the case of neutral ones, even for very strong selection, smet = 100 (Supporting Information Fig. S14 a). When psmet=1%, which corresponds to around 5,300 selective mutations in the primary tumor populations, and smet ≤ 5, the estimator for Nb is robust and comparable to the neutral case. However, when selection is strong (smet > 5), the estimator for Nb clearly underestimates the true founder size (Supporting Information Fig. S14 d). When substantial fraction of mutations is advantageous, psmet=10%, which corresponds to around 53,000 selective mutations in the primary tumor populations, the estimator for Nb is robust and comparable to the neutral case for smet ≤ 1. However, when smet > 1, the estimator for Nb again clearly underestimates the true founder size (Supporting Information Fig. S14 g). In summary, except for the cases of numerous advantageous mutations and/or a strong selection coefficient, the proposed method gave robust estimates.

Behavior of the proposed estimator in the stochastic evolution of the metastatic tumor

The proposed model attributes all genetic drift in metastatic tumor evolution to a “single bottleneck,” and the estimate of “the founder size” reflects those drifts. Nevertheless, genetic drift due to stochastic evolution of the metastatic tumor should occur during an early stage of development of the metastatic tumor. Here, we investigated the behavior of the proposed estimator in the birth and death processes for metastatic development (Supporting Information Fig. S15). As with the primary tumor, a metastatic founder population consisting of Nb cells are assumed to develop according to the birth and death process, with birth rate, bmet and, death rate, dmet, per unit time. Cell divisions repeat until the metastatic tumor has grown to the final primary tumor size, Nmet, at which exome sequencing is done. Limiting the case to dmet < bmet, which means growth of the tumor population, various values, dmet = 0.0–0.9 against unit birth rate, bmet = 1 and Nmet = 10,000 were assumed (the ratio of dmet to bmet defines the evolutional system). The other parameters were set as in Figure 2 a, that is, μ = 2.5, K = 50, γ = 1 and m1(min) = 2, D¯=100, N1 = 100,000. We ran 100 simulations for each parameter set.

In the case of dmet = 0 (i.e., the birth only process), although the estimation accuracy gets worse compared to the case of no stochastic metastatic evolution (Fig. 2 a), the estimator for the founder size, Nb, remains nearly median‐unbiased (Supporting Information Fig. S15). In addition to cell birth, if slight to moderate cell death is introduced (dmet ≤ 0.1), Nb remains nearly median‐unbiasedly estimated. For example, when dmet = 0.1, the medians of the estimates (IQRs) were 6.0 (5.0, 7.0), 9.0 (8.0, 11.0) and 16.0 (14.0, 18.0) for the true Nb = 5, 10 and 20, respectively. However, in the case of substantial cell death (dmet > 0.1), the estimator for Nb clearly underestimates the true founder size. For example, when dmet = 0.5, the medians of the estimates (IQRs) were 3.0 (2.0, 5.0), 7.0 (5.0, 8.0) and 11.0 (9.0, 12.25) for the true Nb = 5, 10 and 20, respectively. This is expected as substantial genetic drift due to cell death occur during early stage of development of metastatic tumor. Actually, when the death rate is large (e.g., dmet > 0.1), substantial founder cells drop out of the metastatic population in the early stage (~100‐cell stage; Supporting Information Fig. S16). Note that a small amount of founder cells drop out in sufficiently large populations and the proposed method gives the same results irrespective of the final primary tumor size for Nmet≥1,000 (data not shown).

Real data analysis for CRC patients

We used high‐depth WES data from a study of four CRC patients, which included at least one primary and metachronous metastatic tumor sample per patient.8 For each patient, the metastatic tumor(s) were sampled 1–3 years after the removal of the primary tumor(s). Information for called mutations of each tumor were derived from the article.8 As follows, we estimated the founder population size of metastatic (or lymph node) tumors using all pairs of primary and metastatic or lymph node tumors in each patient.

We applied quality‐controls to each tumor exome data. Only called mutations with a sequencing depth of 300 or less and no copy number aberrations were considered. The second criterion ensured diploid tumor sequences, which is assumed in the current model. Copy number aberrations were retrieved from the article.8 For mutation data meeting the criteria, we estimated tumor purities using PurBayes.14 Purity estimates ranged from 0.147 to 0.821 (Supporting Information Table S1). Next, we conducted quality‐controls on the exome data of each primary and metastatic (or lymph node) tumor pair. Mutation sites with at least two mutation reads in the primary tumor, that is, m1(min) = 2 and no mutation read in the normal sample were considered for further analysis. After quality‐control, the number of mutations ranged from 70 to 220 and an average sequence depth of 75.61–127.64 and 90.96–144.37 in the primary and metastatic tumor exomes, respectively (Supporting Information Table S2). The observed VAFs were somewhat correlated between the primary and metastatic tumors in each patient (Supporting Information Fig. S17, left).

For exome pairs with sufficient purity estimates (averaged purity estimate ≥0.3) that passed quality control, we estimated the founder population size of the metastatic (or lymph node) tumors using the proposed method (Fig. 3). Founder population sizes were estimated to be from 3 to 17 as MLEs, supporting the “multi‐cellular origin” of metastatic tumors. Although founder sizes varied from sample pair to sample pair, similar estimates were obtained for each patient, which supported our implicit assumption of well‐mixed tumors to some extent. All pairs of exomes from patient A01 gave consistently large founder sizes, ranging from 11 to 17. Exceptionally, for A04 the pairs “P1 and M1” and “P1 and M2” also had relatively larger founder estimates of 10, while other pairs from the patient ranged from 3 to 5. Although estimates of founder sizes for exome pairs with low purity estimates (averaged purity estimate <0.3) were also obtained (Supporting Information Fig. S18 for estimates using all exome pairs; for VAFs, Supporting Information Figure S17, right), those were not considered reliable because of the low purities.

Figure 3.

Figure 3

Estimated founder sizes (Nb) for the four colorectal cancer reported by Wei et al. (2017). Results using only diploid regions (excluding copy number aberrations) are shown. “P”, “M” and “L” means primary, metastatic and lymph node tumors. Circles with bars indicate maximum likelihood estimates of Nb and these 90% confidence intervals, based on 1,000 nonparametric bootstrap samples.

Sensitivity and validation analysis for CRC data

We conducted three types of sensitivity analysis and one validation analysis to examine the stability/validity of our main results (Fig. 3). First, to see the impact of copy number aberrations, we estimated the founder population size using mutations in the all WES data (summarized in Supporting Information Table S3) without limiting diploid region. The number of all mutations averaged ~20% higher than those limited to diploid regions (see “# of mutations” of Supporting Information Tables S2 and S3). The estimates using all WES data (Supporting Information Fig. S19) were consistent with the main results (Fig. 3) and the impact of a small portion of copy number aberrations was very small.

In the main results, only mutations with two or more than mutation reads in the primary tumor were considered, that is, m1(min) = 2. However, potential mutation sites supported with only a small number of mutation reads may be false positives. For example, the somatic mutation caller, Mutect,15 with the default settings, does not call sites supported by <5 mutation reads as true mutations at a sequence depth of 100 since the “tumor LOD scores” (log‐10 likelihood ratios of a model having mutations to no mutation model in the tumor population) fails to reach its default threshold of 6.3. To examine the impact of this uncertainty, we assumed that 5–20% of potential mutation sites supported by 2, 3 or 4 mutation reads in the primary tumor were calling errors (erroneous sites). In the case that the erroneous sites in the metastatic tumor were forced to have zero mutation read, the estimates did not differ greatly from the main results, although lower estimates were obtained as the error rates increased (Supporting Information Fig. S20). Next, the erroneous sites in the metastatic tumor were assumed to have the same number of mutation reads as the primary tumor. In this case, the (point) estimates were the same as the main result at any error rates (Supporting Information Fig. S21). Even when the erroneous sites in the metastatic tumor had double the number of mutation reads in the primary tumor, the estimates changed very little compared to the main results (Supporting Information Fig. S22).

As a third sensitivity analysis, since the proposed method uses changes in VAF from the primary to metastatic tumor, clonal mutations are not informative on the founder size could bias the estimate. In the simulation study, the number of clonal mutations was shown not to affect the estimate (Fig. 2 c; Supporting Information Fig. S4) even when we arbitrarily assumed 10% of mutations in the primary tumor were clonal. Defining mutations with observed VAFs > purity estimate × 0.5 × 0.9 as clonal, we confirmed that the estimates without using clonal mutations were similar to the main results, except for the estimates for A02 which were larger than the main results (Supporting Information Fig. S23).

There are mutations that are absent in the primary tumor sample but are present in the metastatic one. For each primary and metastatic tumor pair, we compared such VAFs in the metastatic tumor samples to those from the corresponding simulation based on the “pure birth tumor evolution model” described above (Supporting Information Fig. S24). If the proposed model is relevant, the distributions of observed and simulated VAFs should correspond well with each other. For matched primary and metastatic pairs, the simulations were conducted using the estimated purities (Supporting Information Table S1) and founder size (Fig. 3; Supporting Information Fig. S18), and randomly assigning the depths from matched pairs for each mutation (for the summary of depths, see Supporting Information Table S2), keeping the all other settings as in Figure 2 a, that is, μ = 2.5, K = 50 and m1(min) = 2, N1 = 100,000. We ran 100 simulations for each parameter set. For many pairs, the distributions were relatively similar to each other: medians of the observed VAFs are generally close to simulated ones (Spearman's ρ = 0.534, p = 1.6 × 10–4). However, in many cases, the variations of observed VAFs were different from simulated ones, and the observations tended toward larger VAFs.

Discussion

We developed a method to quantify the founder population size of metastasis using paired WES data from primary and metachronous metastatic tumors. This method, implicitly using the fact that higher (lower) genetic similarity between the primary and metastatic tumors results from a larger (smaller) founder size (Fig. 1 c), unbiasedly estimates the founder population size with sufficient accuracy in the range of realistic founder sizes and settings, for example, sequencing depth, purity and number of clonal mutations (Fig. 2 and Supporting Information Figs. S2S8). The method is also robust to the realistic model of primary tumor evolution, including cell death, selection among cells in primary tumor evolution and moderate selective colonization (Supporting Information Figs. S9S14). Note that in the cases of numerous advantageous mutations and/or a strong selection coefficient, the proposed method underestimates the true founder size for the following reason: Cells with many advantageous mutations, which have high probabilities of colonization, tend to be close relatives of each other and have similar mutational composition; Thus, “effective” number of founder cells should be smaller than “actual” number.

Although relative estimation errors become worse as the founder size increases, this weakness is overcome by deeper sequencing, that is, WES data with ×150 depth give sufficient accuracy even for a founder size of 100 (Supporting Information Fig. S1). As several advanced studies have shown,7, 8, 9, 13, 16, 17 the proposed method also shows the advantage of using VAF information (mutation read counts and depths) rather than using only the presence or absence of mutations, to infer the tumor evolutionary process.

In real data analysis of four CRC patients, we restricted the analysis to pairs of primary and metastatic (or lymph node) tumors with averaged purity 0.3 (Fig. 3) since while the estimation is unbiased, variance becomes large when purity ~0.3 (Fig. 2) and further increases when purity <0.3. In fact, the 90% confidence intervals for the tumors with averaged purity <0.3 tended to be large, for example, for the pairs of P1 and L1 (L2) for the patient A2 (Supporting Information Fig. S18).

Our method supports the multi‐cellular origin of metastatic tumors, which is consistent with the observations of recent mouse model studies4, 5, 6 and WES studies.7, 8 Our method further quantified the founder population sizes to ranging from 3 to 17 cells for CRC subjects (Fig. 3).8 The wide‐range of founder sizes in metastasis might result in large variations of genetic similarity between primary and metastatic tumors and cause variation in drug response between primary and metastatic tumors. In particular, when the founder population size is small, variants with drastically increased VAFs in the metastatic tumors might lead to difficulty in treatment.

In the context of population genetics, demographic history is a confounding factor for detecting and quantifying natural selection acting on the genome.18, 19 The same should be true for the evolution of a tumor population. A potential advantage of the proposed method is to identify selectively recruited mutations in the metastatic tumors under the inferred demographic model for tumor populations, that is, the estimated founder size.

The limitations of our method are that it assumes the model of single bottleneck occurs just after WES in the primary tumor and is followed by rapid growth for metastatic colonization. However, genetic drift (randomly fluctuation of VAFs) may occur in the period between the first exome sampling and metastatic occurrence or between metastatic occurrence and the second exome sampling. Particularly in the latter, genetic drift may substantially shift the VAFs, or a substantial fraction of founder cells might die off due to genetic drift early after metastatic colonization. Our model attributes such genetic drifts to a “single bottleneck,” and the estimate of “the founder size” reflects those drifts. In the simulation for stochastic metastatic tumor evolution showed that for a death rate ≦0.1 against unit the birth rate, the proposed method gives nearly unbiased estimates of founder size (Supporting Information Fig. S15). For death rate >0.1, the true founder size was underestimated due to non‐negligible genetic drift (resulting in reducing the metastatic founder cells) during the early stages of development of metastatic tumor (~100‐cell stage; Supporting Information Fig. S16). In addition, with respect to mutations that are absent in the primary tumor sample but are present in the metastatic tumor sample, while the central tendencies of the observed VAF distributions in the metastatic tumors were close to the simulated ones, the observed variabilities were larger than simulated ones (Supporting Information Fig. S24). This may show that there is room for improvement on the present simple model, for example, we should incorporate the stochastic metastatic evolution into the present model. Furthermore, there are possibly more complex cell migration patterns, including reseeding or multisource seeding,17, 20 which are also beyond the present study, but worth investigating.

Conflicts of interest

The authors have declared no conflicts of interest.

Supporting information

Appendix S1 Supporting Information

Acknowledgements

This research was partially supported by JST CREST Grant Number JPMJCR1412, Japan, and JSPS KAKENHI Grant Numbers 17H06307 and 17H06299, Japan.

References

  • 1. Fidler IJ, Talmadge JE. Evidence that intravenously derived murine pulmonary melanoma metastases can originate from the expansion of a single tumor cell. Cancer Res 1986;46:5167–71. [PubMed] [Google Scholar]
  • 2. Maddipati R, Stanger BZ. Pancreatic cancer Metastases Harbor evidence of Polyclonality. Cancer Discov 2015;5:1086–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Talmadge JE, Fidler I. Evidence for the clonal origin of spontaneous metastases. Science 1982;217:361–3. [DOI] [PubMed] [Google Scholar]
  • 4. Aceto N, Bardia A, Miyamoto DT, et al. Circulating tumor cell clusters are oligoclonal precursors of breast cancer metastasis. Cell 2014;158:1110–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Cheung KJ, Ewald AJ. A collective route to metastasis: seeding by tumor cell clusters. Science 2016;352:167–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Cheung KJ, Padmanaban V, Silvestri V, et al. Polyclonal breast cancer metastases arise from collective dissemination of keratin 14‐expressing tumor cell clusters. Proc Natl Acad Sci USA 2016;113:E854–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Gundem G, Van Loo P, Kremeyer B, et al. The evolutionary history of lethal metastatic prostate cancer. Nature 2015;520:353–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Wei Q, Ye Z, Zhong X, et al. Multiregion whole‐exome sequencing of matched primary and metastatic tumors revealed genomic heterogeneity and suggested polyclonal seeding in colorectal cancer metastasis. Ann Oncol 2017;28:2135–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Williams MJ, Werner B, Barnes CP, et al. Identification of neutral tumor evolution across cancer types. Nat Genet 2016;48:238–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Ohtsuki H, Innan H. Forward and backward evolutionary processes and allele frequency spectrum in a cancer cell population. Theor Popul Biol 2017;117:43–50. [DOI] [PubMed] [Google Scholar]
  • 11. Bozic I, Antal T, Ohtsuki H, et al. Accumulation of driver and passenger mutations during tumor progression. Proc Natl Acad Sci USA 2010;107:18545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Diaz LA Jr, Williams RT, Wu J, et al. The molecular evolution of acquired resistance to targeted EGFR blockade in colorectal cancers. Nature 2012;486:537–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Williams MJ, Werner B, Heide T, et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat Genet 2018;50:895–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Larson NB, Fridley BL. PurBayes: estimating tumor cellularity and subclonality in next‐generation sequencing data. Bioinformatics 2013;29:1888–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Cibulskis K, Lawrence MS, Carter SL, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 2013;31:213–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Sun R, Hu Z, Sottoriva A, et al. Between‐region genetic divergence reflects the mode and tempo of tumor evolution. Nat Genet 2017;49:1015–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. El‐Kebir M, Satas G, Raphael BJ. Inferring parsimonious migration histories for metastatic cancers. Nat Genet 2018;50:718–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Bamshad M, Wooding SP. Signatures of natural selection in the human genome. Nat Rev Genet 2003;4:99–111. [DOI] [PubMed] [Google Scholar]
  • 19. Nielsen R, Hellmann I, Hubisz M, et al. Recent and ongoing selection in the human genome. Nat Rev Genet 2007;8:857–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Sanborn JZ, Chung J, Purdom E, et al. Phylogenetic analyses of melanoma reveal complex patterns of metastatic dissemination. Proc Natl Acad Sci USA 2015;112:10995–1000. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1 Supporting Information

Data Availability Statement

The data that support the findings of our study were derived from Supporting Information Tables S5S12 in the reference 8. The modified data will be made available upon reasonable request. The code to estimate the founder size used for the study is available from the github repository: https://github.com/jonishino/MetaCellNum.


Articles from International Journal of Cancer are provided here courtesy of Wiley

RESOURCES