Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 1.
Published in final edited form as: Cancer. 2020 Apr 21;126(13):3140–3150. doi: 10.1002/cncr.32903

Prognostic significance of an invasive leader cell-derived mutation cluster on chromosome 16q

Brian Pedro 1, Manali Rupji 2, Bhakti Dwivedi 2, Jeanne Kowalski 2,3,^, Jessica M Konen 1,^^, Taofeek K Owonikoko 2,4, Suresh S Ramalingam 2,4, Paula M Vertino 5,6, Adam I Marcus 2,4,*
PMCID: PMC7275903  NIHMSID: NIHMS1589537  PMID: 32315457

Abstract

Background:

Intra-tumoral heterogeneity is defined by subpopulations with varying genotypes and phenotypes. Specialized, highly invasive leader cells and less invasive follower cells are phenotypically distinct subpopulations that cooperate during collective cancer invasion. Since leader cells are a rare subpopulation that would be missed by bulk sequencing, a novel image-guided genomics platform was employed to precisely select this subpopulation. We identified a novel leader cell mutation signature and tested its ability to predict prognosis in non-small cell lung cancer (NSCLC) patient cohorts.

Methods:

SaGA was used to isolate and perform RNA-sequencing on leader and follower populations from the H1299 NSCLC cell line, revealing a leader-specific mutation cluster on chromosome 16q. Genomic data from lung squamous cell carcinoma (LUSC, n=475) and lung adenocarcinoma (LUAD, n=501) patients from The Cancer Genome Atlas (TCGA) were stratified by 16q mutation cluster status (16qMC+ vs. 16qMC-) and compared for overall survival, progression-free survival, and gene set enrichment analysis (GSEA).

Results:

Poorer overall survival and/or progression-free survival was found across all stages and among early-stage patients with 16qMC+ tumors within LUSC and LUAD cohorts. GSEA revealed 16qMC+ tumors to be enriched for expression of metastasis- and survival-associated gene sets.

Conclusion:

This represents the first leader cell mutation signature identified in patients and has the potential to better stratify high-risk NSCLC and ultimately improve patient outcomes.

Keywords: Leader cells, Mutations, Survival, Lung cancer, Hepatocellular carcinoma

Precis:

A novel leader cell-derived mutation cluster on chromosome 16q was discovered, and it was found that 16q mutation cluster-positive (16qMC+) patients experienced poorer survival than 16qMC- patients in multiple NSCLC cohorts. This represents the first leader cell mutation signature identified in patients and has the potential to better stratify high-risk NSCLC.

Introduction

Intra-tumoral heterogeneity stems from internal and external selection pressures,16 leading to cellular subpopulations with varying genomes and phenotypes. This heterogeneity is a key contributor to treatment resistance and cancer progression;3,510 however, this heterogeneity may be missed due to bulk sequencing of only a portion of the tumor. Consequently, the complex genetic and phenotypic landscape from the tumor is not fully captured.

Evidence from in vitro studies and primary solid tumors suggests that rare cells unwittingly missed from bulk sequencing are important for tumor progression and metastasis.11,12 Using a 3-D in vitro model of lung cancer invasion, we showed that collectively invading packs of tumor cells are heterogeneous, and include rare, specialized leader cells that pioneer invasive chains, and follower cells that adhere to and invade behind leaders.11 Collective invasion is widely observed in carcinomas and increases the overall success of metastasis.1315 Leader cells promote collective invasion when mixed with poorly invasive follower cells, even when comprising as little as 1 percent of the population.11 In addition, leader cells are genetically distinct from followers, harboring unique gene expression profiles that may help to facilitate collective invasion.11

Rare subpopulations such as leader cells could be important for cancer metastasis, yet underrepresented by standard tumor sequencing. We therefore sought to use our imaging-guided genomics platform (Spatiotemporal Genomic and Cellular Analysis, or SaGA)11 to identify unique leader cell gene mutations and define higher-risk patient groups in non-small cell lung cancer (NSCLC), which includes squamous cell carcinoma (LUSC) and adenocarcinoma (LUAD). Using a novel, leader cell-specific cluster of mutated genes on chromosome 16q, we found that LUSC and LUAD patients with 1 or more mutation(s) within this cluster have poorer overall and progression-free survival, even among early-stage patients. This represents the first leader cell mutation signature identified in patients and has the potential to better stratify high-risk NSCLC and ultimately improve patient outcomes.

Results

Development of leader cell-specific 16q mutation cluster

We utilized leader and follower cell lines previously derived from the H1299 NSCLC cell line using the SaGA platform (described in11; schematic in Fig. 1A). As leader cells are crucial for collective invasion in 3-D assays,11 we hypothesized that NSCLC patients with genetic evidence of leader cells within the primary tumor could be at higher risk for disease progression and recurrence. Our previous data show that H1299 leader and follower cells contain distinct mutational profiles.16 Additional inclusion of known variants from the dbSNP database17 resulted in 17 leader-specific and 18 follower-specific mutations (Fig. 1B; Table S1). Notably, 7 leader-specific mutations were found on chromosome 16q (Table S2; Fig. 1C, solid lines). We hypothesized that these mutations could help detect leader cell subpopulations; therefore, after confirming comparable mRNA levels of each gene in the leader and follower populations (Fig. S1), we used these genes to define a leader cell mutation signature.

Figure 1. dentification of a leader cell-derived mutation cluster on chromosome 16q.

Figure 1.

I (A) Schematic of the SaGA technique for photoconversion, isolation, and downstream analysis of H1299 leader and follower cells. Adapted from11. (B) Variant allele frequency values from RNA-sequencing of H1299 leader and follower populations for 17 genes identified as containing leader-specific point mutations, and 18 genes identified as containing follower-specific point mutations. (C) Map of chromosome 16q annotated with locations of genes containing leader-specific mutations (solid lines) and adjacent genes subsequently included in the 16q mutation cluster (dotted lines). (D) Percentages of TCGA LUSC and LUAD cases with mutations in each of eight (LUSC) or nine (LUAD) 16q cluster genes.

Identification of 16q mutation cluster-positive tumors in NSCLC patient cohorts

Gene expression data and clinical outcomes information for LUSC patients (n=475) and LUAD patients (n=501) were extracted from The Cancer Genome Atlas (TCGA).18 Importantly, 37 of 475 (7.8%) LUSC patients and 30 of 501 (6.2%) LUAD patients had one or more mutations among the seven 16q mutation cluster genes (LUSC: Figs. 1D, 2A; LUAD: Figs. 1D, 2D). NQO1 was mutated in one patient and was excluded from subsequent analyses. Nearly all of the identified point mutations occurred at different loci, suggesting that they could result from a hyper-mutational process rather than being selected due to altered protein function. Among genes directly adjacent to the six leader-derived 16q genes (Fig. 1C, dashed lines), the same pattern of randomly-distributed mutations was observed in SLC12A3 and ZNF778 in the LUSC cohort (Fig. 2A), and SLC12A3, NFAT5 and SPG7 in the LUAD cohort (Fig. 2D). Taken together, 10.7% of LUSC patients and 11.6% of LUAD patients had at least one mutation within the respective 8- or 9-gene 16q clusters (Fig. 1D); these patients were defined as 16q mutation cluster-positive (16qMC+). The majority of 16qMC+ patients – 94.1% of LUSC and 86.2% of LUAD – had mutations in only one 16q cluster gene (Fig. S2). Most mutations were found at variant allele frequencies (VAF) of less than 50%, which likely indicates sub-clonal mutations barring any chromosomal alterations at that locus (Fig. S2). Additionally, 16qMC+ tumors had significantly higher mutation counts (Table 1). Within the LUAD cohort, the 16qMC+ group contained significantly more smokers (96.5% vs. 84.2%, P=0.009; Table 1), TP53 mutations (77.6% vs. 48.6%, P=0.0005; Table S3), and patients who received radiation therapy prior to resection (20.7% vs. 11.0%, P=0.024; Table S4).

Figure 2. 16qMC predicts poor prognosis in non-small cell lung cancer cohorts.

Figure 2.

(A) Lollipop plots illustrating locations of point mutations in 16q cluster genes in TCGA LUSC patients. Black dots depict truncations; gray dots depict missense mutations; red outlines depict driver mutations indicated by OncoKB and/or Cancer Hotspots. (B) Kaplan Meier (KM) curves for OS and PFS of 16qMC+ and 16qMC− TCGA LUSC patients. Median OS: 5.0 years (16qMC-) vs. 2.6 years (16qMC+); median PFS: 8.0 years (16qMC-) vs. 2.7 years (16qMC+). (C) KM curves for OS and PFS of 16qMC+ and 16qMC− stage I and II TCGA LUSC patients. Median OS: 5.4 years (16qMC-) vs. 2.6 years (16qMC+); median PFS: 8.4 years (16qMC-) vs. 6.3 years (16qMC+). (D) Lollipop plots illustrating locations of point mutations in 16q cluster genes in TCGA LUAD patients. Black dots depict truncation mutations; gray dots depict missense mutations; red outlines depict driver mutations indicated by OncoKB and/or Cancer Hotspots. (E) KM curves for OS and PFS of 16qMC+ and 16qMC− TCGA LUAD patients. Median OS: 4.2 years (16qMC-) vs. 2.6 years (16qMC+); median PFS: 3.1 years (16qMC-) vs. 2.4 years (16qMC+). (F) KM curves for OS and PFS of 16qMC+ and 16qMC− stage I and II TCGA LUAD patients. Median OS: 5.6 years (16qMC-) vs. 3.2 years (16qMC+); median PFS: 3.4 years (16qMC-) vs. 4.0 years (16qMC+). P values calculated by log-rank test.

Table 1:

Patient characteristics for LUSC and LUAD TCGA cohorts

LUSC LUAD
Covariate Statistic Group 16q cluster status P valueab 16q cluster status P value
16qMC+ (N=51) 16qMC− (N=424) 16qMC+ (N=58) 16qMC− (N=443)
Gender N (%) Female 13 (25.49) 112 (26.54) 0.872 28 (48.28) 240 (54.18) 0.397
N (%) Male 38 (74.51) 310 (73.46) 30 (51.72) 203 (45.82)
Pathologic stage N (%) Stage I & II 41 (80.39) 343 (81.47) 0.852 46 (79.31) 346 (78.46) 0.882
N (%) Stage III & IV 10 (19.61) 78 (18.53) 12 (20.69) 95 (21.54)
Smoking history N (%) Current/reformed smoker 49 (98) 399 (96.14) 1 55 (96.49) 362 (84.19) 0.009
N (%) Nonsmoker 1 (2) 16 (3.86) 2 (3.51) 68 (15.81)
Mutation countc N (%) High mut. count 43 (84.31) 235 (57.46) <0.001 49 (84.48) 202 (46.12) <0.001
N (%) Low mut. count 8 (15.69) 174 (42.54) 9 (15.52) 236 (53.88)
Age at diagnosis Median (min-max) 67 (44–83) 68 (39–90) 0.363 65 (40–88) 66 (38–87) 0.494
a

P-values calculated by ANOVA for numerical covariates and chi-square or Fisher’s exact test for categorical covariates.

b

P-values calculated by Kruskal-Wallis test for numerical covariates.

c

High/low mutation count was defined by a cutoff of 192 mutations, based upon previous mutational burden analysis of TCGA cohorts.19

Prognostic validation of 16qMC in TCGA cohorts

We found that 16qMC+ patients had poorer overall survival (OS) (HR 1.79, 95% CI 1.19–2.71; log-rank P=0.005) and progression-free survival (PFS) (HR 1.78, 95% CI 1.06–3.01; log-rank P=0.028) among all-stage LUSC (Fig. 2B; Table 2). Notably, early-stage 16qMC+ LUSC patients had poorer OS (HR 2.08, 95% CI 1.27–3.24; log-rank P=0.003) (Fig. 2C; Table S5). In the LUAD cohort, all-stage 16qMC+ patients experienced poorer OS (HR 1.84, 95% CI 1.73–2.74; log-rank P=0.002; Fig. 2E; Table 3) as did early-stage patients (HR 2.06, 95% CI 1.26–3.23; log-rank P=0.003; Fig. 2F; Table S6). Multivariable Cox regression analysis indicated 16qMC+ status as a significant predictor of OS (HR 1.71, 95% CI 1.13–2.58; P=0.011) and PFS (HR 1.73, 95% CI 1.00–2.97; P=0.049) among all-stage LUSC patients (Table 2), and of OS among all-stage LUAD patients (HR 1.95, 95% CI 1.31–2.91; P=0.001) (Table 3). In multivariable analysis among early-stage patients, 16qMC+ status remained predictive of poorer OS for LUSC (HR 1.94, 95% CI 1.21–3.12; P=0.006; Table S5) and LUAD patients (HR 2.02, 95% CI 1.25–3.27; P=0.004; Table S6).

Table 2:

Cox regression analysis for all-stage LUSC TCGA patients

Univariable analysis Multivariable analysis
Overall survival Progression-free survival Overall survival Progression-free survival
Covariate Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value
16q cluster status
(16qMC+ vs. −)
1.79 (1.19–2.71) 0.006 1.78 (1.06–3.01) 0.030 1.71 (1.13–2.58) 0.011 1.73 (1.00–2.97) 0.049
Gender
(Female vs. Male)
0.83 (0.60–1.16) 0.277 0.89 (0.60–1.32) 0.564
Pathologic stage
(I/II vs. III/IV)
0.61 (0.44–0.84) 0.003 0.50 (0.34–0.73) <0.001 0.61 (0.44–0.85) 0.004 0.49 (0.33–0.73) <0.001
Smoking history
(Smoker vs. non)
0.63 (0.26–1.53) 0.305 0.36 (0.16–0.83) 0.016 0.34 (0.15–0.78) 0.011
Mutation count
(High vs. low)
1.04 (0.78–1.39) 0.765 0.92 (0.65–1.30) 0.625
Age at diagnosis 1.02 (1.00–1.03) 0.059 1.00 (0.98–1.02) 0.988 1.02 (1.00–1.04) 0.038

Table 3:

Cox regression analysis for all-stage LUAD TCGA patients

Univariable analysis Multivariable analysis
Overall survival Progression-free survival Overall survival Progression-free survival
Covariate Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value
16q cluster status
(16qMC+ vs. −)
1.84 (1.23–2.74) 0.003 1.30 (0.86–1.96) 0.214 1.95 (1.31–2.91) 0.001 1.30 (0.86–1.97) 0.207
Gender
(Female vs. Male)
0.95 (0.71–1.27) 0.713 0.95 (0.72–1.26) 0.730
Pathologic stage
(I/II vs. III/IV)
0.37 (0.27–0.51) <0.001 0.62 (0.45–0.86) 0.004 0.36 (0.27–0.50) <0.001 0.62 (0.45–0.86) 0.004
Smoking history
(Smoker vs. non)
0.91 (0.60–1.39) 0.676 0.96 (0.65–1.44) 0.859
Mutation count
(High vs. low)
0.98 (0.73–1.31) 0.893 0.94 (0.71–1.24) 0.669
Age at diagnosis 1.01 (0.99–1.02) 0.349 1.00 (0.98–1.01) 0.695

16qMC+ tumors in both cohorts had increased mutation counts (Fig. 3A, B) and 16qMC+ LUSC tumors had more copy number alterations (Fig. 3C). To determine whether mutation count was driving the poorer survival among 16qMC+ patients, we stratified by low (≤192) or high (>192) mutation count as previously described for TCGA cohorts.19 We found that 16qMC+ status still correlated with poorer OS among highly-mutated LUSC (Fig. 3D), OS and PFS among lowly-mutated LUAD, and OS among highly-mutated LUAD (Fig. 3E). Given the higher proportion of TP53 mutations among 16qMC+ LUAD tumors, we also examined survival by both TP53 and 16qMC status; although mutated TP53 was associated with poorer OS, 16qMC+ status further differentiated survival among TP53 wild-type patients (Fig. S3). These data indicate that 16qMC+ status could help identify patients who are at higher risk for disease progression.

Figure 3. 16qMC+ tumors have increased overall mutational burden.

Figure 3.

(A) Total mutation count for TCGA LUAD and LUSC cohorts. Total mutation count defined as total detected number of non-synonymous mutations. **P<0.01, ****P<0.0001 by ordinary one-way ANOVA with Sidak’s multiple comparisons test. Bars show mean+standard deviation. (B) Percentage (with 95% confidence intervals) of tumors with high mutation count (defined as >192 total mutations) among 16qMC− and 16qMC+ patients. Confidence intervals calculated by the Wilson/Brown method. (C) Total fraction of genome altered (FGA), calculated as the percentage of the genome with copy number gains and/or losses, between 16qMC− and 16qMC+ tumors. *P<0.05 by ordinary one-way ANOVA. Bars show mean+standard deviation. (D-F) KM curves for OS and PFS of 16qMC− and 16qMC+ TCGA LUSC patients (D) and TCGA LUAD patients (E) with either high (>192) or low (<192) mutation counts. Median OS for 16qMC− vs. 16qMC+ LUSC patients: 4.8 vs. 3.9 years (low mut. count); 5.0 vs. 1.5 years (high mut. count). Median PFS for 16qMC− vs. 16qMC+ LUSC patients: 6.0 vs. 2.7 years (low mut. count). Median OS for 16qMC− vs. 16qMC+ LUAD patients: 4.2 vs. 2.6 years (low mut. count); 6.0 vs. 2.7 years (high mut. count). Median PFS for 16qMC− vs. 16qMC+ LUAD patients: 3.0 vs. 2.2 years (low mut. count); 3.1 vs. 2.4 years (high mut. count). P values calculated by log-rank test.

As collective invasion is observed in numerous carcinomas,20 we analyzed additional TCGA cohorts to test the prognostic value of leader-cell derived 16qMC in other cancer types. Notably, 16qMC+ patients within a TCGA hepatocellular carcinoma (HCC) cohort21 also had significantly increased mutation counts, and poorer survival among all-stage and early-stage disease (Tables S79; Fig. S4). As HCC carries poor prognosis and high rates of recurrence, HCC patients could also potentially benefit from 16qMC+ screening.

Given the scattered distribution of mutations in 16qMC+ patients (Figs. 2A, 2D, S4), we determined the prognostic power of the 16qMC genes compared with 1,000 randomly-selected clusters of 8 (LUSC) or 9 (LUAD, HCC) genes. The 16q mutation cluster outperformed the random gene sets in differentiating survival for LUSC (OS: P=0.007; PFS: P=0.025), LUAD (OS: P=0.001), and HCC (OS: P=0.006; PFS: P=0.0290) (see Supplementary Methods).

Gene set enrichment analysis of 16qMC+ tumors

Next, differentially expressed genes between 16qMC+ and 16qMC- tumors were determined from RNA-sequencing data for the LUSC and LUAD TCGA patient cohorts, and subjected to gene set enrichment analysis (GSEA).22,23 Several gene sets related to metastasis, recurrence, relapse, prognosis, or survival were significantly associated with 16qMC+ status (false discovery rate <0.05) (Fig. 4A). In both cohorts, among the most positively-enriched gene sets for 16qMC+ patients was “SHEDDEN LUNG CANCER POOR SURVIVAL A6,” a gene set predictive of OS in lung adenocarcinoma patients24 (LUSC normalized enrichment score (NES)=6.81, LUAD NES=13.75; Fig. 4BC). Conversely, “SHEDDEN LUNG CANCER GOOD SURVIVAL A4,” a gene set highly expressed in patients with better survival,24 was depleted in 16qMC+ LUSC (NES=−3.46; Fig. 4B) and LUAD (NES=−6.47; Fig. 4C). Also identified were positive enrichment for “WINNEPENNINCKX_MELANOMA_METASTASIS_UP” in 16qMC+ LUSC (NES=4.87; Fig. 4B) and LUAD (NES=8.11; Fig. 4C), negative enrichment of “CHANDRAN_METASTASIS_DN” in 16qMC+ LUSC (NES=−3.59; Fig. 4B) and LUAD (NES=−5.48; Fig. 4C), and positive enrichment of “BIDUS_METASTASIS_UP” in 16qMC+ LUSC (NES=4.86) and LUAD (NES=5.51). Together, these results show that the 16q mutation cluster identifies patients with a similar high-risk expression profile as previously established prognostic gene sets, and that 16qMC+ tumors are consistent with more advanced disease, increased likelihood of recurrence, and poorer patient outcomes.

Figure 4. Metastasis- and prognosis-related gene sets are enriched in 16qMC+ tumors.

Figure 4.

(A) Gene sets related to metastasis, recurrence, relapse, prognosis, or survival that were significantly positively- or negatively-enriched (FDR < 0.05) in GSEA of 16qMC+ tumors vs. 16qMC- tumors within the TCGA LUSC and TCGA LUAD cohorts. NES = normalized enrichment score. Dot size indicates the number of core enriched genes, while dot color indicates the proportion of total genes in the given gene set that are enriched in the 16qMC+ population. (B-D) GSEA plots of selected gene sets in LUSC (D) and LUAD (E) cohorts.

Discussion

Current methods for molecular characterization may not sufficiently capture the full genomic and phenotypic landscape of a tumor population,6,25 since rare, yet treatment-resistant and invasive cell populations would be missed11,26. Our previous work begins to address this problem through the SaGA platform, which was used to isolate specialized, highly invasive leader cells from a larger population of collectively invading packs of NSCLC cells.11

We identified a novel, leader cell-derived, ten-gene mutation cluster on chromosome 16q. Although 16q deletions have been found in breast, prostate, and other cancers,2729 16q alterations in lung cancer have not been widely studied, and co-occurrence of point mutations on 16q have not been reported in any cancer type. In separate cohorts of LUSC and LUAD patients, patients with at least one non-synonymous mutation in any of 8 (LUSC) or 9 (LUAD) of these 16q genes were found to have experienced significantly poorer overall and progression-free survival. These survival differences are maintained in early-stage patients, highlighting the potential clinical utility of this mutation cluster.

The mechanism by which 16qMC+ status differentiates survival requires further study. Although we identified 17 leader-specific mutated genes, only mutations on 16q could differentiate survival, whereas including all 17 genes showed no survival differences in LUSC and LUAD (Log-rank P=0.504 and 0.380, respectively). The majority of 16qMC+ tumors contained only one 16qMC mutation, with no observed effects on expression for the majority of genes (Fig. S2). Although this was initially surprising, it is important to consider that the 16qMC was derived from rare and invasive leader cells; therefore the majority of early stage tumors with these mutations may not yet have a detectable effect on genome-wide expression. As the tumor progresses and metastasis occurs, we would predict that the downstream expression consequences of leader specific mutations would become more apparent. Furthermore, point mutations could impact protein function without affecting gene expression. For example, we previously showed that a leader-specific mutation in ARP3, while not affecting mRNA expression in leader cells, conferred leader cell behavior when introduced into follower cells.16

Interestingly, in addition to LUSC and LUAD, 16qMC+ tumors also contained significantly elevated mutation counts in TCGA HCC, breast, colorectal, stomach, melanoma, and endometrial cancer cohorts (Fig. S4). This indicates that 16qMC+ status could result from a hyper-mutational state, such as microsatellite instability (MSI), in which the 16qMC genes are particularly susceptible to somatic mutations. MSI is observed in lung cancer30,31 albeit less frequently than other cancer types such as colorectal. However, our data show that 16qMC+ status correlates with survival even after stratifying patients by high and low mutation counts (Fig. 3), and thus additional work is needed to determine whether mutation count is contributing to the poorer survival among 16qMC+ patients.

Using GSEA, genes differentially expressed between 16qMC+ and 16qMC- tumors are enriched in gene sets associated with metastasis and patient prognosis.24,3234 These data show that the 16q mutation cluster can stratify high-risk patients through identification of a single point mutation among ten genes. By comparison, other larger-scale, expression-based gene sets, are not as easily translatable to patient care. Targeted sequencing of these ten 16q genes could represent a new strategy for preventing disease recurrence and improving survival in NSCLC, and potentially in HCC as well.

Future studies will focus on prospective cohort analysis of early-stage NSCLC patients to better determine how reproducibly 16qMC+ status can differentiate survival. These results are observed across multiple NSCLC cohorts and extend to HCC; however, to better determine the potential clinical utility of 16qMC+ screening, next steps include prospective analyses in additional NSCLC cohorts using primary patient tissue. Additionally, the issue persists that a biopsy could miss rarer subpopulations of cells.6,25 Thus, sequencing of circulating tumor DNA through liquid biopsies could provide a more complete picture of the tumor genome.3537 By using SaGA to identify, isolate, and analyze rare leader cells to discover novel biomarkers, we have laid out an approach that could lead to more effective prognostic strategies.

Methods

Identification of leader- and follower-enriched variants

Isolation via SaGA, RNA-sequencing expression, and variant calling for leader and follower cells from the H1299 cell line were performed as previously described.11,16 RNA-sequencing data are deposited in the NCBI SRA database under accession number PRJNA542374.

Patient selection and stratification

For TCGA cohorts in cBioPortal, only patients with available mutation data were included. Patients with at least one non-synonymous mutation in at least one 16qMC gene were categorized as “16qMC+”. Lollipop plots depicting locations of 16q cluster point mutations in each cohort were constructed using MutationMapper through cBioPortal.38,39 Patient clinical data were downloaded from cBioPortal.

Enrichment analysis

For GSEA, previously processed versions of TCGA LUAD, LUSC, and LIHC (HCC) RNA-seq data based on human genome build hg19 were downloaded for the included subsets of patients from the GDC legacy archive (https://portal.gdc.cancer.gov/legacy-archive/search/). Raw RSEM expression counts were filtered for lowly expressed genes (average CPM<1.0) and normalized by the TMM method using edgeR.40,41 Differential expression between 16qMC+ and 16qMC- was calculated for all genes with limma R package.41 Genes were ranked according to −log10(P value) multiplied by direction of fold change. GSEAPreranked was performed on the ranked gene list with classic enrichment statistics under default settings and C2 curated gene sets (4762 gene sets) from MSigDB 6.2 release using GSEA Desktop v3.0.22,42

Statistical analysis

Statistical analysis was conducted using SAS Version 9.4 and GraphPad Prism Version 8.2. Ordinary one-way ANOVA with Sidak’s multiple comparisons test was used when three or more conditions were being compared. Confidence intervals of percentages were calculated using the Wilson/Brown method. Patient characteristics were reported as counts with percentages for categorical variables and median with range for numeric variables. A chi-square or Fisher’s exact test, as appropriate, was conducted to identify associations between categorical demographic characteristics and 16qMC status, and an ANOVA or a Kruskal-Wallis test, as appropriate, was conducted to identify associations between continuous demographic factors and 16qMC status.

OS and PFS were calculated by the Kaplan-Meier method, with P values calculated by the log-rank (Mantel-Cox) test. A univariable cox proportional hazards regression analysis was performed to determine any significant association of the demographic factors and OS/PFS. Variables significant at an alpha of 0.2 were used for model selection. A multivariable cox regression analysis using a backward elimination approach was used to select covariates, with removal of covariates of alpha >0.2.

For subgroup survival analysis based on early/late stage or mutation count categories, KM curves were created based upon 16qMC status for both OS and PFS. To account for the small number of events in the strata, Firth’s penalized regression approach was used within each subgroup. The multivariable analysis was conducted as described above. Similar subgroup analysis was performed for early-stage (I and II) and late stage (III and IV) patients. For the four-group survival analysis by 16qMC status and TP53 mutation status, KM curves for each OS and PFS endpoint were created and log-rank P values were obtained. Pairwise log-rank P values were adjusted using Tukey-Kramer’s method for multiple comparisons.

Supplementary Material

sup info

Acknowledgements and Funding

This project was funded in part by National Institute of Health grants R01CA201340-01 and 1U54CA209992 (to A.I.M.). B.P. was support by NIH Predoctoral Fellowship 1F31CA225049. Support of the Emory Integrated Genomics Core Shared Resource and the Biostatistics and Bioinformatics Shared Resource were provided by the Winship Cancer Institute of Emory University core grant under award number 2P30CA138292. The authors declare no conflicts of interest.

References

  • 1.Jiang Y, Qiu Y, Minn AJ, Zhang NR. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(37):E5528–E5537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sun R, Hu Z, Sottoriva A, et al. Between-region genetic divergence reflects the mode and tempo of tumor evolution. Nat Genet. 2017;49(7):1015–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McGranahan N, Swanton C. Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. Cancer Cell. 2015;27(1):15–26. [DOI] [PubMed] [Google Scholar]
  • 4.Lipinski KA, Barber LJ, Davies MN, Ashenden M, Sottoriva A, Gerlinger M. Cancer Evolution and the Limits of Predictability in Precision Cancer Medicine. Trends Cancer. 2016;2(1):49–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Burrell RA, McGranahan N, Bartek J, Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013;501(7467):338–345. [DOI] [PubMed] [Google Scholar]
  • 6.Gerlinger M, Rowan AJ, Horswell S, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366(10):883–892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gerlinger M, McGranahan N, Dewhurst SM, Burrell RA, Tomlinson I, Swanton C. Cancer: evolution within a lifetime. Annu Rev Genet. 2014;48:215–236. [DOI] [PubMed] [Google Scholar]
  • 8.McGranahan N, Swanton C. Clonal Heterogeneity and Tumor Evolution: Past, Present, and the Future. Cell. 2017;168(4):613–628. [DOI] [PubMed] [Google Scholar]
  • 9.Caswell DR, Swanton C. The role of tumour heterogeneity and clonal cooperativity in metastasis, immune evasion and clinical outcome. BMC Med. 2017;15(1):133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.de Bruin EC, McGranahan N, Mitter R, et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science. 2014;346(6206):251–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Konen J, Summerbell E, Dwivedi B, et al. Image-guided genomics of phenotypically heterogeneous populations reveals vascular signaling during symbiotic collective cancer invasion. Nature Communications. 2017;8:15078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jamal-Hanjani M, Wilson GA, McGranahan N, et al. Tracking the Evolution of Non-Small-Cell Lung Cancer. N Engl J Med. 2017;376(22):2109–2121. [DOI] [PubMed] [Google Scholar]
  • 13.Gilbert-Ross M, Konen J, Koo J, et al. Targeting adhesion signaling in KRAS, LKB1 mutant lung adenocarcinoma. JCI Insight. 2017;2(5):e90487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Richardson AM, Havel LS, Koyen AE, et al. Vimentin Is Required for Lung Adenocarcinoma Metastasis via Heterotypic Tumor Cell–Cancer-Associated Fibroblast Interactions during Collective Invasion. Clin Cancer Res. 2018;24(2):420–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cheung KJ, Padmanaban V, Silvestri V, et al. Polyclonal breast cancer metastases arise from collective dissemination of keratin 14-expressing tumor cell clusters. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(7):E854–863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zoeller EL, Pedro B, Konen J, et al. Genetic heterogeneity within collective invasion packs drives leader and follower cell phenotypes. Journal of Cell Science. 2019;132(19):jcs231514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research. 2001;29(1):308–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hoadley KA, Yau C, Hinoue T, et al. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell. 2018;173(2):291–304 e296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Colli LM, Machiela MJ, Myers TA, Jessop L, Yu K, Chanock SJ. Burden of Nonsynonymous Mutations among TCGA Cancers and Candidate Immune Checkpoint Inhibitor Responses. Cancer Res. 2016;76(13):3767–3772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Friedl P, Locker J, Sahai E, Segall JE. Classifying collective cancer cell invasion. Nat Cell Biol. 2012;14(8):777–783. [DOI] [PubMed] [Google Scholar]
  • 21.Weinstein JN, Collisson EA, Mills GB, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mootha VK, Lindgren CM, Eriksson K-F, et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267–273. [DOI] [PubMed] [Google Scholar]
  • 24.Director’s Challenge Consortium for the Molecular Classification of Lung A, Shedden K, Taylor JMG, et al. Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008;14:822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gerlinger M, Horswell S, Larkin J, et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat Genet. 2014;46(3):225–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lawson DA, Kessenbrock K, Davis RT, Pervolarakis N, Werb Z. Tumour heterogeneity and metastasis at single-cell resolution. Nat Cell Biol. 2018;20(12):1349–1360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fabris VT. From chromosomal abnormalities to the identification of target genes in mouse models of breast cancer. Cancer Genet. 2014;207(6):233–246. [DOI] [PubMed] [Google Scholar]
  • 28.Cleton-Jansen AM, van Eijk R, Lombaerts M, et al. ATBF1 and NQO1 as candidate targets for allelic loss at chromosome arm 16q in breast cancer: absence of somatic ATBF1 mutations and no role for the C609T NQO1 polymorphism. BMC cancer. 2008;8:105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kluth M, Jung S, Habib O, et al. Deletion lengthening at chromosomes 6q and 16q targets multiple tumor suppressor genes and is associated with an increasingly poor prognosis in prostate cancer. 2017;8(65):108923–108935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shen C, Wang X, Tian L, Che G. Microsatellite alteration in multiple primary lung cancer. Journal of thoracic disease. 2014;6(10):1499–1505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fong KM, Zimmerman PV, Smith PJ. Microsatellite instability and other molecular abnormalities in non-small cell lung cancer. Cancer research. 1995;55(1):28–30. [PubMed] [Google Scholar]
  • 32.Woo HG, Park ES, Cheon JH, et al. Gene Expression–Based Recurrence Prediction of Hepatitis B Virus–Related Human Hepatocellular Carcinoma. Clin Cancer Res. 2008;14(7):2056. [DOI] [PubMed] [Google Scholar]
  • 33.Villanueva A, Hoshida Y, Battiston C, et al. Combining Clinical, Pathology, and Gene Expression Data to Predict Recurrence of Hepatocellular Carcinoma. Gastroenterology. 2011;140(5):1501–1512.e1502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lee J-S, Chu I-S, Heo J, et al. Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. Hepatology. 2004;40(3):667–676. [DOI] [PubMed] [Google Scholar]
  • 35.Matikas A, Syrigos KN, Agelaki S. Circulating Biomarkers in Non-Small-Cell Lung Cancer: Current Status and Future Challenges. Clin Lung Cancer. 2016;17(6):507–516. [DOI] [PubMed] [Google Scholar]
  • 36.Zhang YC, Zhou Q, Wu YL. The emerging roles of NGS-based liquid biopsy in non-small cell lung cancer. J Hematol Oncol. 2017;10(1):167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Sorber L, Zwaenepoel K, Deschoolmeester V, et al. Circulating cell-free nucleic acids and platelets as a liquid biopsy in the provision of personalized therapy for lung cancer patients. Lung Cancer. 2017;107:100–107. [DOI] [PubMed] [Google Scholar]
  • 38.Cerami E, Gao J, Dogrusoz U, et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2012;2(5):401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Gao J, Aksoy BA, Dogrusoz U, et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Science Signaling. 2013;6(269):pl1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Fabregat A, Jupe S, Matthews L, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2017;46(D1):D649–D655. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sup info

RESOURCES