Abstract
PURPOSE
Our goal was to identify the opportunities and challenges in analyzing data from the American Association of Cancer Research Project Genomics Evidence Neoplasia Information Exchange (GENIE), a multi-institutional database derived from clinically driven genomic testing, at both the inter- and the intra-institutional level. Inter-institutionally, we identified genotypic differences between primary and metastatic tumors across the 3 most represented cancers in GENIE. Intra-institutionally, we analyzed the clinical characteristics of the Vanderbilt-Ingram Cancer Center (VICC) subset of GENIE to inform the interpretation of GENIE as a whole.
METHODS
We performed overall cohort matching on the basis of age, ethnicity, and sex of 13,208 patients stratified by cancer type (breast, colon, or lung) and sample site (primary or metastatic). We then determined whether detected variants, at the gene level, were associated with primary or metastatic tumors. We extracted clinical data for the VICC subset from VICC’s clinical data warehouse. Treatment exposures were mapped to a 13-class schema derived from the HemOnc ontology.
RESULTS
Across 756 genes, there were significant differences in all cancer types. In breast cancer, ESR1 variants were over-represented in metastatic samples (odds ratio, 5.91; q < 10−6). TP53 mutations were over-represented in metastatic samples across all cancers. VICC had a significantly different cancer type distribution than that of GENIE but patients were well matched with respect to age, sex, and sample type. Treatment data from VICC was used for a bipartite network analysis, demonstrating clusters with a mix of histologies and others being more histology specific.
CONCLUSION
This article demonstrates the feasibility of deriving meaningful insights from GENIE at the inter- and intra-institutional level and illuminates the opportunities and challenges of the data GENIE contains. The results should help guide future development of GENIE, with the goal of fully realizing its potential for accelerating precision medicine.
INTRODUCTION
Recent developments in oncology related to the decreasing costs of sequencing, histology-agnostic indications, and reuse of archival tissues provide new opportunities to accelerate innovations in precision oncology. In particular, one important development is the ability to sequence DNA from formalin-fixed paraffin embedded specimens, even those that were obtained years before the need for a precision approach.1 Whether to rely on archival tissue or on a new biopsy of metastatic recurrence to guide management is unclear.2 Furthermore, the choice of site for rebiopsy is often guided by convenience and safety factors.3,4 Fundamentally, the tendencies in genomic variation between primary and metastatic cancer samples are poorly understood, given the paucity of true paired-sample studies in the literature.5-8 Clinically, primary samples are relatively unlikely to have been exposed to cytotoxic treatment, whereas metastatic samples have often developed clinical resistance in heavily pretreated settings. Past studies have largely focused on small groups of patients (n = approximately 50), with discordant findings.9-18 A major obstacle in the field remains the reconciling of prior studies and providing a broad overview for the genomics of metastasis that evolve under treatment pressure.
CONTEXT
Key Objective
Targeted therapies are becoming increasingly available for tumors with specific genotypes. However, tumor samples are usually from the most accessible site because guidelines do not distinguish between primary and metastatic biopsy sites. It is largely unknown how primary and metastatic cancer samples differ from each other on the genomic level after exposure to therapies. The key objective of this study was to provide a broad overview of the genomics of metastasis by using the American Association of Cancer Research Project Genomics Evidence Neoplasia Information Exchange (GENIE), a multi-institutional cancer genomics database.
Knowledge Generated
We found mutated genes over-represented in both primary and metastatic samples, such as ESR1 in metastatic breast cancer, a resistance mechanism to hormonal therapies. To better understand this heterogeneity, we investigated treatment exposure at one GENIE member institution and found commonalities in the regimens used across cancer types.
Relevance
There can be clinically significant differences between primary and metastatic sites, which may be driven in part by treatment selection.
There has been an increasing effort to consolidate data to create larger precision oncology databases. One such effort is the American Association of Cancer Research Project, Genomics Evidence Neoplasia Information Exchange (GENIE).19 Project GENIE represents the collaboration of 19 member institutions and as of July 2019 had grown to include almost 70,000 samples. Each member institution has its own gene panel, with a total of 1,211 genes represented as of version 5.0. There is an overlap of 44 core genes among the original data set and 191 among the large-scale gene panels, which cover 275 to 429 genes each. As the database continues to grow, the important question remains of how best to consolidate data from these different sources and use them in a meaningful way.
In this 2-part study, we used the public release of Project GENIE from January 2019 (5.0-public). We focused our analysis on the 3 most commonly represented cancers (lung, breast, and colon) to identify genomic alterations between primary and metastatic cohorts within each cancer type. Because treatment exposure details are not yet part of the public GENIE database, we next focused on the patients of one GENIE member institution (Vanderbilt-Ingram Cancer Center [VICC]) to gain deeper insight into the treatment exposure characteristics of GENIE patients (Fig 1A).
FIG 1.

Overview of study. (A) General flow diagram of the study. There are 2 arms, 1 focused on Project Genomics Evidence Neoplasia Information Exchange (GENIE) and 1 on the Vanderbilt-Ingram Cancer Center (VICC) subset. (B) Degree of overlap at the gene level among the 3 centers with large-scale gene arrays (Memorial Sloan Kettering [MSK], Dana-Farber Cancer Institute [DFCI], and VICC). AACR, American Association for Cancer Research; TMB, tumor mutational burden.
METHODS
Sample Data and Raw Data
We downloaded the mutation and clinical characteristics data from GENIE 5.0-public. Cancer type was determined on the basis of the cancer type classification designated by GENIE (which was based on the OncoTree hierarchy20). Because there is heterogeneity in the gene panels in Project GENIE, we limited the number of gene panels to both maximize coverage and reduce unevenness in gene representation. We included all the patients in GENIE who were sequenced at Memorial Sloan Kettering (MSK), Dana-Farber Cancer Institute (DFCI), and VICC. These institutions were selected because they provided large-scale gene coverage and had significant (> 1,000) patient representation in GENIE as of January 2019. These gene panels share a core overlap of 227 genes and cover 756 genes altogether (Fig 1B). Together, samples from these institutions represent more than 70% of the samples available in GENIE 5.0-public.
The samples from MSK have been matched to normal tissue to distinguish germline mutations, but the samples from VICC and DFCI have not. For DFCI samples, putative germline variants have been filtered out using a panel of historical normals or if present in the Exome Sequencing Project database at a frequency ≥ 0.1%, unless the variant is also in the Catalogue of Somatic Mutations in Cancer.21 For VICC, a Bayesian methodology incorporating tissue-specific prior expectations was used to detect somatic mutations. Only the VICC samples with full exon-sequence data were included. A smaller sequence assay containing hotspot-only data (ie, targeted sequencing of common mutations, not full exons) from VICC was excluded. MSK and DFCI data consisted solely of full exon data and therefore all were retained. Patients with multiple tumor samples in Project GENIE were also excluded from the analysis.
For VICC-specific clinical data, we obtained Project GENIE–linked clinical data from the Vanderbilt Cancer Registry with the approval of the institutional review board (No. 171559). Manual curation and basic quality control were performed on the data, including removing from additional analysis patients with no clinical data other than cancer genotype. These patients most likely represented patients who were seen in consultation at VICC but received the majority of their care elsewhere. As before, patients with multiple samples were excluded from the analysis. Clinical data included the stage at diagnosis, date of diagnosis, date of initial treatment, date of radiation treatment, and type of systemic therapy regimen received.
Cohort Matching
Matched cohorts of primary and metastatic samples for each cancer type were created by splitting the samples on the basis of cancer type per GENIE and further subdividing that group into primary and metastatic sample cohorts. Propensity score matching22 was performed using the R package MatchIt to reduce the bias of clinical characteristics of the patients.23 The primary and metastatic sample cohorts were matched on the basis of available clinical variables: age at sequencing, sex (with the exception of those with breast cancer), ethnicity, race, and sequence assay ID. Sequence assays are center specific, although each center may have multiple assays. Matching scores were based on the nearest neighbor scoring, with the exception of the sequence assay ID, which was required to be exact. Matched cohorts were then compared against each other by the matched characteristics to gauge the success of the match.
Mutation Analysis
All genes covered by the gene panels from the 3 included institutions (MSK, DFCI, and VICC) were included for analysis. All annotated mutations [missense, nonsense, 3′ untranslated region (UTR), 5′ UTR, splice site, and small indels] from all sequenced exons were included as somatic mutations. Copy number (deletions and amplifications) and re-arrangement data were not included because their functional consequences are not as well described, and panel-based testing can underestimate amplification or deletion events across broad regions (up to and including whole chromosomes). Genes with somatic mutations were compared between the primary and metastatic cohorts for differential representation using a 2-sided Fisher’s exact test with Bonferroni correction (q < .05 was considered statistically significant). The tumor mutational burden (TMB) was estimated by dividing the total number of mutations per sample by the number of base pairs sequenced.24
Treatment Network Analysis
For treatment analysis, we used clinical data obtained from the Vanderbilt Cancer Registry as described earlier in the text. The free text of treatment regimens was standardized and was classified into 12 mechanistic treatment categories. Radiation therapy was considered another treatment category, bringing the total number of categories to 13. Broadly speaking, treatment categories were defined by shared mechanism of action (eg, erlotinib and gefitinib, both epidermal growth factor receptor inhibitors). Patients were considered to have received radiation treatment if they had a date of initial radiation listed. We analyzed these data using bipartite networks,25 in which nodes represented either patients or treatment categories, and the edges connecting patient-treatment pairs represented which patients were in specific treatment categories. We used bicluster modularity26 to identify patient subgroups and their most frequently co-occurring treatment categories, and measured the degree of clusteredness using modularity (in which a value > 0.3 is considered strong clustering27). The network was laid out using the Kamada-Kawai force-based algorithm28 and the ExplodeLayout algorithm.29
Statistical Analysis
Gene enrichment in the metastatic and primary cohorts from Project GENIE data was determined using Fisher’s exact test with a false discovery rate for multiple comparison testing; q < .05 was considered statistically significant. Comparison of patient characteristics between VICC and Project GENIE was performed using the Wilcoxon matched-pairs signed rank test for categorical variables and the unpaired t test for age. All analyses were performed using the statistical software Prism 8.
RESULTS
GENIE Patient Cohort: Clinical Characteristics
We used genomic and clinical data collected on patients in Project GENIE from MSK, DFCI, and VICC as described in Methods. We next divided the samples by cancer type, as defined by GENIE. The top 3 represented cancers (non–small-cell lung, breast, and colon, respectively) represented 40% of the total samples and had a sufficient mixture of primary and metastatic samples to conduct this analysis. We therefore chose to focus our analysis on these 3 cancers. We also excluded samples from patients who had more than 1 sample (6% of the samples) and samples that were not identified as primary or metastatic (< 1%). In total, N = 13,208 samples (5,221 lung, 4,607 breast, and 3,380 colorectal) were included in the study.
Overall, there is more representation of primary samples than of metastatic in GENIE (60.7% primary, 34.2% metastasis). However, we sought to minimize differences between the confounding clinical characteristics of the primary and metastatic sample cohorts through cohort matching. Without matching, the primary and metastatic cohorts were already fairly well matched in terms of age, sex, and primary ethnicity (Table 1; Fig 2A). With cohort matching, we were able to retain 73.4% of the original cohort. The most notable difference between the primary and metastatic cohorts was a reduction in primary samples from DFCI, specifically using 1 assay, DFCI-ONCOPANEL2 (Fig 2B; Table 1).
TABLE 1.
Demographic and Clinical Characteristics of Patients in Matched and Unmatched GENIE Cohorts
FIG 2.

Clinical characteristics of the Genomics Evidence Neoplasia Information Exchange (GENIE) cohort are similar between matched and unmatched cohorts. (A) Age distributions of the matched and unmatched primary and metastatic cohorts for breast, colon, and lung cancer. (B) Gene assay makeup of the matched and unmatched primary and metastatic cohorts for breast, colon, and lung cancer. DFCI, Dana-Farber Cancer Institute; MSK, Memorial Sloan Kettering; VICC, Vanderbilt-Ingram Cancer Center.
Differential Mutations Associated With Metastasis by Cancer Type
We first compared the overall mutation burden in primary and metastatic samples. For breast and lung cancers, the average number of mutations was higher in the metastatic versus the primary cohort (Fig 3A; Appendix Table A1). However, in colorectal cancer, the opposite was true. This is consistent with previous studies comparing large cohorts of primary versus metastatic cancers, likely caused by the tendency of hypermutated colon cancers to be less aggressive.7,8,15,17
FIG 3.
Comparison of primary and metastatic (met) cohorts shows site-specific mutations and differences in tumor mutation burden. (A) Tumor mutation burden for the matched primary and met cohorts in breast, colon, and lung. Tumor mutation burden was calculated by dividing the number of mutations at the gene level by the number of base pairs (bp) sequenced. (B) Genes with mutations significantly over-represented in primary or met samples. Matched primary and met cohorts in each cancer type were compared using Fisher’s exact test. Results with q value < .01 are labeled by gene name. CRC, colorectal.
To identify the genetic differences between primary and metastatic cancers, we aggregated genetic variants by gene and compared them between the primary and metastatic cohorts. Across 756 genes, there were significant differences in all cancer types (Fig 3B; Appendix Table A2). We found that 48 genes were statistically significant between primary and metastatic samples for the 3 cancer types included in the analysis (q < .05): 8 genes in breast cancer, 2 genes in lung cancer, and 38 genes in colorectal cancer. Of these genes, 43 were unique, and 5 were shared among the different cancer types. For lung and breast cancers, all statistically significant mutations were over-represented in metastatic samples compared with the primaries (odds ratio [OR], > 1 for all), whereas for colorectal cancer, all but 1 of the mutated genes (TP53) were over-represented in the primary samples compared with the metastatic samples (OR, < 1).
TP53 was universally associated with metastatic samples across cancer types. The ORs associated with breast, colon, and lung were 1.27 (95% CI, 1.08 to 1.42; q = .048), 1.48 (95% CI, 1.22 to 1.77; q = 3.6 × 10−3), and 1.63 (95% CI, 1.41 to 1.84; q < 1 × 10−6), respectively. Of the represented genes, TP53 was the only gene with a common association across all cancers. The other overlapping genes were ARID2, FANCA, and SMARCA4, which were over-represented in breast metastases and colorectal primaries. In general, more specific mutations were associated with colorectal primaries than with metastases, unlike in breast and lung.
Most prominently, in breast cancer, ESR1 variants were more common in metastatic samples (OR, 5.91 [95% CI, 3.73 to 7.71]; q < 10−6). Finding ESR1 mutations in metastatic breast cancer is consistent with previous work showing the ESR1 mutation to be a mechanism of endocrine therapy resistance, leading to recurrence as a new metastasis.30-34 Of the 59 distinct breast cancer ESR1 missense variants in GENIE, 42 (71%) were not annotated in the Variant Interpretation for Cancer Consortium meta-KB, a compilation of multiple cancer variant knowledge bases, including unannotated variants that were found in multiple patients at more than 1 institution.
Clinical Characteristics of VICC Cohort
Finding ESR1 suggests that some genetic differences between primary and metastatic samples may be representative of treatment resistance mutations. Therefore, prior treatment data would be helpful for interpretation. Unfortunately, Project GENIE does not yet contain treatment data. We therefore did a subset analysis of the VICC cohort.
VICC represents 4% of the overall GENIE cohort (2,302 of 56,970 samples) and 5.6% of this study’s cohort (818 of 13,208). We abstracted data from the cancer registry on 2,111 VICC patients represented in GENIE. After the exclusion of 317 patients for having multiple cancer diagnoses, 1,016 patients had treatment data (the remaining patients did not have treatment data at VICC).
There are a few key differences between the clinical characteristics of the GENIE and VICC cohorts. Among the more prevalent cancers, there is a significantly larger representation of melanoma, leukemia, and myelodysplastic syndromes at VICC than in GENIE (Fig 4A). In addition, there is a nonsignificant increase in the proportion of metastatic cancers at VICC. Age and sex distribution remain roughly the same (Fig 4B; Table 2).
FIG 4.
Analysis of the Vanderbilt-Ingram Cancer Center (VICC) subset of the Genomics Evidence Neoplasia Information Exchange (GENIE) shows common treatment modalities across cancer types. (A) Age distributions of the GENIE and VICC data sets, regardless of cancer type. Only VICC patients with treatment data were included. (B) Cancer type makeup of the GENIE and VICC data sets. The top 10 represented cancer types of GENIE and VICC were included, yielding a union of a total of 13 types. The remaining cancer types were grouped together. (C) Network analysis of treatments received by VICC patients. Patients are represented by nodes, colored by cancer type, and are connected by edges to treatment types received. AI, aromatase inhibitor; Anti-HER2, anti–human epidermal growth factor receptor 2; Cytotoxic Abx, cytotoxic antibiotic; EGFRi, epidermal growth factor receptor inhibitor; GnRHa, gonadotropin-releasing hormone agonist; Immunotx, immunotherapy; RT, radiation therapy; SERM/D, selective estrogen receptor modulator/degrader; VEGFi, vascular endothelial growth factor inhibitor.
TABLE 2.
Demographic and Clinical Characteristics of Patients in GENIE Compared to VICC

Of Vanderbilt samples from all cancers, 846 of 1,875 (45.1%) were from metastases, and 338 of 672 (50.3%) of the samples with annotated stage were stage IV. Grouping by types of therapy (13 in total, including radiation therapy), alkylating agents were the most prevalent (503 of 1,016 [49.5%]) and were used across multiple cancer types (72 of 128 breast [56%], 72 of 106 colorectal [68%], and 98 of 133 lung [74%]), followed by antimetabolites (424 of 1,016 [41.7%]), taxanes (255 of 1,016 [25.1%]), and anthracyclines (175 of 1,016 [17.2%]; Appendix Table A3).
Given that one of the strongest signals of the initial analysis was ESR1, a gene associated with treatment resistance, we next focused on treatment exposure.33,34 However, there is considerable heterogeneity of treatment regimens available in the metastatic setting. Patients with the same broad cancer type may receive different courses of treatment, and patients with different cancer types may receive similar classes of treatment. Visual analytics using bipartite networks can be used to gain a better understanding of this heterogeneity. To that end, we grouped chemotherapy, immune therapies, and targeted therapies according to a schema derived from the HemOnc ontology35,36 and again focused on patients with either breast, lung, or colon cancer. Bipartite network analysis of the resulting data revealed 5 biclusters that had strong clusteredness (q = .35). Furthermore, our analysis showed both cancer-specific regimen clusters (such as selective estrogen receptor modulators in breast cancer) and clusters that showed a mix of histologies (such as a large cluster including both patients with lung cancer and those with colorectal cancer around alkylators; Fig 4C).
DISCUSSION
The increasingly widespread adoption of tumor genome sequencing has led to a wealth of data aggregated into Project GENIE, the largest database of its kind to date.19 This study demonstrates how cohort matching can be used to address source heterogeneity. We performed a comparison of GENIE as a whole with a member institution, VICC, to show the degree of institutional source heterogeneity in clinical variables. In addition, we analyzed the distribution of key clinical variables such as treatment exposure among VICC patients, data which are currently missing in GENIE.
Using a clinically collected database such as GENIE means that the primary and metastatic cohorts are from different patients and include gene panels with incomplete overlap. A previous study has addressed this limitation by focusing on overlapping genes shared by all panels.37 However, this limits the comparison with only 44 genes in the original data set and excludes ESR1, the most differentially mutated gene between primary and metastatic breast cancers in our analysis. In this analysis, we included all sequenced genes and used cohort matching to ensure equal representation of genes in the primary and metastatic cohorts. Cohort matching may be useful for additional analyses of consolidated databases such as Project GENIE.
Cohort matching by institution gene panel also controls for institutional differences in the patient population. Analysis of the VICC cohort compared with GENIE shows significant differences in cancer type distribution. The VICC patients with genomic data but no clinical data likely represent patients seen in consultation with VICC but who received the majority of their care elsewhere. We did not find VICC-specific genes among our significant results.
Overall, our work is able to replicate previously known findings, but we also found novel associations. ATR, ARID2, ESR1, FGFR4, and TP53, but not FANCA, SMARCA4, or TSC2, were found in an independent analysis of the MSK breast cancer data.8 The latter genes are well represented in the DFCI and VICC data sets, so these differences may be a result of the increased power of the combined data. In colorectal cancer, there is less consensus in primary tumor-specific mutations, although we were able to find some commonalities.12,13,38,39 For example, genes in the PI3K pathway (PIK3CA, PI3KR, and PTEN) are enriched in the primary cohort. PIK3CA mutations are associated with reduced rates of recurrence after primary colon cancer resection.40
The enrichment of ESR1 suggests that treatment resistance markers may represent a significant difference between primary and metastatic samples because of the strong signal from selection pressure.33,34 Although the VICC cohort was underpowered to associate ESR1 mutations with treatment, this remains an important direction for future work. Correlating genetic differences with treatment exposure may aid in identifying other resistance markers in the future.
In our analysis, lung and breast metastatic samples are associated with larger mutation burdens, possibly because of the accumulated genetic complexity. Colorectal cancer metastases exhibited lower estimated TMB than did primary tumors. One explanation is that increased mutation burden leads to increased immune surveillance, which, in turn, leads to decreased metastasis, such as in colorectal cancers with high microsatellite instability.41,42 We also found MSH6 and BRAF, which are associated with microsatellite instability,43 enriched in our colorectal primary sample cohort.
Overall, this study demonstrates the feasibility of a larger scale analysis using data from different institutions. One of the missions of Project GENIE is to generate new hypotheses, which this study shows can be a possibility.19 As Project GENIE grows in sample size and clinical annotation, so will its ability to detect small, rare differences and confirm overarching trends. Project GENIE has grown over threefold since its initial launch, and its power to detect significant changes has likewise increased. From the initial cohort (Project GENIE v1), our analysis yielded 3 genes with q values < .05, but from the analyzed cohort (Project GENIE v5), our analysis yielded 47. In addition, more clinical annotation, especially treatment data, will allow the correlation of treatment exposure with treatment resistance markers. We must continue to explore how best to use the growing wealth of cancer genomic information in the years to come.
APPENDIX
TABLE A1.
Tumor Mutation Burden in GENIE Cancer Cohorts
TABLE A2.
Genes Significantly Overrepresented in Primary or Metastatic Cohorts in GENIE

TABLE A3.
Treatment Regimens Used in VICC Cancer Cohort by Category
Presented in poster format (preliminary results) at the 30th Anniversary American Association for Cancer Research Special Conference Convergence: Artificial Intelligence, Big Data, and Prediction in Cancer, Newport, RI, October 14-17, 2018.
SUPPORT
Supported by National Institutes of Health Grant Nos. U01 CA231840 (S.K.B. and J.L.W.) and T32 HG008341 (S.M.R.)
AUTHOR CONTRIBUTIONS
Conception and design: Julie Wu, Michele Lenoue-Newton, Mia Levy, Christine Micheel, Yaomin Xu, Jeremy L. Warner
Provision of study material or patients: Samuel M. Rubinstein
Collection and assembly of data: Julie Wu, Samuel M. Rubinstein, Lucy Wang, Mia Levy, Christine Micheel, Jeremy L. Warner
Data analysis and interpretation: Julie Wu, Jordan Bryan, Samuel M. Rubinstein, Raed Zuhour, Mia Levy, Yaomin Xu, Suresh K. Bhavnani, Lester Mackey, Jeremy L. Warner
Manuscript writing: All authors
Final approval of manuscript: All authors
Accountable for all aspects of the work: All authors
AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST
The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/po/author-center.
Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).
Michele Lenoue-Newton
Employment: DaVita (I)
Stock and Other Ownership Interests: DaVita (I)
Travel, Accommodations, Expenses: GenomOncology
Mia Levy
Employment: SeqTech Diagnostics (I)
Leadership: Personalis
Stock and Other Ownership Interests: Personalis, GenomOncology
Honoraria: Roche
Consulting or Advisory Role: Personalis, GenomOncology, Roche
Research Funding: GenomOncology
Patents, Royalties, Other Intellectual Property: Royalties from GenomOncology for licensing of MyCancerGenome content
Travel, Accommodations, Expenses: Roche
Christine Micheel
Consulting or Advisory Role: Roche
Research Funding: GenomOncology (Inst), GE Healthcare (Inst)
Lester Mackey
Patents, Royalties, Other Intellectual Property: Royalties from Stanford University Docket No. S15-106 Model for Predicting a Patient’s Future Healthcare Costs
Jeremy L. Warner
Stock and Other Ownership Interests: HemOnc.org
Consulting or Advisory Role: Westat, IBM
Travel, Accommodations, Expenses: IBM
No other potential conflicts of interest were reported.
REFERENCES
- 1.Robbe P, Popitsch N, Knight SJL, et al. Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: Pilot study for the 100,000 Genomes Project. Genet Med. 2018;20:1196–1205. doi: 10.1038/gim.2017.241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lopez JS, Banerji U. Combine and conquer: Challenges for targeted therapy combinations in early phase trials. Nat Rev Clin Oncol. 2017;14:57–66. doi: 10.1038/nrclinonc.2016.96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Engstrom PF, Arnoletti JP, Benson AB, III, et al. NCCN clinical practice guidelines in oncology: Colon cancer. J Natl Compr Canc Netw. 2009;7:778–831. doi: 10.6004/jnccn.2009.0056. [DOI] [PubMed] [Google Scholar]
- 4.Lindeman NI, Cagle PT, Beasley MB, et al. Molecular testing guideline for selection of lung cancer patients for EGFR and ALK tyrosine kinase inhibitors: Guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association for Molecular Pathology. J Thorac Oncol. 2013;8:823–859. doi: 10.1097/JTO.0b013e318290868f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Klein CA. Parallel progression of primary tumours and metastases. Nat Rev Cancer. 2009;9:302–312. doi: 10.1038/nrc2627. [DOI] [PubMed] [Google Scholar]
- 6.Turajlic S, Swanton C. Metastasis as an evolutionary process. Science. 2016;352:169–175. doi: 10.1126/science.aaf2784. [DOI] [PubMed] [Google Scholar]
- 7.Robinson DR, Wu Y-M, Lonigro RJ, et al. Integrative clinical genomics of metastatic cancer. Nature. 2017;548:297–303. doi: 10.1038/nature23306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Angus L, Smid M, Wilting SM, et al. The genomic landscape of metastatic breast cancer highlights changes in mutation and signature frequencies. Nat Genet. 2019;51:1450–1458. doi: 10.1038/s41588-019-0507-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Tomlins SA, Mehra R, Rhodes DR, et al. Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007;39:41–51. doi: 10.1038/ng1935. [DOI] [PubMed] [Google Scholar]
- 10.Lapointe J, Li C, Giacomini CP, et al. Genomic profiling reveals alternative genetic pathways of prostate tumorigenesis. Cancer Res. 2007;67:8504–8510. doi: 10.1158/0008-5472.CAN-07-0673. [DOI] [PubMed] [Google Scholar]
- 11.Popławski AB, Jankowski M, Erickson SW, et al. Frequent genetic differences between matched primary and metastatic breast cancer provide an approach to identification of biomarkers for disease progression. Eur J Hum Genet. 2010;18:560–568. doi: 10.1038/ejhg.2009.230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lin AY, Chua M-S, Choi Y-L, et al. Comparative profiling of primary colorectal carcinomas and liver metastases identifies LEF1 as a prognostic biomarker. PLoS One. 2011;6:e16636. doi: 10.1371/journal.pone.0016636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brannon AR, Vakiani E, Sylvester BE, et al. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions. Genome Biol. 2014;15:454. doi: 10.1186/s13059-014-0454-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kim S-K, Kim S-Y, Kim J-H, et al. A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients. Mol Oncol. 2014;8:1653–1666. doi: 10.1016/j.molonc.2014.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bertucci F, Finetti P, Guille A, et al. Comparative genomic analysis of primary tumors and metastases in breast cancer. Oncotarget. 2016;7:27208–27219. doi: 10.18632/oncotarget.8349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Krøigård AB, Larsen MJ, Thomassen M, et al. Molecular concordance between primary breast cancer and matched metastases. Breast J. 2016;22:420–430. doi: 10.1111/tbj.12596. [DOI] [PubMed] [Google Scholar]
- 17.Goto T, Hirotsu Y, Mochizuki H, et al. Mutational analysis of multiple lung cancers: Discrimination between primary and metastatic lung cancers by genomic profile. Oncotarget. 2017;8:31133–31143. doi: 10.18632/oncotarget.16096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lawler K, Papouli E, Naceur-Lombardelli C, et al. Gene expression modules in primary breast cancers as risk factors for organotropic patterns of first metastatic spread: A case control study. Breast Cancer Res. 2017;19:113. doi: 10.1186/s13058-017-0881-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.AACR Project GENIE Consortium AACR Project GENIE: Powering precision medicine through an international consortium. Cancer Discov. 2017;7:818–831. doi: 10.1158/2159-8290.CD-17-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.MSKCC Oncotree. http://oncotree.mskcc.org/#/home
- 21.Tate JG, Bamford S, Jubb HC, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019;47:D941–D947. doi: 10.1093/nar/gky1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46:399–424. doi: 10.1080/00273171.2011.568786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ho D, Imai K, King G, et al. MatchIt: Nonparametric preprocessing for parametric causal inference. J Stat Softw. 2011;42:1–28. [Google Scholar]
- 24.Baras AS, Stricker T. Consortium on behalf of the APG: Abstract LB-105: Characterization of total mutational burden in the GENIE cohort: Small and large panels can provide TMB information but to varying degrees. Cancer Res. 2017;77:LB-105. [Google Scholar]
- 25.Newman M. Networks: An Introduction. Oxford Scholarship Online; 2010. https://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199206650.001.0001/acprof-9780199206650 [Google Scholar]
- 26.Treviño S, Nyberg A, Genio CID, et al. Fast and accurate determination of modularity and its effect size. J Stat Mech. 2015;P02003:2015. [Google Scholar]
- 27.Newman MEJ. Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;69:066133. doi: 10.1103/PhysRevE.69.066133. [DOI] [PubMed] [Google Scholar]
- 28.Kamada T, Kawai S. An algorithm for drawing general undirected graphs. Inf Process Lett. 1989;31:7–15. [Google Scholar]
- 29.Bhavnani SK, Chen T, Ayyaswamy A, et al. Enabling comprehension of patient subgroups and characteristics in large bipartite networks: Implications for precision medicine. AMIA Jt Summits Transl Sci Proc. 2017;2017:21–29. [PMC free article] [PubMed] [Google Scholar]
- 30.Li S, Shen D, Shao J, et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 2013;4:1116–1130. doi: 10.1016/j.celrep.2013.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Robinson DR, Wu Y-M, Vats P, et al. Activating ESR1 mutations in hormone-resistant metastatic breast cancer. Nat Genet. 2013;45:1446–1451. doi: 10.1038/ng.2823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Toy W, Shen Y, Won H, et al. ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat Genet. 2013;45:1439–1445. doi: 10.1038/ng.2822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Clarke R, Tyson JJ, Dixon JM. Endocrine resistance in breast cancer--An overview and update. Mol Cell Endocrinol. 2015;418:220–234. doi: 10.1016/j.mce.2015.09.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Reinert T, Gonçalves R, Bines J. Implications of ESR1 mutations in hormone receptor-positive breast cancer. Curr Treat Options Oncol. 2018;19:24. doi: 10.1007/s11864-018-0542-0. [DOI] [PubMed] [Google Scholar]
- 35.Malty AM, Jain SK, Yang PC, et al. Computerized approach to creating a systematic ontology of hematology/oncology regimens. JCO Clin Cancer Inform. 2018;2:1–11. doi: 10.1200/CCI.17.00142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Warner JL, Dymshyts D, Reich CG, et al. HemOnc: A new standard vocabulary for chemotherapy regimen representation in the OMOP common data model. J Biomed Inform. 2019;96:103239. doi: 10.1016/j.jbi.2019.103239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Liu G, Zhan X, Dong C, et al. Genomics alterations of metastatic and primary tissues across 15 cancer types. Sci Rep. 2017;7:13262. doi: 10.1038/s41598-017-13650-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Vermaat JS, Nijman IJ, Koudijs MJ, et al. Primary colorectal cancers and their subsequent hepatic metastases are genetically different: Implications for selection of patients for targeted treatment. Clin Cancer Res. 2012;18:688–699. doi: 10.1158/1078-0432.CCR-11-1965. [DOI] [PubMed] [Google Scholar]
- 39.Yaeger R, Chatila WK, Lipsyc MD, et al. Clinical sequencing defines the genomic landscape of Metastatic colorectal cancer. Cancer Cell. 2018;33:125–136.e3. doi: 10.1016/j.ccell.2017.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Manceau G, Marisa L, Boige V, et al. PIK3CA mutations predict recurrence in localized microsatellite stable colon cancer. Cancer Med. 2015;4:371–382. doi: 10.1002/cam4.370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Popat S, Hubner R, Houlston RS. Systematic review of microsatellite instability and colorectal cancer prognosis. J Clin Oncol. 2005;23:609–618. doi: 10.1200/JCO.2005.01.086. [DOI] [PubMed] [Google Scholar]
- 42.Kloor M, von Knebel Doeberitz M. The immune biology of microsatellite-unstable cancer. Trends Cancer. 2016;2:121–133. doi: 10.1016/j.trecan.2016.02.004. [DOI] [PubMed] [Google Scholar]
- 43.Boland CR, Goel A. Microsatellite instability in colorectal cancer. Gastroenterology. 2010;138:2073–2087.e3. doi: 10.1053/j.gastro.2009.12.064. [DOI] [PMC free article] [PubMed] [Google Scholar]





