Abstract
Patients with non-small cell lung cancer (NSCLC) who have distant metastases have a poor prognosis. To determine which genomic factors of the primary tumor are associated with metastasis, we analyzed data from 759 patients originally diagnosed with stage I–III NSCLC as part of the AACR Project GENIE Biopharma Collaborative consortium. We found that TP53 mutations were significantly associated with the development of new distant metastases. TP53 mutations were also more prevalent in patients with a history of smoking, suggesting that these patients may be at increased risk for distant metastasis. Our results suggest that additional investigation of the optimal management of patients with early-stage NSCLC harboring TP53 mutations at diagnosis is warranted in light of their higher likelihood of developing new distant metastases.
Subject terms: Cancer genetics, Statistics
Introduction
Distant metastasis in non-small-cell lung cancer (NSCLC) is associated with a poor survival of only 6% at 5 years after primary diagnosis1. About 50% of patients present with distant metastases at the time of diagnosis (i.e., Stage IV)2, and ~ 34% of patients diagnosed with stage I-II disease develop metastases five years after diagnosis3. While some studies suggest that specific mutations (e.g., in EGFR) increase the risk of distant metastasis4, other results indicate that these mutations do not significantly affect metastasis development5. To further investigate this question, we performed a retrospective analysis of 759 patients with stage I-III NSCLC who underwent targeted sequencing of their primary tumors as part of the AACR Project GENIE BPC NSCLC v2.1-consortium dataset6 to determine if specific mutations and copy number alterations common in NSCLC are associated with metastasis to distant sites.
We used multivariate Cox proportional hazards models to quantify the association between common genomic alterations in the primary tumor and the rate of developing distant metastases in NSCLC patients diagnosed with local or locally advanced disease (stages I-IIIB; Fig. 1A, Supplementary Table 1, Methods). We investigated associations between nonsynonymous mutations in 5 of the most commonly mutated genes in NSCLC (TP53, KRAS, EGFR, BRAF, PIK3CA) and copy number changes in 5 of the most commonly amplified genes (EGFR, PIK3CA, MET, KRAS, FGFR1) and the likelihood of developing metastases. We found that TP53 mutations were associated with a significantly increased rate of developing metastases to any distant site after diagnosis (Fig. 1B,C; HR = 1.43, HR 95% CI 1.09–2.90, p = 0.033, Wald’s Test with Benjamini-Hochberg (BH) adjustment for multiple hypothesis testing).
We also investigated associations between these mutations and CNAs and the development of metastases to specific distant sites individually (Fig. 1D) and found that TP53 mutations were associated with a significantly increased rate of metastasis to the liver (HR = 2.51, HR 95% CI 1.07–5.93, BH-adjusted p = 0.026, Wald’s Test). However, no significant associations between any genomic alterations and the metastasis rate to brain or bone specifically were observed (Supplementary Fig. 1). We found that TP53 mutation status was not significantly associated with NSCLC stage at diagnosis (p = 0.21, χ2 test) (Fig. 1E), but was significantly associated with reduced overall survival in patients diagnosed with stage I-III NSCLC (Fig. 1F and Supplementary Fig. 2; HR = 1.97, HR 95% CI 1.45–2.66, p < 1e-04, Wald’s test).
Given the prognostic significance of TP53 mutations in NSCLC, we analyzed the location and identity of TP53 mutations found in primary tumors using an expanded cohort of 1,034 patients with stage I-IV disease (Methods). TP53 mutations in cancer have previously been shown to occur mostly in the DNA binding domain7,8, suggesting that these mutations are likely to impair protein function. Of the 331 patients in our cohort with nonsynonymous point mutations or indels in TP53, 285 had mutations localized within the p53 DNA binding domain, most of which are single nucleotide substitutions (Fig. 2A). However, the splice site or frameshift insertions or deletions (n = 52 mutations) were more evenly spread throughout the coding sequence, likely because these mutations have a greater impact on protein function regardless of location.
We also found that TP53 mutations were enriched in patients with a smoking history (p = 0.0023, χ2 test; odds ratio 1.66; Fig. 2B). Single nucleotide substitutions in TP53 in smokers had a significantly different pattern of base substitutions than nonsmokers, with a higher rate of C > A substitutions found in smokers (Fig. 2C). This pattern is similar to the mutational signature associated with tobacco smoking in cancers of the lung and larynx9. The different mutational processes active in smokers and never-smokers were shown to result in differences in the frequency of TP53 mutations10. We found that the most common point mutation in smokers (R158L) is less common in never-smokers (14/256 point mutations in smokers, vs. 0/57 in never smokers), although this difference was not significant (p = 0.15, χ2 test; Fig. 2D). This mutation has previously been shown to be more prevalent in lung cancers10 and is associated with changes in cell motility and drug sensitivity in vitro11. In summary, patients with NSCLC with a history of smoking had more frequent mutations in TP53, likely due to smoking-related mutational processes. Our Cox modeling results (Fig. 1) suggest that this increased TP53 mutation burden is associated with increased risk of developing distant metastases after diagnosis.
Our work has several limitations. First, as our study retrospectively examined the effect of genomic alterations on patient outcome, differences in treatment or other factors associated with specific mutations (e.g., administration of targeted therapies to patients with EGFR mutations) made it difficult to isolate the effect of certain genomic changes. Additionally, our study is vulnerable to selection bias and to informative cohort entry12,13, since it only included patients who underwent primary tumor genomic sequencing, which is more likely to be performed in patients who later developed recurrent or progressive disease.
In summary, we found that TP53 mutations are associated with distant recurrence in patients with NSCLC who were diagnosed with stage I-III disease. Our results suggest that TP53 mutation status should be regularly tracked in all prospective adjuvant trials in early-stage NSCLC, so that the effect of this frequent mutation can be better understood. While previous clinical trials suggest that adjuvant therapy with cisplatin-based regimens does not improve survival in patients with early-stage TP53-mutant NSCLC relative to patients with TP53-wild type disease14,15, other therapies (e.g., immunotherapy) could provide a survival advantage to this population16. Given the potential for distant recurrence in this population, additional investigation of the optimal management strategy for patients with TP53-mutant NSCLC is warranted.
Methods
Participant eligibility and selection
Clinical and genomic data for 1,862 patients with NSCLC were collected as part of AACR Project GENIE (BPC NSCLC version 2.1) (Fig. 1A; Supplementary Table 1). Permission to access the data was granted by the AACR Project GENIE Biopharmaceutical Consortium publications committee. All patient data was anonymized before retrieval. The Dana-Farber/Harvard Cancer Center Institutional Review Board determined that this study did not constitute human subjects research, given its use of a previously collected, deidentified dataset. All research was performed in accordance with the Declaration of Helsinki. Data from patients with a NSCLC diagnosis of any stage and who received targeted genomic sequencing of a primary tumor and/or a metastasis biopsy at Dana-Farber Cancer Institute, Memorial Sloan-Kettering Cancer Center, or Vanderbilt-Ingram Cancer Center between 1/1/2014 and 12/31/2017, or at Princess Margaret Cancer Center (Toronto, CA) between 1/1/2014 and 12/31/2015 were collected in the BPC dataset. Additionally, the BPC study only included patients that were between 18 and 89 years of age at the time of sequencing and who were followed for at least two years after sequencing (or until death). For patients who had tumor sequencing performed on a research basis, informed consent for use of genomic and clinical data were obtained; for those who had sequencing performed on a standard of care clinical basis, data were collected under a waiver of informed consent at respective institutions. For this study, only patients with sequencing of at least one primary tumor sample were included, and only primary tumor sequencing data was used for all analyses. American Joint Committee on Cancer (AJCC) TNM tumor stage was determined in accordance with current guidelines at the time of diagnosis (AJCC guidelines version 6 or 7). Only patients with stage I-III disease at diagnosis were used for Cox proportional hazards modeling to study the association between primary tumor genomics, distant metastasis, and survival, while all patients (including patients with stage IV disease) were used to study the pattern of mutations that occur in the TP53 gene in NSCLC.
Clinical and genomic data collection
Targeted sequencing of primary tumor samples was performed using institution-specific clinical next-generation sequencing panels. The tumor sequencing panels used and variant calling pipeline for the AACR Project GENIE are as previously described6.
Imaging records and medical oncologists’ notes were curated according to the PRISSMM framework17 to determine when and where metastases appeared in each patient. Each radiologist report was reviewed to determine whether cancer was present and in which anatomical sites the tumor was found. These notes were used to determine the length of time from diagnosis of the primary tumor to the time at which disease was first observed at each distant site. The time to first distant metastasis was defined as the earliest time after diagnosis at which the patient had an extra-thoracic lymph node or organ metastasis, or a metastasis to the mediastinum, heart, or pleura. No patients in the analysis of association between primary tumor genomics and distant metastases had distant metastases at the time of diagnosis.
Statistical analysis of time to new distant metastases
We used multivariable Cox proportional hazards models to test whether a priori defined static covariates were significantly associated with the development of new distant metastases after diagnosis in patients with stage I-III NSCLC. Six demographic and clinical covariates were included in each model: age at diagnosis, smoking history (current or former smoker vs. never smoker), sex, race, ethnicity, and stage (I, II, or III) at diagnosis. We also used primary tumor SNV/indel information for 5 genes (TP53, KRAS, EGFR, BRAF, PIK3CA) and copy number alteration data for 5 genes (EGFR, PIK3CA, MET, KRAS, FGFR1). Among mutations, only nonsynonymous point mutations, frameshift mutations, and splice site mutations were considered.
Multivariate Cox proportional hazards models for the time to first distant metastasis and for the time to bone, brain, and liver metastases were fit using the coxph function in the R survival package, version 3.218, with right censoring at the date of death or last patient contact, such that the competing risk of death was addressed by analyzing the cause-specific hazard of distant metastasis. Wald test p-values for each covariate were pooled across all mutations/CNAs tested for each metastasis site and adjusted for multiple hypotheses19,20 using the Benjamini–Hochberg method, and covariates with adjusted p-value < 0.05 were considered significant. Confidence intervals for the hazard ratios were adjusted for multiple comparisons using the Bonferroni method.
Statistical analysis of the effect of TP53 mutations on patient survival after NSCLC diagnosis
After observing that mutations in TP53 were associated with increased risk of distant metastasis, multivariable Cox proportional hazards modeling was used to measure whether associations between primary tumor TP53 mutation status were related to overall survival after diagnosis of stage I-III NSCLC. This model incorporated the demographic, clinical, and genomic covariates used in the time-to-metastasis models (age, sex, race, ethnicity, smoking history, stage at diagnosis, and 10 total mutation/copy number alteration variables). Risk set adjustment21 was not performed, since informative cohort entry has previously been demonstrated in clinico-genomic datasets12,13, and risk set adjustment could still yield biased results in the event of informative entry. Since this analysis was designed to specifically assess the effect of TP53 mutations on patient survival, no correction for multiple hypotheses was performed.
Supplementary Information
Acknowledgements
We thank members of the Michor Lab for discussion and comments. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry as well as members of the AACR Project GENIE consortium for their commitment to data sharing (full list of consortium members given in the Supplementary Information). Interpretations are the responsibility of study authors.
Author contributions
D.V.E., K.K., and F.M. designed the study. D.V.E. and K.K. performed all analyses and wrote the initial draft of the paper. Clinical data collection and curation was done by the AACR Project GENIE Consortium, led by D.S., E.L., J. L.W., G.R, M.L.-N., and P.L.B. F.M. supervised the study, with assistance from K.L.K. and biostatistics guidance from P.C. All authors reviewed and edited the final manuscript.
Funding
National Cancer Institute, USA
Data availability
Genomic and clinical data for the AACR Project GENIE BPC NSCLC cohort is publicly available at http://www.synapse.org/bpc.
Competing interests
K.L.K. reports serving as a consultant/advisor to Aetion, receiving funding from the American Association for Cancer Research related to this work, and receiving honoraria from Roche and IBM. J. L.W. reports receiving funding from the American Association for Cancer Research related to this work, and receiving funding from the National Institutes of Health, consulting fees from Westat, Roche, Melax Tech, Flatiron Health, and ownership of HemOnc.org LLC, outside the submitted work. D.V.E. is a shareholder of Fractal Therapeutics. F.M. is a co-founder of and has equity in Harbinger Health, has equity in Zephyr AI, and serves as a consultant for Harbinger Health, Zephyr AI, and Red Cell Partners. F.M. declares that none of these relationships are directly or indirectly related to the content of this manuscript. All other authors do not have any conflicts.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Debra Van Egeren and Khushi Kohli.
A list of authors and their affiliations appears at the end of the paper.
Contributor Information
Franziska Michor, Email: michor@jimmy.harvard.edu.
for the AACR Project GENIE Consortium represented by Shawn Sweeney:
Michael Fiandalo, Margaret Foti, Yekaterina Khotskaya, Jocelyn Lee, Nicole Peters, Shawn Sweeney, Jean Abraham, James D. Brenton, Carlos Caldas, Gary Doherty, Birgit Nimmervoll, Karen Pinilla, Jose-Ezequiel Martin, Oscar M. Rueda, Stephen-John Sammut, Dilrini Silva, Kajia Cao, Allison P. Heath, Marilyn Li, Jena Lilly, Suzanne MacFarland, John M. Maris, Jennifer L. Mason, Allison M. Morgan, Adam Resnick, Mark Welsh, Yuankun Zhu, Bruce Johnson, Yvonne Li, Lynette Sholl, Ron Beaudoin, Roshni Biswas, Ethan Cerami, Oya Cushing, Deepa Dand, Matthew Ducar, Alexander Gusev, William C. Hahn, Kevin Haigis, Michael Hassett, Katherine A. Janeway, Pasi Jänne, Arundhati Jawale, Jason Johnson, Kenneth L. Kehl, Priti Kumari, Valerie Laucks, Eva Lepisto, Neal Lindeman, James Lindsay, Amanda Lueders, Laura Macconaill, Monica Manam, Tali Mazor, Diana Miller, Ashley Newcomb, John Orechia, Andrea Ovalle, Asha Postle, Daniel Quinn, Brendan Reardon, Barrett Rollins, Priyanka Shivdasani, Angela Tramontano, Eliezer Van Allen, Stephen C. Van Nostrand, Jonathan Bell, Michael B. Datto, Michelle Green, Chris Hubbard, Shannon J. McCall, Niharika B. Mettu, John H. Strickler, Fabrice Andre, Benjamin Besse, Marc Deloger, Semih Dogan, Antoine Italiano, Yohann Loriot, Lacroix Ludovic, Stefan Michels, Jean Scoazec, Alicia Tran-Dien, Gilles Vassal, Christopher E. Freeman, Susan J. Hsiao, Matthew Ingham, Jiuhong Pang, Raul Rabadan, Lira Camille Roman, Richard Carvajal, Raymond DuBois, Maria E. Arcila, Ryma Benayed, Michael F. Berger, Marufur Bhuiya, A. Rose Brannon, Samantha Brown, Debyani Chakravarty, Cynthia Chu, Ino de Bruijn, Jesse Galle, Jianjiong Gao, Stu Gardos, Benjamin Gross, Ritika Kundra, Andrew L. Kung, Marc Ladanyi, Jessica A. Lavery, Xiang Li, Aaron Lisman, Brooke Mastrogiacomo, Caroline McCarthy, Chelsea Nichols, Angelica Ochoa, Katherine S. Panageas, John Philip, Shirin Pillai, Gregory J. Riely, Hira Rizvi, Julia Rudolph, Charles L. Sawyers, Deborah Schrag, Nikolaus Schultz, Julian Schwartz, Robert Sheridan, David Solit, Avery Wang, Manda Wilson, Ahmet Zehir, Hongxin Zhang, Gaofei Zhao, Lailah Ahmed, Philippe L. Bedard, Jeffrey P. Bruce, Helen Chow, Sophie Cooke, Samantha Del Rossi, Sam Felicen, Sevan Hakgor, Prasanna Jagannathan, Suzanne Kamel-Reid, Geeta Krishna, Natasha Leighl, Zhibin Lu, Alisha Nguyen, Leslie Oldfield, Demi Plagianakos, Trevor J. Pugh, Alisha Rizvi, Peter Sabatini, Elizabeth Shah, Nitthusha Singaravelan, Lillian Siu, Gunjan Srivastava, Natalie Stickle, Tracy Stockley, Marian Tang, Carlos Virtaenen, Stuart Watt, Celeste Yu, Brady Bernard, Carlo Bifulco, Julie L. Cramer, Soohee Lee, Brian Piening, Sheila Reynolds, Joseph Slagel, Paul Tittel, Walter Urba, Jake VanCampen, Roshanthi Weerasinghe, Alyssa Acebedo, Justin Guinney, Xindi Guo, Haley Hunter-Zinck, Thomas Yu, Kristen Dang, Valsamo Anagnostou, Alexander Baras, Julie Brahmer, Christopher Gocke, Robert B. Scharpf, Jessica Tao, Victor E. Velculescu, Shlece Alexander, Neil Bailey, Philip Gold, Mariska Bierkens, Jan de Graaf, Jan Hudeček, Gerrit A. Meijer, Kim Monkhorst, Kris G. Samsom, Joyce Sanders, Gabe Sonke, Jelle ten Hoeve, Tony van de Velde, José van den Berg, Emile Voest, George Steinhardt, Sabah Kadri, Wanjari Pankhuri, Peng Wang, Jeremy Segal, Christine Moung, Carlos Espinosa-Mendez, Henry J. Martell, Courtney Onodera, Ana Quintanar Alfaro, E. Alejandro Sweet-Cordero, Eric Talevich, Michelle Turski, Laura Van’t Veer, Amanda Wren, Susana Aguilar, Rodrigo Dienstmann, Francesco Mancuso, Paolo Nuciforo, Josep Tabernero, Cristina Viaplana, Ana Vivancos, Ingrid Anderson, Sandip Chaugai, Joseph Coco, Daniel Fabbri, Doug Johnson, Leigh Jones, Xuanyi Li, Christine Lovly, Sanjay Mishra, Kathleen Mittendorf, Li Wen, Yuanchu James Yang, Chen Ye, Marilyn Holt, Michele L. LeNoue-Newton, Christine M. Micheel, Ben H. Park, Samuel M. Rubinstein, Thomas Stricker, Lucy Wang, Jeremy Warner, Meijian Guan, Guangxu Jin, Liang Liu, Umit Topaloglu, Cetin Urtis, Wei Zhang, Michael D’Eletto, Stephen Hutchison, Janina Longtine, and Zenta Walther
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-022-21448-1.
References
- 1.SEER Cancer statistics review, 1975–2016.
- 2.Chen VW, et al. Analysis of stage and clinical/prognostic factors for lung cancer from SEER registries: AJCC staging and collaborative stage data collection system. Cancer. 2014;120:3781–3792. doi: 10.1002/cncr.29045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kelsey CR, et al. Local recurrence after surgery for early stage lung cancer. Cancer. 2009;115:5218–5227. doi: 10.1002/cncr.24625. [DOI] [PubMed] [Google Scholar]
- 4.Galvez C, et al. The role of EGFR mutations in predicting recurrence in early and locally advanced lung adenocarcinoma following definitive therapy. Oncotarget. 2020;11:1953–1960. doi: 10.18632/oncotarget.27602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mak RH, et al. Outcomes by EGFR, KRAS, and ALK genotype after combined modality therapy for locally advanced non–small-cell lung cancer. JCO Precis. Oncol. 2018;2:1–18. doi: 10.1200/PO.17.00219. [DOI] [PubMed] [Google Scholar]
- 6.AACR Project GENIE Consortium AACR project GENIE: Powering precision medicine through an international consortium. Cancer Discov. 2017;7:818–831. doi: 10.1158/2159-8290.CD-17-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Baugh EH, Ke H, Levine AJ, Bonneau RA, Chan CS. Why are there hotspot mutations in the TP53 gene in human cancers? Cell Death Differ. 2018;25:154–160. doi: 10.1038/cdd.2017.180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bouaoun L, et al. TP53 variations in human cancers: New lessons from the IARC TP53 database and genomics data. Hum. Mutat. 2016;37:865–876. doi: 10.1002/humu.23035. [DOI] [PubMed] [Google Scholar]
- 9.Alexandrov LB, et al. Mutational signatures associated with tobacco smoking in human cancer. Science. 2016;354:618–622. doi: 10.1126/science.aag0299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Giacomelli AO, et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 2018;50:1381–1387. doi: 10.1038/s41588-018-0204-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kong LR, et al. Targeting codon 158 p53-mutant cancers via the induction of p53 acetylation. Nat. Commun. 2020;11:2086. doi: 10.1038/s41467-020-15608-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kehl KL, Schrag D, Hassett MJ, Uno H. Assessment of temporal selection bias in genomic testing in a cohort of patients with cancer. JAMA Netw. Open. 2020;3:e206976. doi: 10.1001/jamanetworkopen.2020.6976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Backenroth D, et al. Accounting for delayed entry in analyses of overall survival in clinico-genomic databases. Cancer Epidemiol. Biomark. Prev. 2022;31:1195–1201. doi: 10.1158/1055-9965.EPI-21-0876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ma X, et al. Significance of TP53 mutations as predictive markers of adjuvant cisplatin-based chemotherapy in completely resected non-small-cell lung cancer. Mol. Oncol. 2014;8:555–564. doi: 10.1016/j.molonc.2013.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tsao M-S, et al. Prognostic and predictive importance of p53 and RAS for adjuvant chemotherapy in non–small-cell lung cancer. JCO. 2007;25:5240–5247. doi: 10.1200/JCO.2007.12.6953. [DOI] [PubMed] [Google Scholar]
- 16.Wakelee HA, et al. IMpower010: Primary results of a phase III global study of atezolizumab versus best supportive care after adjuvant chemotherapy in resected stage IB-IIIA non-small cell lung cancer (NSCLC) JCO. 2021;39:8500–8500. doi: 10.1200/JCO.2021.39.15_suppl.8500. [DOI] [Google Scholar]
- 17.Schrag, D. Real-World Applications of GENIE and a Taxonomy for Defining Cancer Outcomes. (2018).
- 18.Therneau, T. M. A Package for Survival Analysis in R. (2021).
- 19.Vickerstaff V, Omar RZ, Ambler G. Methods to adjust for multiple comparisons in the analysis and sample size calculation of randomised controlled trials with multiple primary outcomes. BMC Med. Res. Methodol. 2019;19:129. doi: 10.1186/s12874-019-0754-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.European Medicines Agency. Guideline on multiplicity issues in clinical trials. (2016).
- 21.Brown S, et al. Implications of selection bias due to delayed study entry in clinical genomic studies. JAMA Oncol. 2022;8:287–291. doi: 10.1001/jamaoncol.2021.5153. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Genomic and clinical data for the AACR Project GENIE BPC NSCLC cohort is publicly available at http://www.synapse.org/bpc.