Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: J Thorac Cardiovasc Surg. 2022 Sep 24;165(4):1554–1564. doi: 10.1016/j.jtcvs.2022.09.028

A Unique Gene Signature Predicting Recurrence Free Survival in Stage IA Lung Adenocarcinoma

Shamus R Carr 1,*, Haitao Wang 1,*, Rasika Hudlikar 1, Xiaofan Lu 2, Mary R Zhang 1, Chuong D Hoang 1, Fangrong Yan 2, David S Schrump 1
PMCID: PMC10442056  NIHMSID: NIHMS1838672  PMID: 37608989

Abstract

Objective:

Resected stage IA lung adenocarcinoma (LUAD) has a reported 5-year recurrence free survival (RFS) of 63-81%. A unique gene signature stratifying patients with early stage LUAD as high or low-risk of recurrence would be valuable.

Methods:

GEO datasets combining European and North American LUAD patients (n=684) were filtered for stage IA (n=105) to develop a robust signature for recurrence (RFSscore). Univariate Cox proportional hazard regression model was used to assess associations of gene expression with RFS and OS. Leveraging a bootstrap approach of these identified upregulated genes allowed construction of a model which was evaluated by Area Under the Received Operating Characteristics. The optimal signature has RFSscore calculated via a linear combination of expression of selected genes weighted by the corresponding Cox regression derived coefficients. Log-rank analysis calculated RFS and OS. Results were validated using the LUAD TCGA transcriptomic NGS based dataset.

Results:

Rigorous bioinformatic analysis identified a signature of 4 genes: KNSTRN, PAFAH1B3, MIF, CHEK1. Kaplan-Meier analysis of stage IA LUAD with this signature resulted in 5-year RFS for low-risk of 90% compared to 53% for high-risk (HR 6.55, 95%CI 2.65-16.18, p-value <0.001), confirming the robustness of the gene signature with its clinical significance. Validation of the signature using TCGA dataset resulted in an AUC of 0.797 and 5-year RFS for low and high-risk stage IA patients being 91% and 67%, respectively (HR 3.44, 95%CI 1.16-10.23, p-value=0.044).

Conclusions:

This 4 gene signature stratifies European and North American patients with pathologically confirmed stage IA LUAD into low and high-risk groups for OS and more importantly RFS.

Keywords: non-small cell lung cancer, adenocarcinoma, prognostic signature, recurrence free survival

Graphical Abstract

graphic file with name nihms-1838672-f0007.jpg

INTRODUCTION

Despite surgical resection being the standard of care for stage IA non-small cell lung cancer (NSCLC) patients, 5-year survival ranges from 77-92% with approximately 15-30% developing recurrence.1,2 Additionally, most recurrences are distant and diagnosed within 2 years of resection.

Recent advances in genome-wide sequencing have enabled identification of candidate biomarkers that are associated with both histopathological findings and clinical outcomes in NSCLC.38 However, these signatures are inconsistent even when utilizing the same databases. Focusing on a single micro-RNA (mRNA), long non-coding RNA (lncRNA), or gene may also have limited applicability911, and many studies continue to group adenocarcinoma and squamous cell carcinoma together, despite different biological signatures.12 Other studies combined stage I and II NSCLC and defined these as “early stage”. While the first issue has been addressed by looking at multiple genes to provide a prognostic signature for early-stage NSCLC1316, the relevance of analyses restricted to adenocarcinomas and outcomes of only earliest stage tumors has not been fully examined.

As lung adenocarcinoma (LUAD) is a heterogenous disease, with multiple subtypes and various frequencies of driver mutations in different ethnic groups, analyzing homogeneous patient cohorts may provide more accurate gene signatures corresponding with survival and disease recurrence.

We hypothesized that using such an approach with large, publicly available datasets we could identify and validate a multi-gene prognostic signature that reliably correlates with recurrence free survival (RFS) and overall survival (OS) in patients with stage IA LUAD. A reliable gene signature could prove useful for identifying stage IA patients who might benefit from post-operative adjuvant therapy rather than standard of care surveillance.

METHODS

Cohort datasets of lung adenocarcinoma

The following search terms in GEO datasets were utilized to identify possible datasets: “Adenocarcinoma of Lung”[MeSH Terms] OR lung adenocarcinoma[All Fields] OR LUAD[All Fields]) AND “Homo sapiens”[porgn]) AND “expression profiling by array”[DataSet Type]. We manually screened all the results and collected a total of 11 datasets with microarray data for LUAD (Supplemental Table 1). We extracted data from the TCGA-LUAD cohort where gene expression data were downloaded from UCSC Xena (https://xena.ucsc.edu/); and either affymetrix or illumina microarry data were downloaded from GEO (http://www.ncbi.nlm.gov/geo/).

The GEO datasets17 in this study included: GSE30219, GSE37745, GSE50081, GSE43580, GSE19188, GSE68465, GSE14814, and GSE42127 (Supplemental Table 1). Only European and North American datasets were selected to minimize any confounding results due to high EGFR mutation rates in Asian LUAD and more closely resemble ethnic proportions seen in the validation TCGA dataset. The Institutional Review Board of the NIH did not require approval of this study as all datasets are publicly available for download and analysis (45 CFR 46.101(b)).

Clinical information

All clinical information was extracted from either the publications with the clinical datasets, or from the GEO dataset using the R package GEOquery (version 2.6.2) or from UCSC Xena for TCGA-LUAD cohort. For most datasets, the clinical characteristics, including histological type, grade, stage, age, and survival data were available; otherwise, the missing information was referred to as not available (NA).

Raw data processing and normalization

Raw CEL files of Affymetrix U133A and U133 plus 2.0 (Affymetrix, Santa Clara, CA, USA) were downloaded and processed using the Robust Multichip Average (RMA) algorithm as implemented in the R package affy (version 1.72.0) and processed with our in-house written R programs pipeline. For the platforms of HumanWG-6 v3.0 and RNAseq, raw files were downloaded from their corresponding databases and similarly processed by in-house written R programs. For multiple probes that corresponded to the same gene, the most sensitive probe was selected for the expression value. ComBat method was used to correct for batch effect before downstream analysis. This filtering resulted in a total of 1193 (TCGA n=479; GPL570 n=583; Illumina n=131) LUAD patient samples from 9 datasets were chosen for further analysis. Differentional expression genes (DEGs) were identified by comparing microarray / RNAseq data of tumor versus adjacent normal lung tissue, using Limma / DEseq2 package (Fold change > 1 and FDR < 0.05).

Prognostic Signature Determination

Figure 1 illustrates the stepwise flow diagram of the analysis in this study. Briefly, candidate genes associated with OS of all LUAD patients from each GEO dataset were identified (step 1). Then, prognostic genes of all overlapping datasets were selected (step 2). These prognostic genes were placed into a risk score model to develop prognostic signatures (step 3). These signatures from combined GEO dataset cohorts were then evaluated for their performance (step 4). Log-rank analysis was used to estimate RFS and OS first using all GEO dataset patients which included stages I through IV (n=583) and then re-analyzed with only stage IA patients (n=105) (step 5). These results were validated in the exact same manner using the TCGA LUAD dataset containing transcriptomic sequences for 479 LUAD patients including 79 with stage IA tumors.

Figure 1:

Figure 1:

Flow diagram of in-silico analysis to 1) identify DEGs; 2) identify genes associated with OS and RFS; 3) perform bootstrapping and pairwise AUC; 4) generate a risk model that allows identification of a unique multi-gene signature; 5) log-rank analysis for OS and RFS

Step 1:Identification of Differentially Expressed Genes (DEGs). Each GEO dataset was initially screened for upregulated DEGs. Identification of candidate genes began by separating GEO and TCGA patients and filtering for only patients that were T1, N0, N1, M0, or M1 and grouping them as T1N0M0 and non-T1N0M0. Initial exclusion of pathological stage IA patients permitted selection of the union of the TCGA and GPL570 for common DEGs related to more advanced stage T1 LUAD. These T1 tumors were clinically associated with either lymphatic (i.e. N1) and/or vascular invasion (i.e. M1). Then different combinations of T1N1/M1, T1N0/M0 and adjacent tissues were used to identify genes associated with lymphatic involvement or progression of T1 stage IA LUAD.

Step 2: Cox-regression. A univariate Cox proportional hazard regression model was performed to evaluate for association of gene expression with both overall survival (OS) and progression free survival (PFS) in all datasets from Europe and North America. The union of only those datasets with this clinical outcome data available allowed identification of DEGs. This allowed the widest capture of genes for further analysis. As progression in fully resected stage IA LUAD is a recurrence, this was interchanged with RFS. . Hazard ratios for genes were assessed to identify upregulated/downregulated genes. A preliminary screen of each training cohort was then performed via univariate Cox regression with significance set at p-value of less than 0.05.

Finally, taking the union of the common DEGs (step 1) from GEO and TCGA datasets (n=566) which comprise all non-stage IA patients and combining with (step 2) all genes from the Cox-regression (n=1335) we identified as “Common Genes”. These genes were in all GEO datasets, the TCGA dataset, and associated with either OS, PFS, or both. These 272 “Common Genes” were used to test the robustness of the model.

Step 3: Bootstrapping. Focusing on upregulated genes, pairwise AUC and Bootstrapping of the top 3000 genes from each platform were selected. Random survival forest was applied using the R statistical package (randomForestSRC version 3.0.2) and repeated 1,000 times. This resulted in 182 genes analyzed to obtain the largest concordance index (C-index) for only stage IA patients (Figure 3). The combination with the largest C-index were considered as optimal signatures. The pairwise AUC for only stage IA patients resulted in 138 genes, of which 89 were associated with OS and 49 were associated with RFS. Analysis of this list revealed some overlap. After removal of duplicates, 109 unique genes were utilized for modeling. The model analyzed different combinations from our training cohort and all combinations with significance were kept, then validated using the TCGA cohort.

Figure 3:

Figure 3:

Pairwised AUC for OS and RFS for only Stage 1A patients (Red dots are the 4 genes that are eventually identified in the risk modeling)

Step 4. Model construction. The identified upregulated genes (i.e. Final gene pool) were then used to construct models and their performance was evaluated by Area Under the Received Operating Characteristics (AUC). Utilizing the gene signature that provided the AUC larger than 0.57, a recurrence free survival score (RFSscore) was individually calculated via a linear combination of the expression of the selected upregulated genes, weighted by the corresponding Cox regression derived coefficients.

Step 5. Final analysis. A log-rank analysis was then utilized to calculate RFS and OS.

Statistical analysis

Survival curves were estimated by Kaplan–Meier methods, and differences between survival distributions were assessed with the two-sided log-rank test as implemented in the R package survival (version 3.2-13). The univariate model was computed by using Cox proportional-hazards regression as implemented in R package survival, and hazard ratio (HR) with 95% confidence interval (CI) was determined. Because only individual contributions to OS were investigated, only the univariate Cox proportional- hazard model was implemented in this study. Forest plots (R package forestplot, version 1.7) were used to illustrate the analysis results of Cox proportional-hazards regression by showing HR and 95% CI (Table 1). All statistical analyses were performed using the R program (version 4.0.0). All statistical tests were two-sided and considered statistically significant if p-value was less than 0.05.

Table 1:

Hazard-Ratio for 4 Gene Signature for Both Training and Validation Sets

Hazard Ratio 95% CI p-value
Training Set (GEO)

OS – All 2.70 1.93 – 3.76 < 0.001

RFS – All 2.72 1.96 – 3.76 < 0.001

OS – Stage IA 4.33 2.07 – 9.05 < 0.001

RFS – Stage IA 6.55 2.65 – 16.18 < 0.001


Validation Set (TCGA)

OS – All 1.98 1.45 – 2.68 < 0.001

RFS – All 1.75 1.23 – 2.50 0.002

OS – Stage IA 3.22 1.09 – 9.56 0.058

RFS – Stage IA 3.44 1.16 – 10.23 0.044

Results

Demographics and Data Set Characteristics

The training dataset consisted of three unique GEO datasets (GSE30219, GSE37745, and GSE50081) as they were the only ones that included stage information, RFS, and OS on a common platform (Supplemental Table 1); collectively these datasets included 583 patients. The TCGA with 479 patients was selected as the validation set. From these datasets we utilized strict criteria (Figure 2) to select only stage IA patients. A total of 105 stage IA patients were in the training set and 79 in the TCGA validation set. The average age of all the stage IA patients in the training set was 63.3 +/− 10.0 years; 68% of these patients were male. The median follow up was 64 months (6-221 months) in the training set. In the validation set, the average age was 66.7+/− 9.6. Notably in the validation set, only 39% of the patients were male and the median follow up was only 23 months (4-105 months).

Figure 2:

Figure 2:

Flowchart showing selection criteria from analyzed individual datasets to arrive at final cohort numbers of stage 1A patients.

Identification of Gene Signature

A single four gene combination resulted in the best AUC in the entire training set with values of 0.708 and 0.732 for RFS and OS, respectively (blue line, Figure 4B and 4A). AUC values increased to 0.790 for RFS and 0.769 for OS when only patients with stage 1A LUAD were evaluated (Figure 4D and 4C). The AUC for RFS and OS for all patients in the TCGA validation set were 0.633 and 0.611, respectively (red line, Figure 4B and 4A). Once again, when restricting the evaluation to stage 1A patients, the AUC improved to 0.797 and 0.824 for RFS and OS, respectively (red line, Figure 4D and 4C). This four gene signature consisting of: kinetochore-localized astrin binding protein (KNSTRN), checkpoint kinase 1 (CHEK1), platelet-activating factor acetylhydrolase isoform 1B3 (PAFAH1B3), and macrophage migration inhibitory factor (MIF). This Using this signature patients could be categorized as either high or low expression and those with high expression were found to have the highest risk of recurrence.

Figure 4.

Figure 4

(A-D): Best calculated AUC for the Risk Model – blue line is training set and red line is validation set. (A) OS of all patients; (B) OS for only stage 1A; (C) RFS for all patients; (D) RFS for only stage 1A

Outcomes

The GEO datasets that contained OS data were then analyzed by log-rank and Kaplan-Meier techniques. All patients in the training cohort were grouped as either high or low-risk based on the identified 4 gene signature. When considering all 583 patients in the GEO datasets, 5-year OS were 78% and 49% for low-expression and high-expression patients, respectively (Figure 5A). Univariate analysis from the Cox proportional hazards regressions for the training set remained significant throughout the analysis (Table 1). The unadjusted estimated HR was 2.70 (95% CI: 1.93-3.76, p < 0.001) for OS for the entire training set and increased to 4.33 (95%CI: 2.07-9.05, p < 0.001) when only stage IA patients were examined. When considering all LUAD the Kaplan-Meier OS graphs for GEO patients demonstrated better OS for patients with low-expression compared to high-expression signatures (p < 0.001) for both all patients (Figure 5A) as well as those with stage IA tumors (Figure 5B). Evaluation for RFS in all patients (Figure 5C) was similar with HR 2.72 (95% CI: 1.96-3.76, p < 0.001) and again improved when only stage IA patients were evaluated to 6.55 (95% CI: 2.65-16.18, p < 0.005) (Figure 5D). The estimated 5-year RFS for only the stage IA low-expression patients was 90%, compared to 5-year RFS for high-expression patients of 53%.

Figure 5.

Figure 5

(A-D): Kaplan-Meier Curves for Training Set – blue line is low-expression and red line is high-expression of 4 gene signature: (A) OS of all patients; (B) OS for only stage 1A; (C) RFS for all patients; (D) RFS for only stage 1A (95% CI)

The validation cohort was then stratified as high versus low-expression based on the 4 gene signature detected in the combined GEO sets. The 5-year OS of all TCGA patients (Figure 6A) was 35% in low-expression compared to 26% with high-expression with HR 1.98 (95%CI: 1.45-2.68, p < 0.001), while 5-year RFS (Figure 6C) was 68% and 44% in low and high-expression patients, respectively with HR 1.75 (95%CI: 1.23-2.50, p = 0.002). When evaluating only stage IA patients, RFS (Figure 6D) remained significant with a HR 3.44 (95%CI: 1.16-10.23, p=0.044). The low-expression patients had a 5-year RFS of 91%, compared to 67% for those with a high-expression signature (p=0.044).

Figure 6.

Figure 6

(A-D): Kaplan-Meier Curves for Validation Set – blue line is low-expression and red line is high-expression of 4 gene signature: (A) OS of all patients; (B) OS for only stage 1A; (C) RFS for all patients; (D) RFS for only stage 1A (95% CI)

Discussion:

NSCLC continues to be a challenging disease and understanding why some patients recur and others with comparably staged tumors do not continues to be a focus of intense research. Current staging systems provide snapshots of overall populations yet fail to prospectively identify individuals most likely to recur. While histopathological findings such as lymphatic and vascular invasion have been shown to adversely impact outcomes18,19, these assessments are subject to pathological interpretation, incomplete analysis of tumor samples, and imprecise reporting. Furthermore, whereas molecular and gene-expression profiles have been shown to correlate with patient prognosis3,16 these signatures have not been readily implemented for treatment planning except in instances where actionable mutations have been identified. About 36% NSCLC have no detectable drivers and KRAS is the driver in 25% of these cancers20, which until recently has been undruggable. Furthermore, the Lung Cancer Mutation Consortium has demonstrated that the incidence of any identifiable mutation varies widely with ethnicity. Compared to Asian lung cancer patients in whom up to 80% may have actionable mutations, less than 40% of Caucasian patients with these malignancies have such mutations.21

In the current study, we identified a 4-gene signature corresponding with prognosis in LUAD arising in patients from Europe and North America. This multigene-signature stratified stage IA patients as high or low-expression and was associated with recurrence. This is especially important since patients with node-negative LUAD under 4 cm in size generally do not receive adjuvant chemotherapy or immunotherapy outside of a clinical trial yet have recurrence rates that may be as high as 30%.2 Implementation of a molecular assessment based on a validated gene signature could prove useful for identifying patients with high-risk tumors, particularly those which lack actionable mutations, thus impacting post-resection treatment recommendations. However, due to the heterogeneity of NSCLC, it is unlikely that a single multigene signature correlates with RFS or OS in all NSCLC subtypes; each histological type may be associated with a unique prognosis signature, which may also vary with ethnicity. Unfortunately, an analysis by subtype was not possible as most cases were listed as “Adenocarcinoma, NOS”.

Our work extends that of other investigators who have sought to identify gene signatures associated with prognosis in lung cancer patients. Chen et al identified a gene signature that was closely associated with both OS and RFS in NSCLC.22 Their signature differs from the one identified in this analysis possibly because their analysis included adenocarcinomas as well as squamous cell carcinomas which are known to have different gene signatures and mutation burdens.12,23 A separate study by He et al identified an 8-gene prognostic signature in early-stage LUAD.13 This study combined datasets from Japan, Sweden, and Canada and included both stage I and II patients, which were defined as “early-stage”. Thus, patients with and without lymph node metastasis were combined. Due to a large percentage of the patients being from Asia, there was also likely a high percentage of EGFR patients. This may have confounded the results as the validation set used was the TCGA, which has very few Asian patients. Thus, unlike He et al, our study only considered pathologically proven stage IA patients from Europe and North America validated by a dataset from North America. This subtle point, in our opinion, provides improved ability of identification of a gene-signature associated with RFS and OS in early-stage LUAD. Because we could not identify and exclude patients with EGFR mutations and the type and extent of the resections performed were not uniformly captured, the signature we have defined should be further evaluated using databases containing DNA sequences and transcriptome signatures for each patient with full demographic and treatment information.

Our analysis demonstrated that combining the upregulated genes KNSTRN, CHEK1, PAFAH1B3, and MIF allowed stratification of stage IA patients into high or low-risk cohorts. Each of these individual genes has previously been linked to poor outcomes of NSCLC patients.48 Interestingly, two of these genes (KNSTRN and CHEK1) drive intracellular processes in cancer cells, while the other two (PAFAH1B3 and MIF) impact interactions of these cells with other stromal elements within the tumor microenvironment.

KNSTRN encodes a kinetochore-localized astrin/SPAG5 binding protein gene that promotes metaphase-to-anaphase transition and chromosome segregation during mitosis.24 In a previous TCGA analysis, KNSTRN was found to be highly expressed in stage I and node negative LUAD and to correlate inversely with RFS and OS in patients with these tumors.4

CHEK1 encodes a protein that plays an essential role in the maintenance of DNA integrity by affecting S-phase and G2/M phase arrest in response to DNA damage. Over expression of CHEK1 has been associated with chemotherapy resistance25 and poor outcomes in NSCLC in several studies including one using completely different GEO mRNA microarray datasets.7,8,26 Early phase trials targeting CHEK1 with a pharmacologic inhibitor have been associated with significant adverse events.27

PAFAH1B3 has been identified as a key driver in lung and breast carcinomas and to correlate with poor prognosis in these malignancies.5,28 Previous studies demonstrated a possible association of PAFAH1B3 with survival in early-stage NSCLC suggesting that overexpression of PAFAH1B3 might be a potential biomarker in these malignancies.16,29

Lastly, MIF encodes a lymphokine involved in multiple cell-mediated pathways regulating antitumor immunity, immunoregulation, inflammation, and angiogenesis. It also contributes to an immunosuppressive microenvironment favoring tumor growth. Recent studies have demonstrated associations between MIF and lung cancer risk and antitumor effects of MIF inhibition.6,30

A potential limitation of our study pertains to our decision to restrict the analysis to LUAD patients from Europe and North America, thus minimizing the confounding factor of EGFR mutation; the most common oncogenic driver in Asians with nearly 50% incidence, while the reported incidence in Caucasian LUAD is only 10-15%.31,32 While ethnicity was not uniformly captured in all datasets, the populations of Canada, Sweden, and France which are represented in the GEO datasets used in our analysis, are predominantly Caucasian. On the other hand, restricting our analysis in this manner may strengthen our findings as it provides results that may be directly applicable to certain LUAD patients. Another factor that needs to be considered moving forward is how microarray data are obtained. Since microarray platforms may have different numbers and sequences of probes for the same genes, results may be more a function of which microarray chip is used and not the actual expression levels of individual biomarkers. This is one reason why we chose the GEO datasets used in this study. Only datasets that utilized the same platform and had information pertaining to tumor type, stage, recurrence free survival, and overall survival were analyzed in our study. Analysis for associations between various genes and clinical information such as lymphovascular invasion, spread through airways, extent of resection, histological subtype, and mutation or PD-L1 status are precluded since these parameters are not uniformly captured in the databases. While the percentage of males and females captured in the GEO and TCGA databases differs, this disparity is inherent in the datasets as the entire TCGA for adenocarcinoma is 54% female. While a sex-related survival disparity in NSCLC is known, it is mainly accounted for by treatment-related factors.33 Finally, there is a marked difference in terms of follow up when comparing training and validation datasets. The median follow up was much shorter in the validation set. However, as the vast majority of LUAD recur within two years of resection, the datasets likely captured nearly all patients who recurred. Nevertheless, despite differences in gender composition and follow-up, when comparing back to a validation set as we did with TCGA, both the number and sequence of probes were again different, which could explain some discrepancies in the results that we have reported compared to other published literature.

This is the first study to examine only stage IA LUAD and utilize microarray data to identify and validate a unique gene signature associated with recurrence and overall survival in patients with these cancers (Figure 7). Being able to accurately predict likelihood of recurrence in resected stage IA LUAD may alter the care of the patient from the current standard of serial imaging. Patients deemed to be high-risk based upon this gene signature, if validated in both laboratory experiments and larger prospective trials, may be considered for adjuvant therapy due to an elevated risk of recurrence. This treatment paradigm is already used for breast34 and prostate cancer35 patients for optimal personalized treatments. The results of this study warrant further evaluation of this 4-gene signature and strengthens the rationale for individualized follow up and therapy for stage IA patients with LUAD.

Supplementary Material

1

Central Picture.

Central Picture

Central Picture

Recurrence Free Survival in Stage IA Lung Adenocarcinoma (95% CI)

Central Picture and Message:

European and North American patients with stage IA lung adenocarcinoma can be stratified into low and high-risk for recurrence based on a 4 gene signature.

Perspective Statement:

Stage IA non-small cell lung adenocarcinoma has a recurrence rate of approximately 15%. A genetic analysis of individual tumors has identified a unique gene signature that provides an individualized method to improve prediction of recurrence in European and North American patients.

Acknowledgment:

This manuscript was presented at the 102nd Annual Meeting of the AATS in Boston, MA (May 14-17, 2022. This research was supported in part by the Intramural Research Program of the NIH (SRC)NIH Intramural Grant (ZIA BC 011115; DSS) and the Stephen J. Solarz Memorial Fund, Foundation for the National Institutes of Health.

Glossary of Abbreviations:

AUC

Area under the Received Operating Characteristics

CI

Confidence Interval

C-Index

Concordance Index

DEGs

Differential Expression Genes

EGFR

Endothelial Growth Factor Receptor

GEO

Gene Expression Omnibus

HR

Hazard Ratio

lncRNA

long non-coding RNA

LUAD

lung adenocarcinoma

mRNA

micro RNA

MIF

Macrophage Inhibitory Factor

NA

not available

NGS

next generation sequencing

NSCLC

non-small cell lung cancer

OS

overall survival

PAFAH1B3

Platelet-activating factor acetylhydrolase 1B3

PFS

progression free survival

RFS

recurrence free survival

RFSscore

robust signature score for recurrence free survival

TCGA

The Cancer Genome Atlas

Biographies

graphic file with name nihms-1838672-b0010.gif

graphic file with name nihms-1838672-b0011.gif

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

All authors have no conflicts of interest to report.

The Institutional Review Board of the NIH did not require approval of this study as all datasets are publicly available for download and analysis (45 CFR 144 46.101(b)).

References:

  • 1.Altorki NK, Yip R, Hanaoka T, et al. Sublobar resection is equivalent to lobectomy for clinical stage 1A lung cancer in solid nodules. J Thorac Cardiovasc Surg. 2014;147(2):754–764. doi: 10.1016/j.jtcvs.2013.09.065 [DOI] [PubMed] [Google Scholar]
  • 2.Wu CF, Fu JY, Yeh CJ, et al. Recurrence Risk Factors Analysis for Stage I Non-small Cell Lung Cancer. Medicine. 2015;94(32):e1337. doi: 10.1097/md.0000000000001337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Beer DG, Kardia SLR, Huang CC, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8(8):816–824. doi: 10.1038/nm733 [DOI] [PubMed] [Google Scholar]
  • 4.Deng P, Zhou R, Zhang J, Cao L. Increased Expression of KNSTRN in Lung Adenocarcinoma Predicts Poor Prognosis: A Bioinformatics Analysis Based on TCGA Data. J Cancer. 2021;12(11):3239–3248. doi: 10.7150/jca.51591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yu DH, Huang JY, Liu XP, et al. Effects of hub genes on the clinicopathological and prognostic features of lung adenocarcinoma. Oncol Lett. 2020;19(2):1203–1214. doi: 10.3892/ol.2019.11193 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jäger B, Klatt D, Plappert L, et al. CXCR4/MIF axis amplifies tumor growth and epithelial-mesenchymal interaction in non-small cell lung cancer. Cell Signal. 2020;73:109672. doi: 10.1016/j.cellsig.2020.109672 [DOI] [PubMed] [Google Scholar]
  • 7.Wang L, Qu J, Liang Y, et al. Identification and validation of key genes with prognostic value in non-small-cell lung cancer via integrated bioinformatics analysis. Thorac Cancer. 2020;11(4):851–866. doi: 10.1111/1759-7714.13298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mu R, Liu H, Luo S, et al. Genetic variants of CHEK1, PRIM2 and CDK6 in the mitotic phase-related pathway are associated with nonsmall cell lung cancer survival. Int J Cancer. 2021;149(6):1302–1312. doi: 10.1002/ijc.33702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu Y, Lyu H, Liu H, Shi X, Song Y, Liu B. Downregulation of the long noncoding RNA GAS5-AS1 contributes to tumor metastasis in non-small cell lung cancer. Sci Rep-uk. 2016;6(1):31093. doi: 10.1038/srep31093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jiang B, Liu J, Zhang YH, et al. Long noncoding RNA LINC00961 inhibits cell invasion and metastasis in human non-small cell lung cancer. Biomed Pharmacother. 2018;97:1311–1318. doi: 10.1016/j.biopha.2017.11.062 [DOI] [PubMed] [Google Scholar]
  • 11.Chen Y, Min L, Zhang X, et al. Decreased miRNA-148a is associated with lymph node metastasis and poor clinical outcomes and functions as a suppressor of tumor metastasis in non-small cell lung cancer. Oncol Rep. 2013;30(4):1832–1840. doi: 10.3892/or.2013.2611 [DOI] [PubMed] [Google Scholar]
  • 12.Chen JW, Dhahbi J. Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods. Sci Rep-uk. 2021;11(1):13323. doi: 10.1038/s41598-021-92725-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.He R, Zuo S. A Robust 8-Gene Prognostic Signature for Early-Stage Non-small Cell Lung Cancer. Frontiers Oncol. 2019;9:693. doi: 10.3389/fonc.2019.00693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zuo S, Wei M, Zhang H, et al. A robust six-gene prognostic signature for prediction of both disease-free and overall survival in non-small cell lung cancer. J Transl Med. 2019;17(1):152. doi: 10.1186/s12967-019-1899-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Xie H, Xie C. A Six-Gene Signature Predicts Survival of Adenocarcinoma Type of Non-Small-Cell Lung Cancer Patients: A Comprehensive Study Based on Integrated Analysis and Weighted Gene Coexpression Network. Biomed Res Int. 2019;2019:4250613. doi: 10.1155/2019/4250613 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lau SK, Boutros PC, Pintilie M, et al. Three-Gene Prognostic Classifier for Early-Stage Non–Small-Cell Lung Cancer. J Clin Oncol. 2007;25(35):5562–5569. doi: 10.1200/jco.2007.12.0352 [DOI] [PubMed] [Google Scholar]
  • 17.Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41(D1):D991–D995. doi: 10.1093/nar/gksll93 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pechet TTV, Carr SR, Collins JE, Cohn HE, Farber JL. Arterial invasion predicts early mortality in stage I non-small cell lung cancer. Ann Thorac Surg. 2004;78(5):1748–1753. doi: 10.1016/j.athoracsur.2004.04.061 [DOI] [PubMed] [Google Scholar]
  • 19.Maeda R, Yoshida J, Ishii G, Hishida T, Nishimura M, Nagai K. Prognostic impact of intratumoral vascular invasion in non-small cell lung cancer patients. Thorax. 2010;65(12):1092. doi: 10.1136/thx.2010.141861 [DOI] [PubMed] [Google Scholar]
  • 20.Sholl LM, Aisner DL, Varella-Garcia M, et al. Multi-institutional Oncogenic Driver Mutation Analysis in Lung Adenocarcinoma: The Lung Cancer Mutation Consortium Experience. J Thorac Oncol. 2015;10(5):768–777. doi: 10.1097/jto.0000000000000516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Steuer CE, Behera M, Berry L, et al. Role of race in oncogenic driver prevalence and outcomes in lung adenocarcinoma: Results from the Lung Cancer Mutation Consortium. Cancer. 2016;122(5):766–772. doi: 10.1002/cncr.29812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chen HY, Yu SL, Chen CH, et al. A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung Cancer. New Engl J Medicine. 2007;356(1):11–20. doi: 10.1056/nejmoa060096 [DOI] [PubMed] [Google Scholar]
  • 23.Ozaki Y, Muto S, Takagi H, et al. Tumor mutation burden and immunological, genomic, and clinicopathological factors as biomarkers for checkpoint inhibitor treatment of patients with non-small-cell lung cancer. Cancer Immunol Immunother. 2020;69(1):127–134. doi: 10.1007/s00262-019-02446-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fang L, Seki A, Fang G. SKAP associates with kinetochores and promotes the metaphase-to-anaphase transition. Cell Cycle Georget Tex. 2009;8(17):2819–2827. doi: 10.4161/cc.8.17.9514 [DOI] [PubMed] [Google Scholar]
  • 25.Grabauskiene S, Bergeron EJ, Chen G, et al. Checkpoint kinase 1 protein expression indicates sensitization to therapy by checkpoint kinase 1 inhibition in non–small cell lung cancer. J Sttrg Res. 2014;187(1):6–13. doi: 10.1016/j.jss.2013.12.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Li Z, Sang M, Tian Z, et al. Identification of key biomarkers and potential molecular mechanisms in lung cancer by bioinformatics analysis. Oncol Lett. 2019;18(5):4429–4440. doi: 10.3892/ol.2019.10796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wehler T, Thomas M, Schumann C, et al. A randomized, phase 2 evaluation of the CHK1 inhibitor, LY2603618, administered in combination with pemetrexed and cisplatin in patients with advanced nonsquamous non-small cell lung cancer. Lung Cancer. 2017;108:212–216. doi: 10.1016/j.lungcan.2017.03.001 [DOI] [PubMed] [Google Scholar]
  • 28.Kohnz RA, Mulvihill MM, Chang JW, et al. Activity-Based Protein Profiling of Oncogene-Driven Changes in Metabolism Reveals Broad Dysregulation of PAFAH1B2 and 1B3 in Cancer. Acs Chem Biol. 2015;10(7):1624–1630. doi: 10.1021/acschembio.5b00053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang C, Wang C, Wang C, et al. Importin subunit alpha-2 is identified as a potential biomarker for non-small cell lung cancer by integration of the cancer cell secretome and tissue transcriptome. Int J Cancer. 2011;128(10):2364–2372. doi: 10.1002/ijc.25568 [DOI] [PubMed] [Google Scholar]
  • 30.Kaanane H, Senhaji N, Berradi H, et al. The influence of Interleukin-6, Interleukin-8, Interleukin-10, Interleukin-17, TNF-A, MIF, STAT3 on lung cancer risk in Moroccan population. Cytokine. 2022;151:155806. doi: 10.1016/j.cyto.2022.155806 [DOI] [PubMed] [Google Scholar]
  • 31.Gahr S, Stoehr R, Geissinger E, et al. EGFR mutational status in a large series of Caucasian European NSCLC patients: data from daily practice. Brit J Cancer. 2013;109(7):1821–1828. doi: 10.1038/bjc.2013.511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shi Y, Au JSK, Thongprasert S, et al. A Prospective, Molecular Epidemiology Study of EGFR Mutations in Asian Patients with Advanced Non–Small-Cell Lung Cancer of Adenocarcinoma Histology (PIONEER). J Thorac Oncol. 2014;9(2):154–162. doi: 10.1097/jto.0000000000000033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Yu XQ, Yap ML, Cheng ES, et al. Evaluating Prognostic Factors for Sex Differences in Lung Cancer Survival: Findings From a Large Australian Cohort. J Thorac Oncol. 2022;17(5):688–699. doi: 10.1016/j.jtho.2022.0L016 [DOI] [PubMed] [Google Scholar]
  • 34.Sparano JA, Gray RJ, Makower DF, et al. Adjuvant Chemotherapy Guided by a 21-Gene Expression Assay in Breast Cancer. New Engl J Med. 2018;379(2):111–121. doi: 10.1056/nejmoa1804710 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Moschovas MC, Chew C, Bhat S, et al. Association Between Oncotype DX Genomic Prostate Score and Adverse Tumor Pathology After Radical Prostatectomy. European Urology Focus. Published online 2021. doi: 10.1016/j.euf.2021.03.015 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES